from:"Emmanuel Cecchet"

Re: [HACKERS] Partitioning option for COPY

2009-11-26 Thread Emmanuel Cecchet


Hannu Krosing wrote:

After triggers can't change tuple, thus cant change routing.
  
An after trigger can always issue an update of the tuple but that should 
be trapped by the regular mechanism that will deal with updates (when we 
have it available).
Also the description omits the execution of before and after statement 
triggers. While those can apply to the parent table (but the same 
question about what happens if the after statement modifies routing 
decision still applies), what does it mean in the case of COPY to have 
statement triggers on the child tables? 



What statement triggers do you mean ?

I don't think we have ON COPY triggers ? 
  

I mean

CREATE TRIGGER /name/ { BEFORE | AFTER } /event/
   ON /table/ FOR EACH STATEMENT


Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

Re: [HACKERS] Partitioning option for COPY

2009-11-25 Thread Emmanuel Cecchet


Robert Haas wrote:

On Wed, Nov 25, 2009 at 5:03 AM, Hannu Krosing  wrote:
  

I'd propose that triggers on both parent table and selected child are
executed.



I was thinking we should make the partitioning decision FIRST, before
any triggers are fired, and then fire only those triggers relevant to
the selected partition.  If the BEFORE triggers on the partition
modify the tuple in a way that makes it incompatible with the table
constraints on that partition, the insert (or update) fails.

Firing triggers on more than one table is pretty substantially
incompatible with what we do elsewhere and I'm not clear what we get
in exchange.  What is the use case for this?
  
I don't have a use case for this but I was puzzled with that when I had 
to implement trigger support in COPY with partitioning.
I came to the same conclusion as you and made the operation fail if the 
trigger was trying to move the tuple to another partition. However, I 
had a problem with after row triggers that I had to call synchronously 
to be able to detect the change. We will need something to tell us that 
an after row trigger did not mess with the routing decision.


Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-25 Thread Emmanuel Cecchet


Hannu Krosing wrote:

On Tue, 2009-11-24 at 10:08 -0500, Emmanuel Cecchet wrote:
  

Itagaki Takahiro wrote:


I just edited a wiki page for this discussion.
I hope it can be a help.
http://wiki.postgresql.org/wiki/Table_partitioning
  
  

I guess the problem of handling user triggers is still open.
If we allow triggers on partitions, badly written logic could lead to 
infinite loops in routing. In the case of COPY, an after statement 
trigger could change all the routing decisions taken for each row.



A simple update to the row can cause it to move between partitions, no ?
  

Yes.
I am not sure what the semantic should be if you have triggers defined on the 
parent and child tables. Which triggers do you fire if the insert is on 
the parent table but the tuple ends up in a child table?



I'd propose that triggers on both parent table and selected child are
executed.

1. first you execute before triggers on parent table, which may 
   change which partition the row belongs to


2.  then you execute before triggers on selected child table

   2.1 if this changes the child table selection repeat from 2.

3. save the tuple in child table

4. execute after triggers of the final selected child table
  

What if that trigger changes again the child table selection?

5. execute after triggers of parent table
  
Same here, what if the trigger changes the child table selection. Do we 
re-execute triggers on the new child table?
Also it is debatable whether we should execute an after trigger on a 
table where nothing was really inserted.

order of 4. and 5. is selected arbitrarily, others are determined by
flow.
  
Also the description omits the execution of before and after statement 
triggers. While those can apply to the parent table (but the same 
question about what happens if the after statement modifies routing 
decision still applies), what does it mean in the case of COPY to have 
statement triggers on the child tables? You cannot know in advance where 
the tuples are going to go and fire the before statement triggers. If 
you had to fire after statement triggers, in which order would you fire 
them?
If the new implementation hides the child tables, 



If you hide child tables, you suddenly need a lot of new syntax to
"unhide" them, so that partitions can be manipulated. Currently it is
easy to do it with INHERIT / NO INHERIT.
  
Agreed, but I think that we will discover some restrictions that will 
apply to child tables.


Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Syntax for partitioning

2009-11-24 Thread Emmanuel Cecchet


Greg Smith wrote:
I just made a few updates to 
http://wiki.postgresql.org/wiki/Table_partitioning , merging in the 
stuff that had been on the ToDo page and expanding the links to 
discussion on this list a bit.  The number of submitted patches over 
the last couple of years that handle some subset of the desired 
feature set here is really remarkable when you see them all together.


Should we add the 'WITH (...) TABLESPACE tbs' options to the syntax 
since they are supported?

Do we support ALTER ... SET TABLESPACE?

Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Syntax for partitioning

2009-11-24 Thread Emmanuel Cecchet


Hi,

Sorry for commenting only now but I think that we need to be able to 
store the partitions in different tablespaces. Even if originally the 
create table creates all partitions in the same tablespace, individual 
partitions should be allowed to be moved in different tablespaces using 
alter table or alter partition.  I think that other databases allows the 
user to define a tablespace for each partition in the create table 
statement.
In a warehouse, you might want to split your partitions on different 
volumes and over time, move older partitions to storage with higher 
compression if that data is not to be accessed frequently anymore. 
Altering tablespaces for partitions is important in that context.


Are you also planning to provide partitioning extensions to 'create 
table as'?


Thanks
Emmanuel


Here is a WIP partitioning patch. The new syntax are:
  1. CREATE TABLE parent (...);
  2. ALTER TABLE parent PARTITION BY { RANGE | LIST } ( key );
  3. CREATE TABLE child (...);
  4. ALTER TABLE child INHERIT parent AS PARTITION VALUES ...;

We can also use "CREATE TABLE PARTITION BY" as 1+2+3+4 and
"CREATE PARTITION" as 3+4. I think "INHERIT AS PARTITION" is rarely
used typically, but such orthogonality seems to be cleaner.

The most complex logic of the patch is in ATExecAddInherit(). It scans
existing partitions and generate CHECK constraint for the new partition.

Any comments to the design?  If no objections, I'd like to stop adding
features in this CommitFest and go for remaining auxiliary works
-- pg_dump, object dependency checking, documentation, etc.

  

-
 Catalog changes
-


In addition to pg_partition, I added pg_inherits.inhvalues field.
The type of field is "anyarray" and store partition values.
For range partition, an upper bound value is stored in the array.
For list partition, list values are stored in it. These separated
value fields will be useful to implement partition triggers in the
future. In contrast, reverse engineering of check constraints is messy.

CATALOG(pg_inherits,2611) BKI_WITHOUT_OIDS
{
Oid inhrelid;
Oid inhparent;
int4inhseqno;
anyarrayinhvalues;  /* values for partition */
} FormData_pg_inherits;

  

CREATE TABLE pg_partition (
partrelid oid REFERENCES oid ON pg_class,-- partitioned table oid
partopr   oid REFERENCES oid ON pg_operator, -- operator to compare keys
partkind  "char", -- kind of partition: 'R' (range) or 'L' (list)
partkey   text,   -- expression tree of partition key
PRIMARY KEY (partrelid)
) WITHOUT OIDS;



--
 Limitations and Restrictions
--
* We can create a new partition as long as partitioning keys
  are not conflicted with existing partitions. Especially,
  we cannot add any partitions if we have overflow partitions
  because a new partition always split the overflow partition.

* We cannot reuse an existing check constraint as a partition
  constraint. ALTER TABLE INHERIT AS PARTITION brings on
  a table scan to add a new CHECK constraint.

* No partition triggers nor planner and executor improvements.
  It would come in the future development.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

  
----



  



--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-24 Thread Emmanuel Cecchet


Itagaki Takahiro wrote:

Emmanuel Cecchet  wrote:

  

I guess the problem of handling user triggers is still open.
If we allow triggers on partitions, badly written logic could lead to 
infinite loops in routing.


Infinite loops are not a partition-related problem, no?
We can also find infinite loops in user defined functions,
recursive queries, etc. I think the only thing we can do for it
is to *stop* loops instead of prevention, like max_stack_depth.
  
I was thinking a trigger on child1 updating the partition key forcing 
the tuple to move to child2. And then a trigger on child2 updating the 
key again to move the tuple back to child1. You end up with an infinite 
loop.
With the current proposed implementation, would it be 
possible to define a view using child tables?



No, if you mean using a partition-view. I'm thinking we are moving
our implementation of partitioning from view-based to built-in feature.
Do you have any use-cases that requires view-based partitioning?
Was the inheritance-based partitioning not enough for it?
  
Nevermind, I was thinking about the implications of materialized views 
but Postgres does not have materialized views!


I have other questions related to create table but I will post them in 
the 'syntax for partitioning' thread.


Thanks
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-24 Thread Emmanuel Cecchet


Itagaki Takahiro wrote:

I just edited a wiki page for this discussion.
I hope it can be a help.
http://wiki.postgresql.org/wiki/Table_partitioning
  

I guess the problem of handling user triggers is still open.
If we allow triggers on partitions, badly written logic could lead to 
infinite loops in routing. In the case of COPY, an after statement 
trigger could change all the routing decisions taken for each row. I am 
not sure what the semantic should be if you have triggers defined on the 
parent and child tables. Which triggers do you fire if the insert is on 
the parent table but the tuple ends up in a child table?
If the new implementation hides the child tables, it might be safer to 
not allow triggers on child tables altogether and use the parent table 
as the single point of entry to access the partition (and define 
triggers). With the current proposed implementation, would it be 
possible to define a view using child tables?


Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Emmanuel Cecchet


Simon Riggs wrote:

I was unaware you were developing these ideas and so was unable to
provide comments until now. 
The first patch was published to this list on September 10 (almost 2.5 
months ago) along with the wiki page describing the problem and the 
solution.

What should I have done to raise awareness further?

/E

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Emmanuel Cecchet


Robert Haas wrote:

On Mon, Nov 23, 2009 at 9:39 AM, Emmanuel Cecchet  wrote:
  

I think you should read the thread and the patch before making any false
statements like you did in your email.

1. The patch does not use any trigger for routing.



Whoa, whoa!  I don't think that Simon said that it did.  But even if I
am wrong and he did...
  
Quote from Simon's email: "It is too narrow in its scope and potentially 
buggy in its approach to developing a cache and using trigger-like stuff."

You should really think twice about the style of your emails that cast a
detestable tone to discussions on pg-hackers.


...I certainly don't think this comment is justified.  This is a
technical discussion about the best way of solving a certain problem,
and I don't believe that any of the discussion up to this point has
been anything other than civil.  I can tell that you are frustrated
that your patch is not getting the support you would like to see it
get, but launching ad hominem attacks on Simon or anyone else is not
going to help

We certainly don't live in the same civilization then.

I am not frustrated if my patch does not get in because of technical 
considerations and I am happy so far with Jan's feedback that helped a 
lot. I think there is a misunderstanding between what Simon wants 
('Anyway, I want data routing, as is the intention of this patch.') and 
what this patch is about. This patch is just supposed to load tuples in 
a hierarchy of tables as this is a recurrent use case in datawarehouse 
scenarios. It is not supposed to solve data routing in general 
(otherwise that would be integrated standard in COPY and not as an option).


But it looks like it is a waste of everybody's time to continue this 
discussion further. Just move the patch to the rejected patches and 
let's wait for Itagaki's implementation.


Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Emmanuel Cecchet


Simon,

I think you should read the thread and the patch before making any false 
statements like you did in your email.


1. The patch does not use any trigger for routing.
2. This is just an option for COPY that is useful for loading operations 
in the datawarehouse world. It is not meant to implement full 
partitioning as explained many times already in this thread.
3. This patch elaborates on existing mechanisms and cannot rely on a 
meta-data representation of partitions which does not exist yet and will 
probably not exist in 8.5


You should justify your statements when you say 'potentially buggy in 
its approach to developing a cache and using trigger-like stuff'. I 
understand that you don't like it because this is not what you want but 
this is not my fault. This is not an implementation of partitioning like 
COPY does not do update/delete/alter/...
And yes the use case is 'narrow' like any option in COPY. It is like 
complaining that the CSV option is not useful because you want to load 
binary dumps.


If Itagaki gets the support of the community to get his implementation 
accepted, I will gladly use it. Contributing? If Aster is willing to 
contribute a code monkey to implement your specs, why not but you will 
have to convince them.


You should really think twice about the style of your emails that cast a 
detestable tone to discussions on pg-hackers.


Emmanuel



On Wed, 2009-11-11 at 19:53 -0500, Emmanuel Cecchet wrote:
  

Hi,

I have extracted the partitioning option for COPY (removed the error 
logging part) from the previous patch.



We can use an INSERT trigger to route tuples into partitions even now.
Why do you need an additional router for COPY? 
  
Tom has already explained on the list why using a trigger was a bad idea 
(and I know we can use a trigger since I am the one who wrote it).
If you look at the code you will see that you can do optimizations in 
the COPY code that you cannot do in the trigger.




 Also, it would be nicer
that the router can works not only in COPY but also in INSERT.
  
  
As 8.5 will at best provide a syntactic hack on top of the existing 
constraint implementation, I think that it will not hurt to have routing 
in COPY since we will not have it anywhere otherwise.


BTW, I'm working on meta data of partitioning now. Your "partitioning"
option in COPY could be replaced with the catalog.
  
  
This implementation is only for the current 8.5 and it will not be 
needed anymore once we get a fully functional partitioning in Postgres 
which seems to be for a future version.



Yes, the trigger way of doing this is a bad way. 


I regret to say that the way proposed here isn't much better, AFAICS.
Let me explain why I think that, but -1 to anyone applying this patch.

This patch proposes keeping a cache of last visited partitions to reduce
the overhead of data routing.

What I've requested is that partitioning work by using a data structure
held in relcache for inheritance parents. This differs in 3 ways from
this patch
a) it has a clearly defined location for the cached metadata, with
clearly identified and well worked out mechanisms for cache invalidation
b) the cache can be built once when it is first needed, not slowly grown
as parts of the metadata are used
c) it would be available for all parts of the server, not just COPY.

The easiest way to build that metadata is when structured partitioning
info is available. i.e. the best next action is to complete and commit
Itagaki's partitioning syntax patch. Then we can easily build the
metadata for partitioning, which can then be used in COPY for data
routing.

Anyway, I want data routing, as is the intention of this patch. I just
don't think this patch is a useful way to do it. It is too narrow in its
scope and potentially buggy in its approach to developing a cache and
using trigger-like stuff. 


ISTM that with the right metadata in the right place, a cleaner and
easier solution is still possible for 8.5. The code within COPY should
really just reduce to a small piece of code to derive the correct
relation for the desired row and then use that during heap_insert().

I have just discussed partitioning with Itagaki-san at JPUG, so I know
his plans. Itagaki-san and Manu, please can you work together to make
this work for 8.5? 


---

A more detailed explanation of Partitioning Metadata:

Partitioning Metadata is information held on the relcache for a table
that has child partitions. Currently, a table does not cache info about
its children, which prevents various optimisations.

We would have an extra pointer on the Relation struct that points to a
PartitioningMetadata struct. We can fill in this information when we
construct the relcache for a relation, or we can populate it on demand
the first time we attempt to use that information (if it exists).

We want to hold an array of partition boundary values.

Re: [HACKERS] Partitioning option for COPY

2009-11-22 Thread Emmanuel Cecchet


Stephan Szabo wrote:

On Sun, 22 Nov 2009, Emmanuel Cecchet wrote:

  

As I explained to Tom, if the after row trigger is called asynchronously
I get a relcache leak on the child table at the end of the copy
operation. If the trigger is called synchronously (like a before row
trigger) it works fine. Also calling the after row trigger synchronously
allows me to detect any potential problem between the actions of the
trigger and the routing decision. I am open to any suggestion for a more
elegant solution.



Well, I think there are still some issues there that at least need to be
better documented.

For example,
 create or replace function fi() returns trigger as '
  begin
   if (NEW.p is not null) then
if (select count(*) from i where i.i = NEW.p) = 0 then
 raise exception ''No parent'';
end if;
   end if;
   return NEW;
  end;
 ' language 'plpgsql';

 create or replace function fc() returns trigger as '
  begin
   if (NEW.p is not null) then
if (select count(*) from c where c.i = NEW.p) = 0 then
 raise exception ''No parent'';
end if;
   end if;
   return NEW;
  end;
 ' language 'plpgsql';

 create or replace function fp() returns trigger as '
  begin
   if (NEW.p is not null) then
if (select count(*) from p where p.i = NEW.p) = 0 then
 raise exception ''No parent'';
end if;
   end if;
   return NEW;
  end;
 ' language 'plpgsql';

 drop table i;
 drop table c;
 drop table p cascade;

 create table i(i int, p int);
 create trigger tri after insert on i for each row execute procedure fi();

 create table c(i int, p int);
 create trigger trc after insert on c for each row execute procedure fc();

 create table p(i int, p int);
 create table p1 (check (i > 0 and i <= 10)) inherits (p);
 create table p2 (check (i > 10 and i <= 20)) inherits (p);
 create table p3 (check (i > 20 and i <= 30)) inherits (p);
 create trigger trp1 after insert on p1 for each row execute procedure fp();
 create trigger trp2 after insert on p2 for each row execute procedure fp();
 create trigger trp3 after insert on p3 for each row execute procedure fp();

insert into i values (1,3),(2,1),(3,NULL);
copy c from stdin;
1   3
2   1
3   \N
\.
copy p from stdin with (partitioning);
1   3
2   1
3   \N
\.

gives me a successful load into i and c, but not into p with the current
patch AFAICS while a load where the 3 row is first does load.
  
Well, if you don't insert anything in p (the table, try to avoid using 
the same name for the table and the column in an example), copy will 
insert (1,3) in p1 and then the trigger will evaluate


select count(*) from p where p.i = NEW.p => NEW.p is 3 and the only p.i 
available is 1.

This should return 0 rows and raise the exception. This seems normal to me.
The only reason it works for i is because you inserted the values before 
the copy.


Am I missing something?
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-21 Thread Emmanuel Cecchet


Jan Urbański wrote:

 o) my main concern is still valid: the design was never agreed upon.
The approach of using inheritance info for automatic partitioning is, at
least IMHO, too restricted. Regular INSERTs won't get routed to child
tables. Data from writable CTEs won't get routed. People wanting to do
partitioning on something else that constraints are stuffed.
  
Well, this patch does not claim to implement partitioning for Postgres, 
it just offers partitioning as an option for COPY (and COPY only) based 
on the existing mechanism in Postgres.
I have already participated in lengthy and relatively sterile 
discussions on how to implement a full-blown partitioning but we never 
reached the beginning of an agreement and it was decided that a 
step-by-step approach would be better. I will propose another 
implementation of partitioning in COPY once Postgres has another 
representation than constraints on child tables to implement it.

I strongly suspect the patch will get rejected on the grounds of lack of
community agreement on partitioning, but I'd hate to see your work
wasted. It's not too late to open a discussion on how automatic
partitioning could work (or start working out a common proposal with the
people discussing in the "Syntax for partitioning" thread).
  
This is not my call. Right now the syntax for partitioning does not 
change anything to Postgres, it just adds syntactic sugar on top of the 
existing implementation. It will not route anything or answer any of the 
needs you mentioned in your previous point.

Marking as Waiting on Author, although I'd really like to see a solid
design being agreed upon, and then the code.
  
You are asking the wrong person if you want me to lead the partitioning 
design discussions. I already tried once and I was unsuccessful. As 
nothing as changed I don't see why I would be more successful this time.


Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-20 Thread Emmanuel Cecchet


Tom Lane wrote:

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?=  writes:
  

that got broken by the WHEN triggers patch
(c6e0a36243a54eff79b47b3a0cb119fb67a55165), which changed the
TriggerEnabled function signature, the code currently does not compile.



[ squint... ]  What is that patch doing touching the innards of
trigger.c in the first place?  I can't see any reason for trigger.c
to be associated with partitioning.
  
The problem I had is that if I used the standard trigger mechanism for 
after row inserts on a child table where the trigger is called 
asynchronously, I had a relcache leak on the child table. I tried to ask 
for help on that earlier on but it got lost with other discussions on 
the patch. So I tried to call the after trigger synchronously on the 
child table and it worked.
So the patch is just adding a synchronous call to after row insert 
triggers that is called when the tuple is moved to a child table (also 
allows to detect for triggers that are messing with the routing).


I would be happy to follow any recommendation for a more elegant 
solution to the problem.


Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Union test case broken in make check?

2009-11-20 Thread Emmanuel Cecchet


Tom Lane wrote:

Kenneth Marshall  writes:
  

Without an order by, the order is not defined.



Yeah, but with the same data and the same software it should generally
give the same result; as evidenced by the fact that these same
regression tests have worked for most people for years.  There's
something odd happening on Emmanuel's machine.  Maybe he's changed
the hashing algorithms or some planner cost parameters?
  
I did not change anything to Postgres and I can reproduce the problem 
with a vanilla checkout of HEAD. However, I think I started to see the 
problem since my last VMWare and OS upgrade (unfortunately I did both 
simultaneously).


For info, I am using VMWare Workstation v6.5.3 build-185404 on Vista 
6.0.60002 SP2. The VM is an Ubuntu 9.04 (Jaunty) with a 2.6.28-16 SMP 
kernel on x86_64 (2 cores available for the VM, 4 cores total). gcc is 
4.3.3 (Ubuntu 4.3.3-5ubuntu4).


Let me know if you need additional info about my setup.

Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Union test case broken in make check?

2009-11-19 Thread Emmanuel Cecchet


Then I guess that we need to fix the test.

Emmanuel

Kenneth Marshall wrote:

Without an order by, the order is not defined. The answers are the
same but the test gives a false failure because of the lack of
ordering.

Regards,
Ken

On Thu, Nov 19, 2009 at 07:54:30PM -0500, Emmanuel Cecchet wrote:
  

Tom Lane wrote:


Andrew Dunstan  writes:
  
  

Emmanuel Cecchet wrote:



Is it just me or the union test case fails in CVS head?
  
  
  
  
The buildfarm is pretty much all green: 
<http://www.pgbuildfarm.org/cgi-bin/show_status.pl>

So it looks like it's you :-)



When in doubt, try "make distclean" and a full rebuild before assuming
you've got a problem worth tracking down ...
  
  

Well, I did:
1. make distclean
2. configure with CFLAGS=-O0 --enable-cassert --enable-debug --without-perl 
--without-python --without-tcl --without-openssl

3. make (everything normal)
4. make check
And it still fails for me. I am attaching my regression.diffs if someone 
thinks it is worth tracking down ...


Emmanuel



  

*** /home/manu/workspace/PG-HEAD/src/test/regress/expected/union.out
2009-02-09 16:18:28.0 -0500
--- /home/manu/workspace/PG-HEAD/src/test/regress/results/union.out 
2009-11-19 19:37:32.0 -0500
***
*** 198,208 
WHERE f1 BETWEEN 0 AND 100;
   five  
  ---

--1004.3
- -34.84
-  -1.2345678901234e-200
   0
  123456
  (5 rows)
  
  SELECT CAST(f1 AS char(4)) AS three FROM VARCHAR_TBL

--- 198,208 
WHERE f1 BETWEEN 0 AND 100;
   five  
  ---

   0
  123456
+ -34.84
+  -1.2345678901234e-200
+-1004.3
  (5 rows)
  
  SELECT CAST(f1 AS char(4)) AS three FROM VARCHAR_TBL

***
*** 263,278 
  SELECT q2 FROM int8_tbl INTERSECT SELECT q1 FROM int8_tbl;
  q2
  --

-  4567890123456789
123
  (2 rows)
  
  SELECT q2 FROM int8_tbl INTERSECT ALL SELECT q1 FROM int8_tbl;
  q2
  --

   4567890123456789
   4567890123456789
-   123
  (3 rows)
  
  SELECT q2 FROM int8_tbl EXCEPT SELECT q1 FROM int8_tbl ORDER BY 1;

--- 263,278 
  SELECT q2 FROM int8_tbl INTERSECT SELECT q1 FROM int8_tbl;
  q2
  --

123
+  4567890123456789
  (2 rows)
  
  SELECT q2 FROM int8_tbl INTERSECT ALL SELECT q1 FROM int8_tbl;
  q2
  --

+   123
   4567890123456789
   4567890123456789
  (3 rows)
  
  SELECT q2 FROM int8_tbl EXCEPT SELECT q1 FROM int8_tbl ORDER BY 1;

***
*** 305,320 
  SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q2 FROM int8_tbl;
  q1
  --

-  4567890123456789
123
  (2 rows)
  
  SELECT q1 FROM int8_tbl EXCEPT ALL SELECT DISTINCT q2 FROM int8_tbl;
  q1
  --

   4567890123456789
   4567890123456789
-   123
  (3 rows)
  
  --

--- 305,320 
  SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q2 FROM int8_tbl;
  q1
  --

123
+  4567890123456789
  (2 rows)
  
  SELECT q1 FROM int8_tbl EXCEPT ALL SELECT DISTINCT q2 FROM int8_tbl;
  q1
  --

+   123
   4567890123456789
   4567890123456789
  (3 rows)
  
  --

***
*** 341,348 
  SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 
FROM int8_tbl;
  q1 
  ---

-   4567890123456789
 123
 456
4567890123456789
 123
--- 341,348 
  SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 
FROM int8_tbl;
  q1 
  ---

 123
+   4567890123456789
 456
4567890123456789
 123
***
*** 353,367 
  SELECT q1 FROM int8_tbl INTERSECT (((SELECT q2 FROM int8_tbl UNION ALL SELECT 
q2 FROM int8_tbl)));
  q1
  --

-  4567890123456789
123
  (2 rows)
  
  (((SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl))) UNION ALL SELECT q2 FROM int8_tbl;
  q1 
  ---

-   4567890123456789
 123
 456
4567890123456789
 123
--- 353,367 
  SELECT q1 FROM int8_tbl INTERSECT (((SELECT q2 FROM int8_tbl UNION ALL SELECT 
q2 FROM int8_tbl)));
  q1
  --

123
+  4567890123456789
  (2 rows)
  
  (((SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl))) UNION ALL SELECT q2 FROM int8_tbl;
  q1 
  ---

 123
+   45678901234

Re: [HACKERS] Union test case broken in make check?

2009-11-19 Thread Emmanuel Cecchet


Tom Lane wrote:

Andrew Dunstan  writes:
  

Emmanuel Cecchet wrote:


Is it just me or the union test case fails in CVS head?
  


  
The buildfarm is pretty much all green: 
<http://www.pgbuildfarm.org/cgi-bin/show_status.pl>

So it looks like it's you :-)



When in doubt, try "make distclean" and a full rebuild before assuming
you've got a problem worth tracking down ...
  

Well, I did:
1. make distclean
2. configure with CFLAGS=-O0 --enable-cassert --enable-debug 
--without-perl --without-python --without-tcl --without-openssl

3. make (everything normal)
4. make check
And it still fails for me. I am attaching my regression.diffs if someone 
thinks it is worth tracking down ...


Emmanuel
*** /home/manu/workspace/PG-HEAD/src/test/regress/expected/union.out
2009-02-09 16:18:28.0 -0500
--- /home/manu/workspace/PG-HEAD/src/test/regress/results/union.out 
2009-11-19 19:37:32.0 -0500
***
*** 198,208 
WHERE f1 BETWEEN 0 AND 100;
   five  
  ---
--1004.3
- -34.84
-  -1.2345678901234e-200
   0
  123456
  (5 rows)
  
  SELECT CAST(f1 AS char(4)) AS three FROM VARCHAR_TBL
--- 198,208 
WHERE f1 BETWEEN 0 AND 100;
   five  
  ---
   0
  123456
+ -34.84
+  -1.2345678901234e-200
+-1004.3
  (5 rows)
  
  SELECT CAST(f1 AS char(4)) AS three FROM VARCHAR_TBL
***
*** 263,278 
  SELECT q2 FROM int8_tbl INTERSECT SELECT q1 FROM int8_tbl;
  q2
  --
-  4567890123456789
123
  (2 rows)
  
  SELECT q2 FROM int8_tbl INTERSECT ALL SELECT q1 FROM int8_tbl;
  q2
  --
   4567890123456789
   4567890123456789
-   123
  (3 rows)
  
  SELECT q2 FROM int8_tbl EXCEPT SELECT q1 FROM int8_tbl ORDER BY 1;
--- 263,278 
  SELECT q2 FROM int8_tbl INTERSECT SELECT q1 FROM int8_tbl;
  q2
  --
123
+  4567890123456789
  (2 rows)
  
  SELECT q2 FROM int8_tbl INTERSECT ALL SELECT q1 FROM int8_tbl;
  q2
  --
+   123
   4567890123456789
   4567890123456789
  (3 rows)
  
  SELECT q2 FROM int8_tbl EXCEPT SELECT q1 FROM int8_tbl ORDER BY 1;
***
*** 305,320 
  SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q2 FROM int8_tbl;
  q1
  --
-  4567890123456789
123
  (2 rows)
  
  SELECT q1 FROM int8_tbl EXCEPT ALL SELECT DISTINCT q2 FROM int8_tbl;
  q1
  --
   4567890123456789
   4567890123456789
-   123
  (3 rows)
  
  --
--- 305,320 
  SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q2 FROM int8_tbl;
  q1
  --
123
+  4567890123456789
  (2 rows)
  
  SELECT q1 FROM int8_tbl EXCEPT ALL SELECT DISTINCT q2 FROM int8_tbl;
  q1
  --
+   123
   4567890123456789
   4567890123456789
  (3 rows)
  
  --
***
*** 341,348 
  SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 
FROM int8_tbl;
  q1 
  ---
-   4567890123456789
 123
 456
4567890123456789
 123
--- 341,348 
  SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 
FROM int8_tbl;
  q1 
  ---
 123
+   4567890123456789
 456
4567890123456789
 123
***
*** 353,367 
  SELECT q1 FROM int8_tbl INTERSECT (((SELECT q2 FROM int8_tbl UNION ALL SELECT 
q2 FROM int8_tbl)));
  q1
  --
-  4567890123456789
123
  (2 rows)
  
  (((SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl))) UNION ALL 
SELECT q2 FROM int8_tbl;
  q1 
  ---
-   4567890123456789
 123
 456
4567890123456789
 123
--- 353,367 
  SELECT q1 FROM int8_tbl INTERSECT (((SELECT q2 FROM int8_tbl UNION ALL SELECT 
q2 FROM int8_tbl)));
  q1
  --
123
+  4567890123456789
  (2 rows)
  
  (((SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl))) UNION ALL 
SELECT q2 FROM int8_tbl;
  q1 
  ---
 123
+   4567890123456789
 456
4567890123456789
 123
***
*** 416,423 
  SELECT q1 FROM int8_tbl EXCEPT (((SELECT q2 FROM int8_tbl ORDER BY q2 LIMIT 
1)));
  q1
  --
-  4567890123456789
123
  (2 rows)
  
  --
--- 416,423 
  SELECT q1 FROM int8_tbl EXCEPT (((SELECT q2 FROM int8_tbl ORDER BY q2

[HACKERS] Union test case broken in make check?

2009-11-19 Thread Emmanuel Cecchet


Hi,

Is it just me or the union test case fails in CVS head?

manu

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-17 Thread Emmanuel Cecchet


Tom Lane wrote:

Emmanuel Cecchet  writes:
  
Actually the list is supposed to stay around between statement 
executions. You don't want to restart with a cold cache at every 
statement so I really want this structure to stay in memory at a more 
global level.



Cache?  Why do you need a cache for COPY?  Repeated bulk loads into the
same table within a single session doesn't seem to me to be a case that
is common enough to justify a cache.
  
Actually the cache is only activated if you use the partitioning option. 
It is just a list of oids of child tables where tuples were inserted.
It is common to have multiple COPY operations in the same session when 
you are doing bulk loading in a warehouse.

(BTW, the quoted code seems to be busily reinventing OID Lists.  Don't
do that.)
  
Yes, I understood that I should use an OidList instead. But I was trying 
to understand what I did wrong here (besides reinventing the oid list ;-)).

Why do I get this segfault if I use memory from CacheMemoryContext?

Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-17 Thread Emmanuel Cecchet


Tom Lane wrote:

Emmanuel Cecchet  writes:
  

Tom Lane wrote:


This looks like the patch is trying to create a data structure in a
memory context that's not sufficiently long-lived for the use of the
structure.  If you do this in a non-cassert build, it will seem to
work, some of the time, if the memory in question happens to not
get reallocated to something else.

  
I was using the CacheMemoryContext. Could someone tell me why this is 
wrong and what should have been the appropriate context to use?



Well, (a) I doubt you really were creating the list in
CacheMemoryContext, else it'd have not gotten clobbered; (b) creating
statement-local data structures in CacheMemoryContext is entirely
unacceptable anyway, because then they represent a permanent memory
leak.
  

Well I thought that this code would do it:

child_table_lru = (OidLinkedList 
*)MemoryContextAlloc(
+   CacheMemoryContext, 
sizeof(OidLinkedList));
...
+   /* Add the new entry in head of the list */
+   new_head = (OidCell *) MemoryContextAlloc(
+   CacheMemoryContext, sizeof(OidCell));



The right context for statement-lifetime data structures is generally
the CurrentMemoryContext the statement code is called with.
  
Actually the list is supposed to stay around between statement 
executions. You don't want to restart with a cold cache at every 
statement so I really want this structure to stay in memory at a more 
global level.


Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-17 Thread Emmanuel Cecchet


Tom Lane wrote:

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?=  writes:
  

Program received signal SIGSEGV, Segmentation fault.
0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040,
tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821
1821child_relation_id =
child_oid_cell->oid_value;
(gdb) p child_oid_cell
$1 = (OidCell *) 0x7f7f7f7f



This looks like the patch is trying to create a data structure in a
memory context that's not sufficiently long-lived for the use of the
structure.  If you do this in a non-cassert build, it will seem to
work, some of the time, if the memory in question happens to not
get reallocated to something else.
  
I was using the CacheMemoryContext. Could someone tell me why this is 
wrong and what should have been the appropriate context to use?


Thanks
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch committers

2009-11-11 Thread Emmanuel Cecchet

The following email expresses my personal opinion and does not reflect 
the opinion of my employers.


Bruce Momjian wrote:

I also think the bad economy is making it harder for people/companies to
devote time to community stuff when paid work is available.
  
Actually the bad economy should be a booster for open source projects. 
There should be more developers with time to acquire new skills on 
projects that will get them a better job when the economy comes back.


I think the problems are more rooted in the developer community itself. 
The pg-hackers mailing list is probably the less socially skilled 
developer community I have ever seen in all the open source projects I 
have been involved with. A very high standard is set for contributions, 
which is good for the quality of the code, but the lack of development 
process and clear decision chain turns every new contributor into 
endless frustration. For a patch to be committed, a vague consensus has 
to arise among the strong technical voice(s) (usually convincing Tom is 
enough). If a more complex feature needs to be implemented, the lack of 
decision process ends up in a first long round of emails until everybody 
gets tired of it. Then sometimes later someone tries to re-activate the 
debate for another round and so on (partitioning is a good example). You 
lost potential committers at each of these rounds.


The way I see it, most companies try to push their agenda, contribute 
their patches back to the community if it works and just go with their 
own fork and closed implementation if this is too much work or burden. 
Whatever the economy, very few people can commit to an indefinite amount 
of time to get a feature integrated in Postgres.


Now you should probably ask yourself what you should do as a community 
to get more committers? Like it was said at PGCon, to get a patch in, it 
is going to be hard and painful. How do you make it less hard and less 
painful?


On the other end, how do we, simple developers, become better to reach 
the status of committers? How can we endure the constant bashing 
especially in the early stages of our learning phase (especially if you 
are not paid to do it)? How do we justify to our employers that they 
should persist through this long process without visibility, so that 
eventually their contribution will make it back to the community? How do 
we make it easier for companies to contribute code?


The lightness of the development process (no project manager, no 
steering committee, no voting, no product management, ...) is both a 
strength and a weakness that makes Postgres what it is today. The 
commitfest started to address some of the most obvious issues but there 
is probably much more that can be done by looking at what is happening 
in the other open source communities.


Even if the economy is hard, I found time to invest my own 2 cents in 
this email ;-)


Emmanuel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-11 Thread Emmanuel Cecchet


Hi,
I have extracted the partitioning option for COPY (removed the error 
logging part) from the previous patch.



We can use an INSERT trigger to route tuples into partitions even now.
Why do you need an additional router for COPY? 
Tom has already explained on the list why using a trigger was a bad idea 
(and I know we can use a trigger since I am the one who wrote it).
If you look at the code you will see that you can do optimizations in 
the COPY code that you cannot do in the trigger.



 Also, it would be nicer
that the router can works not only in COPY but also in INSERT.
  
As 8.5 will at best provide a syntactic hack on top of the existing 
constraint implementation, I think that it will not hurt to have routing 
in COPY since we will not have it anywhere otherwise.

BTW, I'm working on meta data of partitioning now. Your "partitioning"
option in COPY could be replaced with the catalog.
  
This implementation is only for the current 8.5 and it will not be 
needed anymore once we get a fully functional partitioning in Postgres 
which seems to be for a future version.


Best regards,
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Partitioning option for COPY

2009-11-11 Thread Emmanuel Cecchet


Hi all,

I have extracted the partitioning option for COPY (removed the error 
logging part) from the previous patch. The documentation and test suite 
sample are provided as well.
More details are on the wiki page at 
http://wiki.postgresql.org/wiki/Auto-partitioning_in_COPY. Ignore the 
error logging related comments that do not apply here.


Looking forward to your feedback
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

Index: src/test/regress/parallel_schedule
===
RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/parallel_schedule,v
retrieving revision 1.57
diff -c -r1.57 parallel_schedule
*** src/test/regress/parallel_schedule  24 Aug 2009 03:10:16 -  1.57
--- src/test/regress/parallel_schedule  11 Nov 2009 03:17:48 -
***
*** 47,53 
  # execute two copy tests parallel, to check that copy itself
  # is concurrent safe.
  # --
! test: copy copyselect
  
  # --
  # Another group of parallel tests
--- 47,53 
  # execute two copy tests parallel, to check that copy itself
  # is concurrent safe.
  # --
! test: copy copyselect copy_partitioning
  
  # --
  # Another group of parallel tests
Index: src/backend/utils/adt/ruleutils.c
===
RCS file: /home/manu/cvsrepo/pgsql/src/backend/utils/adt/ruleutils.c,v
retrieving revision 1.314
diff -c -r1.314 ruleutils.c
*** src/backend/utils/adt/ruleutils.c   5 Nov 2009 23:24:25 -   1.314
--- src/backend/utils/adt/ruleutils.c   11 Nov 2009 03:17:48 -
***
*** 218,224 
  static Node *processIndirection(Node *node, deparse_context *context,
   bool printit);
  static void printSubscripts(ArrayRef *aref, deparse_context *context);
! static char *generate_relation_name(Oid relid, List *namespaces);
  static char *generate_function_name(Oid funcid, int nargs, List *argnames,
Oid 
*argtypes, bool *is_variadic);
  static char *generate_operator_name(Oid operid, Oid arg1, Oid arg2);
--- 218,224 
  static Node *processIndirection(Node *node, deparse_context *context,
   bool printit);
  static void printSubscripts(ArrayRef *aref, deparse_context *context);
! char *generate_relation_name(Oid relid, List *namespaces);
  static char *generate_function_name(Oid funcid, int nargs, List *argnames,
Oid 
*argtypes, bool *is_variadic);
  static char *generate_operator_name(Oid operid, Oid arg1, Oid arg2);
***
*** 6347,6353 
   * We will forcibly qualify the relation name if it equals any CTE name
   * visible in the namespace list.
   */
! static char *
  generate_relation_name(Oid relid, List *namespaces)
  {
HeapTuple   tp;
--- 6347,6353 
   * We will forcibly qualify the relation name if it equals any CTE name
   * visible in the namespace list.
   */
! char *
  generate_relation_name(Oid relid, List *namespaces)
  {
HeapTuple   tp;
Index: doc/src/sgml/ref/copy.sgml
===
RCS file: /home/manu/cvsrepo/pgsql/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.92
diff -c -r1.92 copy.sgml
*** doc/src/sgml/ref/copy.sgml  21 Sep 2009 20:10:21 -  1.92
--- doc/src/sgml/ref/copy.sgml  11 Nov 2009 03:17:48 -
***
*** 41,46 
--- 41,47 
  ESCAPE 'escape_character'
  FORCE_QUOTE { ( column [, 
...] ) | * }
  FORCE_NOT_NULL ( column [, 
...] )
+ PARTITIONING [ boolean ]
  
   
  
***
*** 282,287 
--- 283,301 
  
 
  
+
+ PARTITIONING
+ 
+  
+   In PARTITIONING mode, COPY TO a parent
+   table will automatically move each row to the child table that
+   has the matching constraints. This feature can be used with
+   ERROR_LOGGING to capture rows that do not match any
+   constraint in the table hierarchy. See the notes below for the
+   limitations.
+  
+ 
+

   
  
***
*** 384,389 
--- 398,421 
  VACUUM to recover the wasted space.
 
  
+
+ PARTITIONING mode scans for each child table constraint in the
+ hierarchy to find a match. As an optimization, a cache of the last child
+ tables where tuples have been routed is kept and tried first. The size
+ of the cache is set by the copy_partitioning_cache_size
+ session variable. It the size is set to 0, the cache is disabled otherwise
+ the indicated number of child tables is kept in the cache (at most).
+
+  
+
+ PARTITIONING mode assumes that every child table has at least
+ one constraint defined otherwise an error is thrown. If child tables have
+ overlapping constraints, the row is i

Re: [HACKERS] DISTINCT ON

2009-11-04 Thread Emmanuel Cecchet


Tom Lane wrote:

Greg Stark  writes:
  

On Wed, Nov 4, 2009 at 3:17 AM, Emmanuel Cecchet  wrote:


SELECT DISTINCT ON ('1'::varchar,  '1'::varchar) a FROM (SELECT 1 AS a) AS a
ORDER BY '1'::varchar, '1'::varchar, '2'::varchar;
  


  

This sounds familiar.  What version of Postgres are you testing this on?



Presumably something before this bug
http://archives.postgresql.org/pgsql-sql/2008-07/msg00123.php
got fixed
http://archives.postgresql.org/pgsql-committers/2008-07/msg00341.php
  
I am using 8.3.6 and it looks like the fix was only integrated in 8.4. 
So using 8.4 should solve the problem.


Thanks
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] DISTINCT ON

2009-11-03 Thread Emmanuel Cecchet


Hi all,

It looks like Postgres has a restriction in DISTINCT ON queries where the 
DISTINCT ON expressions must match the left side of the ORDER BY list. The 
issue is that if a DISTINCT ON ... has multiple instances of a particular 
expression, this check doesn't seem to fire correctly.

For example, this query returns an error (but I guess it shouldn't):

SELECT DISTINCT ON ('1'::varchar,  '1'::varchar) a FROM (SELECT 1 AS a) AS a 
ORDER BY '1'::varchar, '1'::varchar, '2'::varchar;

And this query doesn't return an error (but I guess it should):

SELECT DISTINCT ON ('1'::varchar, '2'::varchar, '1'::varchar) a FROM (SELECT 1 
AS a) AS a ORDER BY '1'::varchar, '2'::varchar, '2'::varchar;


Am I misunderstanding something or is there a bug?

Thanks for the help
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-10-20 Thread Emmanuel Cecchet


Tom,

Emmanuel Cecchet  writes:
  

Tom Lane wrote:


There aren't any.  You can *not* put a try/catch around arbitrary code
without a subtransaction.  Don't even think about it

Well then why the tests provided with the patch are working?


Because they carefully exercise only a tiny fraction of the system.
The key word in my sentence above is "arbitrary".  You don't know what
a datatype input function might try to do, let alone triggers or other
functions that COPY might have to invoke.  They might do things that
need to be cleaned up after, and subtransaction rollback is the only
mechanism we have for that.
  
Is it possible to use the try/catch mechanism to detect errors and log 
them and only abort the command at the end of the processing?
This would have the advantage of being to be able to log all errors 
without the overhead of subtransactions and even add some additional 
information on where the error happened (parsing, constraints, trigger, 
...) for further automated treatment. The result would still be clean 
since we would abort the COPY command in case of an error as it does 
currently (but we would not stop on the first error).
I guess that it would make more sense to log to a file than to a table 
in that case.


Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-10-13 Thread Emmanuel Cecchet


Tom Lane wrote:

Emmanuel Cecchet  writes:
  
- speed with error logging best effort: no use of sub-transactions but 
errors that can safely be trapped with pg_try/catch (no index violation, 



There aren't any.  You can *not* put a try/catch around arbitrary code
without a subtransaction.  Don't even think about it.
  

Well then why the tests provided with the patch are working?
I hear you when you say that it is not a generally applicable idea, but 
it seems that at least a couple of errors can be trapped with this 
mechanism.


Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-10-13 Thread Emmanuel Cecchet


Tom Lane wrote:

Ultimately, there's always going to be a tradeoff between speed and
flexibility.  It may be that we should just say "if you want to import
dirty data, it's gonna cost ya" and not worry about the speed penalty
of subtransaction-per-row.  But that still leaves us with the 2^32
limit.  I wonder whether we could break down COPY into sub-sub
transactions to work around that...
  
Regarding that tradeoff between speed and flexibility I think we could 
propose multiple options:

- maximum speed: current implementation fails on first error
- speed with error logging: copy command fails if there is an error but 
continue to log all errors
- speed with error logging best effort: no use of sub-transactions but 
errors that can safely be trapped with pg_try/catch (no index violation, 
no before insert trigger, etc...) are logged and command can complete
- pre-loading (2-phase copy): phase 1: copy good tuples into a [temp] 
table and bad tuples into an error table. phase 2: push good tuples to 
destination table. Note that if phase 2 fails, it could be retried since 
the temp table would be dropped only on success of phase 2.
- slow but flexible: have every row in a sub-transaction -> is there any 
real benefits compared to pg_loader?


Tom was also suggesting 'refactoring COPY into a series of steps that 
the user can control'. What would these steps be? Would that be per row 
and allow to discard a bad tuple?


Emmanuel

--
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consulting

--
Web: http://www.frogthinker.org
email: m...@frogthinker.org
Skype: emmanuel_cecchet


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-10-07 Thread Emmanuel Cecchet


Greg Smith wrote:
Absolutely, that's the whole point of logging to a file in the first 
place.  What needs to happen here is that when one is aborted, you 
need to make sure that fact is logged, and with enough information 
(the pid?) to tie it to the COPY that failed.  Then someone can crawl 
the logs to figure out what happened and what data did and didn't get 
loaded.  Ideally you'd want to have that as database table information 
instead, to allow automated recovery and re-commit in the cases where 
the error wasn't inherent in the data but instead some other type of 
database failure.
If one is aborted, nothing gets loaded anyway. Now if you want a 
separate log of the successful commands, I don't think that should apply 
just to COPY but to any operation (insert, update, DDL, ...) if you want 
to build some automatic retry mechanism. But this seems independent from 
COPY to me.


manu

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-10-07 Thread Emmanuel Cecchet


Robert Haas wrote:

On Wed, Oct 7, 2009 at 11:39 AM, Emmanuel Cecchet  wrote:
  

Robert Haas wrote:


On Wed, Oct 7, 2009 at 9:12 AM, Emmanuel Cecchet 
wrote:

  

Hi all,

I think there is a misunderstanding about what the current patch is
about.
The patch includes 2 things:
- error logging in a table for bad tuples in a COPY operation (see
http://wiki.postgresql.org/wiki/Error_logging_in_COPY for an example; the
error message, command and so on are automatically logged)
- auto-partitioning in a hierarchy of child table if the COPY targets a
parent table.
The patch does NOT include:
- logging errors into a file (a feature we can add later on (next commit
fest?))



My failure to have read the patch is showing here, but it seems to me
that error logging to a table could be problematic: if the transaction
aborts, we'll lose the log.  If this is in fact a problem, we should
be implementing logging to a file (or stdout) FIRST.

  

I don't think this is really a problem. You can always do a SELECT in the
error table after the COPY operation if you want to diagnose what happened
before the transaction rollbacks.



Uhhh if the transaction has aborted, no new commands (including
SELECT) will be accepted until you roll back.
  
You are suggesting then that it is the COPY command that aborts the 
transaction. That would only happen if you had set a limit on the number 
of errors that you want to accept in a COPY command (in which case you 
know that there is something really wrong with your input file and you 
don't want the server to process that file).


manu

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-10-07 Thread Emmanuel Cecchet

The roadmap I would propose for the current list of enhancements to COPY 
is as follows:

1. new syntax for COPY options (already committed)
2. error logging in a table
3. auto-partitioning (just relies on basic error logging, so can be 
scheduled anytime after 2)

4. error logging in a file

manu

Andrew Dunstan wrote:

Tom Lane wrote:
  

Andrew Dunstan  writes:
  


Emmanuel Cecchet wrote:

  
If you prefer to postpone the auto-partitioning to the next commit 
fest, I can strip it from the current patch and re-submit it for the 
next fest (but it's just 2 isolated methods really easy to review).
  

  

I certainly think this should be separated out. In general it is not a 
good idea to roll distinct features together. It complicates both the 
reviewing process and the discussion.

  

I think though that Greg was suggesting that we need some more thought
about the overall road map.  Agglomerating "independent" features onto
COPY one at a time is going to lead to a mess, unless they fit into an
overall design plan.


  



I don't disagree with that. But even if we get a roadmap of some set of 
features we want to implement, rolling them all together isn't a good 
way to go.


cheers

andrew


  



--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-10-07 Thread Emmanuel Cecchet


Robert Haas wrote:

On Wed, Oct 7, 2009 at 9:12 AM, Emmanuel Cecchet  wrote:
  

Hi all,

I think there is a misunderstanding about what the current patch is about.
The patch includes 2 things:
- error logging in a table for bad tuples in a COPY operation (see
http://wiki.postgresql.org/wiki/Error_logging_in_COPY for an example; the
error message, command and so on are automatically logged)
- auto-partitioning in a hierarchy of child table if the COPY targets a
parent table.
The patch does NOT include:
- logging errors into a file (a feature we can add later on (next commit
fest?))



My failure to have read the patch is showing here, but it seems to me
that error logging to a table could be problematic: if the transaction
aborts, we'll lose the log.  If this is in fact a problem, we should
be implementing logging to a file (or stdout) FIRST.
  
I don't think this is really a problem. You can always do a SELECT in 
the error table after the COPY operation if you want to diagnose what 
happened before the transaction rollbacks.  I think that using a file to 
process the bad tuples is actually less practical than a table if you 
want to re-insert these tuples in the database.
But anyway, the current implementation captures the tuple with all the 
needed information for logging and delegate the logging to a specific 
method. If we want to log to a file (or stdout), we can just provide 
another method to log the already captured info in a file.


I think that the file approach is a separate feature but can be easily 
integrated in the current design. There is just extra work to make sure 
concurrent COPY commands writing to the same error file are using proper 
locking.


Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-10-07 Thread Emmanuel Cecchet


Hi all,

I think there is a misunderstanding about what the current patch is about.
The patch includes 2 things:
- error logging in a table for bad tuples in a COPY operation (see 
http://wiki.postgresql.org/wiki/Error_logging_in_COPY for an example; 
the error message, command and so on are automatically logged)
- auto-partitioning in a hierarchy of child table if the COPY targets a 
parent table.


The patch does NOT include:
- logging errors into a file (a feature we can add later on (next commit 
fest?))


I can put the auto-partitioning in a separate patch if that helps but 
this patch relies on error logging to log possible errors in the routing 
of tuples.


I think that the way to move forward is first to have a basic error 
logging facility that logs into a database table (like the current patch 
does) and then add enhancements. I don't think we should try to do the 
file logging at the same time. If you prefer to postpone the 
auto-partitioning to the next commit fest, I can strip it from the 
current patch and re-submit it for the next fest (but it's just 2 
isolated methods really easy to review).


Emmanuel


Robert Haas wrote:

On Wed, Oct 7, 2009 at 3:17 AM, Greg Smith  wrote:
  

I know this patch is attracting more reviewers lately, is anyone tracking
the general architecture of the code yet?  Emmanuel's work is tough to
review just because there's so many things mixed together, and there's other
inputs I think should be considered at the same time while we're all testing
in there (such as the COPY patch Andrew Dunstan put together).



I hadn't realized this was an issue, but I think it's a good point: a
patch that does one thing well is much more likely to get accepted
than a patch that does two things well, let alone two things poorly.
It's just much easier to review and verify.  Or maybe the name of the
patch maybe should have tipped me off: "COPY enhancements" vs. "make
COPY have feature X".

  

What I'd like to see is for everything to get broken more into component
chunks that can get commited and provide something useful one at a time,
because I doubt taskmaster Robert is going to let this one linger around
with scope creep for too long before being pushed out to the next
CommitFest.



I'm can't decide whether to feel good or bad about that appelation, so
I'm going with both.  But in all seriousness if this patch needs
substantial reworking (which it sounds like it does) we should
postpone it to the next CF; we are quickly running out of days, and
it's not fair to reviewers or committers to ask for new reviews of
substantially revised code with a only a week to go.

...Robert
  



--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-10-06 Thread Emmanuel Cecchet


I just realized that I forgot to CC the list when I answered to Josh... 
resending!

Josh,



>> I think that this was the original idea but we should probably rollback
>> the error logging if the command has been rolled back. It might be more
>> consistent to use the same hi_options as the copy command. Any idea what
>> would be best?
>> 


>
> Well, if we're logging to a file, you wouldn't be *able* to roll them
> back.  Also, presumbly, if you abort a COPY because of errors, you
> probably want to keep the errors around for later analysis.  No?
>   
  
You are right about the file (though this functionality is not 
implemented yet).
I don't have any strong opinion about the right behavior if COPY aborts. 
The error log table would contain tuples that were not inserted anyway. 
Now knowing whether the bad tuples come from a successful COPY command 
or not will be left to the user.


Emmanuel
-- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com

Re: [HACKERS] COPY enhancements

2009-10-05 Thread Emmanuel Cecchet


Hi Selena,


This is my first pass at the error logging portion of this patch. I'm
going to take a break and try to go through the partitioning logic as
well later this afternoon.

caveat: I'm not familiar with most of the code paths that are being
touched by this patch.

Overall:
* I noticed '\see' included in the comments in your code. These should
be removed.
  

Should we avoid any doxygen tags in general in the Postgres code?

* The regression is failling, as Jeff indicated, and I didn't figure
out why yet either. Hopefully will have a look closer this afternoon.
  

Let me know if the response I sent to Jeff works for you.

Comments:
* copy.c: Better formatting, maybe rewording needed for comment
starting on line 1990.
  
Some of the bad formatting has been left on purpose to minimize the size 
of the patch otherwise there would have been many tab/white spaces 
changes due to the indentation in the PG_TRY blocks. I suggest that 
whoever is going to commit the code runs pg_indent or I can fix the 
formatting once the reviewing is done.

** Maybe say: "Check that errorData->sqlerrcode only logged tuples
that are malformed. This ensures that we let other errors pass
through."
  

Ok.

* copy.c: line: 2668 -> need to fix that comment :) (/* Regular code
*/) -- this needs to be more descriptive of what is actually
happening. We fell through the partitioning logic, right, and back to
the default behavior?
  

Yes.

* copy.c: line 2881, maybe say instead: "Mark that we have not read a
line yet. This is necessary so that we can perform error logging on
complete lines only."
  

Ok.

Formatting:
* copy.c: whitespace is maybe a little off at line: 2004-2009
* NEW FILES: src/backend/utils/misc/errorlogging.c & errorlogging.h need headers
  
I was not sure what is auto-generated by SVN commit scripts. I'd be 
happy to add headers. Is there a specification somewhere or should I 
just copy/paste from another file?

Code:
* copy.c: line 1990 -> cur_lineno_str[11] & related: why is this
conversion necessary? (sorry if that is a dumb question)
  
This conversion is necessary if you log in the error table the index of 
the row that is causing the error (this is what is logged by default in 
ERROR_LOGGING_KEY).

* copy.c: line 2660 -> what if we are error logging for copy? Does
this get logged properly?
  

Yes, this is in a try/catch block that performs error logging.

* errorlogging.c: Noticed the comment at 503 -> 'note that we always
enable wal logging'.. Thinking this through - the reasoning is that
you won't be rolling back the error logging no matter what, right?
  
I think that this was the original idea but we should probably rollback 
the error logging if the command has been rolled back. It might be more 
consistent to use the same hi_options as the copy command. Any idea what 
would be best?

* src/include/commands/copy.c and copy.h: struct CopyStateData was
moved from copy.c to copy.h; this made sense to me, just noting. That
move made it a little tricky to find the changes that were made. There
were 10 items added.
  
Yes, patch can be a little bit lost when you move a big data structure 
like this one.

** super nit pick: 'readlineOk' uses camel-case instead of underscores
like the rest of the new variables
  

Yes, queryDesc also has camel-case. I will fix readlineOk.

* errorlogging.c: could move currentCommand pg_stat call in Init
routine into MalformedTupleIs, or maybe move the setting of that
parameter into the Init routine for consistency's sake.
  
The advantage of  calling pg_stat in InitializeErrorLogging is that it 
is evaluated only once for the entire copy command (and it's not going 
to change during the execution of the command). I am not sure to 
understand what your second suggestion is since currentCommand is set 
and initialized in Init.

Documentation:
* doc/src/sgml/ref/copy.sgml: line 313: 'failif' needs a space
  

ok

** Also: The error table log examples have relations shown in a
different order than the actual CREATE TABLE declaration in the code.
  
The order of the columns does not really matter but for consistency sake 
we can put the same order.

* src/test/regress/sql/copyselect.sql: Format of options changed..
added parenthesis. Is this change documented elsewhere? (sorry if I
just missed this in the rest of the thread/patch)
  
Yes this has already been committed by Tom. The new format of options 
has been introduced just before this patch.


I am attaching a revised version of the patch.
Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com



aster-copy-newsyntax-patch-8.5v6context.txt.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-10-04 Thread Emmanuel Cecchet


The problem comes from the foo_malformed_terminator.data file.
It is supposed to have a malformed terminator that was not catch by 
patch. The second line should look like:

2   two^M

If it does not, you can edit it with emacs, go at the end of the second 
line and press Ctrl+q followed by Ctrl+m and that will do the trick.
I am attaching a copy of the file, hoping that the attachment will 
arrive with the malformed terminator in your inbox!


manu


Jeff Davis wrote:

On Fri, 2009-09-25 at 10:01 -0400, Emmanuel Cecchet wrote:
  

Robert,

Here is the new version of the patch that applies to CVS HEAD as of this 
morning.



I just started looking at this now. It seems to fail "make check", diffs
attached. I haven't looked into the cause of the failure yet.

Regards,
Jeff Davis
  



--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com

1   one
2   two
3   three
4   four
5   five

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Join optimization for inheritance tables

2009-09-26 Thread Emmanuel Cecchet


Hi Simon,

Thanks for the insight. I might take that as a long term project. I have 
to discuss that with my colleagues at Aster. It would certainly help to 
put together a wiki page with the key insights on the design of such 
implementation to be able to better scope the project and agree on what 
is the right approach for this.


manu


On Tue, 2009-09-22 at 18:16 -0400, Emmanuel Cecchet wrote:

  
If the partitioning implementation does not make progress (and does not 
make it for 8.5)



Manu, not sure if you are referring to Kedar's patch I reviewed earlier
in July, but that patch didn't implement anything like the right
internals for this to work. 


As Tom says, what we need is a single internal data structure that
describes all the partitions. That can then be used to automatically
route inserts, perform the push down merge joins and speed-up constraint
exclusion. We should build this once and store it in the relcache for
the parent relation and so would need some changes in relcache
invalidation as well to make sure child ddl changes invalidate that.

  



--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-09-25 Thread Emmanuel Cecchet


Robert,

Here is the new version of the patch that applies to CVS HEAD as of this 
morning.


Emmanuel


On Fri, Sep 18, 2009 at 12:14 AM, Emmanuel Cecchet  wrote:
  

Here is a new version of error logging and autopartitioning in COPY based on
the latest COPY patch that provides the new syntax for copy options (this
patch also includes the COPY option patch).

New features compared to previous version:
- removed the GUC variables for error logging and use copy options instead
(renamed options as suggested by Josh)
- added documentation and examples (see copy.sgml)
- partitioning also activated by copy option rather than GUC variable
(renamed as well)
- added a new ERROR_LOGGING_MAX_ERRORS option that allows to abort the COPY
operation if a certain number of bad rows has been encountered.
- updated unit tests

I also tried to update the wiki pages but it's late and the doc is probably
better for now.

This addresses most of the comments so far except for the format of the
error table (lack of natural key) and a possible pg_error_table as a default
name.

Emmanuel

Emmanuel Cecchet wrote:


Hi all,

Finally the error logging and autopartitioning patches for COPY that I
presented at PGCon are here!
Error logging is described here:
http://wiki.postgresql.org/wiki/Error_logging_in_COPY
Autopartitioning is described here:
http://wiki.postgresql.org/wiki/Auto-partitioning_in_COPY

The attached patch contains both contribs as well as unit tests. I will
submit shortly the new patch at commitfest.postgresql.org.

Thanks in advance for your feedback,
Emmanuel
  


This no longer applies.

...Robert
  



--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com



aster-copy-newsyntax-patch-8.5v5context.txt.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-09-24 Thread Emmanuel Cecchet

Yes, I have to update the patch following what Tom already integrated of 
the COPY patch.

I will get a new version posted as soon as I can.

Emmanuel

Robert Haas wrote:

On Fri, Sep 18, 2009 at 12:14 AM, Emmanuel Cecchet  wrote:
  

Here is a new version of error logging and autopartitioning in COPY based on
the latest COPY patch that provides the new syntax for copy options (this
patch also includes the COPY option patch).

New features compared to previous version:
- removed the GUC variables for error logging and use copy options instead
(renamed options as suggested by Josh)
- added documentation and examples (see copy.sgml)
- partitioning also activated by copy option rather than GUC variable
(renamed as well)
- added a new ERROR_LOGGING_MAX_ERRORS option that allows to abort the COPY
operation if a certain number of bad rows has been encountered.
- updated unit tests

I also tried to update the wiki pages but it's late and the doc is probably
better for now.

This addresses most of the comments so far except for the format of the
error table (lack of natural key) and a possible pg_error_table as a default
name.

Emmanuel

Emmanuel Cecchet wrote:


Hi all,

Finally the error logging and autopartitioning patches for COPY that I
presented at PGCon are here!
Error logging is described here:
http://wiki.postgresql.org/wiki/Error_logging_in_COPY
Autopartitioning is described here:
http://wiki.postgresql.org/wiki/Auto-partitioning_in_COPY

The attached patch contains both contribs as well as unit tests. I will
submit shortly the new patch at commitfest.postgresql.org.

Thanks in advance for your feedback,
Emmanuel
  


This no longer applies.

...Robert
  



--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Join optimization for inheritance tables

2009-09-22 Thread Emmanuel Cecchet


Tom,

If the partitioning implementation does not make progress (and does not 
make it for 8.5), don't you think that this could be an interesting 
contribution to consider for this release?
I have put on the wiki 
(http://wiki.postgresql.org/wiki/Join_optimization_for_inheritance_tables) 
the results obtained so far and the use case where it is most used.


As I understand it, partitioning will certainly lead to some significant 
changes/enhancements to the planner. Do you think it is realistic to get 
that for 8.5?


manu


Herodotos Herodotou  writes:
  

This patch extends the query optimizer to consider joins between child tables 
when hierarchies are joined together.



I looked over this patch a bit.  I am of the opinion that this is a lot
of work towards a dead-end direction :-(.  The patch contains a large
amount of rather inefficient code that attempts to reverse-engineer
knowledge about whether an inheritance tree forms a range-partitioned
table.  We already know that reasoning from first principles using a set
of arbitrary per-table constraints doesn't scale very well, and this
would make it a lot worse.  pgsql-hackers have been talking for some
time about implementing an explicit partitioning feature, which would
give the planner direct knowledge about this type of table structure
without expending nearly so much work (and then expending it all over
again for the next query).  I think the right way to go at it is to
implement that first, and then see about teaching the planner to plan
like this.

We could possibly extract a subset of the patch that just deals with
pushing joins down through appends, without the constraint-exclusion
logic that tries to eliminate provably empty join pairs.  But I'm
dubious that that would be worthwhile by itself.  I think you need the
constraint exclusion to eliminate a good deal of work before this is
actually better than doing a single join above the append.

There are a number of other things I don't like, such as the restrictive
and not-obviously-necessary requirement that all the join clauses be
simple "Var op Var" clauses.  But the main thing is that a large
fraction of the patch is doing something I don't think we should try
to do.

    regards, tom lane

  



--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generic copy options

2009-09-21 Thread Emmanuel Cecchet


Tom Lane wrote:

I wrote:
  

As far as I can tell, the majority opinion is to use "format csv"



BTW, if we're going to do that, shouldn't the "binary" option instead
be spelled "format binary"?
  
Looking at the doc, it looks like FORMAT should be mandatory and be 
either text, binary or csv (for now).


manu

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generic copy options

2009-09-20 Thread Emmanuel Cecchet

The easiest for both implementation and documentation might just be to 
have a matrix of options.
Each option has a row and a column in the matrix. The intersection of a 
row and a column is set to 0 if options are not compatible and set to 1 
if it is. This way we are sure to capture all possible combinations.
This way, each time we find a new option, we just have to check in the 
matrix if it is compatible with the already existing options. Note that 
we can also replace the 0 with an index in an error message array.


I can provide an implementation of that if this looks interesting to anyone.
Emmanuel

Robert Haas wrote:

On Sun, Sep 20, 2009 at 2:25 PM, Emmanuel Cecchet  wrote:
  

Tom Lane wrote:


Emmanuel Cecchet  writes:
  

Here you will force every format to use the same set of options


How does this "force" any such thing?

  

As far as I understand it, every format will have to handle every format
options that may exist so that they can either implement it or throw an
error.



I don't think this is really true.  To be honest with you, I think
it's exactly backwards.  The way the option-parsing logic works, we
parse each option individually FIRST.  Then at the end we do
cross-checks to see whether there is an incompatibility in the
combination specified.  So if two different formats support the same
option, we just change the cross-check to say that foo is OK with
either format bar or format baz.  On the other hand, if we split the
option into bar_foo and baz_foo, then the first loop that does the
initial parsing has to support both cases, and then you still need a
separate cross-check for each one.

  

That would argue in favor of a format option that defines the format. Right
now I find it bogus to have to say (csv on, csv_header on). If csv_header is
on that should imply csv on.
The only problem I have is that it is not obvious what options are generic
COPY options and what are options of an option (like format options).
So maybe a tradeoff is to differentiate format specific options like in:
(delimiter '.', format csv, format_header, format_escape...)
This should also make clear if someone develops a new format what options
need to be addressed.



I think this is a false dichotomy.  It isn't necessarily the case that
every format will support a delimiter option either.  For example, if
we were to add an XML or JSON format (which I'm not at all convinced
is a good idea, but I'm sure someone is going to propose it!) it
certainly won't support specifying an arbitrary delimiter.

IOW, *every* format will have different needs and we can't necessarily
know which options will be applicable to those needs.  But as long as
we agree that we won't use the same option for two different
format-specific options with wildly different semantics, I don't think
that undecorated names are going to cause us much trouble.  It's also
less typing.

  

PS: I don't know why but as I write this message I already feel that Tom
hates this new proposal :-D



I get those feeling sometimes myself.  :-)  Anyway, FWIW, I think Tom
has analyzed this one correctly...

...Robert

  



--
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consulting

--
Web: http://www.frogthinker.org
email: m...@frogthinker.org
Skype: emmanuel_cecchet


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generic copy options

2009-09-20 Thread Emmanuel Cecchet


Tom Lane wrote:

Emmanuel Cecchet  writes:
  

Here you will force every format to use the same set of options



How does this "force" any such thing?
  
As far as I understand it, every format will have to handle every format 
options that may exist so that they can either implement it or throw an 
error.
and if 
someone introduces a new option, you will have to modify all other 
formats to make sure they throw an error telling the user that this 
option is not supported.



Well, if we do it your way then we will instead need a collection of
code to throw errors for combinations like (xml on, csv_header on).
I don't really see any improvement there.
  
That would argue in favor of a format option that defines the format. 
Right now I find it bogus to have to say (csv on, csv_header on). If 
csv_header is on that should imply csv on.
The only problem I have is that it is not obvious what options are 
generic COPY options and what are options of an option (like format 
options).
So maybe a tradeoff is to differentiate format specific options like in: 
(delimiter '.', format csv, format_header, format_escape...)
This should also make clear if someone develops a new format what 
options need to be addressed.


Emmanuel
PS: I don't know why but as I write this message I already feel that Tom 
hates this new proposal :-D


--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generic copy options

2009-09-20 Thread Emmanuel Cecchet


Tom Lane wrote:

No, I don't think so.  Suppose I write

COPY ... (xml_header on)

If HEADER isn't actually an option supported by XML format, what I will
get here is an "unknown option" error, which conveys just about nothing
--- is it really an unsupported combination, or did I just misspell the
option name?

Well, I don't see why you would write that if the option is not documented.
Usually as a user, when I need to use a command, I look at the doc/man 
page and use the options that are indicated, I don't try to invent new 
options. That should prevent the kind of scenario you describe here:

  If we go with the other way then I would expect

COPY ... (xml, header on)

to draw a specific "HEADER is not supported in XML format" error.
Of course, that will require some extra code to make it happen.
So you could argue that format-specific option names are easier
from the lazy programmer's viewpoint.  But I don't believe the
argument that they're better from the user's viewpoint.
  
Here you will force every format to use the same set of options and if 
someone introduces a new option, you will have to modify all other 
formats to make sure they throw an error telling the user that this 
option is not supported. I don't think this is a great design and that 
it will be easy to extend.


Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generic copy options

2009-09-20 Thread Emmanuel Cecchet


Tom Lane wrote:

Robert Haas  writes:
  

...  A related question is whether we
should replace the "CSV" option with a "FORMAT" option for which one
of the possible choices is "CSV" (much as we did with EXPLAIN).



That might be a good idea --- otherwise we'd need some ad-hoc code
to check that only one format option has been selected.  On the other
hand, it wouldn't be all that much code; and it would be a rather larger
change from previous behavior than we were contemplating to start with.
Comments anyone?
  
That would assume that the semantic of the other options (header, quote, 
espace, ...) is exactly the same for each format. Otherwise this will be 
a nightmare to document.
If we don't prefix with CSV, I guess that some users will spend some 
time to figure out that NULL is not a format option but FORCE NOT NULL 
is. If an option is only supported by one format (let's say XML) we will 
have to document every time which options are supported by which format 
which would be much easier and intuitive is options were readily 
prefixed by the format name.
If you look at the COPY documentation in the error logging patch, if we 
strip the error_logging prefix, it is going to be very confusing.
But I am willing to let Tom choose whatever he prefers as his birthday 
gift ;-)



One other minor point is that the patch introduces an empty-list
syntax for individual option values, but then treats it the same
as specifying nothing:
  


  

It seemed like a good idea because if you can do force_quote (a, b, c)
and force_quote (a, b) you might think that you could also do
force_quote ().  Particularly for the sake of people generating SQL
automatically by some means, it seems like this might simplify life.
But possibly it shouldn't evaluate to the same value, so that you
can't write OIDS () to mean the same thing as OIDS ON.



Yeah, that type of scenario was why I didn't like it.  I am not
impressed by the empty-list argument, either.  We don't support empty
lists with that sort of syntax in most other places, so why here?
There are counter-precedents even in the syntax of COPY itself:
you can't write "()" for an empty column name list, and you can't
write "()" for an empty copy_generic_option_list.
  
Well this one was in Robert's initial patch and I left it as is. I don't 
have any strong opinion for or against it.


Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generic copy options

2009-09-18 Thread Emmanuel Cecchet


Josh Berkus wrote:

Nope, but it was on the checklist and I was being thorough.



That's a good thing.  I was just seeing if I needed to get involved in
performance testing.
  
That would be good to have more people test the autopartitioning feature 
in COPY. If you want to be involved in performance testing on that, that 
would be great.


Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generic copy options

2009-09-17 Thread Emmanuel Cecchet


On that particular patch, as Robert mentioned, only the parsing has changed.
In the case of \copy, the parsing is much lighter than before in psql 
(remains the same in the server). The bigger the COPY operation the less 
you will see the impact of the parsing since it is done only once for 
the entire operation.


Emmanuel

Dan Colish wrote:

On Thu, Sep 17, 2009 at 07:10:35PM -0400, Andrew Dunstan wrote:
  

Greg Smith wrote:


On Thu, 17 Sep 2009, Dan Colish wrote:

  
- Performance appears to be the same although I don't have a good 
way for

  testing this at the moment


Here's what I do to generate simple COPY performance test cases:

CREATE TABLE t (i integer);
INSERT INTO t SELECT x FROM generate_series(1,10) AS x;
\timing
COPY t TO '/some/file' WITH [options];
BEGIN;
TRUNCATE TABLE t;
COPY t FROM '/some/file' WITH [options];
COMMIT;

You can adjust the size of the generated table based on whether you  
want to minimize (small number) or maximize (big number) the impact of  
the setup overhead relative to actual processing time.  Big numbers  
make sense if there's a per-row change, small ones if it's mainly COPY  
setup that's been changed if you want a small bit of data to test  
against.


An example with one column in it is a good test case for seeing  
whether per-row impact has gone up.  You'd want something with a wider  
row for other types of performance tests.


The reason for the BEGIN/COMMIT there is that form utilizes an  
optimization that lowers WAL volume when doing the COPY insertion,  
which makes it more likely you'll be testing performance of the right  
thing.



  
I usually prefer to test with a table that is more varied than anything  
you can make with generate_series. When I tested my ragged copy patch  
the other day I copied 1,000,000 rows out of a large table with a  
mixture of dates, strings, numbers and nulls.


But then, it has a (tiny) per field overhead so I wanted to make sure  
that was well represented in the test.


You are certainly right about wrapping it in begin/truncate/commit (and  
when you do make sure that archiving is not on).


You probably want to make sure that the file is not on the same disk as  
the database, to avoid disk contention. Or, better, make sure that it is  
in OS file system cache, or on a RAM disk.


cheers

andrew



If someone with a more significant setup can run tests that would ideal.
I only have my laptop which is a single disk and fairly underpowered.

That said, here are my results running the script above, it looks like
the pach improves performance. I would really interested to see results
on a larger data set and heavier iron.

--
--Dan

Without Patch:

CREATE TABLE
INSERT 0 10
Timing is on.
COPY 10
Time: 83.273 ms
BEGIN
Time: 0.412 ms
TRUNCATE TABLE
Time: 0.357 ms
COPY 10
Time: 140.911 ms
COMMIT
Time: 4.909 ms

With Patch:

CREATE TABLE
INSERT 0 10
Timing is on.
COPY 10
Time: 80.205 ms
BEGIN
Time: 0.351 ms
TRUNCATE TABLE
Time: 0.346 ms
COPY 10
Time: 124.303 ms
    COMMIT
Time: 4.130 ms



  



--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generic copy options

2009-09-17 Thread Emmanuel Cecchet


Here you go.

Emmanuel

Dan Colish wrote:

On Thu, Sep 17, 2009 at 02:56:07PM -0400, Emmanuel Cecchet wrote:
  

Dan,

My bad, I copy/pasted the hard link in output/copy.source instead of  
@abs_build...@.

Here is a complete version of the patch with the fix on output/copy.source.

Emmanuel



Emmanuel, 
	

Thanks for getting that back so quickly. As I said before, it
applies cleanly and passes regression tests. I'm reading through the
	changes now. When you get a moment could you send me the patch as a 
	context diff?


--
--Dan
  



--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com

Index: src/test/regress/sql/copy2.sql
===
RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/copy2.sql,v
retrieving revision 1.18
diff -c -r1.18 copy2.sql
*** src/test/regress/sql/copy2.sql  25 Jul 2009 00:07:14 -  1.18
--- src/test/regress/sql/copy2.sql  17 Sep 2009 20:59:50 -
***
*** 73,89 
  \.
  
  -- various COPY options: delimiters, oids, NULL string
! COPY x (b, c, d, e) from stdin with oids delimiter ',' null 'x';
  50,x,45,80,90
  51,x,\x,\\x,\\\x
  52,x,\,,\\\,,\\
  \.
  
! COPY x from stdin WITH DELIMITER AS ';' NULL AS '';
  3000;;c;;
  \.
  
! COPY x from stdin WITH DELIMITER AS ':' NULL AS E'\\X';
  4000:\X:C:\X:\X
  4001:1:empty::
  4002:2:null:\X:\X
--- 73,89 
  \.
  
  -- various COPY options: delimiters, oids, NULL string
! COPY x (b, c, d, e) from stdin (oids, delimiter ',', null 'x');
  50,x,45,80,90
  51,x,\x,\\x,\\\x
  52,x,\,,\\\,,\\
  \.
  
! COPY x from stdin (DELIMITER ';', NULL '');
  3000;;c;;
  \.
  
! COPY x from stdin (DELIMITER ':', NULL E'\\X');
  4000:\X:C:\X:\X
  4001:1:empty::
  4002:2:null:\X:\X
***
*** 108,120 
  INSERT INTO no_oids (a, b) VALUES (20, 30);
  
  -- should fail
! COPY no_oids FROM stdin WITH OIDS;
! COPY no_oids TO stdout WITH OIDS;
  
  -- check copy out
  COPY x TO stdout;
  COPY x (c, e) TO stdout;
! COPY x (b, e) TO stdout WITH NULL 'I''m null';
  
  CREATE TEMP TABLE y (
col1 text,
--- 108,120 
  INSERT INTO no_oids (a, b) VALUES (20, 30);
  
  -- should fail
! COPY no_oids FROM stdin (OIDS);
! COPY no_oids TO stdout (OIDS);
  
  -- check copy out
  COPY x TO stdout;
  COPY x (c, e) TO stdout;
! COPY x (b, e) TO stdout (NULL 'I''m null');
  
  CREATE TEMP TABLE y (
col1 text,
***
*** 130,140 
  COPY y TO stdout WITH CSV FORCE QUOTE col2 ESCAPE E'\\';
  COPY y TO stdout WITH CSV FORCE QUOTE *;
  
  --test that we read consecutive LFs properly
  
  CREATE TEMP TABLE testnl (a int, b text, c int);
  
! COPY testnl FROM stdin CSV;
  1,"a field with two LFs
  
  inside",2
--- 130,152 
  COPY y TO stdout WITH CSV FORCE QUOTE col2 ESCAPE E'\\';
  COPY y TO stdout WITH CSV FORCE QUOTE *;
  
+ -- Test new 8.5 syntax
+ 
+ COPY y TO stdout (CSV);
+ COPY y TO stdout (CSV, CSV_QUOTE '''', DELIMITER '|');
+ COPY y TO stdout (CSV, CSV_FORCE_QUOTE (col2), CSV_ESCAPE E'\\');
+ COPY y TO stdout (CSV, CSV_FORCE_QUOTE *);
+ 
+ \COPY y TO stdout (CSV)
+ \COPY y TO stdout (CSV, CSV_QUOTE '''', DELIMITER '|')
+ \COPY y TO stdout (CSV, CSV_FORCE_QUOTE (col2), CSV_ESCAPE E'\\')
+ \COPY y TO stdout (CSV, CSV_FORCE_QUOTE *)
+ 
  --test that we read consecutive LFs properly
  
  CREATE TEMP TABLE testnl (a int, b text, c int);
  
! COPY testnl FROM stdin (CSV);
  1,"a field with two LFs
  
  inside",2
***
*** 143,156 
  -- test end of copy marker
  CREATE TEMP TABLE testeoc (a text);
  
! COPY testeoc FROM stdin CSV;
  a\.
  \.b
  c\.d
  "\."
  \.
  
! COPY testeoc TO stdout CSV;
  
  DROP TABLE x, y;
  DROP FUNCTION fn_x_before();
--- 155,168 
  -- test end of copy marker
  CREATE TEMP TABLE testeoc (a text);
  
! COPY testeoc FROM stdin (CSV);
  a\.
  \.b
  c\.d
  "\."
  \.
  
! COPY testeoc TO stdout (CSV);
  
  DROP TABLE x, y;
  DROP FUNCTION fn_x_before();
Index: src/test/regress/sql/aggregates.sql
===
RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/aggregates.sql,v
retrieving revision 1.15
diff -c -r1.15 aggregates.sql
*** src/test/regress/sql/aggregates.sql 25 Apr 2009 16:44:56 -  1.15
--- src/test/regress/sql/aggregates.sql 17 Sep 2009 20:59:50 -
***
*** 104,110 
BIT_OR(i4)  AS "?"
  FROM bitwise_test;
  
! COPY bitwise_test FROM STDIN NULL 'null';
  1 1   1   1   1   B0101
  3 3   3   null2   B0100
  7 7   7   3   4   B1100
--- 104,110 
BIT_OR(i4)  AS

Re: [HACKERS] generic copy options

2009-09-17 Thread Emmanuel Cecchet


Dan,

My bad, I copy/pasted the hard link in output/copy.source instead of 
@abs_build...@.

Here is a complete version of the patch with the fix on output/copy.source.

Emmanuel


On Thu, Sep 17, 2009 at 11:07:33AM -0400, Emmanuel Cecchet wrote:
  

Tom Lane wrote:


I wonder though if we couldn't simplify matters. Offhand it seems to me
that psql doesn't need to validate the command's syntax fully.  All it
really needs to do is find the target filename and replace it with
STDIN/STDOUT.  Could we have it just treat the remainder of the line
literally, and not worry about the details of what the options might be?
Let the backend worry about throwing an error if they're bad.
  
  

New version with the simplified psql. Still supports the 7.3 syntax.
I am going to look into porting the other COPY enhancements (error  
logging and autopartitioning) on this implementation. We might come up  
with new ideas for the documentation side of things with more options.


manu

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com




Hi,

I've been working on a review of this patch and currently its
failing regression tests. Here's the regression.diff. I'm going to
spend some time today trying to figure out if the tests need to
change of there is an actual issue in the patch. Just wanted to give
a heads up.

--
--Dan

*** /home/dc0lish/workspace/postgresql/src/test/regress/expected/copy.out   
2009-09-17 11:45:04.041818319 -0700
--- /home/dc0lish/workspace/postgresql/src/test/regress/results/copy.out
2009-09-17 11:45:14.215152558 -0700
***
*** 72,88 
  1,a,1
  2,b,2
  -- Repeat the above tests with the new 8.5 option syntax
! copy copytest to 
'/home/manu/workspace/Postgres8.5-COPY/src/test/regress/results/copytest.csv' 
(csv);
  truncate copytest2;
! copy copytest2 from 
'/home/manu/workspace/Postgres8.5-COPY/src/test/regress/results/copytest.csv' 
(csv);
  select * from copytest except select * from copytest2;
   style | test | filler 
  ---+--+

  (0 rows)
  
  truncate copytest2;

! copy copytest to 
'/home/manu/workspace/Postgres8.5-COPY/src/test/regress/results/copytest.csv' 
(csv, csv_quote '''', csv_escape E'\\');
! copy copytest2 from 
'/home/manu/workspace/Postgres8.5-COPY/src/test/regress/results/copytest.csv' 
(csv, csv_quote '''', csv_escape E'\\');
  select * from copytest except select * from copytest2;
   style | test | filler 
  ---+--+

--- 72,88 
  1,a,1
  2,b,2
  -- Repeat the above tests with the new 8.5 option syntax
! copy copytest to 
'/home/dc0lish/workspace/postgresql/src/test/regress/results/copytest.csv' 
(csv);
  truncate copytest2;
! copy copytest2 from 
'/home/dc0lish/workspace/postgresql/src/test/regress/results/copytest.csv' 
(csv);
  select * from copytest except select * from copytest2;
   style | test | filler 
  ---+--+

  (0 rows)
  
  truncate copytest2;

! copy copytest to 
'/home/dc0lish/workspace/postgresql/src/test/regress/results/copytest.csv' 
(csv, csv_quote '''', csv_escape E'\\');
! copy copytest2 from 
'/home/dc0lish/workspace/postgresql/src/test/regress/results/copytest.csv' 
(csv, csv_quote '''', csv_escape E'\\');
  select * from copytest except select * from copytest2;
   style | test | filler 
  ---+--+

***
*** 95,111 
  1,a,1
  2,b,2
  -- Repeat the above tests with the new 8.5 option syntax from psql
! \copy copytest to 
'/home/manu/workspace/Postgres8.5-COPY/src/test/regress/results/copytest.csv' 
(csv)
  truncate copytest2;
! \copy copytest2 from 
'/home/manu/workspace/Postgres8.5-COPY/src/test/regress/results/copytest.csv' 
(csv)
  select * from copytest except select * from copytest2;
   style | test | filler 
  ---+--+

  (0 rows)
  
  truncate copytest2;

! \copy copytest to 
'/home/manu/workspace/Postgres8.5-COPY/src/test/regress/results/copytest.csv' 
(csv, csv_quote '''', csv_escape E'\\')
! \copy copytest2 from 
'/home/manu/workspace/Postgres8.5-COPY/src/test/regress/results/copytest.csv' 
(csv, csv_quote '''', csv_escape E'\\')
  select * from copytest except select * from copytest2;
   style | test | filler 
  ---+--+

--- 95,111 
  1,a,1
  2,b,2
  -- Repeat the above tests with the new 8.5 option syntax from psql
! \copy copytest to 
'/home/dc0lish/workspace/postgresql/src/test/regress/results/copytest.csv' (csv)
  truncate copytest2;
! \copy copytest2 from 
'/home/dc0lish/workspace/postgresql/src/test/regress/results/copytest.csv' (csv)
  select * from copytest except select * fro

Re: [HACKERS] generic copy options

2009-09-17 Thread Emmanuel Cecchet


Tom Lane wrote:

I wonder though if we couldn't simplify matters. Offhand it seems to me
that psql doesn't need to validate the command's syntax fully.  All it
really needs to do is find the target filename and replace it with
STDIN/STDOUT.  Could we have it just treat the remainder of the line
literally, and not worry about the details of what the options might be?
Let the backend worry about throwing an error if they're bad.
  

New version with the simplified psql. Still supports the 7.3 syntax.
I am going to look into porting the other COPY enhancements (error 
logging and autopartitioning) on this implementation. We might come up 
with new ideas for the documentation side of things with more options.


manu

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com

### Eclipse Workspace Patch 1.0
#P Postgres8.5-COPY
Index: src/test/regress/sql/copy2.sql
===
RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/copy2.sql,v
retrieving revision 1.18
diff -u -r1.18 copy2.sql
--- src/test/regress/sql/copy2.sql  25 Jul 2009 00:07:14 -  1.18
+++ src/test/regress/sql/copy2.sql  17 Sep 2009 15:04:06 -
@@ -73,17 +73,17 @@
 \.
 
 -- various COPY options: delimiters, oids, NULL string
-COPY x (b, c, d, e) from stdin with oids delimiter ',' null 'x';
+COPY x (b, c, d, e) from stdin (oids, delimiter ',', null 'x');
 50,x,45,80,90
 51,x,\x,\\x,\\\x
 52,x,\,,\\\,,\\
 \.
 
-COPY x from stdin WITH DELIMITER AS ';' NULL AS '';
+COPY x from stdin (DELIMITER ';', NULL '');
 3000;;c;;
 \.
 
-COPY x from stdin WITH DELIMITER AS ':' NULL AS E'\\X';
+COPY x from stdin (DELIMITER ':', NULL E'\\X');
 4000:\X:C:\X:\X
 4001:1:empty::
 4002:2:null:\X:\X
@@ -108,13 +108,13 @@
 INSERT INTO no_oids (a, b) VALUES (20, 30);
 
 -- should fail
-COPY no_oids FROM stdin WITH OIDS;
-COPY no_oids TO stdout WITH OIDS;
+COPY no_oids FROM stdin (OIDS);
+COPY no_oids TO stdout (OIDS);
 
 -- check copy out
 COPY x TO stdout;
 COPY x (c, e) TO stdout;
-COPY x (b, e) TO stdout WITH NULL 'I''m null';
+COPY x (b, e) TO stdout (NULL 'I''m null');
 
 CREATE TEMP TABLE y (
col1 text,
@@ -130,11 +130,23 @@
 COPY y TO stdout WITH CSV FORCE QUOTE col2 ESCAPE E'\\';
 COPY y TO stdout WITH CSV FORCE QUOTE *;
 
+-- Test new 8.5 syntax
+
+COPY y TO stdout (CSV);
+COPY y TO stdout (CSV, CSV_QUOTE '''', DELIMITER '|');
+COPY y TO stdout (CSV, CSV_FORCE_QUOTE (col2), CSV_ESCAPE E'\\');
+COPY y TO stdout (CSV, CSV_FORCE_QUOTE *);
+
+\COPY y TO stdout (CSV)
+\COPY y TO stdout (CSV, CSV_QUOTE '''', DELIMITER '|')
+\COPY y TO stdout (CSV, CSV_FORCE_QUOTE (col2), CSV_ESCAPE E'\\')
+\COPY y TO stdout (CSV, CSV_FORCE_QUOTE *)
+
 --test that we read consecutive LFs properly
 
 CREATE TEMP TABLE testnl (a int, b text, c int);
 
-COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin (CSV);
 1,"a field with two LFs
 
 inside",2
@@ -143,14 +155,14 @@
 -- test end of copy marker
 CREATE TEMP TABLE testeoc (a text);
 
-COPY testeoc FROM stdin CSV;
+COPY testeoc FROM stdin (CSV);
 a\.
 \.b
 c\.d
 "\."
 \.
 
-COPY testeoc TO stdout CSV;
+COPY testeoc TO stdout (CSV);
 
 DROP TABLE x, y;
 DROP FUNCTION fn_x_before();
Index: src/test/regress/sql/aggregates.sql
===
RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/aggregates.sql,v
retrieving revision 1.15
diff -u -r1.15 aggregates.sql
--- src/test/regress/sql/aggregates.sql 25 Apr 2009 16:44:56 -  1.15
+++ src/test/regress/sql/aggregates.sql 17 Sep 2009 15:04:06 -
@@ -104,7 +104,7 @@
   BIT_OR(i4)  AS "?"
 FROM bitwise_test;
 
-COPY bitwise_test FROM STDIN NULL 'null';
+COPY bitwise_test FROM STDIN (NULL 'null');
 1  1   1   1   1   B0101
 3  3   3   null2   B0100
 7  7   7   3   4   B1100
@@ -171,7 +171,7 @@
   BOOL_OR(b3)AS "n"
 FROM bool_test;
 
-COPY bool_test FROM STDIN NULL 'null';
+COPY bool_test FROM STDIN (NULL 'null');
 TRUE   nullFALSE   null
 FALSE  TRUEnullnull
 null   TRUEFALSE   null
Index: src/test/regress/sql/copyselect.sql
===
RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/copyselect.sql,v
retrieving revision 1.2
diff -u -r1.2 copyselect.sql
--- src/test/regress/sql/copyselect.sql 7 Aug 2008 01:11:52 -   1.2
+++ src/test/regress/sql/copyselect.sql 17 Sep 2009 15:04:06 -
@@ -61,7 +61,7 @@
 --
 -- Test headers, CSV and quotes
 --
-copy (select t from test1 where id = 1) to stdout csv header force quote t;
+copy (se

Re: [HACKERS] generic copy options

2009-09-17 Thread Emmanuel Cecchet


Tom Lane wrote:

Emmanuel Cecchet  writes:
  

Tom Lane wrote:


psql has MORE need to support old syntax than the backend does, because
it's supposed to work against old servers.

  

Well, I wonder how many users just upgrade psql vs upgrade the server.



We have established a project policy that psql backslash commands will
support servers at least back to 7.4, and a great deal of work has
already been expended in support of that goal.  It is not within the
charter of this patch to ignore or redefine that policy.
  

Does that mean that we can drop the 7.3 syntax or should we just keep it?
For future references, where can I find the various project policies?

Thanks
Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generic copy options

2009-09-17 Thread Emmanuel Cecchet


Pavel Stehule wrote:

Well, I wonder how many users just upgrade psql vs upgrade the server. I was
thinking that when users perform a database upgrade their application often
remain the same and therefore the server needs to support the old syntax.
Unless you are upgrading a machine where a bunch of psql-based scripts are
running to update various remote Postgres instances with older versions, I
would guess that it is unlikely that someone is going to upgrade psql and
keep the old instance of the server on the same machine.
I just wonder how many users are using a single psql to manage multiple
server instances of different older versions.



What application, that use current copy format for fast data import? I
thing, so doing incompatible changes of copy statement syntax is very
bad idea.
  
The old syntax is still supported in both psql and the server but I am 
not sure how many applications are relying on psql to perform a copy 
operation (actually a \copy).


manu

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generic copy options

2009-09-17 Thread Emmanuel Cecchet


Tom Lane wrote:
While I understand the need for the server to still support the syntax, 
is it necessary for newer version of psql to support the old syntax?



psql has MORE need to support old syntax than the backend does, because
it's supposed to work against old servers.
  
Well, I wonder how many users just upgrade psql vs upgrade the server. I 
was thinking that when users perform a database upgrade their 
application often remain the same and therefore the server needs to 
support the old syntax. Unless you are upgrading a machine where a bunch 
of psql-based scripts are running to update various remote Postgres 
instances with older versions, I would guess that it is unlikely that 
someone is going to upgrade psql and keep the old instance of the server 
on the same machine.
I just wonder how many users are using a single psql to manage multiple 
server instances of different older versions.

I wonder though if we couldn't simplify matters.  Offhand it seems to me
that psql doesn't need to validate the command's syntax fully.  All it
really needs to do is find the target filename and replace it with
STDIN/STDOUT.  Could we have it just treat the remainder of the line
literally, and not worry about the details of what the options might be?
Let the backend worry about throwing an error if they're bad.
  
As the only difference between \copy and copy seems to be the ability to 
stream the file from the client, I guess that everything else should be 
sent as is to the server as you suggest. I'll come with a patch for that 
today.


Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generic copy options

2009-09-17 Thread Emmanuel Cecchet


Pavel,

I am not sure about syntax change. Isn't better solve this problem
well. This is too simple solution. I thinking, so we able to add new
parser for COPY statement and share this paraser between SQL and psql.
  
Refactoring COPY to put new parsers seems to be another project. The 
idea here is that the parser does not have to be changed again if we add 
new options, we just have to handle them in the COPY code (which will 
probably have to be refactored at some point as it was discussed in a 
previous thread).


manu

2009/9/17 Emmanuel Cecchet :
  

Robert Haas wrote:


I don't think the way the doc changes are formatted is consistent with
what we've done elsewhere.  I think that breaking the options out as a
separate block could be OK (because otherwise they have to be
duplicated between COPY TO and COPY FROM) but it should be done more
like the way that the SELECT page is done.
  

I looked at the way it is done in SELECT and there is a section per clause
(from clause, where clause, ...). So I am not sure how you want to apply
that here besides the copy parameters and the option clause.


Also, you haven't
documented the syntax 100% correctly: the boolean options work just
like the boolean explain options - they take an optional argument
which if omitted defaults to true, but you can also specify 0, 1,
true, false, on, off.  See defGetBoolean.  So those should be
specified as:

BINARY [boolean]
OIDS [boolean]
CSV [boolean]
CSV_HEADER [boolean]

See how we did it in sql-explain.html.

  

Ok, fixed.


I changed the name of the CSV options to prefix them with csv_ to avoid
confusion with any future options. I also had to change the grammar to
allow
'*' as a parameter (needed for cvs_force_quote).



You seem to have introduced a LARGE number of unnecessary whitespace
changes here which are not going to fly.  You need to go through and
revert all of those.  It's hard to tell what you've really changed
here, but also every whitespace change that gets committed is a
potential merge conflict for someone else; plus pgindent will
eventually change it back, thus creating another potential merge
conflict for someone else.

  

Sorry, I overlooked a format in Eclipse that formatted the whole file
instead of the block I was working on. This should be fixed now.


I am not 100% sold on renaming all of the CSV-specific options to add
"csv_".  I would like to get an opinion from someone else on whether
that is a good idea or not.  I am fairly certain it is NOT a good idea
to support BOTH the old and new option names, as you've done here.  If
you're going to rename them, you should update gram.y and change the
makeDefElem() calls within the copy_opt_list productions to emit the
new names.

  

Agreed for the makeDefElem().
For changing the names, I think that names like 'header', 'escape' and
'quote' are too generic to not conflict with something that is not csv. If
you think of another format that could be added to copy, it is likely to
re-use the same variable names. The only thing that seems odd is that if you
use a CSV_* option, you still have to add CSV [on] to the option list which
seems kind of redundant.



When we decide to drop the old syntax (in 8.6?), we will be able to clean
a
lot especially in psql.



Considering that we are still carrying syntax that was deprecated in
7.3, I don't think it's likely that we'll phase out the present syntax
anywhere nearly that quickly.  But it's reasonable to ask whether we
should think about removing support for the pre-7.3 syntax altogether
for 8.5.  It doesn't seem to cost us much to keep that support around,
but then again it's been deprecated for seven major releases, so it
might be about time.

  

While I understand the need for the server to still support the syntax, is
it necessary for newer version of psql to support the old syntax?

I am attaching the new version of the patch with the current modifications
addressing your comments.

Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


### Eclipse Workspace Patch 1.0
#P Postgres8.5-COPY
Index: src/test/regress/sql/copy2.sql
===
RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/copy2.sql,v
retrieving revision 1.18
diff -u -r1.18 copy2.sql
--- src/test/regress/sql/copy2.sql  25 Jul 2009 00:07:14 -  1.18
+++ src/test/regress/sql/copy2.sql  17 Sep 2009 03:14:48 -
@@ -73,17 +73,17 @@
 \.

 -- various COPY options: delimiters, oids, NULL string
-COPY x (b, c, d, e) from stdin with oids delimiter ',' null 'x';
+COPY x (b, c, d, e) from stdin (oids, delimiter ',', null 'x');
 50,x,45,80,90
 51,x,\x,\\x,\\\x
 52,x,\,,\\\,,\\
 \.

-COPY x from stdin WITH DELIMITER AS &#x

Re: [HACKERS] generic copy options

2009-09-16 Thread Emmanuel Cecchet


Robert Haas wrote:

I don't think the way the doc changes are formatted is consistent with
what we've done elsewhere.  I think that breaking the options out as a
separate block could be OK (because otherwise they have to be
duplicated between COPY TO and COPY FROM) but it should be done more
like the way that the SELECT page is done.  
I looked at the way it is done in SELECT and there is a section per 
clause (from clause, where clause, ...). So I am not sure how you want 
to apply that here besides the copy parameters and the option clause.

Also, you haven't
documented the syntax 100% correctly: the boolean options work just
like the boolean explain options - they take an optional argument
which if omitted defaults to true, but you can also specify 0, 1,
true, false, on, off.  See defGetBoolean.  So those should be
specified as:

BINARY [boolean]
OIDS [boolean]
CSV [boolean]
CSV_HEADER [boolean]

See how we did it in sql-explain.html.
  

Ok, fixed.

I changed the name of the CSV options to prefix them with csv_ to avoid
confusion with any future options. I also had to change the grammar to allow
'*' as a parameter (needed for cvs_force_quote).



You seem to have introduced a LARGE number of unnecessary whitespace
changes here which are not going to fly.  You need to go through and
revert all of those.  It's hard to tell what you've really changed
here, but also every whitespace change that gets committed is a
potential merge conflict for someone else; plus pgindent will
eventually change it back, thus creating another potential merge
conflict for someone else.
  
Sorry, I overlooked a format in Eclipse that formatted the whole file 
instead of the block I was working on. This should be fixed now.

I am not 100% sold on renaming all of the CSV-specific options to add
"csv_".  I would like to get an opinion from someone else on whether
that is a good idea or not.  I am fairly certain it is NOT a good idea
to support BOTH the old and new option names, as you've done here.  If
you're going to rename them, you should update gram.y and change the
makeDefElem() calls within the copy_opt_list productions to emit the
new names.
  

Agreed for the makeDefElem().
For changing the names, I think that names like 'header', 'escape' and 
'quote' are too generic to not conflict with something that is not csv. 
If you think of another format that could be added to copy, it is likely 
to re-use the same variable names. The only thing that seems odd is that 
if you use a CSV_* option, you still have to add CSV [on] to the option 
list which seems kind of redundant.



When we decide to drop the old syntax (in 8.6?), we will be able to clean a
lot especially in psql.


Considering that we are still carrying syntax that was deprecated in
7.3, I don't think it's likely that we'll phase out the present syntax
anywhere nearly that quickly.  But it's reasonable to ask whether we
should think about removing support for the pre-7.3 syntax altogether
for 8.5.  It doesn't seem to cost us much to keep that support around,
but then again it's been deprecated for seven major releases, so it
might be about time.
  
While I understand the need for the server to still support the syntax, 
is it necessary for newer version of psql to support the old syntax?


I am attaching the new version of the patch with the current 
modifications addressing your comments.


Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com

### Eclipse Workspace Patch 1.0
#P Postgres8.5-COPY
Index: src/test/regress/sql/copy2.sql
===
RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/copy2.sql,v
retrieving revision 1.18
diff -u -r1.18 copy2.sql
--- src/test/regress/sql/copy2.sql  25 Jul 2009 00:07:14 -  1.18
+++ src/test/regress/sql/copy2.sql  17 Sep 2009 03:14:48 -
@@ -73,17 +73,17 @@
 \.
 
 -- various COPY options: delimiters, oids, NULL string
-COPY x (b, c, d, e) from stdin with oids delimiter ',' null 'x';
+COPY x (b, c, d, e) from stdin (oids, delimiter ',', null 'x');
 50,x,45,80,90
 51,x,\x,\\x,\\\x
 52,x,\,,\\\,,\\
 \.
 
-COPY x from stdin WITH DELIMITER AS ';' NULL AS '';
+COPY x from stdin (DELIMITER ';', NULL '');
 3000;;c;;
 \.
 
-COPY x from stdin WITH DELIMITER AS ':' NULL AS E'\\X';
+COPY x from stdin (DELIMITER ':', NULL E'\\X');
 4000:\X:C:\X:\X
 4001:1:empty::
 4002:2:null:\X:\X
@@ -108,13 +108,13 @@
 INSERT INTO no_oids (a, b) VALUES (20, 30);
 
 -- should fail
-COPY no_oids FROM stdin WITH OIDS;
-COPY no_oids TO stdout WITH OIDS;
+COPY no_oids FROM stdin (OIDS);
+COPY no_oids TO stdout (OIDS);
 
 -- check copy out
 COPY x TO stdout;
 COPY x (c, e) TO stdout;
-COPY x (b, e) TO st

Re: [HACKERS] generic copy options

2009-09-16 Thread Emmanuel Cecchet


Robert,

Here is a new version of the patch with an updated doc and psql.
I changed the name of the CSV options to prefix them with csv_ to avoid 
confusion with any future options. I also had to change the grammar to 
allow '*' as a parameter (needed for cvs_force_quote).


When we decide to drop the old syntax (in 8.6?), we will be able to 
clean a lot especially in psql.


Emmanuel


On Mon, Sep 14, 2009 at 2:51 PM, Emmanuel Cecchet  wrote:
  

This looks good. Shoud I try to elaborate on that for the patch with error
logging and autopartitioning in COPY?



That make sense to me.  You shouldn't need to do anything else in
gram.y; whatever you want to add should just involve changing copy.c.
If not, please post the details.

We also need to fix the psql end of this, and the docs...  any
interest in taking a crack at either of those?

...Robert

  



--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com

### Eclipse Workspace Patch 1.0
#P Postgres8.5-COPY
Index: src/test/regress/sql/copy2.sql
===
RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/copy2.sql,v
retrieving revision 1.18
diff -u -r1.18 copy2.sql
--- src/test/regress/sql/copy2.sql  25 Jul 2009 00:07:14 -  1.18
+++ src/test/regress/sql/copy2.sql  16 Sep 2009 22:37:31 -
@@ -73,17 +73,17 @@
 \.
 
 -- various COPY options: delimiters, oids, NULL string
-COPY x (b, c, d, e) from stdin with oids delimiter ',' null 'x';
+COPY x (b, c, d, e) from stdin (oids, delimiter ',', null 'x');
 50,x,45,80,90
 51,x,\x,\\x,\\\x
 52,x,\,,\\\,,\\
 \.
 
-COPY x from stdin WITH DELIMITER AS ';' NULL AS '';
+COPY x from stdin (DELIMITER ';', NULL '');
 3000;;c;;
 \.
 
-COPY x from stdin WITH DELIMITER AS ':' NULL AS E'\\X';
+COPY x from stdin (DELIMITER ':', NULL E'\\X');
 4000:\X:C:\X:\X
 4001:1:empty::
 4002:2:null:\X:\X
@@ -108,13 +108,13 @@
 INSERT INTO no_oids (a, b) VALUES (20, 30);
 
 -- should fail
-COPY no_oids FROM stdin WITH OIDS;
-COPY no_oids TO stdout WITH OIDS;
+COPY no_oids FROM stdin (OIDS);
+COPY no_oids TO stdout (OIDS);
 
 -- check copy out
 COPY x TO stdout;
 COPY x (c, e) TO stdout;
-COPY x (b, e) TO stdout WITH NULL 'I''m null';
+COPY x (b, e) TO stdout (NULL 'I''m null');
 
 CREATE TEMP TABLE y (
col1 text,
@@ -130,11 +130,23 @@
 COPY y TO stdout WITH CSV FORCE QUOTE col2 ESCAPE E'\\';
 COPY y TO stdout WITH CSV FORCE QUOTE *;
 
+-- Test new 8.5 syntax
+
+COPY y TO stdout (CSV);
+COPY y TO stdout (CSV, CSV_QUOTE '''', DELIMITER '|');
+COPY y TO stdout (CSV, CSV_FORCE_QUOTE (col2), CSV_ESCAPE E'\\');
+COPY y TO stdout (CSV, CSV_FORCE_QUOTE *);
+
+\COPY y TO stdout (CSV)
+\COPY y TO stdout (CSV, CSV_QUOTE '''', DELIMITER '|')
+\COPY y TO stdout (CSV, CSV_FORCE_QUOTE (col2), CSV_ESCAPE E'\\')
+\COPY y TO stdout (CSV, CSV_FORCE_QUOTE *)
+
 --test that we read consecutive LFs properly
 
 CREATE TEMP TABLE testnl (a int, b text, c int);
 
-COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin (CSV);
 1,"a field with two LFs
 
 inside",2
@@ -143,14 +155,14 @@
 -- test end of copy marker
 CREATE TEMP TABLE testeoc (a text);
 
-COPY testeoc FROM stdin CSV;
+COPY testeoc FROM stdin (CSV);
 a\.
 \.b
 c\.d
 "\."
 \.
 
-COPY testeoc TO stdout CSV;
+COPY testeoc TO stdout (CSV);
 
 DROP TABLE x, y;
 DROP FUNCTION fn_x_before();
Index: src/test/regress/sql/aggregates.sql
===
RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/aggregates.sql,v
retrieving revision 1.15
diff -u -r1.15 aggregates.sql
--- src/test/regress/sql/aggregates.sql 25 Apr 2009 16:44:56 -  1.15
+++ src/test/regress/sql/aggregates.sql 16 Sep 2009 22:37:31 -
@@ -104,7 +104,7 @@
   BIT_OR(i4)  AS "?"
 FROM bitwise_test;
 
-COPY bitwise_test FROM STDIN NULL 'null';
+COPY bitwise_test FROM STDIN (NULL 'null');
 1  1   1   1   1   B0101
 3  3   3   null2   B0100
 7  7   7   3   4   B1100
@@ -171,7 +171,7 @@
   BOOL_OR(b3)AS "n"
 FROM bool_test;
 
-COPY bool_test FROM STDIN NULL 'null';
+COPY bool_test FROM STDIN (NULL 'null');
 TRUE   nullFALSE   null
 FALSE  TRUEnullnull
 null   TRUEFALSE   null
Index: src/test/regress/sql/copyselect.sql
===
RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/copyselect.sql,v
retrieving revision 1.2
diff -u -r1.2 copyselect.sql
--- src/test/regress/sql/copyselect.sql 7 Aug 2008 01:11:52 -   1.2
+++ src/test/regress/sql/copyselect.sql 1

Re: [HACKERS] generic copy options

2009-09-14 Thread Emmanuel Cecchet


Robert Haas wrote:

On Mon, Sep 14, 2009 at 2:51 PM, Emmanuel Cecchet  wrote:
  

This looks good. Shoud I try to elaborate on that for the patch with error
logging and autopartitioning in COPY?



That make sense to me.  You shouldn't need to do anything else in
gram.y; whatever you want to add should just involve changing copy.c.
If not, please post the details.
  

Ok, I'll keep you posted.

We also need to fix the psql end of this, and the docs...  any
interest in taking a crack at either of those?
  

I can certainly help with the doc.
I have never looked at the psql code but that could be a good way to get 
started on that. If you can point me at where to look at, I'll give it a 
try.


Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generic copy options

2009-09-14 Thread Emmanuel Cecchet

This looks good. Shoud I try to elaborate on that for the patch with 
error logging and autopartitioning in COPY?


manu

Robert Haas wrote:

Here's a half-baked proof of concept for the above approach.  This
probably needs more testing than I've given it, and I haven't
attempted to fix the psql parser or update the documentation, but it's
at least an outline of a solution.  I did patch all the regression
tests to use the new syntax, so you can look at that part of the patch
to get a flavor for it.  If this is broadly acceptable I can attempt
to nail down the details, or someone else is welcome to pick it up.
It's on my git repo as well, as usual.
  


--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-09-14 Thread Emmanuel Cecchet


Greg Smith wrote:

On Fri, 11 Sep 2009, Emmanuel Cecchet wrote:

I guess the problem with extra or missing columns is to make sure 
that you know exactly which data belongs to which column so that you 
don't put data in the wrong columns which is likely to happen if this 
is fully automated.


Allowing the extra column case is easy:  everwhere in copy.c you find 
the error message "extra data after last expected column", just ignore 
the overflow fields rather than rejecting the line just based on 
that.  And the default information I mentioned you might want to 
substitute for missing columns is already being collected by the code 
block with the comment "Get default info if needed".
If I understand it well, you expect the garbage to be after the last 
column. But what if the extra or missing column is somewhere upfront or 
in the middle? Sometimes you might have a type conflict problem that 
will help you detect the problem, sometimes you will just insert 
garbage. This might call for another mechanism that would log the lines 
that are automatically 'adjusted' to be able to rollback any mistake 
that might happen during this automated process.


Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for automating partitions in PostgreSQL 8.4 Beta 2

2009-09-14 Thread Emmanuel Cecchet


Grzegorz Jaskiewicz wrote:

Anyone knows what's the latest on that patch ?
To be honest, this was the thing that I was looking forward most in 
8.5 ... (and probably not only me alone).
We are also interested in integrating our autopartitioning patch for 
COPY with that implementation. I can help with the partitioning 
implementation and/or testing of that feature since this is of interest 
for Aster too.


Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-09-11 Thread Emmanuel Cecchet


Robert Haas wrote:

http://developer.postgresql.org/pgdocs/postgres/sql-explain.html
  

Just out of curiosity, it looks like I could write something like:
EXPLAIN (ANALYZE TRUE, COSTS FALSE, VERBOSE TRUE, COSTS TRUE) statement

What is the expected behavior if someone puts multiple time the same 
option with different values. The last value prevails?
I know that this example looks stupid but when you have a lot of options 
it sometimes happen that you put twice an option with different values 
(that happens with some JDBC driver options...).


manu

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-09-11 Thread Emmanuel Cecchet


Greg Smith wrote:

The full set of new behavior here I'd like to see allows adjusting:

-Accept or reject rows with extra columns?
-Accept or reject rows that are missing columns at the end?
--Fill them with the default for the column (if available) or NULL?
-Save rejected rows?
--To a single system table?
--To a user-defined table?
--To the database logs?
The proposed patch save all rejected rows (with extra or missing 
columns) to a user-defined table (that can be created automatically). If 
you want to handle these bad rows on the fly, I guess you could have a 
trigger on the error table that does the appropriate processing 
depending on the data you are processing. In that case, having multiple 
error tables allows you to plug different triggers to handle possible 
error cases differently. The other option is to process the error table 
after COPY to handle the bad rows.
I guess the problem with extra or missing columns is to make sure that 
you know exactly which data belongs to which column so that you don't 
put data in the wrong columns which is likely to happen if this is fully 
automated.


I will try to re-read the thread on your proposal to better understand 
how you figure out which rows are missing or extra.


Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-09-11 Thread Emmanuel Cecchet


Tom Lane wrote:

Robert Haas  writes:
  

Or look at your CVS/git checkout.



The important point is to look at the grammar, which doesn't have any
idea what the specific options are in the list.  (Well, okay, it had
to have special cases for ANALYZE and VERBOSE because those are reserved
words :-(.  But future additions will not need to touch the grammar.
In the case of COPY that also means not having to touch psql \copy.)
  
I understand the convenience from a developer perspective but I wonder 
how this improves the user experience. If options are not explicitly 
part of the grammar:

- you cannot do automated testing based on the grammar
- it seems that it will be harder to document
- it still requires the same amount of work in psql and 3rd party tools 
to support command-completion and so on?
- why only COPY or EXPLAIN would use that syntax? what is the good limit 
between an option and something that is part of the grammar?


It looks like passing the current GUC variables as options to COPY. 
Isn't there a design problem with the parser if it is so hard to add a 
new option to a command?  In all cases, both the client and the server 
will have to support the new extension (and it will have to be 
documented) so it should not make a big difference whether it is 
explicitly part of the command grammar or a set of generic options.


manu

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-09-11 Thread Emmanuel Cecchet


Tom,

I looked at EXPLAIN 
(http://www.postgresql.org/docs/current/interactive/sql-explain.html) 
and there is not a single line of what you are talking about.

And the current syntax is just EXPLAIN [ ANALYZE ] [ VERBOSE ] /statement
/
If I try to decrypt what you said, you are looking at something like

COPY /tablename/ [ ( /column/ [, ...] ) ]
   FROM { '/filename/' | STDIN }
   [option1=value[,...]]

That would give something like:
COPY foo FROM 'input.txt' binary=on, oids=on, errors=skip, max_errors=10

If this is not what you are thinking, please provide an example.

Emmanuel
/

/

Emmanuel Cecchet  writes:
  

The new syntax could look like:



  

COPY /tablename/ [ ( /column/ [, ...] ) ]
FROM { '/filename/' | STDIN }
[ [, BINARY ]
  [, OIDS ]
  [, DELIMITER [ AS ] '/delimiter/' ]
  [, NULL [ AS ] '/null string/' ]
  [, CSV [ HEADER ]
 [ QUOTE [ AS ] '/quote/' ] 
 [ ESCAPE [ AS ] '/escape/' ]

 [ FORCE NOT NULL (/column/ [, ...]) ]
  [, ERRORS { SKIP | 
  LOG INTO { tablename | 'filename' }

[ LABEL label_name ]
[ KEY key_name ]
[ MAX ERRORS /count/ ] } ]



  

Is this what you had in mind?



No. because that doesn't do a darn thing to make the option set less
hard-wired into the syntax.  I was thinking of a strict keyword/value
format with non-wired-in keywords ... and only *one* keyword per value.
See EXPLAIN.

regards, tom lane
  



--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-09-11 Thread Emmanuel Cecchet


Hi Robert,


I like this idea, perhaps not surprisingly (for those not following at
home: that was my patch).  Unfortunately, it looks to me like there is
no way to do this without overhauling the syntax.  If the existing
syntax required a comma between options (i.e. "copy blah to stdout
binary, oids" rather than "copy to stdout binary oids", this would be
pretty straightforward; but it doesn't even allow one).
  
Well some options like CSV FORCE ... take a comma separated list of 
columns. This would require all options to become reserved keywords or 
force parenthesis around option parameters.

I wonder if we should consider allowing COPY options to be
comma-separated beginning with 8.5, and then require it in 8.6.  Other
options include continuing to support the old syntax for the existing
options, but allowing some new syntax as well and requiring its use
for all new options (this is what we did with EXPLAIN, but there were
only two pre-existing options there), and just changing the syntax
incompatibly and telling users to suck it up.  But I'm not sure I like
either of those choices.
  
We could keep the current syntax for backward compatibility only (can be 
dropped in a future release) and have a new syntax (starting in 8.5). To 
avoid confusion between both, we could just replace WITH with something 
else (or just drop it) to indicate that this is the new syntax.


The new syntax could look like:

COPY /tablename/ [ ( /column/ [, ...] ) ]
   FROM { '/filename/' | STDIN }
   [ [, BINARY ]
 [, OIDS ]
 [, DELIMITER [ AS ] '/delimiter/' ]
 [, NULL [ AS ] '/null string/' ]
 [, CSV [ HEADER ]
[ QUOTE [ AS ] '/quote/' ] 
[ ESCAPE [ AS ] '/escape/' ]

[ FORCE NOT NULL (/column/ [, ...]) ]
 [, ERRORS { SKIP | 
 LOG INTO { tablename | 'filename' }

   [ LABEL label_name ]
   [ KEY key_name ]
       [ MAX ERRORS /count/ ] } ]


Is this what you had in mind?

Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com

Re: [HACKERS] COPY enhancements

2009-09-10 Thread Emmanuel Cecchet


Tom Lane wrote:

It might be time to switch COPY over to a more easily extensible
option syntax, such as we just adopted for EXPLAIN.
  
Could you point me to the thread detailing the extensible option syntax 
for EXPLAIN? Is that new to 8.5?


Thanks
Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-09-10 Thread Emmanuel Cecchet


Josh Berkus wrote:

I am not really sure why you need a natural key.



a) because we shouldn't be building any features which teach people bad
db design, and

b) because I will presumably want to purge records from this table
periodically and doing so without a key is likely to result in purging
the wrong records.
  
Agreed, but I am not sure that imposing unilaterally a key is going to 
suit everyone.

By default, the partition_key contains the index of the faulty entry and
label the copy command. This could be your key.



Well, you still haven't explained the partition_key to me, so I'm not
quite clear on that.  Help?
  
Please re-read my previous message for the default behavior. If you look 
at the example at http://wiki.postgresql.org/wiki/Error_logging_in_COPY, 
input_file.txt has 5 rows out of which only 1 and 5 are correct (2, 3 
and 4 are bad). The partition key indicates the row that caused the 
problem (2 for row 2, 3 for row 3, ...) and label contains the full COPY 
statement.
If you want to know how it is used in the Aster product, we will have to 
ask Alex who did implement the feature in the product.



The reason why I'd like to have a session_id or pid or similar is so
that I can link the copy errors to which backend is erroring in the
other system views or in the pg_log.

Imagine a system where you have multiple network clients doing COPYs; if
one of them starts bugging out and all I have is a tablename, filename
and time, I'm not going to be able to figure out which client is causing
the problems.  The reason I mention this case is that I have a client
who has a production application like this right now

All the clients are copying the same file to the same table?
I would imagine that every client processes different files and from the 
file names it would be easy to identify the faulty client. I am not sure 
how you would use the pid to identify the faulty client more easily?


Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-09-10 Thread Emmanuel Cecchet

n to
COPY. If the COPY command fails, everything gets rolled back (data in
the destination table and error table). That would be harder to
implement with a file (the rollback part).



With a logging file, you wouldn't worry about rollback.  You'd just log
a statement to the file that it was rolled back.
  

Ok.

I) Aster's current implementation has the advantage of being able to log
to any user-defined table, giving users the flexibility to log different
COPYs to different tables, or even put them on various tablespaces.  Is
that what we want, though?  Clearly it would make things simpler for
most (but not all) users to have just a canonical pg_copy_errors table
which was in pg_catalog.  It would also presumably remove some code from
the patch (usually a good thing)  What do people think?
  
The code that would be removed would just be the table creation and the 
GUC variable for the table name. That would not remove much code.

J) One option I think we need if we're going to support logging to "any
defined logging table" is the option to make that table a temporary table.
  
Actually you can create the table beforehand (check unit tests for an 
example) and so it is possible to make it a temporary table if you want 
(I would just have to test it, I did not try it yet).

O) Is this capable of dealing with partitioning by more than one column,
or by an expression?
  
  

Yes, we just use a brute force technique where we try all child tables
1-by-1 and rely on the existing Postgres constraint checking mechanism
(no new or duplicated code there).



Sounds disastrous performance-wise.  There's no easy way to test the
expression without attempting an actual insert?
  
An option that was proposed at PGCon by someone in the audience (sorry I 
don't remember who), was to build a query plan and find out from there 
which child table should get the tuple. It will be disastrous 
performance-wise if you always have misses and need to check a lot of 
constraints (we can always build a worst case scenario test). However, 
in my tests, even with completely random data, routing is faster than a 
script doing sorting in multiple files + manual inserts in each child 
table. In practice, even a small cache size usually works very well as 
soon as you have enough locality in your source data.
The problem is that faulty tuples will have to check all constraints to 
figure out that there is no child table for them. Note that if you have 
a deep tree hierarchy, you don't necessarily have to insert at the top 
of the tree, you can start anywhere down the tree.
Once we have the full partitioning support (in 8.5?), we can probably 
re-use some of their mechanism to route directly the tuple to the right 
child.



Thanks for your great feedback,
Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-09-10 Thread Emmanuel Cecchet


Josh,

See the answers inlined.

Thank you for tackling this very long-time TODO.
  

Error logging is described here:
http://wiki.postgresql.org/wiki/Error_logging_in_COPY



Questions & Comments:

A) Why would someone want to turn error_logging on, but leave
error_logging_skip_tuples off?  The pg_log already logs errors which copy 
throws by default.
  
When error_logging is on and skip_tuples is off, errors are logged in 
the error table. If skip_tuples is on, tuples are not logged in the 
error table.

B) As I mentioned earlier, we'll want to provide the option of logging
to a file instead of to a table.  That's not a reason to reject this
patch, but probably a TODO for 8.5.
  

Ok but what should be the format of that file?

C) Are we sure we want to handle this via GUCs rather than extensions to
COPY syntax?  It seems like fairly often users would want to log
different COPY sources to different tables/files.
  
I agree that new COPY options could be easier to use, the implementation 
is just more complex. However, the labels allows you to select the 
tuples related to specific COPY commands.

D) These GUCs are userset, I hope?  (haven't dug into the code far
enough to tell yet).
  

Yes.

E) What is error_logging_tuple_label for?   You don't explain/give
examples.  And how is error_logging_tuple_partition_key used?
  
We use the label and partition key in Aster products to easily retrieve 
which COPY command on which partition did generate the bad tuples. By 
default, the tuple_label contains the COPY command that was executed 
(see example on Wiki) and the key contains the index of the tuple in the 
source file (see example on Wiki).

F) Rawdata for rejected tuples is presumably BYTEA?
  
Yes. I forgot to put back the table description that can be seen in the 
unit tests. I have updated the Wiki with the table definition.

G) We should probably have a default for error_logging_table_name, such
as pg_copy_errors.  Does that table get automatically created if it
doesn't exist?
  
Yes, as indicated on the wiki the table is created automatically (see 
config variable section).

H) Finally, one request of the TODO is some way to halt import after a
specified number of bad tuples because it probably means you have the
wrong file or wrong table.  Do we still want that?
  
We can still do that. It can be another GUC variable or an option to 
COPY. If the COPY command fails, everything gets rolled back (data in 
the destination table and error table). That would be harder to 
implement with a file (the rollback part).

Autopartitioning is described here:
http://wiki.postgresql.org/wiki/Auto-partitioning_in_COPY



M) tuple_routing_in_copy should take "on" or "off", not 0 or 1.
  

Ok.

N) Have you measured the overhead & speed of this kind of COPY as
opposed to COPY into a single table?  Have you checked the overhead if
tuple_routing_in_copy is on, but you are not loading into a partitioned
table?
  
Yes. There is no noticeable overhead if there is no routing to do (but 
routing is on).
If routing is involved, the overhead depends on how sorted your input 
data is. If it all goes to the same partition, the caching effect works 
well and there is no noticeable overhead. The cost is in the constraint 
check and it depends on the complexity of the constraint. The more 
constraints you have to check and the more complex they are, the more 
overhead on each tuple routing.

O) Is this capable of dealing with partitioning by more than one column,
or by an expression?
  
Yes, we just use a brute force technique where we try all child tables 
1-by-1 and rely on the existing Postgres constraint checking mechanism 
(no new or duplicated code there).

Finally, I'm going to suggest different names for the GUCs, as the names
you've chosen don't group well and would likely cause confusion.  Here
are my suggestions, which all begin with "copy_" for prefix matching:

error_logging --> probaby not needed, see able
error_logging_skip_tuples --> copy_skip_bad_rows
error_logging_schema_name --> copy_logging_schema_name
error_logging_relation_name --> copy_logging_table_name
error_logging_tuple_label --> don't know what this is for, see above
error_logging_tuple_partition_key --> don't know what this is for, see above

tuple_routing_in_copy --> copy_partitioning
tuple_routing_cache_size --> copy_partitioning_cache_size
  

This makes sense. I'll add that on my todo list.

Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

2009-09-10 Thread Emmanuel Cecchet


Hi David,


My C isn't all that great, to put it mildly, but I noticed C++-style
//comments like this, which we don't use.
  

Ok, I will fix that.

Are these dependent on each other, or could they be reviewed, etc.
separately?
  
The code implementing the functionality are independent (separate 
methods or files) but error logging also catches potential routing 
problems and log faulty tuples (the call to routing is just basically in 
the try/catch of error logging).
So, yes they can be reviewed separately, but if both are integrated it 
makes sense to review them together.


Cheers
Emmanuel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] COPY enhancements

2009-09-10 Thread Emmanuel Cecchet


Hi all,

Finally the error logging and autopartitioning patches for COPY that I 
presented at PGCon are here!
Error logging is described here: 
http://wiki.postgresql.org/wiki/Error_logging_in_COPY
Autopartitioning is described here: 
http://wiki.postgresql.org/wiki/Auto-partitioning_in_COPY


The attached patch contains both contribs as well as unit tests. I will 
submit shortly the new patch at commitfest.postgresql.org.


Thanks in advance for your feedback,
Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com



aster-copy-improvements-patch-8.5v3.txt.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Durability?

2009-08-07 Thread Emmanuel Cecchet


Tom Lane wrote:

Emmanuel Cecchet  writes:
  

I got an error like this:



  

ERROR:  xlog flush request 1/C121E998 is not satisfied --- flushed only to 
1/BCBCB440
CONTEXT:  writing block 529 of relation 1663/233690/1247
WARNING:  could not write block 529 of 1663/233690/1247
DETAIL:  Multiple failures --- write error might be permanent.



  

The xrecoff value (logs show 1/xrecoff) advances a few times during the day, 
but the message keeps appearing.



It looks like you've got a corrupted page in shared buffers, and every
time the system tries to flush it to disk for a checkpoint, it fails.

What I'd try for getting out this is to kill -9 some backend in order
to force a database restart.  Of course, if you want to investigate
what caused it, you should dig around in shared memory first and try
to get a copy of that buffer's contents.
  

Will the database be able to restart with a corrupted WAL?
If the database restarts, what transactions will be missing:
- just the block that couldn't be flushed?
- all transactions that were committed after the faulty block?
- more?

Thanks
Emmanuel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Durability?

2009-08-07 Thread Emmanuel Cecchet


Hi,

I got an error like this:

ERROR:  xlog flush request 1/C121E998 is not satisfied --- flushed only to 
1/BCBCB440
CONTEXT:  writing block 529 of relation 1663/233690/1247
WARNING:  could not write block 529 of 1663/233690/1247
DETAIL:  Multiple failures --- write error might be permanent.

The xrecoff value (logs show 1/xrecoff) advances a few times during the day, 
but the message keeps appearing.

I am not sure to understand clearly the consequences of such error since 
Postgres continues to accept new transactions. If my WAL is corrupted, are my 
transactions still durable?
If this is a violation of durability, is there a way to force Postgres to 
terminate on such error?

Thanks in advance for the clarification.
Emmanuel



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Unique namespace per session for temp tables?

2009-06-05 Thread Emmanuel Cecchet


Hi all,

Following the issue with temp tables and 2PC, we thought that each 
session in Postgres was supposed to generate a unique namespace for its 
temporary tables.

But if we execute the following scenario:
*
First session*.
BEGIN;
CREATE TEMP TABLE foo (x int) ON COMMIT DROP;
PREPARE TRANSACTION 't1';
*Disconnect*.
*Wait* (at least until backend process for first session exits).
*Reconnect new session*.
CREATE TEMP TABLE foo (x int); <-- blocks until t1 commits

The issue we see is that a totally unrelated session (which should have 
a unique temporary schema), reuses the temporary schema that is locked 
away in the prepared transaction. Isn't that breaking the assumption 
that each session should have a unique schema for its temp tables?


Thanks in advance for your insights.
Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com

Re: [HACKERS] Locks on temp table and PREPARE

2009-06-03 Thread Emmanuel Cecchet


Tom Lane wrote:

Emmanuel Cecchet  writes:
  
But when the transaction prepares, we know that. What would prevent us 
to do at prepare time the same cleanup that commit does?



The entire point of PREPARE is that it's *not* committed yet.
  
Agreed but all objects that were created and dropped in the transaction 
are going to be cleaned up whether the transaction commits or rollbacks. 
It seems that releasing these resources at PREPARE time would help for 
these 'temporary' objects that only have a transaction scope, right?


 regards, manu

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Locks on temp table and PREPARE

2009-06-03 Thread Emmanuel Cecchet


Tom Lane wrote:

Emmanuel Cecchet  writes:
  

Tom Lane wrote:


True, but the problem is that the tuple might still be live to (some
snapshots in) that transaction, so we can't inject a duplicate tuple
without risking confusing it.  In this particular case that isn't an
issue because the transaction is done executing, but the tqual.c
rules don't know that.
  


  
Please excuse my ignorance. I am not sure to get how the tuple could 
still be live to some snapshots after the transaction has prepared.



Well, it couldn't be because there are no snapshots in that transaction
anymore.  The problem is that the *other* transaction doesn't have a
good way to know that.  It just sees an open transaction with
conflicting unique-index changes.
  
But when the transaction prepares, we know that. What would prevent us 
to do at prepare time the same cleanup that commit does?


regards, manu (indentation (C) tom lane)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Locks on temp table and PREPARE

2009-06-03 Thread Emmanuel Cecchet


Tom Lane wrote:

True, but the problem is that the tuple might still be live to (some
snapshots in) that transaction, so we can't inject a duplicate tuple
without risking confusing it.  In this particular case that isn't an
issue because the transaction is done executing, but the tqual.c
rules don't know that.
  
Please excuse my ignorance. I am not sure to get how the tuple could 
still be live to some snapshots after the transaction has prepared. What 
could still happen to objects that were only visible to a transaction 
after it has prepared? An example would definitely help.
Is it possible in Postgres for a transaction to see an object that was 
created inside another transaction before it commits (assuming at least 
'read committed' of course)?


Thanks again,
Emmanuel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Locks on temp table and PREPARE

2009-06-03 Thread Emmanuel Cecchet


Tom Lane wrote:

Emmanuel Cecchet  writes:
  

Take PG 8.3.0 and try:
BEGIN;
CREATE TEMP TABLE foo (x int) ON COMMIT DROP;
PREPARE TRANSACTION 't1';
[BEGIN;] <-- doesn't really matter if you start a new transaction or not
CREATE TEMP TABLE foo (x int); <-- blocks until t1 commits

I have been tracking down the problem and it looks like 
PostPrepare_Locks is holding the locks on 'foo' for some reason I don't 
really get.



AFAIK that doesn't really have anything to do with the temp-ness of the
table; it'd be the same with a regular table.  The problem is you have
an in-doubt tuple in pg_class for pg_temp_NNN.foo, and you are trying
to create another one for the same schema/relname, and so the unique
index check is blocking to see what happens to the other transaction
that's creating/deleting the conflicting tuple.
  


You are right (of course!), I tried:

BEGIN;
CREATE TABLE foo (x int);
DROP TABLE foo;
PREPARE TRANSACTION 't1';
[BEGIN;]
CREATE TABLE foo (x int); <-- blocks


There should not be a doubt about table foo because whether the 
transaction commits or rollbacks, that table will not exist anymore (we 
can get rid of it at prepare time actually).
I guess Postgres does not handle the special case of tables (temp or 
not) whose lifespan is limited to the scope of a transaction and 
therefore cannot optimize that case. Is that correct?


Thanks for your help.
Emmanuel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Locks on temp table and PREPARE

2009-06-02 Thread Emmanuel Cecchet


Hi,

As we discussed during PGCon, we are using temp tables in 2PC 
transactions. The temp tables are dropped before PREPARE (or have an ON 
COMMIT DROP option) and never cross transaction boundaries.
In 8.3.1, a patch was introduced to disallow temp tables in 2PC 
transactions and we tried to provide a fix for it (see the long thread 
with Heikki on this list). I am still working on a cleaner patch to 
allow temp tables to be used in 2PC transactions but I did hit a new 
problem that I don't know how to solve cleanly.


Take PG 8.3.0 and try:
BEGIN;
CREATE TEMP TABLE foo (x int) ON COMMIT DROP;
PREPARE TRANSACTION 't1';
[BEGIN;] <-- doesn't really matter if you start a new transaction or not
CREATE TEMP TABLE foo (x int); <-- blocks until t1 commits

I have been tracking down the problem and it looks like 
PostPrepare_Locks is holding the locks on 'foo' for some reason I don't 
really get.


Any suggestion on what should be done differently for temp tables there?

Thanks,
Emmanuel

--
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consulting

--
Web: http://www.frogthinker.org
email: m...@frogthinker.org
Skype: emmanuel_cecchet


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wrong stats for empty tables

2009-05-05 Thread Emmanuel Cecchet

From: Tom Lane [...@sss.pgh.pa.us]
Subject: Re: [HACKERS] Wrong stats for empty tables

"Emmanuel  Cecchet"  writes:
> Is this a bug?

No, it's intentional.


So what is the rationale behind not being able to use indexes and optimizing 
empty tables as in the following example:

manu=# create table father (id int, val int, tex varchar(100), primary key(id));
manu=# create table other (id1 int, id2 int, data varchar(10), primary 
key(id1,id2));
insert some data
manu=# explain select father.*,id2 from father left join other on 
father.id=other.id1 where id2=2 order by id;
   QUERY PLAN

 Sort  (cost=37.81..37.82 rows=5 width=230)
   Sort Key: father.id
   ->  Hash Join  (cost=23.44..37.75 rows=5 width=230)
 Hash Cond: (father.id = other.id1)
 ->  Seq Scan on father  (cost=0.00..13.10 rows=310 width=226)
 ->  Hash  (cost=23.38..23.38 rows=5 width=8)
   ->  Seq Scan on other  (cost=0.00..23.38 rows=5 width=8)
 Filter: (id2 = 2)
(8 rows)

manu=# create table child1() inherits(father);
manu=# create table child2() inherits(father);
manu=# create table child3() inherits(father);
manu=# create table child4() inherits(father);
manu=# create table child5() inherits(father);
manu=# create table child6() inherits(father);
manu=# create table child7() inherits(father);
manu=# create index i1 on child1(id);
manu=# create index i2 on child2(id);
manu=# create index i3 on child3(id);
manu=# create index i4 on child4(id);
manu=# create index i5 on child5(id);
manu=# create index i6 on child6(id);
manu=# create index i7 on child7(id);
manu=# explain select father.*,id2 from father left join other on 
father.id=other.id1 where id2=2 order by id;
 QUERY PLAN

 Sort  (cost=140.00..140.16 rows=62 width=230)
   Sort Key: public.father.id
   ->  Hash Join  (cost=23.44..138.16 rows=62 width=230)
 Hash Cond: (public.father.id = other.id1)
 ->  Append  (cost=0.00..104.80 rows=2480 width=226)
   ->  Seq Scan on father  (cost=0.00..13.10 rows=310 width=226)
   ->  Seq Scan on child1 father  (cost=0.00..13.10 rows=310 
width=226)
   ->  Seq Scan on child2 father  (cost=0.00..13.10 rows=310 
width=226)
   ->  Seq Scan on child3 father  (cost=0.00..13.10 rows=310 
width=226)
   ->  Seq Scan on child4 father  (cost=0.00..13.10 rows=310 
width=226)
   ->  Seq Scan on child5 father  (cost=0.00..13.10 rows=310 
width=226)
   ->  Seq Scan on child6 father  (cost=0.00..13.10 rows=310 
width=226)
   ->  Seq Scan on child7 father  (cost=0.00..13.10 rows=310 
width=226)
 ->  Hash  (cost=23.38..23.38 rows=5 width=8)
   ->  Seq Scan on other  (cost=0.00..23.38 rows=5 width=8)
 Filter: (id2 = 2)
(16 rows)



I must admit that I did not see what the original intention was to get this 
behavior.
Emmanuel
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Wrong stats for empty tables

2009-05-05 Thread Emmanuel Cecchet

Hi,

Here is an example showing the problem:

Welcome to psql 8.3.6, the PostgreSQL interactive terminal.
manu=# create table foo (x int);
CREATE TABLE
manu=# explain select * from foo;
  QUERY PLAN
---
 Seq Scan on foo  (cost=0.00..34.00 rows=2400 width=4)
(1 row)

manu=# analyze foo;
ANALYZE
manu=# explain select * from foo;
  QUERY PLAN
---
 Seq Scan on foo  (cost=0.00..34.00 rows=2400 width=4)
(1 row)

manu=# insert into foo values (1);
INSERT 0 1
manu=# analyze foo;
ANALYZE
manu=# explain select * from foo;
QUERY PLAN
---
 Seq Scan on foo  (cost=0.00..1.01 rows=1 width=4)
(1 row)


Now a possible cause for this might be the relpages attribute in pg_class (the 
default value 0 does not seem to be interpreted correctly):

manu=# create table bar(x int);
CREATE TABLE
manu=# explain select * from bar;
  QUERY PLAN
---
 Seq Scan on bar  (cost=0.00..34.00 rows=2400 width=4)
(1 row)

manu=# select relpages from pg_class where relname='bar';
 relpages 
--
0
(1 row)
manu=# update pg_class set relpages=1 where relname='bar';
UPDATE 1
manu=# explain select * from bar;
QUERY PLAN
---
 Seq Scan on bar  (cost=0.00..0.00 rows=1 width=4)
(1 row)


This is a real problem if you have a lot of empty child tables. Postgres will 
not optimize correctly queries in the presence of empty child tables.

Is this a bug?

Thanks for your help,
Emmanuel
--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning feature ...

2009-03-31 Thread Emmanuel Cecchet

Yes, there is a good reason. As a trigger can update the tuple value, 
this can change the routing decision. If you have a user trigger that 
tries to change the key value after the partition choice has been made, 
this will lead to an integrity constraint violation which is probably 
not what the user expects.
Note that user triggers with partitions will be tricky anyway 
(regardless of how partitioning is implemented, that is with triggers or 
not). If 2 partitions have user triggers that update the key value to 
bounce the tuple to the other partition you may end up with an infinite 
loop.
I am not sure what the semantic of statement triggers (still user 
triggers) should be on partitioned tables.
We will probably have to come up with restrictions on triggers so that 
they can only be applied to the parent table and not on child tables to 
prevent nasty issues.


Emmanuel

Tom Lane wrote:

In the case of the FK triggers, it's intentional (and maybe even
documented) that users should be able to place their own triggers before
or after the FK triggers.  Is there a good reason why partitioning
triggers should be different?  If there is, maybe the feature shouldn't
be implemented via triggers in the first place.

regards, tom lane
  



--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning feature ...

2009-03-30 Thread Emmanuel Cecchet

I agree with Jaime that system triggers should execute independently of 
user triggers.
In the particular case of partitioning, the system trigger should 
execute after the user triggers. However, as the partitioning trigger is 
a row level trigger, it is not clear what is going to happen with user 
triggers that work at the statement level.


Emmanuel

Jaime Casanova wrote:

On Mon, Mar 30, 2009 at 6:51 AM, Kedar Potdar  wrote:
  

  As triggers are executed in order of their names, we've prefixed the
trigger names with "zz". This should work fine as long as no-one uses
trigger-name which starts with "zz".

  


this seems a lot fragile... why system generated triggers has to be
executed following the same rules (talking about order of execution)
as user triggers? can't we simply execute them first or last or maybe
be clever and mark one to be executed first and others last?

  



--
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consulting

--
Web: http://www.frogthinker.org
email: m...@frogthinker.org
Skype: emmanuel_cecchet


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning feature ...

2009-03-26 Thread Emmanuel Cecchet


Hi Kedar,

First of all, congratulations for the excellent work.
I have some comments and questions.

In get_relevent_partition (btw, relevant is spelled with an a) you are 
maintaining 2 lists. I guess this is only useful for multi-column 
partitions, right?
If you have a single column partition (without subpartitions), I think 
you could directly return on the first match (without maintaining any 
list) since you guarantee that there is no overlap between partitions.
A simple but effective optimization for inserts consists of caching the 
last partition used (consecutive inserts often go to the same partition) 
and try it first before going through the whole loop.


The update trigger should first check if the tuple needs to be moved. If 
the updated tuple still matches the constraints of the partitions it 
will not have to be moved and will save a lot of overhead.


The COPY operation should probably be optimized to use the same code as 
the one in the insert trigger for partitioned tables. I guess some code 
could be factorized in COPY to make the inserts more efficient.


The current trigger approach should prevent other triggers to be added 
to the table, or you should make sure that the partition trigger is 
always the one to execute last.


As we don't have automatic partition creation, it would be interesting 
to have an optional mechanism to deal with tuples that don't match any 
partition (very useful when you do bulk insert and some new data require 
a new partition). Having a simple overflow partition or an error logging 
mechanism would definitely help to identify these tuples and prevent 
things like large COPY operations to fail.



Looking forward to your responses,
Emmanuel



We are implementing table partitioning feature to support Range and 
Hash partitions. Please find attached, the WIP patch and test-cases.


The syntax used conforms to most of the suggestions mentioned in 
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php, 
barring the following:
-- Specification of partition names is optional. System will be able 
to generate partition names in such cases.

-- Sub partitioning

We are maintaining a system catalog(pg_partition) for partition 
meta-data. System will look-up this table to find appropriate 
partition to operate on.

System internally uses low-level 'C' triggers to row-movement.

Regards,
--
Kedar.






  



--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regclass and quoted table names

2009-03-04 Thread Emmanuel Cecchet


marcin mank wrote:

Use plain oids or regclass values, not a text column, if you are trying
to store table identities.


wouldn`t oids change on dump/reload?
  
I don't know. I'd also be interested to know if there is a difference if 
we use pg_restore with a binary format or sql dump, or if that does not 
influence at all the way oids are created.


manu

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regclass and quoted table names

2009-03-04 Thread Emmanuel Cecchet


Tom Lane wrote:

Emmanuel Cecchet  writes:
  
It looks like the behavior of regclass is not consistent when table 
names are quoted. The name is returned without the quotes if the name is 
lower case with eventual trailing numbers, otherwise it is returned with 
quotes.



It's intentional that it quotes only when needed.
  
This is problematic in situations where the output of the cast is involved in some later join which returns incorrect results because of the extra double quotes surrounding the table name. 
Is there a way to override the default behavior to have a consistent quoted or non-quoted result?


Thanks,
Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Regclass and quoted table names

2009-03-03 Thread Emmanuel Cecchet


Hi all,

It looks like the behavior of regclass is not consistent when table 
names are quoted. The name is returned without the quotes if the name is 
lower case with eventual trailing numbers, otherwise it is returned with 
quotes.

See some examples here:

tpch=# CREATE VIEW test AS SELECT * FROM customer;
CREATE VIEW
tpch=# CREATE VIEW "test1" AS SELECT * FROM customer;
CREATE VIEW
tpch=# CREATE VIEW "'test2'" AS SELECT * FROM customer;
CREATE VIEW
tpch=# CREATE VIEW "testcolumnVIEW" AS SELECT * FROM customer;
CREATE VIEW
tpch=# CREATE VIEW "testcolumnview" AS SELECT * FROM customer;
CREATE VIEW
tpch=# CREATE VIEW testcolumnVIEW2 AS SELECT * FROM customer;
CREATE VIEW
tpch=# CREATE VIEW "1test" AS SELECT * FROM customer;
CREATE VIEW
tpch=# CREATE VIEW "1test2abc" AS SELECT * FROM customer;
CREATE VIEW
tpch=# CREATE VIEW "1test2" AS SELECT * FROM customer;
CREATE VIEW
tpch=# select c.oid , c.oid::regclass from pg_class c where c.relname 
like '%test%';
 oid  |   oid   
---+--

16410 | test
16413 | test1
16416 | "'test2'"
16419 | "testcolumnVIEW"
16422 | testcolumnview
16425 | testcolumnview2
16428 | "1test2abc"
16431 | "1test2"
16434 | "1test"
(9 rows)


Is this a bug?
manu

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Table Partitioning Feature

2009-02-09 Thread Emmanuel Cecchet


Hi Amit,

I will be traveling until next Tuesday and will have no access to email 
so don't be surprised if I don't follow up this week.
The overall approach seems sound. The metadata table should help also 
for DDL to find out overlapping ranges or duplicate list entries.
So far, I have not tried to use the SPI interface from a C trigger so I 
don't see any disadvantage yet. We would have to assess the performance 
to make sure it's not going to be a show stopper.
I think that the main issue of the trigger approach is that other 
triggers might interfere. The 'partition trigger' must be the last of 
the 'before insert' triggers and if the destination (child) table has a 
trigger, we must ensure that this trigger is not going to require a new 
routing.
Another issue is the result that will be returned by insert/copy 
commands if all tuples are moved to other tables, the result will be 0. 
We might want to have stats that would collect where tuples where moved 
for a particular command (I don't know yet what would be the best place 
to collect these stats but they could probably be updated by the trigger).
Also would the trigger be attached to all tables in the hierarchy or 
only to the top parent?
What kind of query would you use with more than 1 level of inheritance 
(e.g. parent=year, child=month, grand-child=day)? It looks like we have 
to parse the leaves of the graph but intermediate nodes would help 
accelerating the search.


An alternative approach (I haven't assessed the feasibility yet) would 
be to try to call the query planner. If we ask to select the partition 
value of the tuple, the query planner should return the table it is 
going to scan (as in EXPLAIN SELECT * FROM t WHERE key=$1).


Let me know what you think,
Emmanuel


We are considering to following approach:
1. metadata table pg_partitions is defined as follows:
CATALOG(pg_partitions,2336) BKI_WITHOUT_OIDS
{
Oid partrelid; // partition table Oid
Oid parentrelid;  // Parent table Oid
int4parttype;   // Type of partition, list, hash, range
Oidpartkey;// partition key Oid
Oidkeytype;  ///  type of partition key.
int4keyorder  /// order of the key in multi-key partitions.
textmin;
textmax;  // min and max for range parti
text[]  list;
inthash;  // hash value
} FormData_pg_partitions;


2. C triggers will fire a query on this table to get the relevant
partition of the inserted/updated data using SPI interface. The query
will look something like (for range partitioning)

select min(partrelid)
from pg_partitions
where parentrelid = 2934  // we know this value
and (
( $1 between to_int(min ) and to_int(max) and
  keyorder  = 1) OR
($2 between to_date (min) and to_date (max) and
keyorder =2 )

)
group by
parentrelid
having
count(*) = 

$1, $2, ... are the placeholders of the actual partition key values of
trigger tuple.

Since we know the type of partition keys, and the parentrelid, this
kind of query string can be saved in another table say, pg_part_map.
And its plan can be parsed once and saved in cache to be reused.
Do you see any issue with using SPI interface within triggers?

The advantage of this kind of approah is that trigger code can be made
genric for any kind of partition table.

Thanks,
Amit
Persistent Systems,
www.persistentsys.com





On 1/23/09, Emmanuel Cecchet  wrote:
  

Amit,

 You might want to put this on the
http://wiki.postgresql.org/wiki/Table_partitioning wiki
page.
 How does your timeline look like for this implementation?
 I would be happy to contribute C triggers to your implementation. From what
I understood in
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00269.php,
you already have an implementation that parses the grammar and generates
rules as if someone had written them. Is this code available?

 Regarding the use of triggers to push/move data to partitions, what if
someone declares triggers on partitions? Especially if you have
subpartitions, let's consider the case where there is a trigger on the
parent, child and grandchild. If I do an insert in the parent, the user
trigger on the parent will be executed, then the partition trigger that
decides to move to the grandchild. Are we going to bypass the child trigger?
 If we also want fast COPY operations on partitioned table, we could have an
optimized implementation that could bypass triggers and move the tuple
directly to the appropriate child table.

 Thanks for this big contribution,
 Emmanuel




Hi,

We are implementing table partitioning feature to support
- the attached commands. The syntax conforms to most of the suggestion
  

mentioned in
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php,
barring the following:


-- Specification of partition names is optional. System will be able to
  

generate partition names in su

Re: [HACKERS] Table Partitioning Feature

2009-01-25 Thread Emmanuel Cecchet


Hi Amit,

I overlooked the fact that you dropped composite partitions and 
subpartitions template from the proposal presented in 
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php.
Is it because this is too hard to support? or you don't see any 
immediate need for it?


Thanks,
Emmanuel



Hi Emmanuel,

Please find my comments in-lined:

On 1/23/09, *Emmanuel Cecchet* <mailto:m...@frogthinker.org>> wrote:


Amit,

You might want to put this on the
http://wiki.postgresql.org/wiki/Table_partitioning wiki page.


Sure.

How does your timeline look like for this implementation?


 The implementation is planned as follows:
- Partition table commands
++ An intermediate patch in Feb end
++ Final patch in mid March
- Global Index: Mid March
- Optimizer changes for partitioned table: May

I would be happy to contribute C triggers to your implementation.
From what I understood in
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00269.php,
you already have an implementation that parses the grammar and
generates rules as if someone had written them. Is this code
available?


We have just started with the implementation, i will post the grammar 
rules next week.



Regarding the use of triggers to push/move data to partitions,
what if someone declares triggers on partitions? Especially if you
have subpartitions, let's consider the case where there is a
trigger on the parent, child and grandchild. If I do an insert in
the parent, the user trigger on the parent will be executed, then
the partition trigger that decides to move to the grandchild. Are
we going to bypass the child trigger?


We are not supporting sub-partitioning - There is just one level of 
partitioning.


If we also want fast COPY operations on partitioned table, we
could have an optimized implementation that could bypass triggers
and move the tuple directly to the appropriate child table.


We will definitely consider to implement fast COPY after we are done 
with the planned tasks.


Thanks,
Amit


Thanks for this big contribution,
Emmanuel

Hi,

We are implementing table partitioning feature to support
- the attached commands. The syntax conforms to most of the
suggestion mentioned in
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php,
barring the following:
-- Specification of partition names is optional. System will
be able to generate partition names in such cases.
-- sub partitioning
 We are using pgsql triggers to push/move data to appropriate
partitions, but we will definitely consider moving to C
language triggers as suggested by manu.
- Global non-partitioned indexes (that will extend all the
partitions).
- Foreign key support for tables referring to partitioned tables.

Please feel free to post your comments and suggestions.

Thanks,
Amit
Persistent Systems



----




    -- 
    Emmanuel Cecchet

FTO @ Frog Thinker Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: m...@frogthinker.org <mailto:m...@frogthinker.org>
Skype: emmanuel_cecchet





--
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consulting

--
Web: http://www.frogthinker.org
email: m...@frogthinker.org
Skype: emmanuel_cecchet


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Table Partitioning Feature

2009-01-22 Thread Emmanuel Cecchet


Amit,

You might want to put this on the 
http://wiki.postgresql.org/wiki/Table_partitioning wiki page.

How does your timeline look like for this implementation?
I would be happy to contribute C triggers to your implementation. From 
what I understood in 
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00269.php, you 
already have an implementation that parses the grammar and generates 
rules as if someone had written them. Is this code available?


Regarding the use of triggers to push/move data to partitions, what if 
someone declares triggers on partitions? Especially if you have 
subpartitions, let's consider the case where there is a trigger on the 
parent, child and grandchild. If I do an insert in the parent, the user 
trigger on the parent will be executed, then the partition trigger that 
decides to move to the grandchild. Are we going to bypass the child trigger?
If we also want fast COPY operations on partitioned table, we could have 
an optimized implementation that could bypass triggers and move the 
tuple directly to the appropriate child table.


Thanks for this big contribution,
Emmanuel


Hi,

We are implementing table partitioning feature to support
- the attached commands. The syntax conforms to most of the suggestion 
mentioned in 
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php, 
barring the following:
-- Specification of partition names is optional. System will be able 
to generate partition names in such cases.

-- sub partitioning
 We are using pgsql triggers to push/move data to appropriate 
partitions, but we will definitely consider moving to C language 
triggers as suggested by manu.

- Global non-partitioned indexes (that will extend all the partitions).
- Foreign key support for tables referring to partitioned tables.

Please feel free to post your comments and suggestions.

Thanks,
Amit
Persistent Systems









--
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consulting

--
Web: http://www.frogthinker.org
email: m...@frogthinker.org
Skype: emmanuel_cecchet


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] incoherent view of serializable transactions

2008-12-24 Thread Emmanuel Cecchet


Hi Kevin,


The link didn't seem to work for me, but I think I found the article
you meant:  "Serializable Isolation for Snapshot Databases"
by Michael J. Cahill, et al

An interesting work.  If nothing else, it will help flesh out the
documentation of anomalies.  If the PostgreSQL community ever
does want to better approach true serializable behavior, this
should provide a good theoretical basis.
  

Sorry for the broken link. Yes this is the paper.
Note that the paper was not necessarily enthusiastically received by the 
community when presented at the conference. While this is an interesting 
academic paper, it's practicality left a majority of the audience 
perplex. There was an interesting comment by Oracle folks: Oracle does 
not provide serializability but only snapshot isolation, and still users 
prefer to 'downgrade' to read committed for better performance. The 
Oracle guys experience seemed to indicate that there was no need for 
serializability (well, that's also less work for them ;-)) in their 
customer base.
Having both a foot in academia and in industry, I understand the 
intellectual interest for serializability on the academic side, but I 
still have not seen a single use case in industry where this was a 
requirement (but my db experience is probably narrow).


Have nice serializable holidays ;-)
manu

--
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consulting

--
Web: http://www.frogthinker.org
email: m...@frogthinker.org
Skype: emmanuel_cecchet


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Fwd: Re: [HACKERS] Transactions and temp tables]

2008-12-24 Thread Emmanuel Cecchet


Hi Heikki,

The point of using temp tables was performance. Using regular tables in 
our case would hurt performance too much. Well if we cannot get a 
temporary fix in 8.4, we will maintain a separate patch to get that 
functionality just for temp tables that are created and dropped in the 
same transaction. Hopefully we will be able to come up with a working 
solution in 8.5.


Thanks for your help,
Emmanuel


Emmanuel Cecchet wrote:
  

I just saw that this new patch was not considered because the previous
version ended being rejected.
Note that this version of the patch aims at supporting ONLY temp tables
that are created AND dropped in the same transaction. We need to be able
to use temp tables in transactions that are doing 2PC, but the temp
table lifespan does not need to cross transaction boundaries.

Please let me know if this patch could be integrated in 8.4.



IMHO, this is just getting too kludgey. We came up with pretty good
ideas on how to handle temp tables properly, by treating the same as
non-temp tables. That should eliminate all the problems the latest patch
did, and also the issues with sequences, and allow all access to temp
tables, not just a limited subset. I don't think it's worthwhile to
apply the kludge as a stopgap measure, let's do it properly in 8.5.

As a workaround, you can use a regular table instead of a temporary one.
If you create and drop the regular table in the same transaction (that's
the same limitation that latest patch has), you won't end up with a
bogus table in your database if the connection is dropped unexpectedly.
If your application uses multiple connections simultaenously, you'll
need a little bit of code in the application so that you don't try to
create a table with the same name in all backends. You could also create
a different schema for each connection, and do "set
search_path='semitempschemaX, public'", so that you can use the same
table name and still have separate tables for each connections.

(sorry for the late reply)

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
  



--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep: Second thoughts

2008-12-24 Thread Emmanuel Cecchet


Hi Markus,


I didn't have much reliability issues with ensemble, appia or spread, so
far. Although, I admit I didn't ever run any of these in production.
Performance is certainly an issue, yes.
  
I may suggest another reading even though a bit dates, most of the 
results still apply: 
http://jmob.objectweb.org/jgroups/JGroups-middleware-2004.pdf
The baseline is that if you use UDP multicast, you need a dedicated 
switch and the tuning is a nightmare. I discussed these issues with the 
developers of Spread and they have no real magic. TCP seems a more 
reliable alternative (especially predictable performance) but the TCP 
timeouts are also tricky to tune depending on the platform. We worked 
quite a bit with Nuno around Appia in the context of Sequoia and 
performance can be outstanding when properly tuned or absolutely awful 
is some default values are wrong. The chaotic behavior of GCS under 
stress quickly compromises the reliability of the replication system, 
and admission control on UDP multicast has no good solution so far.
It's just a heads up on what is awaiting you in production when the 
system is stressed. There is no good solution so far besides a good 
admission control on top of the GCS (in the application).


I am now off for the holidays.

Cheers,
Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Emmanuel Cecchet


Jeff Davis wrote:

On Tue, 2008-12-23 at 00:42 -0500, Emmanuel Cecchet wrote:
  
I still don't 
get why people would use SERIALIZABLE since there is no efficient 
implementation of it.



If they need true serializable transactions, and don't care about
efficiency.
  
Is there such people who really understand the thing and don't complain 
about the performance? ;-)


Happy holidays,
Emmanuel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Emmanuel Cecchet


Kevin Grittner wrote:

This isn't some hypothetical "maybe some day some product might
implement this, but it'll never catch on" sort of thing -- Microsoft
and Sybase SQL Server had this from version 1.  I used it from 1990
until the conversion to PostgreSQL over the last couple years. 
  
Have you ever used serializable transactions with Sybase? The locking is 
actually based on memory-pages and you end-up with deadlocks if you 
don't pad your data structures to prevent false sharing. Oracle also 
provides SI like Postgres and I don't think they are doing that bad.

I'm going on second-hand information here, but I'm told that IBM DB2
has used similar techniques to provide true serializable transactions
for even longer.
 
I'm somewhat mystified at the reaction this topic gets here.  :-
I am somewhat mystified by the interest some people still have in 
serializable transactions. Why don't users program the application to 
deal with a lower isolation (actually I think they do)?


But I am probably missing the point which was to fix the doc?
Emmanuel

--
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consulting

--
Web: http://www.frogthinker.org
email: m...@frogthinker.org
Skype: emmanuel_cecchet


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep: Second thoughts

2008-12-22 Thread Emmanuel Cecchet

Hi Markus,

I'm not quite sure what you mean by "certification protocol", there's no
such thing in Postgres-R (as proposed by Kemme). Although, I remember
having heard that term in the context of F. Pedone's work. Can you point
me to some paper explaining this certification protocol?

What Bettina calls the Lock Phase in
http://www.cs.mcgill.ca/~kemme/papers/vldb00.pdf is actually a
certification.
You can find more references to certification protocols in
http://gorda.di.uminho.pt/download/reports/gapi.pdf
I would also recommend the work of Sameh on Tashkent and Taskent+ that
was based on Postgres:
http://labos.epfl.ch/webdav/site/labos/users/157494/public/papers/tashkent.eurosys2006.pdf
and
http://infoscience.epfl.ch/record/97654/files/tashkentPlus.eurosys2007.final.pdf

Certification-based
approaches have already multiple reliability issues to improve write
performance compared to statement-based replication, but this is very
dependent on the capacity of the system to limit the conflicting window
for concurrent transactions.

What do you mean by "reliability issues"?

These approaches usually require an atomic broadcast primitive that is
usually fragile (limited scalability, hard to tune failure timeouts, ).
Most prototype implementations have the load balancer and/or the
certifier as a SPOF (single point of failure). Building reliability for
these components will come with a significant performance penalty.

The writeset extraction mechanisms have had
too many limitations so far to allow the use of certification-based
replication in production (AFAIK).

What limitations are you speaking of here?

Oftentimes DDL support is very limited. Non-transactional objects like
sequences are not captured.
Session or environment variables are not necessarily propagated. Support
of temp tables varies between databases which makes it hard to support
them properly in a generic way.
Well I guess everyone has a story on some limitations it has found with
some database replication technology especially when a user expects a
cluster to behave like a single database instance.

Happy holidays,
Emmanuel

--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting

--
Web: http://www.frogthinker.org
email: m...@frogthinker.org
Skype: emmanuel_cecchet

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] incoherent view of serializable transactions

2008-12-22 Thread Emmanuel Cecchet


Kevin,

If you want to know how to build SERIALIZABLE with a database that 
provides SI (Snapshot Isolation), read 
http://portal.acm.org/citation.cfm?doid=1376616.137669
Note that in practice, READ COMMITTED is the most largely used isolation 
level and its limitations are relatively well understood by the average 
programmer that can program his application accordingly. I still don't 
get why people would use SERIALIZABLE since there is no efficient 
implementation of it.


My 2 cents.
Emmanuel


Kevin Grittner wrote:

As I've understood limitations of the PostgreSQL implementation of
SERIALIZABLE transactions, at least the only example given in the
documentation, revolve around a rather unlikely situation:
 
Given concurrent transactions T1 and T2 and non-overlapping sets of

data A and B, T1 reads data including A and uses the data to modify B
while T2 reads data including B and uses that data to modify A, where
the modifications performed by either would affect the modifications
made by the other, if they were visible.
 
For reasons I'll omit here, that scenario didn't worry me for my

current uses of PostgreSQL.
 
I've found another form of deviation from the standard SERIALIZABLE

behavior, though, which does worry me.  Although the above appears to be
the only situation where the end result after everything commits is
inconsistent with standard SERIALIZABLE behavior, the PostgreSQL
implementation allows transactions to view the data in states which
would never be possible during the application of the transactions in
series in the order they will appear to have been applied after the
commit.
 
Imagine, as an example, a system which involves recording receipts,

each of which must go into a daily deposit.  There is a control table
with one row containing the current deposit date for receipts. 
Somewhere mid-afternoon that date is updated, all subsequent receipts

fall into the new day, and a report is run listing the receipts for the
day and giving the deposit total.
 
Under a standard-compliant implementation of SERIALIZABLE, this is

straightforward: a transaction which is inserting a receipt selects the
deposit date to use in its transaction, and any SELECT of receipts for a
date prior to the current deposit date will see the accurate, final
data.  Under the PostgreSQL implementation, although data eventually
gets to a coherent state, there can be a window of time where a SELECT
can return an incomplete list of receipts for a date which appears to be
closed, even if all transactions for modifying and viewing data are
SERIALIZABLE.
 
-- setup

create table ctl (k text not null primary key, deposit_date date not
null);
insert into ctl values ('receipt', date '2008-12-22');
create table receipt (receipt_no int not null primary key, deposit_date
date not null, amount numeric(13,2));
insert into receipt values (1, (select deposit_date from ctl where k =
'receipt'), 1.00);
insert into receipt values (2, (select deposit_date from ctl where k =
'receipt'), 2.00);

-- connection 1
start transaction isolation level serializable ;
insert into receipt values (3, (select deposit_date from ctl where k =
'receipt'), 4.00);

-- connection 2
start transaction isolation level serializable ;
update ctl set deposit_date = date '2008-12-23' where k = 'receipt';
commit transaction;
start transaction isolation level serializable ;
select * from ctl;
-- (deposit_date shows as 2008-12-23)
select * from receipt;
-- (Only receipts 1 and 2 show for 2008-12-22.)
commit;

-- connection 1
commit transaction;

-- connection 2
start transaction isolation level serializable ;
select * from receipt;
-- (All receipts for the 2008-12-22 deposit date now show.)
commit transaction;
 
At this point, SERIALIZABLE transactions appear to have worked, with

receipt 3 happening before the update of deposit_date; however, there
was a window of time when the update to deposit_date was visible and
receipt 3 was not.
 
This absolutely can't happen in a standard-compliant implementation.

At a minimum, this window where visible data lacks coherency should be
noted in the documentation.  I don't know if there's any way to fix
this without killing performance.
 
-Kevin


  



--
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consulting

--
Web: http://www.frogthinker.org
email: m...@frogthinker.org
Skype: emmanuel_cecchet


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep: Second thoughts

2008-12-21 Thread Emmanuel Cecchet


Hi Markus,

I am happy to see that Postgres-R is alive again. The paper was written 
in 07 (and published in 08, the review process is longer than a 
CommitFest ;-)) and at the time of the writing there was no version of 
Postgres-R available, hence the 'obsolete' mention referring to past 
versions.
I think that it is legitimate for users to expect more guarantees from a 
replicated database than from a single database. Not knowing what happen 
when a failure happens at commit time when some nodes are still active 
in a cluster is not intuitive for users.
I did not look at the source, but if Postgres -R continue to elaborate 
on Bettina's ideas with writeset extraction and a certification 
protocol, I think that it will be a bad idea to try to mix it with Sync 
Rep (mentioned in another thread). If you delay commits, you will 
increase the window for transactions to conflict and therefore induce a 
higher abort rate (thus less scalability). Certification-based 
approaches have already multiple reliability issues to improve write 
performance compared to statement-based replication, but this is very 
dependent on the capacity of the system to limit the conflicting window 
for concurrent transactions. The writeset extraction mechanisms have had 
too many limitations so far to allow the use of certification-based 
replication in production (AFAIK).


Good luck with Postgres-R.
Emmanuel


Emmanuel Cecchet wrote:
  

What the application is going to see is a failure when the postmaster it
is connected to is going down. If this happen at commit time, I think
that there is no guarantee for the application to know what happened:
1. failure occurred before the request reached postmaster:  no instance
committed
2. failure occurred during commit: might be committed on either nodes
3. failure occurred while sending back ack of commit to client: both
instances have committed
But for the client, it will all look the same: an error on commit.



This is very much the same for a single node database system, so I think
current application developers are used to that behavior.

A distributed database system just needs to make sure, that committed
transactions can and will eventually get committed on all nodes. So in
case the client doesn't receive a COMMIT acknowledgment due to atny kind
of failure, it can only be sure the distributed database is in a
consistent state. The client cannot tell, if it applied the transaction
or not. Much like for single node systems.

I agree, that a distributed database system could theoretically do
better, but that would require changes on the client side connection
library as well, letting the client connect to two or even more nodes of
the distributed database system.

  

This is just to point out that despite all your efforts, the client
might think that some transactions have failed (error on commit) but
they are actually committed.



As pointed out above, that would currently be an erroneous conclusion.

  

If you don't put some state in the driver
that is able to check at failover time if the commit operation succeeded
or not, it does not really matter what happens for in-flight
transactions (or in-commit transactions) at failure time.



Sure it does. The database system still needs to guarantee consistency.

Hm.. well, you're right if there's only one single standby left (as is
obviously the case for the proposed Sync Rep). Ensuring consistency is
pretty simple in such a case. But imagine having two or more standby
servers. Those would need to agree on a set of in-flight transactions
from the master they both need to apply.

  

Actually, if there was a way to query the database about the status of a
particular transaction by providing a cluster-wide unique id, that would
help a lot.



You're certainly aware, that Postgres-R features such a global
transaction id... And I guess Sync Rep could easily add such an
identifier as well.

  

I wrote a paper on the issues with database replication at
Sigmod earlier this year (http://infoscience.epfl.ch/record/129042).
Even though it was targeted at middleware replication, I think that some
of it is still relevant for the problem at hand.



Interesting read, thanks for the pointer. It's pretty obvious that I
don't consider Postgres-R to be obsolete...

I've linked your paper from www.postgres-r.org [1].

  

Regarding the wording, if experts can't agree, you can be sure that
users won't either. Most of them don't have a clue about the different
flavors of replication. So as long as you state clearly how it behaves
and define all the terms you use that should be fine.



I mostly agree to that, without repeating my concerns again, here.

Regards

Markus Wanner


[1]: Referenced Papers from Postgres-R website:
http://www.postgres-r.org/documentation/references

  



--
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consul

Re: [HACKERS] Partitioning wiki page

2008-12-18 Thread Emmanuel Cecchet


Hi all,

I did not find the option to remove the page so I just left the links 
section. Feel free to remove the page so that there is no trace left of 
this in the history. I don't think it is worth keeping anything since no 
one found value in it anyway.
Until I figure out a way to come up with a constructive proposal that 
does not offend anyone, I think that it is best that I do not contribute 
anything to the wiki anymore.


Best,
Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning wiki page

2008-12-17 Thread Emmanuel Cecchet


Hi Robert,

Thanks all for the time you are spending commenting on this during this 
busy Commit Fest.

I am not a fan of the proposed syntax.  It is conceptually similar to
what we already do with constraints, but invents a whole new syntax to
no obvious benefit that I can see.  
Actually I did not invent a new syntax but tried to map the Oracle 
syntax which seems to be a requirement that appeared at multiple 
occasions in the discussions. The question is more whether Postgres 
wants to have its own syntax (and there might be good reasons for that) 
or if we want to use a syntax similar to other databases for easier 
migration (after all Postgres is very late on that topic compared to 
other databases).

I think we would do well to look
at what other systems besides Oracle do, as well as considering any
newer ideas Oracle may have introduced.  Perhaps this would be a good
thing to add to the Wiki page - instead of saying, this is the design,
say, here are some different possibilities, what do we like?
  
Maybe I was not clear on the intent of the wiki page. I was not trying 
to pretend that this was the design but I was just trying to do my 
homework on what are the issues that we will have to address (that's 
basically the section headers) and I just filled the sections with what 
I understood so far from the discussions, the Oracle and MySQL 
documentations.
So yes, there is more work to do on understanding what other systems 
besides Oracle & MySQL do, and then figuring out what we want to propose 
in Postgres.

Oracle's new interval partitioning sounds great, but it ignores the
reality that most data varies considerably over time, either growing or
fluctuating. I much prefer the idea of a size-equalized partitioning
scheme, as implemented by Coppereye's Greenwich. That method gives equal



Sometimes (though certainly not always), the structure of the
underling data makes interval partitioning a win, as when for example
you are accumulating transactions that are billed at the end of each
month.  If you do a lot of queries on the open transactions for the
current month, you want to make sure that there's a partition break at
the start of the month so that you're not unnecessarily scanning some
of the previous month's entries.
  
Partitions of similar sized are usually achieved by hash partitioning 
(in both Oracle and MySQL). There are many use cases where 
range/interval partitioning makes a lot of sense. One use case that was 
mentioned to me is multi-level hierarchies like year->month->week->day 
that would require more than 2 levels of partitioning. I think we can 
think of a way of having an arbitrary depth of partitions even though 
partition management might become slightly more complex/


I will try to integrate the comments as I see them on the mailing list 
but feel free also to update the wiki page with your thoughts and use 
cases.


Thanks again for the feedback,
Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 138 matches

Mail list logo