Re: [HACKERS] Partitioning option for COPY
Hannu Krosing wrote: After triggers can't change tuple, thus cant change routing. An after trigger can always issue an update of the tuple but that should be trapped by the regular mechanism that will deal with updates (when we have it available). Also the description omits the execution of before and after statement triggers. While those can apply to the parent table (but the same question about what happens if the after statement modifies routing decision still applies), what does it mean in the case of COPY to have statement triggers on the child tables? What statement triggers do you mean ? I don't think we have ON COPY triggers ? I mean CREATE TRIGGER /name/ { BEFORE | AFTER } /event/ ON /table/ FOR EACH STATEMENT Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com
Re: [HACKERS] Partitioning option for COPY
Robert Haas wrote: On Wed, Nov 25, 2009 at 5:03 AM, Hannu Krosing wrote: I'd propose that triggers on both parent table and selected child are executed. I was thinking we should make the partitioning decision FIRST, before any triggers are fired, and then fire only those triggers relevant to the selected partition. If the BEFORE triggers on the partition modify the tuple in a way that makes it incompatible with the table constraints on that partition, the insert (or update) fails. Firing triggers on more than one table is pretty substantially incompatible with what we do elsewhere and I'm not clear what we get in exchange. What is the use case for this? I don't have a use case for this but I was puzzled with that when I had to implement trigger support in COPY with partitioning. I came to the same conclusion as you and made the operation fail if the trigger was trying to move the tuple to another partition. However, I had a problem with after row triggers that I had to call synchronously to be able to detect the change. We will need something to tell us that an after row trigger did not mess with the routing decision. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Hannu Krosing wrote: On Tue, 2009-11-24 at 10:08 -0500, Emmanuel Cecchet wrote: Itagaki Takahiro wrote: I just edited a wiki page for this discussion. I hope it can be a help. http://wiki.postgresql.org/wiki/Table_partitioning I guess the problem of handling user triggers is still open. If we allow triggers on partitions, badly written logic could lead to infinite loops in routing. In the case of COPY, an after statement trigger could change all the routing decisions taken for each row. A simple update to the row can cause it to move between partitions, no ? Yes. I am not sure what the semantic should be if you have triggers defined on the parent and child tables. Which triggers do you fire if the insert is on the parent table but the tuple ends up in a child table? I'd propose that triggers on both parent table and selected child are executed. 1. first you execute before triggers on parent table, which may change which partition the row belongs to 2. then you execute before triggers on selected child table 2.1 if this changes the child table selection repeat from 2. 3. save the tuple in child table 4. execute after triggers of the final selected child table What if that trigger changes again the child table selection? 5. execute after triggers of parent table Same here, what if the trigger changes the child table selection. Do we re-execute triggers on the new child table? Also it is debatable whether we should execute an after trigger on a table where nothing was really inserted. order of 4. and 5. is selected arbitrarily, others are determined by flow. Also the description omits the execution of before and after statement triggers. While those can apply to the parent table (but the same question about what happens if the after statement modifies routing decision still applies), what does it mean in the case of COPY to have statement triggers on the child tables? You cannot know in advance where the tuples are going to go and fire the before statement triggers. If you had to fire after statement triggers, in which order would you fire them? If the new implementation hides the child tables, If you hide child tables, you suddenly need a lot of new syntax to "unhide" them, so that partitions can be manipulated. Currently it is easy to do it with INHERIT / NO INHERIT. Agreed, but I think that we will discover some restrictions that will apply to child tables. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Syntax for partitioning
Greg Smith wrote: I just made a few updates to http://wiki.postgresql.org/wiki/Table_partitioning , merging in the stuff that had been on the ToDo page and expanding the links to discussion on this list a bit. The number of submitted patches over the last couple of years that handle some subset of the desired feature set here is really remarkable when you see them all together. Should we add the 'WITH (...) TABLESPACE tbs' options to the syntax since they are supported? Do we support ALTER ... SET TABLESPACE? Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Syntax for partitioning
Hi, Sorry for commenting only now but I think that we need to be able to store the partitions in different tablespaces. Even if originally the create table creates all partitions in the same tablespace, individual partitions should be allowed to be moved in different tablespaces using alter table or alter partition. I think that other databases allows the user to define a tablespace for each partition in the create table statement. In a warehouse, you might want to split your partitions on different volumes and over time, move older partitions to storage with higher compression if that data is not to be accessed frequently anymore. Altering tablespaces for partitions is important in that context. Are you also planning to provide partitioning extensions to 'create table as'? Thanks Emmanuel Here is a WIP partitioning patch. The new syntax are: 1. CREATE TABLE parent (...); 2. ALTER TABLE parent PARTITION BY { RANGE | LIST } ( key ); 3. CREATE TABLE child (...); 4. ALTER TABLE child INHERIT parent AS PARTITION VALUES ...; We can also use "CREATE TABLE PARTITION BY" as 1+2+3+4 and "CREATE PARTITION" as 3+4. I think "INHERIT AS PARTITION" is rarely used typically, but such orthogonality seems to be cleaner. The most complex logic of the patch is in ATExecAddInherit(). It scans existing partitions and generate CHECK constraint for the new partition. Any comments to the design? If no objections, I'd like to stop adding features in this CommitFest and go for remaining auxiliary works -- pg_dump, object dependency checking, documentation, etc. - Catalog changes - In addition to pg_partition, I added pg_inherits.inhvalues field. The type of field is "anyarray" and store partition values. For range partition, an upper bound value is stored in the array. For list partition, list values are stored in it. These separated value fields will be useful to implement partition triggers in the future. In contrast, reverse engineering of check constraints is messy. CATALOG(pg_inherits,2611) BKI_WITHOUT_OIDS { Oid inhrelid; Oid inhparent; int4inhseqno; anyarrayinhvalues; /* values for partition */ } FormData_pg_inherits; CREATE TABLE pg_partition ( partrelid oid REFERENCES oid ON pg_class,-- partitioned table oid partopr oid REFERENCES oid ON pg_operator, -- operator to compare keys partkind "char", -- kind of partition: 'R' (range) or 'L' (list) partkey text, -- expression tree of partition key PRIMARY KEY (partrelid) ) WITHOUT OIDS; -- Limitations and Restrictions -- * We can create a new partition as long as partitioning keys are not conflicted with existing partitions. Especially, we cannot add any partitions if we have overflow partitions because a new partition always split the overflow partition. * We cannot reuse an existing check constraint as a partition constraint. ALTER TABLE INHERIT AS PARTITION brings on a table scan to add a new CHECK constraint. * No partition triggers nor planner and executor improvements. It would come in the future development. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---- -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Itagaki Takahiro wrote: Emmanuel Cecchet wrote: I guess the problem of handling user triggers is still open. If we allow triggers on partitions, badly written logic could lead to infinite loops in routing. Infinite loops are not a partition-related problem, no? We can also find infinite loops in user defined functions, recursive queries, etc. I think the only thing we can do for it is to *stop* loops instead of prevention, like max_stack_depth. I was thinking a trigger on child1 updating the partition key forcing the tuple to move to child2. And then a trigger on child2 updating the key again to move the tuple back to child1. You end up with an infinite loop. With the current proposed implementation, would it be possible to define a view using child tables? No, if you mean using a partition-view. I'm thinking we are moving our implementation of partitioning from view-based to built-in feature. Do you have any use-cases that requires view-based partitioning? Was the inheritance-based partitioning not enough for it? Nevermind, I was thinking about the implications of materialized views but Postgres does not have materialized views! I have other questions related to create table but I will post them in the 'syntax for partitioning' thread. Thanks Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Itagaki Takahiro wrote: I just edited a wiki page for this discussion. I hope it can be a help. http://wiki.postgresql.org/wiki/Table_partitioning I guess the problem of handling user triggers is still open. If we allow triggers on partitions, badly written logic could lead to infinite loops in routing. In the case of COPY, an after statement trigger could change all the routing decisions taken for each row. I am not sure what the semantic should be if you have triggers defined on the parent and child tables. Which triggers do you fire if the insert is on the parent table but the tuple ends up in a child table? If the new implementation hides the child tables, it might be safer to not allow triggers on child tables altogether and use the parent table as the single point of entry to access the partition (and define triggers). With the current proposed implementation, would it be possible to define a view using child tables? Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Simon Riggs wrote: I was unaware you were developing these ideas and so was unable to provide comments until now. The first patch was published to this list on September 10 (almost 2.5 months ago) along with the wiki page describing the problem and the solution. What should I have done to raise awareness further? /E -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Robert Haas wrote: On Mon, Nov 23, 2009 at 9:39 AM, Emmanuel Cecchet wrote: I think you should read the thread and the patch before making any false statements like you did in your email. 1. The patch does not use any trigger for routing. Whoa, whoa! I don't think that Simon said that it did. But even if I am wrong and he did... Quote from Simon's email: "It is too narrow in its scope and potentially buggy in its approach to developing a cache and using trigger-like stuff." You should really think twice about the style of your emails that cast a detestable tone to discussions on pg-hackers. ...I certainly don't think this comment is justified. This is a technical discussion about the best way of solving a certain problem, and I don't believe that any of the discussion up to this point has been anything other than civil. I can tell that you are frustrated that your patch is not getting the support you would like to see it get, but launching ad hominem attacks on Simon or anyone else is not going to help We certainly don't live in the same civilization then. I am not frustrated if my patch does not get in because of technical considerations and I am happy so far with Jan's feedback that helped a lot. I think there is a misunderstanding between what Simon wants ('Anyway, I want data routing, as is the intention of this patch.') and what this patch is about. This patch is just supposed to load tuples in a hierarchy of tables as this is a recurrent use case in datawarehouse scenarios. It is not supposed to solve data routing in general (otherwise that would be integrated standard in COPY and not as an option). But it looks like it is a waste of everybody's time to continue this discussion further. Just move the patch to the rejected patches and let's wait for Itagaki's implementation. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Simon, I think you should read the thread and the patch before making any false statements like you did in your email. 1. The patch does not use any trigger for routing. 2. This is just an option for COPY that is useful for loading operations in the datawarehouse world. It is not meant to implement full partitioning as explained many times already in this thread. 3. This patch elaborates on existing mechanisms and cannot rely on a meta-data representation of partitions which does not exist yet and will probably not exist in 8.5 You should justify your statements when you say 'potentially buggy in its approach to developing a cache and using trigger-like stuff'. I understand that you don't like it because this is not what you want but this is not my fault. This is not an implementation of partitioning like COPY does not do update/delete/alter/... And yes the use case is 'narrow' like any option in COPY. It is like complaining that the CSV option is not useful because you want to load binary dumps. If Itagaki gets the support of the community to get his implementation accepted, I will gladly use it. Contributing? If Aster is willing to contribute a code monkey to implement your specs, why not but you will have to convince them. You should really think twice about the style of your emails that cast a detestable tone to discussions on pg-hackers. Emmanuel On Wed, 2009-11-11 at 19:53 -0500, Emmanuel Cecchet wrote: Hi, I have extracted the partitioning option for COPY (removed the error logging part) from the previous patch. We can use an INSERT trigger to route tuples into partitions even now. Why do you need an additional router for COPY? Tom has already explained on the list why using a trigger was a bad idea (and I know we can use a trigger since I am the one who wrote it). If you look at the code you will see that you can do optimizations in the COPY code that you cannot do in the trigger. Also, it would be nicer that the router can works not only in COPY but also in INSERT. As 8.5 will at best provide a syntactic hack on top of the existing constraint implementation, I think that it will not hurt to have routing in COPY since we will not have it anywhere otherwise. BTW, I'm working on meta data of partitioning now. Your "partitioning" option in COPY could be replaced with the catalog. This implementation is only for the current 8.5 and it will not be needed anymore once we get a fully functional partitioning in Postgres which seems to be for a future version. Yes, the trigger way of doing this is a bad way. I regret to say that the way proposed here isn't much better, AFAICS. Let me explain why I think that, but -1 to anyone applying this patch. This patch proposes keeping a cache of last visited partitions to reduce the overhead of data routing. What I've requested is that partitioning work by using a data structure held in relcache for inheritance parents. This differs in 3 ways from this patch a) it has a clearly defined location for the cached metadata, with clearly identified and well worked out mechanisms for cache invalidation b) the cache can be built once when it is first needed, not slowly grown as parts of the metadata are used c) it would be available for all parts of the server, not just COPY. The easiest way to build that metadata is when structured partitioning info is available. i.e. the best next action is to complete and commit Itagaki's partitioning syntax patch. Then we can easily build the metadata for partitioning, which can then be used in COPY for data routing. Anyway, I want data routing, as is the intention of this patch. I just don't think this patch is a useful way to do it. It is too narrow in its scope and potentially buggy in its approach to developing a cache and using trigger-like stuff. ISTM that with the right metadata in the right place, a cleaner and easier solution is still possible for 8.5. The code within COPY should really just reduce to a small piece of code to derive the correct relation for the desired row and then use that during heap_insert(). I have just discussed partitioning with Itagaki-san at JPUG, so I know his plans. Itagaki-san and Manu, please can you work together to make this work for 8.5? --- A more detailed explanation of Partitioning Metadata: Partitioning Metadata is information held on the relcache for a table that has child partitions. Currently, a table does not cache info about its children, which prevents various optimisations. We would have an extra pointer on the Relation struct that points to a PartitioningMetadata struct. We can fill in this information when we construct the relcache for a relation, or we can populate it on demand the first time we attempt to use that information (if it exists). We want to hold an array of partition boundary values.
Re: [HACKERS] Partitioning option for COPY
Stephan Szabo wrote: On Sun, 22 Nov 2009, Emmanuel Cecchet wrote: As I explained to Tom, if the after row trigger is called asynchronously I get a relcache leak on the child table at the end of the copy operation. If the trigger is called synchronously (like a before row trigger) it works fine. Also calling the after row trigger synchronously allows me to detect any potential problem between the actions of the trigger and the routing decision. I am open to any suggestion for a more elegant solution. Well, I think there are still some issues there that at least need to be better documented. For example, create or replace function fi() returns trigger as ' begin if (NEW.p is not null) then if (select count(*) from i where i.i = NEW.p) = 0 then raise exception ''No parent''; end if; end if; return NEW; end; ' language 'plpgsql'; create or replace function fc() returns trigger as ' begin if (NEW.p is not null) then if (select count(*) from c where c.i = NEW.p) = 0 then raise exception ''No parent''; end if; end if; return NEW; end; ' language 'plpgsql'; create or replace function fp() returns trigger as ' begin if (NEW.p is not null) then if (select count(*) from p where p.i = NEW.p) = 0 then raise exception ''No parent''; end if; end if; return NEW; end; ' language 'plpgsql'; drop table i; drop table c; drop table p cascade; create table i(i int, p int); create trigger tri after insert on i for each row execute procedure fi(); create table c(i int, p int); create trigger trc after insert on c for each row execute procedure fc(); create table p(i int, p int); create table p1 (check (i > 0 and i <= 10)) inherits (p); create table p2 (check (i > 10 and i <= 20)) inherits (p); create table p3 (check (i > 20 and i <= 30)) inherits (p); create trigger trp1 after insert on p1 for each row execute procedure fp(); create trigger trp2 after insert on p2 for each row execute procedure fp(); create trigger trp3 after insert on p3 for each row execute procedure fp(); insert into i values (1,3),(2,1),(3,NULL); copy c from stdin; 1 3 2 1 3 \N \. copy p from stdin with (partitioning); 1 3 2 1 3 \N \. gives me a successful load into i and c, but not into p with the current patch AFAICS while a load where the 3 row is first does load. Well, if you don't insert anything in p (the table, try to avoid using the same name for the table and the column in an example), copy will insert (1,3) in p1 and then the trigger will evaluate select count(*) from p where p.i = NEW.p => NEW.p is 3 and the only p.i available is 1. This should return 0 rows and raise the exception. This seems normal to me. The only reason it works for i is because you inserted the values before the copy. Am I missing something? Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Jan UrbaĆski wrote: o) my main concern is still valid: the design was never agreed upon. The approach of using inheritance info for automatic partitioning is, at least IMHO, too restricted. Regular INSERTs won't get routed to child tables. Data from writable CTEs won't get routed. People wanting to do partitioning on something else that constraints are stuffed. Well, this patch does not claim to implement partitioning for Postgres, it just offers partitioning as an option for COPY (and COPY only) based on the existing mechanism in Postgres. I have already participated in lengthy and relatively sterile discussions on how to implement a full-blown partitioning but we never reached the beginning of an agreement and it was decided that a step-by-step approach would be better. I will propose another implementation of partitioning in COPY once Postgres has another representation than constraints on child tables to implement it. I strongly suspect the patch will get rejected on the grounds of lack of community agreement on partitioning, but I'd hate to see your work wasted. It's not too late to open a discussion on how automatic partitioning could work (or start working out a common proposal with the people discussing in the "Syntax for partitioning" thread). This is not my call. Right now the syntax for partitioning does not change anything to Postgres, it just adds syntactic sugar on top of the existing implementation. It will not route anything or answer any of the needs you mentioned in your previous point. Marking as Waiting on Author, although I'd really like to see a solid design being agreed upon, and then the code. You are asking the wrong person if you want me to lead the partitioning design discussions. I already tried once and I was unsuccessful. As nothing as changed I don't see why I would be more successful this time. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Tom Lane wrote: =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= writes: that got broken by the WHEN triggers patch (c6e0a36243a54eff79b47b3a0cb119fb67a55165), which changed the TriggerEnabled function signature, the code currently does not compile. [ squint... ] What is that patch doing touching the innards of trigger.c in the first place? I can't see any reason for trigger.c to be associated with partitioning. The problem I had is that if I used the standard trigger mechanism for after row inserts on a child table where the trigger is called asynchronously, I had a relcache leak on the child table. I tried to ask for help on that earlier on but it got lost with other discussions on the patch. So I tried to call the after trigger synchronously on the child table and it worked. So the patch is just adding a synchronous call to after row insert triggers that is called when the tuple is moved to a child table (also allows to detect for triggers that are messing with the routing). I would be happy to follow any recommendation for a more elegant solution to the problem. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Union test case broken in make check?
Tom Lane wrote: Kenneth Marshall writes: Without an order by, the order is not defined. Yeah, but with the same data and the same software it should generally give the same result; as evidenced by the fact that these same regression tests have worked for most people for years. There's something odd happening on Emmanuel's machine. Maybe he's changed the hashing algorithms or some planner cost parameters? I did not change anything to Postgres and I can reproduce the problem with a vanilla checkout of HEAD. However, I think I started to see the problem since my last VMWare and OS upgrade (unfortunately I did both simultaneously). For info, I am using VMWare Workstation v6.5.3 build-185404 on Vista 6.0.60002 SP2. The VM is an Ubuntu 9.04 (Jaunty) with a 2.6.28-16 SMP kernel on x86_64 (2 cores available for the VM, 4 cores total). gcc is 4.3.3 (Ubuntu 4.3.3-5ubuntu4). Let me know if you need additional info about my setup. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Union test case broken in make check?
Then I guess that we need to fix the test. Emmanuel Kenneth Marshall wrote: Without an order by, the order is not defined. The answers are the same but the test gives a false failure because of the lack of ordering. Regards, Ken On Thu, Nov 19, 2009 at 07:54:30PM -0500, Emmanuel Cecchet wrote: Tom Lane wrote: Andrew Dunstan writes: Emmanuel Cecchet wrote: Is it just me or the union test case fails in CVS head? The buildfarm is pretty much all green: <http://www.pgbuildfarm.org/cgi-bin/show_status.pl> So it looks like it's you :-) When in doubt, try "make distclean" and a full rebuild before assuming you've got a problem worth tracking down ... Well, I did: 1. make distclean 2. configure with CFLAGS=-O0 --enable-cassert --enable-debug --without-perl --without-python --without-tcl --without-openssl 3. make (everything normal) 4. make check And it still fails for me. I am attaching my regression.diffs if someone thinks it is worth tracking down ... Emmanuel *** /home/manu/workspace/PG-HEAD/src/test/regress/expected/union.out 2009-02-09 16:18:28.0 -0500 --- /home/manu/workspace/PG-HEAD/src/test/regress/results/union.out 2009-11-19 19:37:32.0 -0500 *** *** 198,208 WHERE f1 BETWEEN 0 AND 100; five --- --1004.3 - -34.84 - -1.2345678901234e-200 0 123456 (5 rows) SELECT CAST(f1 AS char(4)) AS three FROM VARCHAR_TBL --- 198,208 WHERE f1 BETWEEN 0 AND 100; five --- 0 123456 + -34.84 + -1.2345678901234e-200 +-1004.3 (5 rows) SELECT CAST(f1 AS char(4)) AS three FROM VARCHAR_TBL *** *** 263,278 SELECT q2 FROM int8_tbl INTERSECT SELECT q1 FROM int8_tbl; q2 -- - 4567890123456789 123 (2 rows) SELECT q2 FROM int8_tbl INTERSECT ALL SELECT q1 FROM int8_tbl; q2 -- 4567890123456789 4567890123456789 - 123 (3 rows) SELECT q2 FROM int8_tbl EXCEPT SELECT q1 FROM int8_tbl ORDER BY 1; --- 263,278 SELECT q2 FROM int8_tbl INTERSECT SELECT q1 FROM int8_tbl; q2 -- 123 + 4567890123456789 (2 rows) SELECT q2 FROM int8_tbl INTERSECT ALL SELECT q1 FROM int8_tbl; q2 -- + 123 4567890123456789 4567890123456789 (3 rows) SELECT q2 FROM int8_tbl EXCEPT SELECT q1 FROM int8_tbl ORDER BY 1; *** *** 305,320 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q2 FROM int8_tbl; q1 -- - 4567890123456789 123 (2 rows) SELECT q1 FROM int8_tbl EXCEPT ALL SELECT DISTINCT q2 FROM int8_tbl; q1 -- 4567890123456789 4567890123456789 - 123 (3 rows) -- --- 305,320 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q2 FROM int8_tbl; q1 -- 123 + 4567890123456789 (2 rows) SELECT q1 FROM int8_tbl EXCEPT ALL SELECT DISTINCT q2 FROM int8_tbl; q1 -- + 123 4567890123456789 4567890123456789 (3 rows) -- *** *** 341,348 SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl; q1 --- - 4567890123456789 123 456 4567890123456789 123 --- 341,348 SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl; q1 --- 123 + 4567890123456789 456 4567890123456789 123 *** *** 353,367 SELECT q1 FROM int8_tbl INTERSECT (((SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl))); q1 -- - 4567890123456789 123 (2 rows) (((SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl))) UNION ALL SELECT q2 FROM int8_tbl; q1 --- - 4567890123456789 123 456 4567890123456789 123 --- 353,367 SELECT q1 FROM int8_tbl INTERSECT (((SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl))); q1 -- 123 + 4567890123456789 (2 rows) (((SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl))) UNION ALL SELECT q2 FROM int8_tbl; q1 --- 123 + 45678901234
Re: [HACKERS] Union test case broken in make check?
Tom Lane wrote: Andrew Dunstan writes: Emmanuel Cecchet wrote: Is it just me or the union test case fails in CVS head? The buildfarm is pretty much all green: <http://www.pgbuildfarm.org/cgi-bin/show_status.pl> So it looks like it's you :-) When in doubt, try "make distclean" and a full rebuild before assuming you've got a problem worth tracking down ... Well, I did: 1. make distclean 2. configure with CFLAGS=-O0 --enable-cassert --enable-debug --without-perl --without-python --without-tcl --without-openssl 3. make (everything normal) 4. make check And it still fails for me. I am attaching my regression.diffs if someone thinks it is worth tracking down ... Emmanuel *** /home/manu/workspace/PG-HEAD/src/test/regress/expected/union.out 2009-02-09 16:18:28.0 -0500 --- /home/manu/workspace/PG-HEAD/src/test/regress/results/union.out 2009-11-19 19:37:32.0 -0500 *** *** 198,208 WHERE f1 BETWEEN 0 AND 100; five --- --1004.3 - -34.84 - -1.2345678901234e-200 0 123456 (5 rows) SELECT CAST(f1 AS char(4)) AS three FROM VARCHAR_TBL --- 198,208 WHERE f1 BETWEEN 0 AND 100; five --- 0 123456 + -34.84 + -1.2345678901234e-200 +-1004.3 (5 rows) SELECT CAST(f1 AS char(4)) AS three FROM VARCHAR_TBL *** *** 263,278 SELECT q2 FROM int8_tbl INTERSECT SELECT q1 FROM int8_tbl; q2 -- - 4567890123456789 123 (2 rows) SELECT q2 FROM int8_tbl INTERSECT ALL SELECT q1 FROM int8_tbl; q2 -- 4567890123456789 4567890123456789 - 123 (3 rows) SELECT q2 FROM int8_tbl EXCEPT SELECT q1 FROM int8_tbl ORDER BY 1; --- 263,278 SELECT q2 FROM int8_tbl INTERSECT SELECT q1 FROM int8_tbl; q2 -- 123 + 4567890123456789 (2 rows) SELECT q2 FROM int8_tbl INTERSECT ALL SELECT q1 FROM int8_tbl; q2 -- + 123 4567890123456789 4567890123456789 (3 rows) SELECT q2 FROM int8_tbl EXCEPT SELECT q1 FROM int8_tbl ORDER BY 1; *** *** 305,320 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q2 FROM int8_tbl; q1 -- - 4567890123456789 123 (2 rows) SELECT q1 FROM int8_tbl EXCEPT ALL SELECT DISTINCT q2 FROM int8_tbl; q1 -- 4567890123456789 4567890123456789 - 123 (3 rows) -- --- 305,320 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q2 FROM int8_tbl; q1 -- 123 + 4567890123456789 (2 rows) SELECT q1 FROM int8_tbl EXCEPT ALL SELECT DISTINCT q2 FROM int8_tbl; q1 -- + 123 4567890123456789 4567890123456789 (3 rows) -- *** *** 341,348 SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl; q1 --- - 4567890123456789 123 456 4567890123456789 123 --- 341,348 SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl; q1 --- 123 + 4567890123456789 456 4567890123456789 123 *** *** 353,367 SELECT q1 FROM int8_tbl INTERSECT (((SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl))); q1 -- - 4567890123456789 123 (2 rows) (((SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl))) UNION ALL SELECT q2 FROM int8_tbl; q1 --- - 4567890123456789 123 456 4567890123456789 123 --- 353,367 SELECT q1 FROM int8_tbl INTERSECT (((SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl))); q1 -- 123 + 4567890123456789 (2 rows) (((SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl))) UNION ALL SELECT q2 FROM int8_tbl; q1 --- 123 + 4567890123456789 456 4567890123456789 123 *** *** 416,423 SELECT q1 FROM int8_tbl EXCEPT (((SELECT q2 FROM int8_tbl ORDER BY q2 LIMIT 1))); q1 -- - 4567890123456789 123 (2 rows) -- --- 416,423 SELECT q1 FROM int8_tbl EXCEPT (((SELECT q2 FROM int8_tbl ORDER BY q2
[HACKERS] Union test case broken in make check?
Hi, Is it just me or the union test case fails in CVS head? manu -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Tom Lane wrote: Emmanuel Cecchet writes: Actually the list is supposed to stay around between statement executions. You don't want to restart with a cold cache at every statement so I really want this structure to stay in memory at a more global level. Cache? Why do you need a cache for COPY? Repeated bulk loads into the same table within a single session doesn't seem to me to be a case that is common enough to justify a cache. Actually the cache is only activated if you use the partitioning option. It is just a list of oids of child tables where tuples were inserted. It is common to have multiple COPY operations in the same session when you are doing bulk loading in a warehouse. (BTW, the quoted code seems to be busily reinventing OID Lists. Don't do that.) Yes, I understood that I should use an OidList instead. But I was trying to understand what I did wrong here (besides reinventing the oid list ;-)). Why do I get this segfault if I use memory from CacheMemoryContext? Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Tom Lane wrote: Emmanuel Cecchet writes: Tom Lane wrote: This looks like the patch is trying to create a data structure in a memory context that's not sufficiently long-lived for the use of the structure. If you do this in a non-cassert build, it will seem to work, some of the time, if the memory in question happens to not get reallocated to something else. I was using the CacheMemoryContext. Could someone tell me why this is wrong and what should have been the appropriate context to use? Well, (a) I doubt you really were creating the list in CacheMemoryContext, else it'd have not gotten clobbered; (b) creating statement-local data structures in CacheMemoryContext is entirely unacceptable anyway, because then they represent a permanent memory leak. Well I thought that this code would do it: child_table_lru = (OidLinkedList *)MemoryContextAlloc( + CacheMemoryContext, sizeof(OidLinkedList)); ... + /* Add the new entry in head of the list */ + new_head = (OidCell *) MemoryContextAlloc( + CacheMemoryContext, sizeof(OidCell)); The right context for statement-lifetime data structures is generally the CurrentMemoryContext the statement code is called with. Actually the list is supposed to stay around between statement executions. You don't want to restart with a cold cache at every statement so I really want this structure to stay in memory at a more global level. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Tom Lane wrote: =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= writes: Program received signal SIGSEGV, Segmentation fault. 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040, tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821 1821child_relation_id = child_oid_cell->oid_value; (gdb) p child_oid_cell $1 = (OidCell *) 0x7f7f7f7f This looks like the patch is trying to create a data structure in a memory context that's not sufficiently long-lived for the use of the structure. If you do this in a non-cassert build, it will seem to work, some of the time, if the memory in question happens to not get reallocated to something else. I was using the CacheMemoryContext. Could someone tell me why this is wrong and what should have been the appropriate context to use? Thanks Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch committers
The following email expresses my personal opinion and does not reflect the opinion of my employers. Bruce Momjian wrote: I also think the bad economy is making it harder for people/companies to devote time to community stuff when paid work is available. Actually the bad economy should be a booster for open source projects. There should be more developers with time to acquire new skills on projects that will get them a better job when the economy comes back. I think the problems are more rooted in the developer community itself. The pg-hackers mailing list is probably the less socially skilled developer community I have ever seen in all the open source projects I have been involved with. A very high standard is set for contributions, which is good for the quality of the code, but the lack of development process and clear decision chain turns every new contributor into endless frustration. For a patch to be committed, a vague consensus has to arise among the strong technical voice(s) (usually convincing Tom is enough). If a more complex feature needs to be implemented, the lack of decision process ends up in a first long round of emails until everybody gets tired of it. Then sometimes later someone tries to re-activate the debate for another round and so on (partitioning is a good example). You lost potential committers at each of these rounds. The way I see it, most companies try to push their agenda, contribute their patches back to the community if it works and just go with their own fork and closed implementation if this is too much work or burden. Whatever the economy, very few people can commit to an indefinite amount of time to get a feature integrated in Postgres. Now you should probably ask yourself what you should do as a community to get more committers? Like it was said at PGCon, to get a patch in, it is going to be hard and painful. How do you make it less hard and less painful? On the other end, how do we, simple developers, become better to reach the status of committers? How can we endure the constant bashing especially in the early stages of our learning phase (especially if you are not paid to do it)? How do we justify to our employers that they should persist through this long process without visibility, so that eventually their contribution will make it back to the community? How do we make it easier for companies to contribute code? The lightness of the development process (no project manager, no steering committee, no voting, no product management, ...) is both a strength and a weakness that makes Postgres what it is today. The commitfest started to address some of the most obvious issues but there is probably much more that can be done by looking at what is happening in the other open source communities. Even if the economy is hard, I found time to invest my own 2 cents in this email ;-) Emmanuel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Hi, I have extracted the partitioning option for COPY (removed the error logging part) from the previous patch. We can use an INSERT trigger to route tuples into partitions even now. Why do you need an additional router for COPY? Tom has already explained on the list why using a trigger was a bad idea (and I know we can use a trigger since I am the one who wrote it). If you look at the code you will see that you can do optimizations in the COPY code that you cannot do in the trigger. Also, it would be nicer that the router can works not only in COPY but also in INSERT. As 8.5 will at best provide a syntactic hack on top of the existing constraint implementation, I think that it will not hurt to have routing in COPY since we will not have it anywhere otherwise. BTW, I'm working on meta data of partitioning now. Your "partitioning" option in COPY could be replaced with the catalog. This implementation is only for the current 8.5 and it will not be needed anymore once we get a fully functional partitioning in Postgres which seems to be for a future version. Best regards, Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Partitioning option for COPY
Hi all, I have extracted the partitioning option for COPY (removed the error logging part) from the previous patch. The documentation and test suite sample are provided as well. More details are on the wiki page at http://wiki.postgresql.org/wiki/Auto-partitioning_in_COPY. Ignore the error logging related comments that do not apply here. Looking forward to your feedback Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com Index: src/test/regress/parallel_schedule === RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/parallel_schedule,v retrieving revision 1.57 diff -c -r1.57 parallel_schedule *** src/test/regress/parallel_schedule 24 Aug 2009 03:10:16 - 1.57 --- src/test/regress/parallel_schedule 11 Nov 2009 03:17:48 - *** *** 47,53 # execute two copy tests parallel, to check that copy itself # is concurrent safe. # -- ! test: copy copyselect # -- # Another group of parallel tests --- 47,53 # execute two copy tests parallel, to check that copy itself # is concurrent safe. # -- ! test: copy copyselect copy_partitioning # -- # Another group of parallel tests Index: src/backend/utils/adt/ruleutils.c === RCS file: /home/manu/cvsrepo/pgsql/src/backend/utils/adt/ruleutils.c,v retrieving revision 1.314 diff -c -r1.314 ruleutils.c *** src/backend/utils/adt/ruleutils.c 5 Nov 2009 23:24:25 - 1.314 --- src/backend/utils/adt/ruleutils.c 11 Nov 2009 03:17:48 - *** *** 218,224 static Node *processIndirection(Node *node, deparse_context *context, bool printit); static void printSubscripts(ArrayRef *aref, deparse_context *context); ! static char *generate_relation_name(Oid relid, List *namespaces); static char *generate_function_name(Oid funcid, int nargs, List *argnames, Oid *argtypes, bool *is_variadic); static char *generate_operator_name(Oid operid, Oid arg1, Oid arg2); --- 218,224 static Node *processIndirection(Node *node, deparse_context *context, bool printit); static void printSubscripts(ArrayRef *aref, deparse_context *context); ! char *generate_relation_name(Oid relid, List *namespaces); static char *generate_function_name(Oid funcid, int nargs, List *argnames, Oid *argtypes, bool *is_variadic); static char *generate_operator_name(Oid operid, Oid arg1, Oid arg2); *** *** 6347,6353 * We will forcibly qualify the relation name if it equals any CTE name * visible in the namespace list. */ ! static char * generate_relation_name(Oid relid, List *namespaces) { HeapTuple tp; --- 6347,6353 * We will forcibly qualify the relation name if it equals any CTE name * visible in the namespace list. */ ! char * generate_relation_name(Oid relid, List *namespaces) { HeapTuple tp; Index: doc/src/sgml/ref/copy.sgml === RCS file: /home/manu/cvsrepo/pgsql/doc/src/sgml/ref/copy.sgml,v retrieving revision 1.92 diff -c -r1.92 copy.sgml *** doc/src/sgml/ref/copy.sgml 21 Sep 2009 20:10:21 - 1.92 --- doc/src/sgml/ref/copy.sgml 11 Nov 2009 03:17:48 - *** *** 41,46 --- 41,47 ESCAPE 'escape_character' FORCE_QUOTE { ( column [, ...] ) | * } FORCE_NOT_NULL ( column [, ...] ) + PARTITIONING [ boolean ] *** *** 282,287 --- 283,301 + + PARTITIONING + + + In PARTITIONING mode, COPY TO a parent + table will automatically move each row to the child table that + has the matching constraints. This feature can be used with + ERROR_LOGGING to capture rows that do not match any + constraint in the table hierarchy. See the notes below for the + limitations. + + + *** *** 384,389 --- 398,421 VACUUM to recover the wasted space. + + PARTITIONING mode scans for each child table constraint in the + hierarchy to find a match. As an optimization, a cache of the last child + tables where tuples have been routed is kept and tried first. The size + of the cache is set by the copy_partitioning_cache_size + session variable. It the size is set to 0, the cache is disabled otherwise + the indicated number of child tables is kept in the cache (at most). + + + + PARTITIONING mode assumes that every child table has at least + one constraint defined otherwise an error is thrown. If child tables have + overlapping constraints, the row is i
Re: [HACKERS] DISTINCT ON
Tom Lane wrote: Greg Stark writes: On Wed, Nov 4, 2009 at 3:17 AM, Emmanuel Cecchet wrote: SELECT DISTINCT ON ('1'::varchar, '1'::varchar) a FROM (SELECT 1 AS a) AS a ORDER BY '1'::varchar, '1'::varchar, '2'::varchar; This sounds familiar. What version of Postgres are you testing this on? Presumably something before this bug http://archives.postgresql.org/pgsql-sql/2008-07/msg00123.php got fixed http://archives.postgresql.org/pgsql-committers/2008-07/msg00341.php I am using 8.3.6 and it looks like the fix was only integrated in 8.4. So using 8.4 should solve the problem. Thanks Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] DISTINCT ON
Hi all, It looks like Postgres has a restriction in DISTINCT ON queries where the DISTINCT ON expressions must match the left side of the ORDER BY list. The issue is that if a DISTINCT ON ... has multiple instances of a particular expression, this check doesn't seem to fire correctly. For example, this query returns an error (but I guess it shouldn't): SELECT DISTINCT ON ('1'::varchar, '1'::varchar) a FROM (SELECT 1 AS a) AS a ORDER BY '1'::varchar, '1'::varchar, '2'::varchar; And this query doesn't return an error (but I guess it should): SELECT DISTINCT ON ('1'::varchar, '2'::varchar, '1'::varchar) a FROM (SELECT 1 AS a) AS a ORDER BY '1'::varchar, '2'::varchar, '2'::varchar; Am I misunderstanding something or is there a bug? Thanks for the help Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Tom, Emmanuel Cecchet writes: Tom Lane wrote: There aren't any. You can *not* put a try/catch around arbitrary code without a subtransaction. Don't even think about it Well then why the tests provided with the patch are working? Because they carefully exercise only a tiny fraction of the system. The key word in my sentence above is "arbitrary". You don't know what a datatype input function might try to do, let alone triggers or other functions that COPY might have to invoke. They might do things that need to be cleaned up after, and subtransaction rollback is the only mechanism we have for that. Is it possible to use the try/catch mechanism to detect errors and log them and only abort the command at the end of the processing? This would have the advantage of being to be able to log all errors without the overhead of subtransactions and even add some additional information on where the error happened (parsing, constraints, trigger, ...) for further automated treatment. The result would still be clean since we would abort the COPY command in case of an error as it does currently (but we would not stop on the first error). I guess that it would make more sense to log to a file than to a table in that case. Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Tom Lane wrote: Emmanuel Cecchet writes: - speed with error logging best effort: no use of sub-transactions but errors that can safely be trapped with pg_try/catch (no index violation, There aren't any. You can *not* put a try/catch around arbitrary code without a subtransaction. Don't even think about it. Well then why the tests provided with the patch are working? I hear you when you say that it is not a generally applicable idea, but it seems that at least a couple of errors can be trapped with this mechanism. Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Tom Lane wrote: Ultimately, there's always going to be a tradeoff between speed and flexibility. It may be that we should just say "if you want to import dirty data, it's gonna cost ya" and not worry about the speed penalty of subtransaction-per-row. But that still leaves us with the 2^32 limit. I wonder whether we could break down COPY into sub-sub transactions to work around that... Regarding that tradeoff between speed and flexibility I think we could propose multiple options: - maximum speed: current implementation fails on first error - speed with error logging: copy command fails if there is an error but continue to log all errors - speed with error logging best effort: no use of sub-transactions but errors that can safely be trapped with pg_try/catch (no index violation, no before insert trigger, etc...) are logged and command can complete - pre-loading (2-phase copy): phase 1: copy good tuples into a [temp] table and bad tuples into an error table. phase 2: push good tuples to destination table. Note that if phase 2 fails, it could be retried since the temp table would be dropped only on success of phase 2. - slow but flexible: have every row in a sub-transaction -> is there any real benefits compared to pg_loader? Tom was also suggesting 'refactoring COPY into a series of steps that the user can control'. What would these steps be? Would that be per row and allow to discard a bad tuple? Emmanuel -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: m...@frogthinker.org Skype: emmanuel_cecchet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Greg Smith wrote: Absolutely, that's the whole point of logging to a file in the first place. What needs to happen here is that when one is aborted, you need to make sure that fact is logged, and with enough information (the pid?) to tie it to the COPY that failed. Then someone can crawl the logs to figure out what happened and what data did and didn't get loaded. Ideally you'd want to have that as database table information instead, to allow automated recovery and re-commit in the cases where the error wasn't inherent in the data but instead some other type of database failure. If one is aborted, nothing gets loaded anyway. Now if you want a separate log of the successful commands, I don't think that should apply just to COPY but to any operation (insert, update, DDL, ...) if you want to build some automatic retry mechanism. But this seems independent from COPY to me. manu -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Robert Haas wrote: On Wed, Oct 7, 2009 at 11:39 AM, Emmanuel Cecchet wrote: Robert Haas wrote: On Wed, Oct 7, 2009 at 9:12 AM, Emmanuel Cecchet wrote: Hi all, I think there is a misunderstanding about what the current patch is about. The patch includes 2 things: - error logging in a table for bad tuples in a COPY operation (see http://wiki.postgresql.org/wiki/Error_logging_in_COPY for an example; the error message, command and so on are automatically logged) - auto-partitioning in a hierarchy of child table if the COPY targets a parent table. The patch does NOT include: - logging errors into a file (a feature we can add later on (next commit fest?)) My failure to have read the patch is showing here, but it seems to me that error logging to a table could be problematic: if the transaction aborts, we'll lose the log. If this is in fact a problem, we should be implementing logging to a file (or stdout) FIRST. I don't think this is really a problem. You can always do a SELECT in the error table after the COPY operation if you want to diagnose what happened before the transaction rollbacks. Uhhh if the transaction has aborted, no new commands (including SELECT) will be accepted until you roll back. You are suggesting then that it is the COPY command that aborts the transaction. That would only happen if you had set a limit on the number of errors that you want to accept in a COPY command (in which case you know that there is something really wrong with your input file and you don't want the server to process that file). manu -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
The roadmap I would propose for the current list of enhancements to COPY is as follows: 1. new syntax for COPY options (already committed) 2. error logging in a table 3. auto-partitioning (just relies on basic error logging, so can be scheduled anytime after 2) 4. error logging in a file manu Andrew Dunstan wrote: Tom Lane wrote: Andrew Dunstan writes: Emmanuel Cecchet wrote: If you prefer to postpone the auto-partitioning to the next commit fest, I can strip it from the current patch and re-submit it for the next fest (but it's just 2 isolated methods really easy to review). I certainly think this should be separated out. In general it is not a good idea to roll distinct features together. It complicates both the reviewing process and the discussion. I think though that Greg was suggesting that we need some more thought about the overall road map. Agglomerating "independent" features onto COPY one at a time is going to lead to a mess, unless they fit into an overall design plan. I don't disagree with that. But even if we get a roadmap of some set of features we want to implement, rolling them all together isn't a good way to go. cheers andrew -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Robert Haas wrote: On Wed, Oct 7, 2009 at 9:12 AM, Emmanuel Cecchet wrote: Hi all, I think there is a misunderstanding about what the current patch is about. The patch includes 2 things: - error logging in a table for bad tuples in a COPY operation (see http://wiki.postgresql.org/wiki/Error_logging_in_COPY for an example; the error message, command and so on are automatically logged) - auto-partitioning in a hierarchy of child table if the COPY targets a parent table. The patch does NOT include: - logging errors into a file (a feature we can add later on (next commit fest?)) My failure to have read the patch is showing here, but it seems to me that error logging to a table could be problematic: if the transaction aborts, we'll lose the log. If this is in fact a problem, we should be implementing logging to a file (or stdout) FIRST. I don't think this is really a problem. You can always do a SELECT in the error table after the COPY operation if you want to diagnose what happened before the transaction rollbacks. I think that using a file to process the bad tuples is actually less practical than a table if you want to re-insert these tuples in the database. But anyway, the current implementation captures the tuple with all the needed information for logging and delegate the logging to a specific method. If we want to log to a file (or stdout), we can just provide another method to log the already captured info in a file. I think that the file approach is a separate feature but can be easily integrated in the current design. There is just extra work to make sure concurrent COPY commands writing to the same error file are using proper locking. Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Hi all, I think there is a misunderstanding about what the current patch is about. The patch includes 2 things: - error logging in a table for bad tuples in a COPY operation (see http://wiki.postgresql.org/wiki/Error_logging_in_COPY for an example; the error message, command and so on are automatically logged) - auto-partitioning in a hierarchy of child table if the COPY targets a parent table. The patch does NOT include: - logging errors into a file (a feature we can add later on (next commit fest?)) I can put the auto-partitioning in a separate patch if that helps but this patch relies on error logging to log possible errors in the routing of tuples. I think that the way to move forward is first to have a basic error logging facility that logs into a database table (like the current patch does) and then add enhancements. I don't think we should try to do the file logging at the same time. If you prefer to postpone the auto-partitioning to the next commit fest, I can strip it from the current patch and re-submit it for the next fest (but it's just 2 isolated methods really easy to review). Emmanuel Robert Haas wrote: On Wed, Oct 7, 2009 at 3:17 AM, Greg Smith wrote: I know this patch is attracting more reviewers lately, is anyone tracking the general architecture of the code yet? Emmanuel's work is tough to review just because there's so many things mixed together, and there's other inputs I think should be considered at the same time while we're all testing in there (such as the COPY patch Andrew Dunstan put together). I hadn't realized this was an issue, but I think it's a good point: a patch that does one thing well is much more likely to get accepted than a patch that does two things well, let alone two things poorly. It's just much easier to review and verify. Or maybe the name of the patch maybe should have tipped me off: "COPY enhancements" vs. "make COPY have feature X". What I'd like to see is for everything to get broken more into component chunks that can get commited and provide something useful one at a time, because I doubt taskmaster Robert is going to let this one linger around with scope creep for too long before being pushed out to the next CommitFest. I'm can't decide whether to feel good or bad about that appelation, so I'm going with both. But in all seriousness if this patch needs substantial reworking (which it sounds like it does) we should postpone it to the next CF; we are quickly running out of days, and it's not fair to reviewers or committers to ask for new reviews of substantially revised code with a only a week to go. ...Robert -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
I just realized that I forgot to CC the list when I answered to Josh... resending! Josh, >> I think that this was the original idea but we should probably rollback >> the error logging if the command has been rolled back. It might be more >> consistent to use the same hi_options as the copy command. Any idea what >> would be best? >> > > Well, if we're logging to a file, you wouldn't be *able* to roll them > back. Also, presumbly, if you abort a COPY because of errors, you > probably want to keep the errors around for later analysis. No? > You are right about the file (though this functionality is not implemented yet). I don't have any strong opinion about the right behavior if COPY aborts. The error log table would contain tuples that were not inserted anyway. Now knowing whether the bad tuples come from a successful COPY command or not will be left to the user. Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com
Re: [HACKERS] COPY enhancements
Hi Selena, This is my first pass at the error logging portion of this patch. I'm going to take a break and try to go through the partitioning logic as well later this afternoon. caveat: I'm not familiar with most of the code paths that are being touched by this patch. Overall: * I noticed '\see' included in the comments in your code. These should be removed. Should we avoid any doxygen tags in general in the Postgres code? * The regression is failling, as Jeff indicated, and I didn't figure out why yet either. Hopefully will have a look closer this afternoon. Let me know if the response I sent to Jeff works for you. Comments: * copy.c: Better formatting, maybe rewording needed for comment starting on line 1990. Some of the bad formatting has been left on purpose to minimize the size of the patch otherwise there would have been many tab/white spaces changes due to the indentation in the PG_TRY blocks. I suggest that whoever is going to commit the code runs pg_indent or I can fix the formatting once the reviewing is done. ** Maybe say: "Check that errorData->sqlerrcode only logged tuples that are malformed. This ensures that we let other errors pass through." Ok. * copy.c: line: 2668 -> need to fix that comment :) (/* Regular code */) -- this needs to be more descriptive of what is actually happening. We fell through the partitioning logic, right, and back to the default behavior? Yes. * copy.c: line 2881, maybe say instead: "Mark that we have not read a line yet. This is necessary so that we can perform error logging on complete lines only." Ok. Formatting: * copy.c: whitespace is maybe a little off at line: 2004-2009 * NEW FILES: src/backend/utils/misc/errorlogging.c & errorlogging.h need headers I was not sure what is auto-generated by SVN commit scripts. I'd be happy to add headers. Is there a specification somewhere or should I just copy/paste from another file? Code: * copy.c: line 1990 -> cur_lineno_str[11] & related: why is this conversion necessary? (sorry if that is a dumb question) This conversion is necessary if you log in the error table the index of the row that is causing the error (this is what is logged by default in ERROR_LOGGING_KEY). * copy.c: line 2660 -> what if we are error logging for copy? Does this get logged properly? Yes, this is in a try/catch block that performs error logging. * errorlogging.c: Noticed the comment at 503 -> 'note that we always enable wal logging'.. Thinking this through - the reasoning is that you won't be rolling back the error logging no matter what, right? I think that this was the original idea but we should probably rollback the error logging if the command has been rolled back. It might be more consistent to use the same hi_options as the copy command. Any idea what would be best? * src/include/commands/copy.c and copy.h: struct CopyStateData was moved from copy.c to copy.h; this made sense to me, just noting. That move made it a little tricky to find the changes that were made. There were 10 items added. Yes, patch can be a little bit lost when you move a big data structure like this one. ** super nit pick: 'readlineOk' uses camel-case instead of underscores like the rest of the new variables Yes, queryDesc also has camel-case. I will fix readlineOk. * errorlogging.c: could move currentCommand pg_stat call in Init routine into MalformedTupleIs, or maybe move the setting of that parameter into the Init routine for consistency's sake. The advantage of calling pg_stat in InitializeErrorLogging is that it is evaluated only once for the entire copy command (and it's not going to change during the execution of the command). I am not sure to understand what your second suggestion is since currentCommand is set and initialized in Init. Documentation: * doc/src/sgml/ref/copy.sgml: line 313: 'failif' needs a space ok ** Also: The error table log examples have relations shown in a different order than the actual CREATE TABLE declaration in the code. The order of the columns does not really matter but for consistency sake we can put the same order. * src/test/regress/sql/copyselect.sql: Format of options changed.. added parenthesis. Is this change documented elsewhere? (sorry if I just missed this in the rest of the thread/patch) Yes this has already been committed by Tom. The new format of options has been introduced just before this patch. I am attaching a revised version of the patch. Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com aster-copy-newsyntax-patch-8.5v6context.txt.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
The problem comes from the foo_malformed_terminator.data file. It is supposed to have a malformed terminator that was not catch by patch. The second line should look like: 2 two^M If it does not, you can edit it with emacs, go at the end of the second line and press Ctrl+q followed by Ctrl+m and that will do the trick. I am attaching a copy of the file, hoping that the attachment will arrive with the malformed terminator in your inbox! manu Jeff Davis wrote: On Fri, 2009-09-25 at 10:01 -0400, Emmanuel Cecchet wrote: Robert, Here is the new version of the patch that applies to CVS HEAD as of this morning. I just started looking at this now. It seems to fail "make check", diffs attached. I haven't looked into the cause of the failure yet. Regards, Jeff Davis -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com 1 one 2 two 3 three 4 four 5 five -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Join optimization for inheritance tables
Hi Simon, Thanks for the insight. I might take that as a long term project. I have to discuss that with my colleagues at Aster. It would certainly help to put together a wiki page with the key insights on the design of such implementation to be able to better scope the project and agree on what is the right approach for this. manu On Tue, 2009-09-22 at 18:16 -0400, Emmanuel Cecchet wrote: If the partitioning implementation does not make progress (and does not make it for 8.5) Manu, not sure if you are referring to Kedar's patch I reviewed earlier in July, but that patch didn't implement anything like the right internals for this to work. As Tom says, what we need is a single internal data structure that describes all the partitions. That can then be used to automatically route inserts, perform the push down merge joins and speed-up constraint exclusion. We should build this once and store it in the relcache for the parent relation and so would need some changes in relcache invalidation as well to make sure child ddl changes invalidate that. -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Robert, Here is the new version of the patch that applies to CVS HEAD as of this morning. Emmanuel On Fri, Sep 18, 2009 at 12:14 AM, Emmanuel Cecchet wrote: Here is a new version of error logging and autopartitioning in COPY based on the latest COPY patch that provides the new syntax for copy options (this patch also includes the COPY option patch). New features compared to previous version: - removed the GUC variables for error logging and use copy options instead (renamed options as suggested by Josh) - added documentation and examples (see copy.sgml) - partitioning also activated by copy option rather than GUC variable (renamed as well) - added a new ERROR_LOGGING_MAX_ERRORS option that allows to abort the COPY operation if a certain number of bad rows has been encountered. - updated unit tests I also tried to update the wiki pages but it's late and the doc is probably better for now. This addresses most of the comments so far except for the format of the error table (lack of natural key) and a possible pg_error_table as a default name. Emmanuel Emmanuel Cecchet wrote: Hi all, Finally the error logging and autopartitioning patches for COPY that I presented at PGCon are here! Error logging is described here: http://wiki.postgresql.org/wiki/Error_logging_in_COPY Autopartitioning is described here: http://wiki.postgresql.org/wiki/Auto-partitioning_in_COPY The attached patch contains both contribs as well as unit tests. I will submit shortly the new patch at commitfest.postgresql.org. Thanks in advance for your feedback, Emmanuel This no longer applies. ...Robert -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com aster-copy-newsyntax-patch-8.5v5context.txt.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Yes, I have to update the patch following what Tom already integrated of the COPY patch. I will get a new version posted as soon as I can. Emmanuel Robert Haas wrote: On Fri, Sep 18, 2009 at 12:14 AM, Emmanuel Cecchet wrote: Here is a new version of error logging and autopartitioning in COPY based on the latest COPY patch that provides the new syntax for copy options (this patch also includes the COPY option patch). New features compared to previous version: - removed the GUC variables for error logging and use copy options instead (renamed options as suggested by Josh) - added documentation and examples (see copy.sgml) - partitioning also activated by copy option rather than GUC variable (renamed as well) - added a new ERROR_LOGGING_MAX_ERRORS option that allows to abort the COPY operation if a certain number of bad rows has been encountered. - updated unit tests I also tried to update the wiki pages but it's late and the doc is probably better for now. This addresses most of the comments so far except for the format of the error table (lack of natural key) and a possible pg_error_table as a default name. Emmanuel Emmanuel Cecchet wrote: Hi all, Finally the error logging and autopartitioning patches for COPY that I presented at PGCon are here! Error logging is described here: http://wiki.postgresql.org/wiki/Error_logging_in_COPY Autopartitioning is described here: http://wiki.postgresql.org/wiki/Auto-partitioning_in_COPY The attached patch contains both contribs as well as unit tests. I will submit shortly the new patch at commitfest.postgresql.org. Thanks in advance for your feedback, Emmanuel This no longer applies. ...Robert -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Join optimization for inheritance tables
Tom, If the partitioning implementation does not make progress (and does not make it for 8.5), don't you think that this could be an interesting contribution to consider for this release? I have put on the wiki (http://wiki.postgresql.org/wiki/Join_optimization_for_inheritance_tables) the results obtained so far and the use case where it is most used. As I understand it, partitioning will certainly lead to some significant changes/enhancements to the planner. Do you think it is realistic to get that for 8.5? manu Herodotos Herodotou writes: This patch extends the query optimizer to consider joins between child tables when hierarchies are joined together. I looked over this patch a bit. I am of the opinion that this is a lot of work towards a dead-end direction :-(. The patch contains a large amount of rather inefficient code that attempts to reverse-engineer knowledge about whether an inheritance tree forms a range-partitioned table. We already know that reasoning from first principles using a set of arbitrary per-table constraints doesn't scale very well, and this would make it a lot worse. pgsql-hackers have been talking for some time about implementing an explicit partitioning feature, which would give the planner direct knowledge about this type of table structure without expending nearly so much work (and then expending it all over again for the next query). I think the right way to go at it is to implement that first, and then see about teaching the planner to plan like this. We could possibly extract a subset of the patch that just deals with pushing joins down through appends, without the constraint-exclusion logic that tries to eliminate provably empty join pairs. But I'm dubious that that would be worthwhile by itself. I think you need the constraint exclusion to eliminate a good deal of work before this is actually better than doing a single join above the append. There are a number of other things I don't like, such as the restrictive and not-obviously-necessary requirement that all the join clauses be simple "Var op Var" clauses. But the main thing is that a large fraction of the patch is doing something I don't think we should try to do. regards, tom lane -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic copy options
Tom Lane wrote: I wrote: As far as I can tell, the majority opinion is to use "format csv" BTW, if we're going to do that, shouldn't the "binary" option instead be spelled "format binary"? Looking at the doc, it looks like FORMAT should be mandatory and be either text, binary or csv (for now). manu -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic copy options
The easiest for both implementation and documentation might just be to have a matrix of options. Each option has a row and a column in the matrix. The intersection of a row and a column is set to 0 if options are not compatible and set to 1 if it is. This way we are sure to capture all possible combinations. This way, each time we find a new option, we just have to check in the matrix if it is compatible with the already existing options. Note that we can also replace the 0 with an index in an error message array. I can provide an implementation of that if this looks interesting to anyone. Emmanuel Robert Haas wrote: On Sun, Sep 20, 2009 at 2:25 PM, Emmanuel Cecchet wrote: Tom Lane wrote: Emmanuel Cecchet writes: Here you will force every format to use the same set of options How does this "force" any such thing? As far as I understand it, every format will have to handle every format options that may exist so that they can either implement it or throw an error. I don't think this is really true. To be honest with you, I think it's exactly backwards. The way the option-parsing logic works, we parse each option individually FIRST. Then at the end we do cross-checks to see whether there is an incompatibility in the combination specified. So if two different formats support the same option, we just change the cross-check to say that foo is OK with either format bar or format baz. On the other hand, if we split the option into bar_foo and baz_foo, then the first loop that does the initial parsing has to support both cases, and then you still need a separate cross-check for each one. That would argue in favor of a format option that defines the format. Right now I find it bogus to have to say (csv on, csv_header on). If csv_header is on that should imply csv on. The only problem I have is that it is not obvious what options are generic COPY options and what are options of an option (like format options). So maybe a tradeoff is to differentiate format specific options like in: (delimiter '.', format csv, format_header, format_escape...) This should also make clear if someone develops a new format what options need to be addressed. I think this is a false dichotomy. It isn't necessarily the case that every format will support a delimiter option either. For example, if we were to add an XML or JSON format (which I'm not at all convinced is a good idea, but I'm sure someone is going to propose it!) it certainly won't support specifying an arbitrary delimiter. IOW, *every* format will have different needs and we can't necessarily know which options will be applicable to those needs. But as long as we agree that we won't use the same option for two different format-specific options with wildly different semantics, I don't think that undecorated names are going to cause us much trouble. It's also less typing. PS: I don't know why but as I write this message I already feel that Tom hates this new proposal :-D I get those feeling sometimes myself. :-) Anyway, FWIW, I think Tom has analyzed this one correctly... ...Robert -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: m...@frogthinker.org Skype: emmanuel_cecchet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic copy options
Tom Lane wrote: Emmanuel Cecchet writes: Here you will force every format to use the same set of options How does this "force" any such thing? As far as I understand it, every format will have to handle every format options that may exist so that they can either implement it or throw an error. and if someone introduces a new option, you will have to modify all other formats to make sure they throw an error telling the user that this option is not supported. Well, if we do it your way then we will instead need a collection of code to throw errors for combinations like (xml on, csv_header on). I don't really see any improvement there. That would argue in favor of a format option that defines the format. Right now I find it bogus to have to say (csv on, csv_header on). If csv_header is on that should imply csv on. The only problem I have is that it is not obvious what options are generic COPY options and what are options of an option (like format options). So maybe a tradeoff is to differentiate format specific options like in: (delimiter '.', format csv, format_header, format_escape...) This should also make clear if someone develops a new format what options need to be addressed. Emmanuel PS: I don't know why but as I write this message I already feel that Tom hates this new proposal :-D -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic copy options
Tom Lane wrote: No, I don't think so. Suppose I write COPY ... (xml_header on) If HEADER isn't actually an option supported by XML format, what I will get here is an "unknown option" error, which conveys just about nothing --- is it really an unsupported combination, or did I just misspell the option name? Well, I don't see why you would write that if the option is not documented. Usually as a user, when I need to use a command, I look at the doc/man page and use the options that are indicated, I don't try to invent new options. That should prevent the kind of scenario you describe here: If we go with the other way then I would expect COPY ... (xml, header on) to draw a specific "HEADER is not supported in XML format" error. Of course, that will require some extra code to make it happen. So you could argue that format-specific option names are easier from the lazy programmer's viewpoint. But I don't believe the argument that they're better from the user's viewpoint. Here you will force every format to use the same set of options and if someone introduces a new option, you will have to modify all other formats to make sure they throw an error telling the user that this option is not supported. I don't think this is a great design and that it will be easy to extend. Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic copy options
Tom Lane wrote: Robert Haas writes: ... A related question is whether we should replace the "CSV" option with a "FORMAT" option for which one of the possible choices is "CSV" (much as we did with EXPLAIN). That might be a good idea --- otherwise we'd need some ad-hoc code to check that only one format option has been selected. On the other hand, it wouldn't be all that much code; and it would be a rather larger change from previous behavior than we were contemplating to start with. Comments anyone? That would assume that the semantic of the other options (header, quote, espace, ...) is exactly the same for each format. Otherwise this will be a nightmare to document. If we don't prefix with CSV, I guess that some users will spend some time to figure out that NULL is not a format option but FORCE NOT NULL is. If an option is only supported by one format (let's say XML) we will have to document every time which options are supported by which format which would be much easier and intuitive is options were readily prefixed by the format name. If you look at the COPY documentation in the error logging patch, if we strip the error_logging prefix, it is going to be very confusing. But I am willing to let Tom choose whatever he prefers as his birthday gift ;-) One other minor point is that the patch introduces an empty-list syntax for individual option values, but then treats it the same as specifying nothing: It seemed like a good idea because if you can do force_quote (a, b, c) and force_quote (a, b) you might think that you could also do force_quote (). Particularly for the sake of people generating SQL automatically by some means, it seems like this might simplify life. But possibly it shouldn't evaluate to the same value, so that you can't write OIDS () to mean the same thing as OIDS ON. Yeah, that type of scenario was why I didn't like it. I am not impressed by the empty-list argument, either. We don't support empty lists with that sort of syntax in most other places, so why here? There are counter-precedents even in the syntax of COPY itself: you can't write "()" for an empty column name list, and you can't write "()" for an empty copy_generic_option_list. Well this one was in Robert's initial patch and I left it as is. I don't have any strong opinion for or against it. Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic copy options
Josh Berkus wrote: Nope, but it was on the checklist and I was being thorough. That's a good thing. I was just seeing if I needed to get involved in performance testing. That would be good to have more people test the autopartitioning feature in COPY. If you want to be involved in performance testing on that, that would be great. Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic copy options
On that particular patch, as Robert mentioned, only the parsing has changed. In the case of \copy, the parsing is much lighter than before in psql (remains the same in the server). The bigger the COPY operation the less you will see the impact of the parsing since it is done only once for the entire operation. Emmanuel Dan Colish wrote: On Thu, Sep 17, 2009 at 07:10:35PM -0400, Andrew Dunstan wrote: Greg Smith wrote: On Thu, 17 Sep 2009, Dan Colish wrote: - Performance appears to be the same although I don't have a good way for testing this at the moment Here's what I do to generate simple COPY performance test cases: CREATE TABLE t (i integer); INSERT INTO t SELECT x FROM generate_series(1,10) AS x; \timing COPY t TO '/some/file' WITH [options]; BEGIN; TRUNCATE TABLE t; COPY t FROM '/some/file' WITH [options]; COMMIT; You can adjust the size of the generated table based on whether you want to minimize (small number) or maximize (big number) the impact of the setup overhead relative to actual processing time. Big numbers make sense if there's a per-row change, small ones if it's mainly COPY setup that's been changed if you want a small bit of data to test against. An example with one column in it is a good test case for seeing whether per-row impact has gone up. You'd want something with a wider row for other types of performance tests. The reason for the BEGIN/COMMIT there is that form utilizes an optimization that lowers WAL volume when doing the COPY insertion, which makes it more likely you'll be testing performance of the right thing. I usually prefer to test with a table that is more varied than anything you can make with generate_series. When I tested my ragged copy patch the other day I copied 1,000,000 rows out of a large table with a mixture of dates, strings, numbers and nulls. But then, it has a (tiny) per field overhead so I wanted to make sure that was well represented in the test. You are certainly right about wrapping it in begin/truncate/commit (and when you do make sure that archiving is not on). You probably want to make sure that the file is not on the same disk as the database, to avoid disk contention. Or, better, make sure that it is in OS file system cache, or on a RAM disk. cheers andrew If someone with a more significant setup can run tests that would ideal. I only have my laptop which is a single disk and fairly underpowered. That said, here are my results running the script above, it looks like the pach improves performance. I would really interested to see results on a larger data set and heavier iron. -- --Dan Without Patch: CREATE TABLE INSERT 0 10 Timing is on. COPY 10 Time: 83.273 ms BEGIN Time: 0.412 ms TRUNCATE TABLE Time: 0.357 ms COPY 10 Time: 140.911 ms COMMIT Time: 4.909 ms With Patch: CREATE TABLE INSERT 0 10 Timing is on. COPY 10 Time: 80.205 ms BEGIN Time: 0.351 ms TRUNCATE TABLE Time: 0.346 ms COPY 10 Time: 124.303 ms COMMIT Time: 4.130 ms -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic copy options
Here you go. Emmanuel Dan Colish wrote: On Thu, Sep 17, 2009 at 02:56:07PM -0400, Emmanuel Cecchet wrote: Dan, My bad, I copy/pasted the hard link in output/copy.source instead of @abs_build...@. Here is a complete version of the patch with the fix on output/copy.source. Emmanuel Emmanuel, Thanks for getting that back so quickly. As I said before, it applies cleanly and passes regression tests. I'm reading through the changes now. When you get a moment could you send me the patch as a context diff? -- --Dan -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com Index: src/test/regress/sql/copy2.sql === RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/copy2.sql,v retrieving revision 1.18 diff -c -r1.18 copy2.sql *** src/test/regress/sql/copy2.sql 25 Jul 2009 00:07:14 - 1.18 --- src/test/regress/sql/copy2.sql 17 Sep 2009 20:59:50 - *** *** 73,89 \. -- various COPY options: delimiters, oids, NULL string ! COPY x (b, c, d, e) from stdin with oids delimiter ',' null 'x'; 50,x,45,80,90 51,x,\x,\\x,\\\x 52,x,\,,\\\,,\\ \. ! COPY x from stdin WITH DELIMITER AS ';' NULL AS ''; 3000;;c;; \. ! COPY x from stdin WITH DELIMITER AS ':' NULL AS E'\\X'; 4000:\X:C:\X:\X 4001:1:empty:: 4002:2:null:\X:\X --- 73,89 \. -- various COPY options: delimiters, oids, NULL string ! COPY x (b, c, d, e) from stdin (oids, delimiter ',', null 'x'); 50,x,45,80,90 51,x,\x,\\x,\\\x 52,x,\,,\\\,,\\ \. ! COPY x from stdin (DELIMITER ';', NULL ''); 3000;;c;; \. ! COPY x from stdin (DELIMITER ':', NULL E'\\X'); 4000:\X:C:\X:\X 4001:1:empty:: 4002:2:null:\X:\X *** *** 108,120 INSERT INTO no_oids (a, b) VALUES (20, 30); -- should fail ! COPY no_oids FROM stdin WITH OIDS; ! COPY no_oids TO stdout WITH OIDS; -- check copy out COPY x TO stdout; COPY x (c, e) TO stdout; ! COPY x (b, e) TO stdout WITH NULL 'I''m null'; CREATE TEMP TABLE y ( col1 text, --- 108,120 INSERT INTO no_oids (a, b) VALUES (20, 30); -- should fail ! COPY no_oids FROM stdin (OIDS); ! COPY no_oids TO stdout (OIDS); -- check copy out COPY x TO stdout; COPY x (c, e) TO stdout; ! COPY x (b, e) TO stdout (NULL 'I''m null'); CREATE TEMP TABLE y ( col1 text, *** *** 130,140 COPY y TO stdout WITH CSV FORCE QUOTE col2 ESCAPE E'\\'; COPY y TO stdout WITH CSV FORCE QUOTE *; --test that we read consecutive LFs properly CREATE TEMP TABLE testnl (a int, b text, c int); ! COPY testnl FROM stdin CSV; 1,"a field with two LFs inside",2 --- 130,152 COPY y TO stdout WITH CSV FORCE QUOTE col2 ESCAPE E'\\'; COPY y TO stdout WITH CSV FORCE QUOTE *; + -- Test new 8.5 syntax + + COPY y TO stdout (CSV); + COPY y TO stdout (CSV, CSV_QUOTE '''', DELIMITER '|'); + COPY y TO stdout (CSV, CSV_FORCE_QUOTE (col2), CSV_ESCAPE E'\\'); + COPY y TO stdout (CSV, CSV_FORCE_QUOTE *); + + \COPY y TO stdout (CSV) + \COPY y TO stdout (CSV, CSV_QUOTE '''', DELIMITER '|') + \COPY y TO stdout (CSV, CSV_FORCE_QUOTE (col2), CSV_ESCAPE E'\\') + \COPY y TO stdout (CSV, CSV_FORCE_QUOTE *) + --test that we read consecutive LFs properly CREATE TEMP TABLE testnl (a int, b text, c int); ! COPY testnl FROM stdin (CSV); 1,"a field with two LFs inside",2 *** *** 143,156 -- test end of copy marker CREATE TEMP TABLE testeoc (a text); ! COPY testeoc FROM stdin CSV; a\. \.b c\.d "\." \. ! COPY testeoc TO stdout CSV; DROP TABLE x, y; DROP FUNCTION fn_x_before(); --- 155,168 -- test end of copy marker CREATE TEMP TABLE testeoc (a text); ! COPY testeoc FROM stdin (CSV); a\. \.b c\.d "\." \. ! COPY testeoc TO stdout (CSV); DROP TABLE x, y; DROP FUNCTION fn_x_before(); Index: src/test/regress/sql/aggregates.sql === RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/aggregates.sql,v retrieving revision 1.15 diff -c -r1.15 aggregates.sql *** src/test/regress/sql/aggregates.sql 25 Apr 2009 16:44:56 - 1.15 --- src/test/regress/sql/aggregates.sql 17 Sep 2009 20:59:50 - *** *** 104,110 BIT_OR(i4) AS "?" FROM bitwise_test; ! COPY bitwise_test FROM STDIN NULL 'null'; 1 1 1 1 1 B0101 3 3 3 null2 B0100 7 7 7 3 4 B1100 --- 104,110 BIT_OR(i4) AS
Re: [HACKERS] generic copy options
Dan, My bad, I copy/pasted the hard link in output/copy.source instead of @abs_build...@. Here is a complete version of the patch with the fix on output/copy.source. Emmanuel On Thu, Sep 17, 2009 at 11:07:33AM -0400, Emmanuel Cecchet wrote: Tom Lane wrote: I wonder though if we couldn't simplify matters. Offhand it seems to me that psql doesn't need to validate the command's syntax fully. All it really needs to do is find the target filename and replace it with STDIN/STDOUT. Could we have it just treat the remainder of the line literally, and not worry about the details of what the options might be? Let the backend worry about throwing an error if they're bad. New version with the simplified psql. Still supports the 7.3 syntax. I am going to look into porting the other COPY enhancements (error logging and autopartitioning) on this implementation. We might come up with new ideas for the documentation side of things with more options. manu -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com Hi, I've been working on a review of this patch and currently its failing regression tests. Here's the regression.diff. I'm going to spend some time today trying to figure out if the tests need to change of there is an actual issue in the patch. Just wanted to give a heads up. -- --Dan *** /home/dc0lish/workspace/postgresql/src/test/regress/expected/copy.out 2009-09-17 11:45:04.041818319 -0700 --- /home/dc0lish/workspace/postgresql/src/test/regress/results/copy.out 2009-09-17 11:45:14.215152558 -0700 *** *** 72,88 1,a,1 2,b,2 -- Repeat the above tests with the new 8.5 option syntax ! copy copytest to '/home/manu/workspace/Postgres8.5-COPY/src/test/regress/results/copytest.csv' (csv); truncate copytest2; ! copy copytest2 from '/home/manu/workspace/Postgres8.5-COPY/src/test/regress/results/copytest.csv' (csv); select * from copytest except select * from copytest2; style | test | filler ---+--+ (0 rows) truncate copytest2; ! copy copytest to '/home/manu/workspace/Postgres8.5-COPY/src/test/regress/results/copytest.csv' (csv, csv_quote '''', csv_escape E'\\'); ! copy copytest2 from '/home/manu/workspace/Postgres8.5-COPY/src/test/regress/results/copytest.csv' (csv, csv_quote '''', csv_escape E'\\'); select * from copytest except select * from copytest2; style | test | filler ---+--+ --- 72,88 1,a,1 2,b,2 -- Repeat the above tests with the new 8.5 option syntax ! copy copytest to '/home/dc0lish/workspace/postgresql/src/test/regress/results/copytest.csv' (csv); truncate copytest2; ! copy copytest2 from '/home/dc0lish/workspace/postgresql/src/test/regress/results/copytest.csv' (csv); select * from copytest except select * from copytest2; style | test | filler ---+--+ (0 rows) truncate copytest2; ! copy copytest to '/home/dc0lish/workspace/postgresql/src/test/regress/results/copytest.csv' (csv, csv_quote '''', csv_escape E'\\'); ! copy copytest2 from '/home/dc0lish/workspace/postgresql/src/test/regress/results/copytest.csv' (csv, csv_quote '''', csv_escape E'\\'); select * from copytest except select * from copytest2; style | test | filler ---+--+ *** *** 95,111 1,a,1 2,b,2 -- Repeat the above tests with the new 8.5 option syntax from psql ! \copy copytest to '/home/manu/workspace/Postgres8.5-COPY/src/test/regress/results/copytest.csv' (csv) truncate copytest2; ! \copy copytest2 from '/home/manu/workspace/Postgres8.5-COPY/src/test/regress/results/copytest.csv' (csv) select * from copytest except select * from copytest2; style | test | filler ---+--+ (0 rows) truncate copytest2; ! \copy copytest to '/home/manu/workspace/Postgres8.5-COPY/src/test/regress/results/copytest.csv' (csv, csv_quote '''', csv_escape E'\\') ! \copy copytest2 from '/home/manu/workspace/Postgres8.5-COPY/src/test/regress/results/copytest.csv' (csv, csv_quote '''', csv_escape E'\\') select * from copytest except select * from copytest2; style | test | filler ---+--+ --- 95,111 1,a,1 2,b,2 -- Repeat the above tests with the new 8.5 option syntax from psql ! \copy copytest to '/home/dc0lish/workspace/postgresql/src/test/regress/results/copytest.csv' (csv) truncate copytest2; ! \copy copytest2 from '/home/dc0lish/workspace/postgresql/src/test/regress/results/copytest.csv' (csv) select * from copytest except select * fro
Re: [HACKERS] generic copy options
Tom Lane wrote: I wonder though if we couldn't simplify matters. Offhand it seems to me that psql doesn't need to validate the command's syntax fully. All it really needs to do is find the target filename and replace it with STDIN/STDOUT. Could we have it just treat the remainder of the line literally, and not worry about the details of what the options might be? Let the backend worry about throwing an error if they're bad. New version with the simplified psql. Still supports the 7.3 syntax. I am going to look into porting the other COPY enhancements (error logging and autopartitioning) on this implementation. We might come up with new ideas for the documentation side of things with more options. manu -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com ### Eclipse Workspace Patch 1.0 #P Postgres8.5-COPY Index: src/test/regress/sql/copy2.sql === RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/copy2.sql,v retrieving revision 1.18 diff -u -r1.18 copy2.sql --- src/test/regress/sql/copy2.sql 25 Jul 2009 00:07:14 - 1.18 +++ src/test/regress/sql/copy2.sql 17 Sep 2009 15:04:06 - @@ -73,17 +73,17 @@ \. -- various COPY options: delimiters, oids, NULL string -COPY x (b, c, d, e) from stdin with oids delimiter ',' null 'x'; +COPY x (b, c, d, e) from stdin (oids, delimiter ',', null 'x'); 50,x,45,80,90 51,x,\x,\\x,\\\x 52,x,\,,\\\,,\\ \. -COPY x from stdin WITH DELIMITER AS ';' NULL AS ''; +COPY x from stdin (DELIMITER ';', NULL ''); 3000;;c;; \. -COPY x from stdin WITH DELIMITER AS ':' NULL AS E'\\X'; +COPY x from stdin (DELIMITER ':', NULL E'\\X'); 4000:\X:C:\X:\X 4001:1:empty:: 4002:2:null:\X:\X @@ -108,13 +108,13 @@ INSERT INTO no_oids (a, b) VALUES (20, 30); -- should fail -COPY no_oids FROM stdin WITH OIDS; -COPY no_oids TO stdout WITH OIDS; +COPY no_oids FROM stdin (OIDS); +COPY no_oids TO stdout (OIDS); -- check copy out COPY x TO stdout; COPY x (c, e) TO stdout; -COPY x (b, e) TO stdout WITH NULL 'I''m null'; +COPY x (b, e) TO stdout (NULL 'I''m null'); CREATE TEMP TABLE y ( col1 text, @@ -130,11 +130,23 @@ COPY y TO stdout WITH CSV FORCE QUOTE col2 ESCAPE E'\\'; COPY y TO stdout WITH CSV FORCE QUOTE *; +-- Test new 8.5 syntax + +COPY y TO stdout (CSV); +COPY y TO stdout (CSV, CSV_QUOTE '''', DELIMITER '|'); +COPY y TO stdout (CSV, CSV_FORCE_QUOTE (col2), CSV_ESCAPE E'\\'); +COPY y TO stdout (CSV, CSV_FORCE_QUOTE *); + +\COPY y TO stdout (CSV) +\COPY y TO stdout (CSV, CSV_QUOTE '''', DELIMITER '|') +\COPY y TO stdout (CSV, CSV_FORCE_QUOTE (col2), CSV_ESCAPE E'\\') +\COPY y TO stdout (CSV, CSV_FORCE_QUOTE *) + --test that we read consecutive LFs properly CREATE TEMP TABLE testnl (a int, b text, c int); -COPY testnl FROM stdin CSV; +COPY testnl FROM stdin (CSV); 1,"a field with two LFs inside",2 @@ -143,14 +155,14 @@ -- test end of copy marker CREATE TEMP TABLE testeoc (a text); -COPY testeoc FROM stdin CSV; +COPY testeoc FROM stdin (CSV); a\. \.b c\.d "\." \. -COPY testeoc TO stdout CSV; +COPY testeoc TO stdout (CSV); DROP TABLE x, y; DROP FUNCTION fn_x_before(); Index: src/test/regress/sql/aggregates.sql === RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/aggregates.sql,v retrieving revision 1.15 diff -u -r1.15 aggregates.sql --- src/test/regress/sql/aggregates.sql 25 Apr 2009 16:44:56 - 1.15 +++ src/test/regress/sql/aggregates.sql 17 Sep 2009 15:04:06 - @@ -104,7 +104,7 @@ BIT_OR(i4) AS "?" FROM bitwise_test; -COPY bitwise_test FROM STDIN NULL 'null'; +COPY bitwise_test FROM STDIN (NULL 'null'); 1 1 1 1 1 B0101 3 3 3 null2 B0100 7 7 7 3 4 B1100 @@ -171,7 +171,7 @@ BOOL_OR(b3)AS "n" FROM bool_test; -COPY bool_test FROM STDIN NULL 'null'; +COPY bool_test FROM STDIN (NULL 'null'); TRUE nullFALSE null FALSE TRUEnullnull null TRUEFALSE null Index: src/test/regress/sql/copyselect.sql === RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/copyselect.sql,v retrieving revision 1.2 diff -u -r1.2 copyselect.sql --- src/test/regress/sql/copyselect.sql 7 Aug 2008 01:11:52 - 1.2 +++ src/test/regress/sql/copyselect.sql 17 Sep 2009 15:04:06 - @@ -61,7 +61,7 @@ -- -- Test headers, CSV and quotes -- -copy (select t from test1 where id = 1) to stdout csv header force quote t; +copy (se
Re: [HACKERS] generic copy options
Tom Lane wrote: Emmanuel Cecchet writes: Tom Lane wrote: psql has MORE need to support old syntax than the backend does, because it's supposed to work against old servers. Well, I wonder how many users just upgrade psql vs upgrade the server. We have established a project policy that psql backslash commands will support servers at least back to 7.4, and a great deal of work has already been expended in support of that goal. It is not within the charter of this patch to ignore or redefine that policy. Does that mean that we can drop the 7.3 syntax or should we just keep it? For future references, where can I find the various project policies? Thanks Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic copy options
Pavel Stehule wrote: Well, I wonder how many users just upgrade psql vs upgrade the server. I was thinking that when users perform a database upgrade their application often remain the same and therefore the server needs to support the old syntax. Unless you are upgrading a machine where a bunch of psql-based scripts are running to update various remote Postgres instances with older versions, I would guess that it is unlikely that someone is going to upgrade psql and keep the old instance of the server on the same machine. I just wonder how many users are using a single psql to manage multiple server instances of different older versions. What application, that use current copy format for fast data import? I thing, so doing incompatible changes of copy statement syntax is very bad idea. The old syntax is still supported in both psql and the server but I am not sure how many applications are relying on psql to perform a copy operation (actually a \copy). manu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic copy options
Tom Lane wrote: While I understand the need for the server to still support the syntax, is it necessary for newer version of psql to support the old syntax? psql has MORE need to support old syntax than the backend does, because it's supposed to work against old servers. Well, I wonder how many users just upgrade psql vs upgrade the server. I was thinking that when users perform a database upgrade their application often remain the same and therefore the server needs to support the old syntax. Unless you are upgrading a machine where a bunch of psql-based scripts are running to update various remote Postgres instances with older versions, I would guess that it is unlikely that someone is going to upgrade psql and keep the old instance of the server on the same machine. I just wonder how many users are using a single psql to manage multiple server instances of different older versions. I wonder though if we couldn't simplify matters. Offhand it seems to me that psql doesn't need to validate the command's syntax fully. All it really needs to do is find the target filename and replace it with STDIN/STDOUT. Could we have it just treat the remainder of the line literally, and not worry about the details of what the options might be? Let the backend worry about throwing an error if they're bad. As the only difference between \copy and copy seems to be the ability to stream the file from the client, I guess that everything else should be sent as is to the server as you suggest. I'll come with a patch for that today. Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic copy options
Pavel, I am not sure about syntax change. Isn't better solve this problem well. This is too simple solution. I thinking, so we able to add new parser for COPY statement and share this paraser between SQL and psql. Refactoring COPY to put new parsers seems to be another project. The idea here is that the parser does not have to be changed again if we add new options, we just have to handle them in the COPY code (which will probably have to be refactored at some point as it was discussed in a previous thread). manu 2009/9/17 Emmanuel Cecchet : Robert Haas wrote: I don't think the way the doc changes are formatted is consistent with what we've done elsewhere. I think that breaking the options out as a separate block could be OK (because otherwise they have to be duplicated between COPY TO and COPY FROM) but it should be done more like the way that the SELECT page is done. I looked at the way it is done in SELECT and there is a section per clause (from clause, where clause, ...). So I am not sure how you want to apply that here besides the copy parameters and the option clause. Also, you haven't documented the syntax 100% correctly: the boolean options work just like the boolean explain options - they take an optional argument which if omitted defaults to true, but you can also specify 0, 1, true, false, on, off. See defGetBoolean. So those should be specified as: BINARY [boolean] OIDS [boolean] CSV [boolean] CSV_HEADER [boolean] See how we did it in sql-explain.html. Ok, fixed. I changed the name of the CSV options to prefix them with csv_ to avoid confusion with any future options. I also had to change the grammar to allow '*' as a parameter (needed for cvs_force_quote). You seem to have introduced a LARGE number of unnecessary whitespace changes here which are not going to fly. You need to go through and revert all of those. It's hard to tell what you've really changed here, but also every whitespace change that gets committed is a potential merge conflict for someone else; plus pgindent will eventually change it back, thus creating another potential merge conflict for someone else. Sorry, I overlooked a format in Eclipse that formatted the whole file instead of the block I was working on. This should be fixed now. I am not 100% sold on renaming all of the CSV-specific options to add "csv_". I would like to get an opinion from someone else on whether that is a good idea or not. I am fairly certain it is NOT a good idea to support BOTH the old and new option names, as you've done here. If you're going to rename them, you should update gram.y and change the makeDefElem() calls within the copy_opt_list productions to emit the new names. Agreed for the makeDefElem(). For changing the names, I think that names like 'header', 'escape' and 'quote' are too generic to not conflict with something that is not csv. If you think of another format that could be added to copy, it is likely to re-use the same variable names. The only thing that seems odd is that if you use a CSV_* option, you still have to add CSV [on] to the option list which seems kind of redundant. When we decide to drop the old syntax (in 8.6?), we will be able to clean a lot especially in psql. Considering that we are still carrying syntax that was deprecated in 7.3, I don't think it's likely that we'll phase out the present syntax anywhere nearly that quickly. But it's reasonable to ask whether we should think about removing support for the pre-7.3 syntax altogether for 8.5. It doesn't seem to cost us much to keep that support around, but then again it's been deprecated for seven major releases, so it might be about time. While I understand the need for the server to still support the syntax, is it necessary for newer version of psql to support the old syntax? I am attaching the new version of the patch with the current modifications addressing your comments. Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com ### Eclipse Workspace Patch 1.0 #P Postgres8.5-COPY Index: src/test/regress/sql/copy2.sql === RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/copy2.sql,v retrieving revision 1.18 diff -u -r1.18 copy2.sql --- src/test/regress/sql/copy2.sql 25 Jul 2009 00:07:14 - 1.18 +++ src/test/regress/sql/copy2.sql 17 Sep 2009 03:14:48 - @@ -73,17 +73,17 @@ \. -- various COPY options: delimiters, oids, NULL string -COPY x (b, c, d, e) from stdin with oids delimiter ',' null 'x'; +COPY x (b, c, d, e) from stdin (oids, delimiter ',', null 'x'); 50,x,45,80,90 51,x,\x,\\x,\\\x 52,x,\,,\\\,,\\ \. -COPY x from stdin WITH DELIMITER AS
Re: [HACKERS] generic copy options
Robert Haas wrote: I don't think the way the doc changes are formatted is consistent with what we've done elsewhere. I think that breaking the options out as a separate block could be OK (because otherwise they have to be duplicated between COPY TO and COPY FROM) but it should be done more like the way that the SELECT page is done. I looked at the way it is done in SELECT and there is a section per clause (from clause, where clause, ...). So I am not sure how you want to apply that here besides the copy parameters and the option clause. Also, you haven't documented the syntax 100% correctly: the boolean options work just like the boolean explain options - they take an optional argument which if omitted defaults to true, but you can also specify 0, 1, true, false, on, off. See defGetBoolean. So those should be specified as: BINARY [boolean] OIDS [boolean] CSV [boolean] CSV_HEADER [boolean] See how we did it in sql-explain.html. Ok, fixed. I changed the name of the CSV options to prefix them with csv_ to avoid confusion with any future options. I also had to change the grammar to allow '*' as a parameter (needed for cvs_force_quote). You seem to have introduced a LARGE number of unnecessary whitespace changes here which are not going to fly. You need to go through and revert all of those. It's hard to tell what you've really changed here, but also every whitespace change that gets committed is a potential merge conflict for someone else; plus pgindent will eventually change it back, thus creating another potential merge conflict for someone else. Sorry, I overlooked a format in Eclipse that formatted the whole file instead of the block I was working on. This should be fixed now. I am not 100% sold on renaming all of the CSV-specific options to add "csv_". I would like to get an opinion from someone else on whether that is a good idea or not. I am fairly certain it is NOT a good idea to support BOTH the old and new option names, as you've done here. If you're going to rename them, you should update gram.y and change the makeDefElem() calls within the copy_opt_list productions to emit the new names. Agreed for the makeDefElem(). For changing the names, I think that names like 'header', 'escape' and 'quote' are too generic to not conflict with something that is not csv. If you think of another format that could be added to copy, it is likely to re-use the same variable names. The only thing that seems odd is that if you use a CSV_* option, you still have to add CSV [on] to the option list which seems kind of redundant. When we decide to drop the old syntax (in 8.6?), we will be able to clean a lot especially in psql. Considering that we are still carrying syntax that was deprecated in 7.3, I don't think it's likely that we'll phase out the present syntax anywhere nearly that quickly. But it's reasonable to ask whether we should think about removing support for the pre-7.3 syntax altogether for 8.5. It doesn't seem to cost us much to keep that support around, but then again it's been deprecated for seven major releases, so it might be about time. While I understand the need for the server to still support the syntax, is it necessary for newer version of psql to support the old syntax? I am attaching the new version of the patch with the current modifications addressing your comments. Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com ### Eclipse Workspace Patch 1.0 #P Postgres8.5-COPY Index: src/test/regress/sql/copy2.sql === RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/copy2.sql,v retrieving revision 1.18 diff -u -r1.18 copy2.sql --- src/test/regress/sql/copy2.sql 25 Jul 2009 00:07:14 - 1.18 +++ src/test/regress/sql/copy2.sql 17 Sep 2009 03:14:48 - @@ -73,17 +73,17 @@ \. -- various COPY options: delimiters, oids, NULL string -COPY x (b, c, d, e) from stdin with oids delimiter ',' null 'x'; +COPY x (b, c, d, e) from stdin (oids, delimiter ',', null 'x'); 50,x,45,80,90 51,x,\x,\\x,\\\x 52,x,\,,\\\,,\\ \. -COPY x from stdin WITH DELIMITER AS ';' NULL AS ''; +COPY x from stdin (DELIMITER ';', NULL ''); 3000;;c;; \. -COPY x from stdin WITH DELIMITER AS ':' NULL AS E'\\X'; +COPY x from stdin (DELIMITER ':', NULL E'\\X'); 4000:\X:C:\X:\X 4001:1:empty:: 4002:2:null:\X:\X @@ -108,13 +108,13 @@ INSERT INTO no_oids (a, b) VALUES (20, 30); -- should fail -COPY no_oids FROM stdin WITH OIDS; -COPY no_oids TO stdout WITH OIDS; +COPY no_oids FROM stdin (OIDS); +COPY no_oids TO stdout (OIDS); -- check copy out COPY x TO stdout; COPY x (c, e) TO stdout; -COPY x (b, e) TO st
Re: [HACKERS] generic copy options
Robert, Here is a new version of the patch with an updated doc and psql. I changed the name of the CSV options to prefix them with csv_ to avoid confusion with any future options. I also had to change the grammar to allow '*' as a parameter (needed for cvs_force_quote). When we decide to drop the old syntax (in 8.6?), we will be able to clean a lot especially in psql. Emmanuel On Mon, Sep 14, 2009 at 2:51 PM, Emmanuel Cecchet wrote: This looks good. Shoud I try to elaborate on that for the patch with error logging and autopartitioning in COPY? That make sense to me. You shouldn't need to do anything else in gram.y; whatever you want to add should just involve changing copy.c. If not, please post the details. We also need to fix the psql end of this, and the docs... any interest in taking a crack at either of those? ...Robert -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com ### Eclipse Workspace Patch 1.0 #P Postgres8.5-COPY Index: src/test/regress/sql/copy2.sql === RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/copy2.sql,v retrieving revision 1.18 diff -u -r1.18 copy2.sql --- src/test/regress/sql/copy2.sql 25 Jul 2009 00:07:14 - 1.18 +++ src/test/regress/sql/copy2.sql 16 Sep 2009 22:37:31 - @@ -73,17 +73,17 @@ \. -- various COPY options: delimiters, oids, NULL string -COPY x (b, c, d, e) from stdin with oids delimiter ',' null 'x'; +COPY x (b, c, d, e) from stdin (oids, delimiter ',', null 'x'); 50,x,45,80,90 51,x,\x,\\x,\\\x 52,x,\,,\\\,,\\ \. -COPY x from stdin WITH DELIMITER AS ';' NULL AS ''; +COPY x from stdin (DELIMITER ';', NULL ''); 3000;;c;; \. -COPY x from stdin WITH DELIMITER AS ':' NULL AS E'\\X'; +COPY x from stdin (DELIMITER ':', NULL E'\\X'); 4000:\X:C:\X:\X 4001:1:empty:: 4002:2:null:\X:\X @@ -108,13 +108,13 @@ INSERT INTO no_oids (a, b) VALUES (20, 30); -- should fail -COPY no_oids FROM stdin WITH OIDS; -COPY no_oids TO stdout WITH OIDS; +COPY no_oids FROM stdin (OIDS); +COPY no_oids TO stdout (OIDS); -- check copy out COPY x TO stdout; COPY x (c, e) TO stdout; -COPY x (b, e) TO stdout WITH NULL 'I''m null'; +COPY x (b, e) TO stdout (NULL 'I''m null'); CREATE TEMP TABLE y ( col1 text, @@ -130,11 +130,23 @@ COPY y TO stdout WITH CSV FORCE QUOTE col2 ESCAPE E'\\'; COPY y TO stdout WITH CSV FORCE QUOTE *; +-- Test new 8.5 syntax + +COPY y TO stdout (CSV); +COPY y TO stdout (CSV, CSV_QUOTE '''', DELIMITER '|'); +COPY y TO stdout (CSV, CSV_FORCE_QUOTE (col2), CSV_ESCAPE E'\\'); +COPY y TO stdout (CSV, CSV_FORCE_QUOTE *); + +\COPY y TO stdout (CSV) +\COPY y TO stdout (CSV, CSV_QUOTE '''', DELIMITER '|') +\COPY y TO stdout (CSV, CSV_FORCE_QUOTE (col2), CSV_ESCAPE E'\\') +\COPY y TO stdout (CSV, CSV_FORCE_QUOTE *) + --test that we read consecutive LFs properly CREATE TEMP TABLE testnl (a int, b text, c int); -COPY testnl FROM stdin CSV; +COPY testnl FROM stdin (CSV); 1,"a field with two LFs inside",2 @@ -143,14 +155,14 @@ -- test end of copy marker CREATE TEMP TABLE testeoc (a text); -COPY testeoc FROM stdin CSV; +COPY testeoc FROM stdin (CSV); a\. \.b c\.d "\." \. -COPY testeoc TO stdout CSV; +COPY testeoc TO stdout (CSV); DROP TABLE x, y; DROP FUNCTION fn_x_before(); Index: src/test/regress/sql/aggregates.sql === RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/aggregates.sql,v retrieving revision 1.15 diff -u -r1.15 aggregates.sql --- src/test/regress/sql/aggregates.sql 25 Apr 2009 16:44:56 - 1.15 +++ src/test/regress/sql/aggregates.sql 16 Sep 2009 22:37:31 - @@ -104,7 +104,7 @@ BIT_OR(i4) AS "?" FROM bitwise_test; -COPY bitwise_test FROM STDIN NULL 'null'; +COPY bitwise_test FROM STDIN (NULL 'null'); 1 1 1 1 1 B0101 3 3 3 null2 B0100 7 7 7 3 4 B1100 @@ -171,7 +171,7 @@ BOOL_OR(b3)AS "n" FROM bool_test; -COPY bool_test FROM STDIN NULL 'null'; +COPY bool_test FROM STDIN (NULL 'null'); TRUE nullFALSE null FALSE TRUEnullnull null TRUEFALSE null Index: src/test/regress/sql/copyselect.sql === RCS file: /home/manu/cvsrepo/pgsql/src/test/regress/sql/copyselect.sql,v retrieving revision 1.2 diff -u -r1.2 copyselect.sql --- src/test/regress/sql/copyselect.sql 7 Aug 2008 01:11:52 - 1.2 +++ src/test/regress/sql/copyselect.sql 1
Re: [HACKERS] generic copy options
Robert Haas wrote: On Mon, Sep 14, 2009 at 2:51 PM, Emmanuel Cecchet wrote: This looks good. Shoud I try to elaborate on that for the patch with error logging and autopartitioning in COPY? That make sense to me. You shouldn't need to do anything else in gram.y; whatever you want to add should just involve changing copy.c. If not, please post the details. Ok, I'll keep you posted. We also need to fix the psql end of this, and the docs... any interest in taking a crack at either of those? I can certainly help with the doc. I have never looked at the psql code but that could be a good way to get started on that. If you can point me at where to look at, I'll give it a try. Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic copy options
This looks good. Shoud I try to elaborate on that for the patch with error logging and autopartitioning in COPY? manu Robert Haas wrote: Here's a half-baked proof of concept for the above approach. This probably needs more testing than I've given it, and I haven't attempted to fix the psql parser or update the documentation, but it's at least an outline of a solution. I did patch all the regression tests to use the new syntax, so you can look at that part of the patch to get a flavor for it. If this is broadly acceptable I can attempt to nail down the details, or someone else is welcome to pick it up. It's on my git repo as well, as usual. -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Greg Smith wrote: On Fri, 11 Sep 2009, Emmanuel Cecchet wrote: I guess the problem with extra or missing columns is to make sure that you know exactly which data belongs to which column so that you don't put data in the wrong columns which is likely to happen if this is fully automated. Allowing the extra column case is easy: everwhere in copy.c you find the error message "extra data after last expected column", just ignore the overflow fields rather than rejecting the line just based on that. And the default information I mentioned you might want to substitute for missing columns is already being collected by the code block with the comment "Get default info if needed". If I understand it well, you expect the garbage to be after the last column. But what if the extra or missing column is somewhere upfront or in the middle? Sometimes you might have a type conflict problem that will help you detect the problem, sometimes you will just insert garbage. This might call for another mechanism that would log the lines that are automatically 'adjusted' to be able to rollback any mistake that might happen during this automated process. Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for automating partitions in PostgreSQL 8.4 Beta 2
Grzegorz Jaskiewicz wrote: Anyone knows what's the latest on that patch ? To be honest, this was the thing that I was looking forward most in 8.5 ... (and probably not only me alone). We are also interested in integrating our autopartitioning patch for COPY with that implementation. I can help with the partitioning implementation and/or testing of that feature since this is of interest for Aster too. Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Robert Haas wrote: http://developer.postgresql.org/pgdocs/postgres/sql-explain.html Just out of curiosity, it looks like I could write something like: EXPLAIN (ANALYZE TRUE, COSTS FALSE, VERBOSE TRUE, COSTS TRUE) statement What is the expected behavior if someone puts multiple time the same option with different values. The last value prevails? I know that this example looks stupid but when you have a lot of options it sometimes happen that you put twice an option with different values (that happens with some JDBC driver options...). manu -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Greg Smith wrote: The full set of new behavior here I'd like to see allows adjusting: -Accept or reject rows with extra columns? -Accept or reject rows that are missing columns at the end? --Fill them with the default for the column (if available) or NULL? -Save rejected rows? --To a single system table? --To a user-defined table? --To the database logs? The proposed patch save all rejected rows (with extra or missing columns) to a user-defined table (that can be created automatically). If you want to handle these bad rows on the fly, I guess you could have a trigger on the error table that does the appropriate processing depending on the data you are processing. In that case, having multiple error tables allows you to plug different triggers to handle possible error cases differently. The other option is to process the error table after COPY to handle the bad rows. I guess the problem with extra or missing columns is to make sure that you know exactly which data belongs to which column so that you don't put data in the wrong columns which is likely to happen if this is fully automated. I will try to re-read the thread on your proposal to better understand how you figure out which rows are missing or extra. Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Tom Lane wrote: Robert Haas writes: Or look at your CVS/git checkout. The important point is to look at the grammar, which doesn't have any idea what the specific options are in the list. (Well, okay, it had to have special cases for ANALYZE and VERBOSE because those are reserved words :-(. But future additions will not need to touch the grammar. In the case of COPY that also means not having to touch psql \copy.) I understand the convenience from a developer perspective but I wonder how this improves the user experience. If options are not explicitly part of the grammar: - you cannot do automated testing based on the grammar - it seems that it will be harder to document - it still requires the same amount of work in psql and 3rd party tools to support command-completion and so on? - why only COPY or EXPLAIN would use that syntax? what is the good limit between an option and something that is part of the grammar? It looks like passing the current GUC variables as options to COPY. Isn't there a design problem with the parser if it is so hard to add a new option to a command? In all cases, both the client and the server will have to support the new extension (and it will have to be documented) so it should not make a big difference whether it is explicitly part of the command grammar or a set of generic options. manu -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Tom, I looked at EXPLAIN (http://www.postgresql.org/docs/current/interactive/sql-explain.html) and there is not a single line of what you are talking about. And the current syntax is just EXPLAIN [ ANALYZE ] [ VERBOSE ] /statement / If I try to decrypt what you said, you are looking at something like COPY /tablename/ [ ( /column/ [, ...] ) ] FROM { '/filename/' | STDIN } [option1=value[,...]] That would give something like: COPY foo FROM 'input.txt' binary=on, oids=on, errors=skip, max_errors=10 If this is not what you are thinking, please provide an example. Emmanuel / / Emmanuel Cecchet writes: The new syntax could look like: COPY /tablename/ [ ( /column/ [, ...] ) ] FROM { '/filename/' | STDIN } [ [, BINARY ] [, OIDS ] [, DELIMITER [ AS ] '/delimiter/' ] [, NULL [ AS ] '/null string/' ] [, CSV [ HEADER ] [ QUOTE [ AS ] '/quote/' ] [ ESCAPE [ AS ] '/escape/' ] [ FORCE NOT NULL (/column/ [, ...]) ] [, ERRORS { SKIP | LOG INTO { tablename | 'filename' } [ LABEL label_name ] [ KEY key_name ] [ MAX ERRORS /count/ ] } ] Is this what you had in mind? No. because that doesn't do a darn thing to make the option set less hard-wired into the syntax. I was thinking of a strict keyword/value format with non-wired-in keywords ... and only *one* keyword per value. See EXPLAIN. regards, tom lane -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Hi Robert, I like this idea, perhaps not surprisingly (for those not following at home: that was my patch). Unfortunately, it looks to me like there is no way to do this without overhauling the syntax. If the existing syntax required a comma between options (i.e. "copy blah to stdout binary, oids" rather than "copy to stdout binary oids", this would be pretty straightforward; but it doesn't even allow one). Well some options like CSV FORCE ... take a comma separated list of columns. This would require all options to become reserved keywords or force parenthesis around option parameters. I wonder if we should consider allowing COPY options to be comma-separated beginning with 8.5, and then require it in 8.6. Other options include continuing to support the old syntax for the existing options, but allowing some new syntax as well and requiring its use for all new options (this is what we did with EXPLAIN, but there were only two pre-existing options there), and just changing the syntax incompatibly and telling users to suck it up. But I'm not sure I like either of those choices. We could keep the current syntax for backward compatibility only (can be dropped in a future release) and have a new syntax (starting in 8.5). To avoid confusion between both, we could just replace WITH with something else (or just drop it) to indicate that this is the new syntax. The new syntax could look like: COPY /tablename/ [ ( /column/ [, ...] ) ] FROM { '/filename/' | STDIN } [ [, BINARY ] [, OIDS ] [, DELIMITER [ AS ] '/delimiter/' ] [, NULL [ AS ] '/null string/' ] [, CSV [ HEADER ] [ QUOTE [ AS ] '/quote/' ] [ ESCAPE [ AS ] '/escape/' ] [ FORCE NOT NULL (/column/ [, ...]) ] [, ERRORS { SKIP | LOG INTO { tablename | 'filename' } [ LABEL label_name ] [ KEY key_name ] [ MAX ERRORS /count/ ] } ] Is this what you had in mind? Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com
Re: [HACKERS] COPY enhancements
Tom Lane wrote: It might be time to switch COPY over to a more easily extensible option syntax, such as we just adopted for EXPLAIN. Could you point me to the thread detailing the extensible option syntax for EXPLAIN? Is that new to 8.5? Thanks Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Josh Berkus wrote: I am not really sure why you need a natural key. a) because we shouldn't be building any features which teach people bad db design, and b) because I will presumably want to purge records from this table periodically and doing so without a key is likely to result in purging the wrong records. Agreed, but I am not sure that imposing unilaterally a key is going to suit everyone. By default, the partition_key contains the index of the faulty entry and label the copy command. This could be your key. Well, you still haven't explained the partition_key to me, so I'm not quite clear on that. Help? Please re-read my previous message for the default behavior. If you look at the example at http://wiki.postgresql.org/wiki/Error_logging_in_COPY, input_file.txt has 5 rows out of which only 1 and 5 are correct (2, 3 and 4 are bad). The partition key indicates the row that caused the problem (2 for row 2, 3 for row 3, ...) and label contains the full COPY statement. If you want to know how it is used in the Aster product, we will have to ask Alex who did implement the feature in the product. The reason why I'd like to have a session_id or pid or similar is so that I can link the copy errors to which backend is erroring in the other system views or in the pg_log. Imagine a system where you have multiple network clients doing COPYs; if one of them starts bugging out and all I have is a tablename, filename and time, I'm not going to be able to figure out which client is causing the problems. The reason I mention this case is that I have a client who has a production application like this right now All the clients are copying the same file to the same table? I would imagine that every client processes different files and from the file names it would be easy to identify the faulty client. I am not sure how you would use the pid to identify the faulty client more easily? Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
n to COPY. If the COPY command fails, everything gets rolled back (data in the destination table and error table). That would be harder to implement with a file (the rollback part). With a logging file, you wouldn't worry about rollback. You'd just log a statement to the file that it was rolled back. Ok. I) Aster's current implementation has the advantage of being able to log to any user-defined table, giving users the flexibility to log different COPYs to different tables, or even put them on various tablespaces. Is that what we want, though? Clearly it would make things simpler for most (but not all) users to have just a canonical pg_copy_errors table which was in pg_catalog. It would also presumably remove some code from the patch (usually a good thing) What do people think? The code that would be removed would just be the table creation and the GUC variable for the table name. That would not remove much code. J) One option I think we need if we're going to support logging to "any defined logging table" is the option to make that table a temporary table. Actually you can create the table beforehand (check unit tests for an example) and so it is possible to make it a temporary table if you want (I would just have to test it, I did not try it yet). O) Is this capable of dealing with partitioning by more than one column, or by an expression? Yes, we just use a brute force technique where we try all child tables 1-by-1 and rely on the existing Postgres constraint checking mechanism (no new or duplicated code there). Sounds disastrous performance-wise. There's no easy way to test the expression without attempting an actual insert? An option that was proposed at PGCon by someone in the audience (sorry I don't remember who), was to build a query plan and find out from there which child table should get the tuple. It will be disastrous performance-wise if you always have misses and need to check a lot of constraints (we can always build a worst case scenario test). However, in my tests, even with completely random data, routing is faster than a script doing sorting in multiple files + manual inserts in each child table. In practice, even a small cache size usually works very well as soon as you have enough locality in your source data. The problem is that faulty tuples will have to check all constraints to figure out that there is no child table for them. Note that if you have a deep tree hierarchy, you don't necessarily have to insert at the top of the tree, you can start anywhere down the tree. Once we have the full partitioning support (in 8.5?), we can probably re-use some of their mechanism to route directly the tuple to the right child. Thanks for your great feedback, Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Josh, See the answers inlined. Thank you for tackling this very long-time TODO. Error logging is described here: http://wiki.postgresql.org/wiki/Error_logging_in_COPY Questions & Comments: A) Why would someone want to turn error_logging on, but leave error_logging_skip_tuples off? The pg_log already logs errors which copy throws by default. When error_logging is on and skip_tuples is off, errors are logged in the error table. If skip_tuples is on, tuples are not logged in the error table. B) As I mentioned earlier, we'll want to provide the option of logging to a file instead of to a table. That's not a reason to reject this patch, but probably a TODO for 8.5. Ok but what should be the format of that file? C) Are we sure we want to handle this via GUCs rather than extensions to COPY syntax? It seems like fairly often users would want to log different COPY sources to different tables/files. I agree that new COPY options could be easier to use, the implementation is just more complex. However, the labels allows you to select the tuples related to specific COPY commands. D) These GUCs are userset, I hope? (haven't dug into the code far enough to tell yet). Yes. E) What is error_logging_tuple_label for? You don't explain/give examples. And how is error_logging_tuple_partition_key used? We use the label and partition key in Aster products to easily retrieve which COPY command on which partition did generate the bad tuples. By default, the tuple_label contains the COPY command that was executed (see example on Wiki) and the key contains the index of the tuple in the source file (see example on Wiki). F) Rawdata for rejected tuples is presumably BYTEA? Yes. I forgot to put back the table description that can be seen in the unit tests. I have updated the Wiki with the table definition. G) We should probably have a default for error_logging_table_name, such as pg_copy_errors. Does that table get automatically created if it doesn't exist? Yes, as indicated on the wiki the table is created automatically (see config variable section). H) Finally, one request of the TODO is some way to halt import after a specified number of bad tuples because it probably means you have the wrong file or wrong table. Do we still want that? We can still do that. It can be another GUC variable or an option to COPY. If the COPY command fails, everything gets rolled back (data in the destination table and error table). That would be harder to implement with a file (the rollback part). Autopartitioning is described here: http://wiki.postgresql.org/wiki/Auto-partitioning_in_COPY M) tuple_routing_in_copy should take "on" or "off", not 0 or 1. Ok. N) Have you measured the overhead & speed of this kind of COPY as opposed to COPY into a single table? Have you checked the overhead if tuple_routing_in_copy is on, but you are not loading into a partitioned table? Yes. There is no noticeable overhead if there is no routing to do (but routing is on). If routing is involved, the overhead depends on how sorted your input data is. If it all goes to the same partition, the caching effect works well and there is no noticeable overhead. The cost is in the constraint check and it depends on the complexity of the constraint. The more constraints you have to check and the more complex they are, the more overhead on each tuple routing. O) Is this capable of dealing with partitioning by more than one column, or by an expression? Yes, we just use a brute force technique where we try all child tables 1-by-1 and rely on the existing Postgres constraint checking mechanism (no new or duplicated code there). Finally, I'm going to suggest different names for the GUCs, as the names you've chosen don't group well and would likely cause confusion. Here are my suggestions, which all begin with "copy_" for prefix matching: error_logging --> probaby not needed, see able error_logging_skip_tuples --> copy_skip_bad_rows error_logging_schema_name --> copy_logging_schema_name error_logging_relation_name --> copy_logging_table_name error_logging_tuple_label --> don't know what this is for, see above error_logging_tuple_partition_key --> don't know what this is for, see above tuple_routing_in_copy --> copy_partitioning tuple_routing_cache_size --> copy_partitioning_cache_size This makes sense. I'll add that on my todo list. Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY enhancements
Hi David, My C isn't all that great, to put it mildly, but I noticed C++-style //comments like this, which we don't use. Ok, I will fix that. Are these dependent on each other, or could they be reviewed, etc. separately? The code implementing the functionality are independent (separate methods or files) but error logging also catches potential routing problems and log faulty tuples (the call to routing is just basically in the try/catch of error logging). So, yes they can be reviewed separately, but if both are integrated it makes sense to review them together. Cheers Emmanuel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] COPY enhancements
Hi all, Finally the error logging and autopartitioning patches for COPY that I presented at PGCon are here! Error logging is described here: http://wiki.postgresql.org/wiki/Error_logging_in_COPY Autopartitioning is described here: http://wiki.postgresql.org/wiki/Auto-partitioning_in_COPY The attached patch contains both contribs as well as unit tests. I will submit shortly the new patch at commitfest.postgresql.org. Thanks in advance for your feedback, Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com aster-copy-improvements-patch-8.5v3.txt.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Durability?
Tom Lane wrote: Emmanuel Cecchet writes: I got an error like this: ERROR: xlog flush request 1/C121E998 is not satisfied --- flushed only to 1/BCBCB440 CONTEXT: writing block 529 of relation 1663/233690/1247 WARNING: could not write block 529 of 1663/233690/1247 DETAIL: Multiple failures --- write error might be permanent. The xrecoff value (logs show 1/xrecoff) advances a few times during the day, but the message keeps appearing. It looks like you've got a corrupted page in shared buffers, and every time the system tries to flush it to disk for a checkpoint, it fails. What I'd try for getting out this is to kill -9 some backend in order to force a database restart. Of course, if you want to investigate what caused it, you should dig around in shared memory first and try to get a copy of that buffer's contents. Will the database be able to restart with a corrupted WAL? If the database restarts, what transactions will be missing: - just the block that couldn't be flushed? - all transactions that were committed after the faulty block? - more? Thanks Emmanuel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Durability?
Hi, I got an error like this: ERROR: xlog flush request 1/C121E998 is not satisfied --- flushed only to 1/BCBCB440 CONTEXT: writing block 529 of relation 1663/233690/1247 WARNING: could not write block 529 of 1663/233690/1247 DETAIL: Multiple failures --- write error might be permanent. The xrecoff value (logs show 1/xrecoff) advances a few times during the day, but the message keeps appearing. I am not sure to understand clearly the consequences of such error since Postgres continues to accept new transactions. If my WAL is corrupted, are my transactions still durable? If this is a violation of durability, is there a way to force Postgres to terminate on such error? Thanks in advance for the clarification. Emmanuel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Unique namespace per session for temp tables?
Hi all, Following the issue with temp tables and 2PC, we thought that each session in Postgres was supposed to generate a unique namespace for its temporary tables. But if we execute the following scenario: * First session*. BEGIN; CREATE TEMP TABLE foo (x int) ON COMMIT DROP; PREPARE TRANSACTION 't1'; *Disconnect*. *Wait* (at least until backend process for first session exits). *Reconnect new session*. CREATE TEMP TABLE foo (x int); <-- blocks until t1 commits The issue we see is that a totally unrelated session (which should have a unique temporary schema), reuses the temporary schema that is locked away in the prepared transaction. Isn't that breaking the assumption that each session should have a unique schema for its temp tables? Thanks in advance for your insights. Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com
Re: [HACKERS] Locks on temp table and PREPARE
Tom Lane wrote: Emmanuel Cecchet writes: But when the transaction prepares, we know that. What would prevent us to do at prepare time the same cleanup that commit does? The entire point of PREPARE is that it's *not* committed yet. Agreed but all objects that were created and dropped in the transaction are going to be cleaned up whether the transaction commits or rollbacks. It seems that releasing these resources at PREPARE time would help for these 'temporary' objects that only have a transaction scope, right? regards, manu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Locks on temp table and PREPARE
Tom Lane wrote: Emmanuel Cecchet writes: Tom Lane wrote: True, but the problem is that the tuple might still be live to (some snapshots in) that transaction, so we can't inject a duplicate tuple without risking confusing it. In this particular case that isn't an issue because the transaction is done executing, but the tqual.c rules don't know that. Please excuse my ignorance. I am not sure to get how the tuple could still be live to some snapshots after the transaction has prepared. Well, it couldn't be because there are no snapshots in that transaction anymore. The problem is that the *other* transaction doesn't have a good way to know that. It just sees an open transaction with conflicting unique-index changes. But when the transaction prepares, we know that. What would prevent us to do at prepare time the same cleanup that commit does? regards, manu (indentation (C) tom lane) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Locks on temp table and PREPARE
Tom Lane wrote: True, but the problem is that the tuple might still be live to (some snapshots in) that transaction, so we can't inject a duplicate tuple without risking confusing it. In this particular case that isn't an issue because the transaction is done executing, but the tqual.c rules don't know that. Please excuse my ignorance. I am not sure to get how the tuple could still be live to some snapshots after the transaction has prepared. What could still happen to objects that were only visible to a transaction after it has prepared? An example would definitely help. Is it possible in Postgres for a transaction to see an object that was created inside another transaction before it commits (assuming at least 'read committed' of course)? Thanks again, Emmanuel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Locks on temp table and PREPARE
Tom Lane wrote: Emmanuel Cecchet writes: Take PG 8.3.0 and try: BEGIN; CREATE TEMP TABLE foo (x int) ON COMMIT DROP; PREPARE TRANSACTION 't1'; [BEGIN;] <-- doesn't really matter if you start a new transaction or not CREATE TEMP TABLE foo (x int); <-- blocks until t1 commits I have been tracking down the problem and it looks like PostPrepare_Locks is holding the locks on 'foo' for some reason I don't really get. AFAIK that doesn't really have anything to do with the temp-ness of the table; it'd be the same with a regular table. The problem is you have an in-doubt tuple in pg_class for pg_temp_NNN.foo, and you are trying to create another one for the same schema/relname, and so the unique index check is blocking to see what happens to the other transaction that's creating/deleting the conflicting tuple. You are right (of course!), I tried: BEGIN; CREATE TABLE foo (x int); DROP TABLE foo; PREPARE TRANSACTION 't1'; [BEGIN;] CREATE TABLE foo (x int); <-- blocks There should not be a doubt about table foo because whether the transaction commits or rollbacks, that table will not exist anymore (we can get rid of it at prepare time actually). I guess Postgres does not handle the special case of tables (temp or not) whose lifespan is limited to the scope of a transaction and therefore cannot optimize that case. Is that correct? Thanks for your help. Emmanuel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Locks on temp table and PREPARE
Hi, As we discussed during PGCon, we are using temp tables in 2PC transactions. The temp tables are dropped before PREPARE (or have an ON COMMIT DROP option) and never cross transaction boundaries. In 8.3.1, a patch was introduced to disallow temp tables in 2PC transactions and we tried to provide a fix for it (see the long thread with Heikki on this list). I am still working on a cleaner patch to allow temp tables to be used in 2PC transactions but I did hit a new problem that I don't know how to solve cleanly. Take PG 8.3.0 and try: BEGIN; CREATE TEMP TABLE foo (x int) ON COMMIT DROP; PREPARE TRANSACTION 't1'; [BEGIN;] <-- doesn't really matter if you start a new transaction or not CREATE TEMP TABLE foo (x int); <-- blocks until t1 commits I have been tracking down the problem and it looks like PostPrepare_Locks is holding the locks on 'foo' for some reason I don't really get. Any suggestion on what should be done differently for temp tables there? Thanks, Emmanuel -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: m...@frogthinker.org Skype: emmanuel_cecchet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wrong stats for empty tables
From: Tom Lane [...@sss.pgh.pa.us] Subject: Re: [HACKERS] Wrong stats for empty tables "Emmanuel Cecchet" writes: > Is this a bug? No, it's intentional. So what is the rationale behind not being able to use indexes and optimizing empty tables as in the following example: manu=# create table father (id int, val int, tex varchar(100), primary key(id)); manu=# create table other (id1 int, id2 int, data varchar(10), primary key(id1,id2)); insert some data manu=# explain select father.*,id2 from father left join other on father.id=other.id1 where id2=2 order by id; QUERY PLAN Sort (cost=37.81..37.82 rows=5 width=230) Sort Key: father.id -> Hash Join (cost=23.44..37.75 rows=5 width=230) Hash Cond: (father.id = other.id1) -> Seq Scan on father (cost=0.00..13.10 rows=310 width=226) -> Hash (cost=23.38..23.38 rows=5 width=8) -> Seq Scan on other (cost=0.00..23.38 rows=5 width=8) Filter: (id2 = 2) (8 rows) manu=# create table child1() inherits(father); manu=# create table child2() inherits(father); manu=# create table child3() inherits(father); manu=# create table child4() inherits(father); manu=# create table child5() inherits(father); manu=# create table child6() inherits(father); manu=# create table child7() inherits(father); manu=# create index i1 on child1(id); manu=# create index i2 on child2(id); manu=# create index i3 on child3(id); manu=# create index i4 on child4(id); manu=# create index i5 on child5(id); manu=# create index i6 on child6(id); manu=# create index i7 on child7(id); manu=# explain select father.*,id2 from father left join other on father.id=other.id1 where id2=2 order by id; QUERY PLAN Sort (cost=140.00..140.16 rows=62 width=230) Sort Key: public.father.id -> Hash Join (cost=23.44..138.16 rows=62 width=230) Hash Cond: (public.father.id = other.id1) -> Append (cost=0.00..104.80 rows=2480 width=226) -> Seq Scan on father (cost=0.00..13.10 rows=310 width=226) -> Seq Scan on child1 father (cost=0.00..13.10 rows=310 width=226) -> Seq Scan on child2 father (cost=0.00..13.10 rows=310 width=226) -> Seq Scan on child3 father (cost=0.00..13.10 rows=310 width=226) -> Seq Scan on child4 father (cost=0.00..13.10 rows=310 width=226) -> Seq Scan on child5 father (cost=0.00..13.10 rows=310 width=226) -> Seq Scan on child6 father (cost=0.00..13.10 rows=310 width=226) -> Seq Scan on child7 father (cost=0.00..13.10 rows=310 width=226) -> Hash (cost=23.38..23.38 rows=5 width=8) -> Seq Scan on other (cost=0.00..23.38 rows=5 width=8) Filter: (id2 = 2) (16 rows) I must admit that I did not see what the original intention was to get this behavior. Emmanuel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Wrong stats for empty tables
Hi, Here is an example showing the problem: Welcome to psql 8.3.6, the PostgreSQL interactive terminal. manu=# create table foo (x int); CREATE TABLE manu=# explain select * from foo; QUERY PLAN --- Seq Scan on foo (cost=0.00..34.00 rows=2400 width=4) (1 row) manu=# analyze foo; ANALYZE manu=# explain select * from foo; QUERY PLAN --- Seq Scan on foo (cost=0.00..34.00 rows=2400 width=4) (1 row) manu=# insert into foo values (1); INSERT 0 1 manu=# analyze foo; ANALYZE manu=# explain select * from foo; QUERY PLAN --- Seq Scan on foo (cost=0.00..1.01 rows=1 width=4) (1 row) Now a possible cause for this might be the relpages attribute in pg_class (the default value 0 does not seem to be interpreted correctly): manu=# create table bar(x int); CREATE TABLE manu=# explain select * from bar; QUERY PLAN --- Seq Scan on bar (cost=0.00..34.00 rows=2400 width=4) (1 row) manu=# select relpages from pg_class where relname='bar'; relpages -- 0 (1 row) manu=# update pg_class set relpages=1 where relname='bar'; UPDATE 1 manu=# explain select * from bar; QUERY PLAN --- Seq Scan on bar (cost=0.00..0.00 rows=1 width=4) (1 row) This is a real problem if you have a lot of empty child tables. Postgres will not optimize correctly queries in the presence of empty child tables. Is this a bug? Thanks for your help, Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning feature ...
Yes, there is a good reason. As a trigger can update the tuple value, this can change the routing decision. If you have a user trigger that tries to change the key value after the partition choice has been made, this will lead to an integrity constraint violation which is probably not what the user expects. Note that user triggers with partitions will be tricky anyway (regardless of how partitioning is implemented, that is with triggers or not). If 2 partitions have user triggers that update the key value to bounce the tuple to the other partition you may end up with an infinite loop. I am not sure what the semantic of statement triggers (still user triggers) should be on partitioned tables. We will probably have to come up with restrictions on triggers so that they can only be applied to the parent table and not on child tables to prevent nasty issues. Emmanuel Tom Lane wrote: In the case of the FK triggers, it's intentional (and maybe even documented) that users should be able to place their own triggers before or after the FK triggers. Is there a good reason why partitioning triggers should be different? If there is, maybe the feature shouldn't be implemented via triggers in the first place. regards, tom lane -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning feature ...
I agree with Jaime that system triggers should execute independently of user triggers. In the particular case of partitioning, the system trigger should execute after the user triggers. However, as the partitioning trigger is a row level trigger, it is not clear what is going to happen with user triggers that work at the statement level. Emmanuel Jaime Casanova wrote: On Mon, Mar 30, 2009 at 6:51 AM, Kedar Potdar wrote: As triggers are executed in order of their names, we've prefixed the trigger names with "zz". This should work fine as long as no-one uses trigger-name which starts with "zz". this seems a lot fragile... why system generated triggers has to be executed following the same rules (talking about order of execution) as user triggers? can't we simply execute them first or last or maybe be clever and mark one to be executed first and others last? -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: m...@frogthinker.org Skype: emmanuel_cecchet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning feature ...
Hi Kedar, First of all, congratulations for the excellent work. I have some comments and questions. In get_relevent_partition (btw, relevant is spelled with an a) you are maintaining 2 lists. I guess this is only useful for multi-column partitions, right? If you have a single column partition (without subpartitions), I think you could directly return on the first match (without maintaining any list) since you guarantee that there is no overlap between partitions. A simple but effective optimization for inserts consists of caching the last partition used (consecutive inserts often go to the same partition) and try it first before going through the whole loop. The update trigger should first check if the tuple needs to be moved. If the updated tuple still matches the constraints of the partitions it will not have to be moved and will save a lot of overhead. The COPY operation should probably be optimized to use the same code as the one in the insert trigger for partitioned tables. I guess some code could be factorized in COPY to make the inserts more efficient. The current trigger approach should prevent other triggers to be added to the table, or you should make sure that the partition trigger is always the one to execute last. As we don't have automatic partition creation, it would be interesting to have an optional mechanism to deal with tuples that don't match any partition (very useful when you do bulk insert and some new data require a new partition). Having a simple overflow partition or an error logging mechanism would definitely help to identify these tuples and prevent things like large COPY operations to fail. Looking forward to your responses, Emmanuel We are implementing table partitioning feature to support Range and Hash partitions. Please find attached, the WIP patch and test-cases. The syntax used conforms to most of the suggestions mentioned in http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php, barring the following: -- Specification of partition names is optional. System will be able to generate partition names in such cases. -- Sub partitioning We are maintaining a system catalog(pg_partition) for partition meta-data. System will look-up this table to find appropriate partition to operate on. System internally uses low-level 'C' triggers to row-movement. Regards, -- Kedar. -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regclass and quoted table names
marcin mank wrote: Use plain oids or regclass values, not a text column, if you are trying to store table identities. wouldn`t oids change on dump/reload? I don't know. I'd also be interested to know if there is a difference if we use pg_restore with a binary format or sql dump, or if that does not influence at all the way oids are created. manu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regclass and quoted table names
Tom Lane wrote: Emmanuel Cecchet writes: It looks like the behavior of regclass is not consistent when table names are quoted. The name is returned without the quotes if the name is lower case with eventual trailing numbers, otherwise it is returned with quotes. It's intentional that it quotes only when needed. This is problematic in situations where the output of the cast is involved in some later join which returns incorrect results because of the extra double quotes surrounding the table name. Is there a way to override the default behavior to have a consistent quoted or non-quoted result? Thanks, Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Regclass and quoted table names
Hi all, It looks like the behavior of regclass is not consistent when table names are quoted. The name is returned without the quotes if the name is lower case with eventual trailing numbers, otherwise it is returned with quotes. See some examples here: tpch=# CREATE VIEW test AS SELECT * FROM customer; CREATE VIEW tpch=# CREATE VIEW "test1" AS SELECT * FROM customer; CREATE VIEW tpch=# CREATE VIEW "'test2'" AS SELECT * FROM customer; CREATE VIEW tpch=# CREATE VIEW "testcolumnVIEW" AS SELECT * FROM customer; CREATE VIEW tpch=# CREATE VIEW "testcolumnview" AS SELECT * FROM customer; CREATE VIEW tpch=# CREATE VIEW testcolumnVIEW2 AS SELECT * FROM customer; CREATE VIEW tpch=# CREATE VIEW "1test" AS SELECT * FROM customer; CREATE VIEW tpch=# CREATE VIEW "1test2abc" AS SELECT * FROM customer; CREATE VIEW tpch=# CREATE VIEW "1test2" AS SELECT * FROM customer; CREATE VIEW tpch=# select c.oid , c.oid::regclass from pg_class c where c.relname like '%test%'; oid | oid ---+-- 16410 | test 16413 | test1 16416 | "'test2'" 16419 | "testcolumnVIEW" 16422 | testcolumnview 16425 | testcolumnview2 16428 | "1test2abc" 16431 | "1test2" 16434 | "1test" (9 rows) Is this a bug? manu -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table Partitioning Feature
Hi Amit, I will be traveling until next Tuesday and will have no access to email so don't be surprised if I don't follow up this week. The overall approach seems sound. The metadata table should help also for DDL to find out overlapping ranges or duplicate list entries. So far, I have not tried to use the SPI interface from a C trigger so I don't see any disadvantage yet. We would have to assess the performance to make sure it's not going to be a show stopper. I think that the main issue of the trigger approach is that other triggers might interfere. The 'partition trigger' must be the last of the 'before insert' triggers and if the destination (child) table has a trigger, we must ensure that this trigger is not going to require a new routing. Another issue is the result that will be returned by insert/copy commands if all tuples are moved to other tables, the result will be 0. We might want to have stats that would collect where tuples where moved for a particular command (I don't know yet what would be the best place to collect these stats but they could probably be updated by the trigger). Also would the trigger be attached to all tables in the hierarchy or only to the top parent? What kind of query would you use with more than 1 level of inheritance (e.g. parent=year, child=month, grand-child=day)? It looks like we have to parse the leaves of the graph but intermediate nodes would help accelerating the search. An alternative approach (I haven't assessed the feasibility yet) would be to try to call the query planner. If we ask to select the partition value of the tuple, the query planner should return the table it is going to scan (as in EXPLAIN SELECT * FROM t WHERE key=$1). Let me know what you think, Emmanuel We are considering to following approach: 1. metadata table pg_partitions is defined as follows: CATALOG(pg_partitions,2336) BKI_WITHOUT_OIDS { Oid partrelid; // partition table Oid Oid parentrelid; // Parent table Oid int4parttype; // Type of partition, list, hash, range Oidpartkey;// partition key Oid Oidkeytype; /// type of partition key. int4keyorder /// order of the key in multi-key partitions. textmin; textmax; // min and max for range parti text[] list; inthash; // hash value } FormData_pg_partitions; 2. C triggers will fire a query on this table to get the relevant partition of the inserted/updated data using SPI interface. The query will look something like (for range partitioning) select min(partrelid) from pg_partitions where parentrelid = 2934 // we know this value and ( ( $1 between to_int(min ) and to_int(max) and keyorder = 1) OR ($2 between to_date (min) and to_date (max) and keyorder =2 ) ) group by parentrelid having count(*) = $1, $2, ... are the placeholders of the actual partition key values of trigger tuple. Since we know the type of partition keys, and the parentrelid, this kind of query string can be saved in another table say, pg_part_map. And its plan can be parsed once and saved in cache to be reused. Do you see any issue with using SPI interface within triggers? The advantage of this kind of approah is that trigger code can be made genric for any kind of partition table. Thanks, Amit Persistent Systems, www.persistentsys.com On 1/23/09, Emmanuel Cecchet wrote: Amit, You might want to put this on the http://wiki.postgresql.org/wiki/Table_partitioning wiki page. How does your timeline look like for this implementation? I would be happy to contribute C triggers to your implementation. From what I understood in http://archives.postgresql.org/pgsql-hackers/2008-01/msg00269.php, you already have an implementation that parses the grammar and generates rules as if someone had written them. Is this code available? Regarding the use of triggers to push/move data to partitions, what if someone declares triggers on partitions? Especially if you have subpartitions, let's consider the case where there is a trigger on the parent, child and grandchild. If I do an insert in the parent, the user trigger on the parent will be executed, then the partition trigger that decides to move to the grandchild. Are we going to bypass the child trigger? If we also want fast COPY operations on partitioned table, we could have an optimized implementation that could bypass triggers and move the tuple directly to the appropriate child table. Thanks for this big contribution, Emmanuel Hi, We are implementing table partitioning feature to support - the attached commands. The syntax conforms to most of the suggestion mentioned in http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php, barring the following: -- Specification of partition names is optional. System will be able to generate partition names in su
Re: [HACKERS] Table Partitioning Feature
Hi Amit, I overlooked the fact that you dropped composite partitions and subpartitions template from the proposal presented in http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php. Is it because this is too hard to support? or you don't see any immediate need for it? Thanks, Emmanuel Hi Emmanuel, Please find my comments in-lined: On 1/23/09, *Emmanuel Cecchet* <mailto:m...@frogthinker.org>> wrote: Amit, You might want to put this on the http://wiki.postgresql.org/wiki/Table_partitioning wiki page. Sure. How does your timeline look like for this implementation? The implementation is planned as follows: - Partition table commands ++ An intermediate patch in Feb end ++ Final patch in mid March - Global Index: Mid March - Optimizer changes for partitioned table: May I would be happy to contribute C triggers to your implementation. From what I understood in http://archives.postgresql.org/pgsql-hackers/2008-01/msg00269.php, you already have an implementation that parses the grammar and generates rules as if someone had written them. Is this code available? We have just started with the implementation, i will post the grammar rules next week. Regarding the use of triggers to push/move data to partitions, what if someone declares triggers on partitions? Especially if you have subpartitions, let's consider the case where there is a trigger on the parent, child and grandchild. If I do an insert in the parent, the user trigger on the parent will be executed, then the partition trigger that decides to move to the grandchild. Are we going to bypass the child trigger? We are not supporting sub-partitioning - There is just one level of partitioning. If we also want fast COPY operations on partitioned table, we could have an optimized implementation that could bypass triggers and move the tuple directly to the appropriate child table. We will definitely consider to implement fast COPY after we are done with the planned tasks. Thanks, Amit Thanks for this big contribution, Emmanuel Hi, We are implementing table partitioning feature to support - the attached commands. The syntax conforms to most of the suggestion mentioned in http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php, barring the following: -- Specification of partition names is optional. System will be able to generate partition names in such cases. -- sub partitioning We are using pgsql triggers to push/move data to appropriate partitions, but we will definitely consider moving to C language triggers as suggested by manu. - Global non-partitioned indexes (that will extend all the partitions). - Foreign key support for tables referring to partitioned tables. Please feel free to post your comments and suggestions. Thanks, Amit Persistent Systems ---- -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: m...@frogthinker.org <mailto:m...@frogthinker.org> Skype: emmanuel_cecchet -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: m...@frogthinker.org Skype: emmanuel_cecchet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table Partitioning Feature
Amit, You might want to put this on the http://wiki.postgresql.org/wiki/Table_partitioning wiki page. How does your timeline look like for this implementation? I would be happy to contribute C triggers to your implementation. From what I understood in http://archives.postgresql.org/pgsql-hackers/2008-01/msg00269.php, you already have an implementation that parses the grammar and generates rules as if someone had written them. Is this code available? Regarding the use of triggers to push/move data to partitions, what if someone declares triggers on partitions? Especially if you have subpartitions, let's consider the case where there is a trigger on the parent, child and grandchild. If I do an insert in the parent, the user trigger on the parent will be executed, then the partition trigger that decides to move to the grandchild. Are we going to bypass the child trigger? If we also want fast COPY operations on partitioned table, we could have an optimized implementation that could bypass triggers and move the tuple directly to the appropriate child table. Thanks for this big contribution, Emmanuel Hi, We are implementing table partitioning feature to support - the attached commands. The syntax conforms to most of the suggestion mentioned in http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php, barring the following: -- Specification of partition names is optional. System will be able to generate partition names in such cases. -- sub partitioning We are using pgsql triggers to push/move data to appropriate partitions, but we will definitely consider moving to C language triggers as suggested by manu. - Global non-partitioned indexes (that will extend all the partitions). - Foreign key support for tables referring to partitioned tables. Please feel free to post your comments and suggestions. Thanks, Amit Persistent Systems -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: m...@frogthinker.org Skype: emmanuel_cecchet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incoherent view of serializable transactions
Hi Kevin, The link didn't seem to work for me, but I think I found the article you meant: "Serializable Isolation for Snapshot Databases" by Michael J. Cahill, et al An interesting work. If nothing else, it will help flesh out the documentation of anomalies. If the PostgreSQL community ever does want to better approach true serializable behavior, this should provide a good theoretical basis. Sorry for the broken link. Yes this is the paper. Note that the paper was not necessarily enthusiastically received by the community when presented at the conference. While this is an interesting academic paper, it's practicality left a majority of the audience perplex. There was an interesting comment by Oracle folks: Oracle does not provide serializability but only snapshot isolation, and still users prefer to 'downgrade' to read committed for better performance. The Oracle guys experience seemed to indicate that there was no need for serializability (well, that's also less work for them ;-)) in their customer base. Having both a foot in academia and in industry, I understand the intellectual interest for serializability on the academic side, but I still have not seen a single use case in industry where this was a requirement (but my db experience is probably narrow). Have nice serializable holidays ;-) manu -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: m...@frogthinker.org Skype: emmanuel_cecchet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Fwd: Re: [HACKERS] Transactions and temp tables]
Hi Heikki, The point of using temp tables was performance. Using regular tables in our case would hurt performance too much. Well if we cannot get a temporary fix in 8.4, we will maintain a separate patch to get that functionality just for temp tables that are created and dropped in the same transaction. Hopefully we will be able to come up with a working solution in 8.5. Thanks for your help, Emmanuel Emmanuel Cecchet wrote: I just saw that this new patch was not considered because the previous version ended being rejected. Note that this version of the patch aims at supporting ONLY temp tables that are created AND dropped in the same transaction. We need to be able to use temp tables in transactions that are doing 2PC, but the temp table lifespan does not need to cross transaction boundaries. Please let me know if this patch could be integrated in 8.4. IMHO, this is just getting too kludgey. We came up with pretty good ideas on how to handle temp tables properly, by treating the same as non-temp tables. That should eliminate all the problems the latest patch did, and also the issues with sequences, and allow all access to temp tables, not just a limited subset. I don't think it's worthwhile to apply the kludge as a stopgap measure, let's do it properly in 8.5. As a workaround, you can use a regular table instead of a temporary one. If you create and drop the regular table in the same transaction (that's the same limitation that latest patch has), you won't end up with a bogus table in your database if the connection is dropped unexpectedly. If your application uses multiple connections simultaenously, you'll need a little bit of code in the application so that you don't try to create a table with the same name in all backends. You could also create a different schema for each connection, and do "set search_path='semitempschemaX, public'", so that you can use the same table name and still have separate tables for each connections. (sorry for the late reply) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: Second thoughts
Hi Markus, I didn't have much reliability issues with ensemble, appia or spread, so far. Although, I admit I didn't ever run any of these in production. Performance is certainly an issue, yes. I may suggest another reading even though a bit dates, most of the results still apply: http://jmob.objectweb.org/jgroups/JGroups-middleware-2004.pdf The baseline is that if you use UDP multicast, you need a dedicated switch and the tuning is a nightmare. I discussed these issues with the developers of Spread and they have no real magic. TCP seems a more reliable alternative (especially predictable performance) but the TCP timeouts are also tricky to tune depending on the platform. We worked quite a bit with Nuno around Appia in the context of Sequoia and performance can be outstanding when properly tuned or absolutely awful is some default values are wrong. The chaotic behavior of GCS under stress quickly compromises the reliability of the replication system, and admission control on UDP multicast has no good solution so far. It's just a heads up on what is awaiting you in production when the system is stressed. There is no good solution so far besides a good admission control on top of the GCS (in the application). I am now off for the holidays. Cheers, Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incoherent view of serializable transactions
Jeff Davis wrote: On Tue, 2008-12-23 at 00:42 -0500, Emmanuel Cecchet wrote: I still don't get why people would use SERIALIZABLE since there is no efficient implementation of it. If they need true serializable transactions, and don't care about efficiency. Is there such people who really understand the thing and don't complain about the performance? ;-) Happy holidays, Emmanuel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incoherent view of serializable transactions
Kevin Grittner wrote: This isn't some hypothetical "maybe some day some product might implement this, but it'll never catch on" sort of thing -- Microsoft and Sybase SQL Server had this from version 1. I used it from 1990 until the conversion to PostgreSQL over the last couple years. Have you ever used serializable transactions with Sybase? The locking is actually based on memory-pages and you end-up with deadlocks if you don't pad your data structures to prevent false sharing. Oracle also provides SI like Postgres and I don't think they are doing that bad. I'm going on second-hand information here, but I'm told that IBM DB2 has used similar techniques to provide true serializable transactions for even longer. I'm somewhat mystified at the reaction this topic gets here. :- I am somewhat mystified by the interest some people still have in serializable transactions. Why don't users program the application to deal with a lower isolation (actually I think they do)? But I am probably missing the point which was to fix the doc? Emmanuel -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: m...@frogthinker.org Skype: emmanuel_cecchet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: Second thoughts
Hi Markus, I'm not quite sure what you mean by "certification protocol", there's no such thing in Postgres-R (as proposed by Kemme). Although, I remember having heard that term in the context of F. Pedone's work. Can you point me to some paper explaining this certification protocol? What Bettina calls the Lock Phase in http://www.cs.mcgill.ca/~kemme/papers/vldb00.pdf is actually a certification. You can find more references to certification protocols in http://gorda.di.uminho.pt/download/reports/gapi.pdf I would also recommend the work of Sameh on Tashkent and Taskent+ that was based on Postgres: http://labos.epfl.ch/webdav/site/labos/users/157494/public/papers/tashkent.eurosys2006.pdf and http://infoscience.epfl.ch/record/97654/files/tashkentPlus.eurosys2007.final.pdf Certification-based approaches have already multiple reliability issues to improve write performance compared to statement-based replication, but this is very dependent on the capacity of the system to limit the conflicting window for concurrent transactions. What do you mean by "reliability issues"? These approaches usually require an atomic broadcast primitive that is usually fragile (limited scalability, hard to tune failure timeouts, ). Most prototype implementations have the load balancer and/or the certifier as a SPOF (single point of failure). Building reliability for these components will come with a significant performance penalty. The writeset extraction mechanisms have had too many limitations so far to allow the use of certification-based replication in production (AFAIK). What limitations are you speaking of here? Oftentimes DDL support is very limited. Non-transactional objects like sequences are not captured. Session or environment variables are not necessarily propagated. Support of temp tables varies between databases which makes it hard to support them properly in a generic way. Well I guess everyone has a story on some limitations it has found with some database replication technology especially when a user expects a cluster to behave like a single database instance. Happy holidays, Emmanuel -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: m...@frogthinker.org Skype: emmanuel_cecchet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incoherent view of serializable transactions
Kevin, If you want to know how to build SERIALIZABLE with a database that provides SI (Snapshot Isolation), read http://portal.acm.org/citation.cfm?doid=1376616.137669 Note that in practice, READ COMMITTED is the most largely used isolation level and its limitations are relatively well understood by the average programmer that can program his application accordingly. I still don't get why people would use SERIALIZABLE since there is no efficient implementation of it. My 2 cents. Emmanuel Kevin Grittner wrote: As I've understood limitations of the PostgreSQL implementation of SERIALIZABLE transactions, at least the only example given in the documentation, revolve around a rather unlikely situation: Given concurrent transactions T1 and T2 and non-overlapping sets of data A and B, T1 reads data including A and uses the data to modify B while T2 reads data including B and uses that data to modify A, where the modifications performed by either would affect the modifications made by the other, if they were visible. For reasons I'll omit here, that scenario didn't worry me for my current uses of PostgreSQL. I've found another form of deviation from the standard SERIALIZABLE behavior, though, which does worry me. Although the above appears to be the only situation where the end result after everything commits is inconsistent with standard SERIALIZABLE behavior, the PostgreSQL implementation allows transactions to view the data in states which would never be possible during the application of the transactions in series in the order they will appear to have been applied after the commit. Imagine, as an example, a system which involves recording receipts, each of which must go into a daily deposit. There is a control table with one row containing the current deposit date for receipts. Somewhere mid-afternoon that date is updated, all subsequent receipts fall into the new day, and a report is run listing the receipts for the day and giving the deposit total. Under a standard-compliant implementation of SERIALIZABLE, this is straightforward: a transaction which is inserting a receipt selects the deposit date to use in its transaction, and any SELECT of receipts for a date prior to the current deposit date will see the accurate, final data. Under the PostgreSQL implementation, although data eventually gets to a coherent state, there can be a window of time where a SELECT can return an incomplete list of receipts for a date which appears to be closed, even if all transactions for modifying and viewing data are SERIALIZABLE. -- setup create table ctl (k text not null primary key, deposit_date date not null); insert into ctl values ('receipt', date '2008-12-22'); create table receipt (receipt_no int not null primary key, deposit_date date not null, amount numeric(13,2)); insert into receipt values (1, (select deposit_date from ctl where k = 'receipt'), 1.00); insert into receipt values (2, (select deposit_date from ctl where k = 'receipt'), 2.00); -- connection 1 start transaction isolation level serializable ; insert into receipt values (3, (select deposit_date from ctl where k = 'receipt'), 4.00); -- connection 2 start transaction isolation level serializable ; update ctl set deposit_date = date '2008-12-23' where k = 'receipt'; commit transaction; start transaction isolation level serializable ; select * from ctl; -- (deposit_date shows as 2008-12-23) select * from receipt; -- (Only receipts 1 and 2 show for 2008-12-22.) commit; -- connection 1 commit transaction; -- connection 2 start transaction isolation level serializable ; select * from receipt; -- (All receipts for the 2008-12-22 deposit date now show.) commit transaction; At this point, SERIALIZABLE transactions appear to have worked, with receipt 3 happening before the update of deposit_date; however, there was a window of time when the update to deposit_date was visible and receipt 3 was not. This absolutely can't happen in a standard-compliant implementation. At a minimum, this window where visible data lacks coherency should be noted in the documentation. I don't know if there's any way to fix this without killing performance. -Kevin -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: m...@frogthinker.org Skype: emmanuel_cecchet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: Second thoughts
Hi Markus, I am happy to see that Postgres-R is alive again. The paper was written in 07 (and published in 08, the review process is longer than a CommitFest ;-)) and at the time of the writing there was no version of Postgres-R available, hence the 'obsolete' mention referring to past versions. I think that it is legitimate for users to expect more guarantees from a replicated database than from a single database. Not knowing what happen when a failure happens at commit time when some nodes are still active in a cluster is not intuitive for users. I did not look at the source, but if Postgres -R continue to elaborate on Bettina's ideas with writeset extraction and a certification protocol, I think that it will be a bad idea to try to mix it with Sync Rep (mentioned in another thread). If you delay commits, you will increase the window for transactions to conflict and therefore induce a higher abort rate (thus less scalability). Certification-based approaches have already multiple reliability issues to improve write performance compared to statement-based replication, but this is very dependent on the capacity of the system to limit the conflicting window for concurrent transactions. The writeset extraction mechanisms have had too many limitations so far to allow the use of certification-based replication in production (AFAIK). Good luck with Postgres-R. Emmanuel Emmanuel Cecchet wrote: What the application is going to see is a failure when the postmaster it is connected to is going down. If this happen at commit time, I think that there is no guarantee for the application to know what happened: 1. failure occurred before the request reached postmaster: no instance committed 2. failure occurred during commit: might be committed on either nodes 3. failure occurred while sending back ack of commit to client: both instances have committed But for the client, it will all look the same: an error on commit. This is very much the same for a single node database system, so I think current application developers are used to that behavior. A distributed database system just needs to make sure, that committed transactions can and will eventually get committed on all nodes. So in case the client doesn't receive a COMMIT acknowledgment due to atny kind of failure, it can only be sure the distributed database is in a consistent state. The client cannot tell, if it applied the transaction or not. Much like for single node systems. I agree, that a distributed database system could theoretically do better, but that would require changes on the client side connection library as well, letting the client connect to two or even more nodes of the distributed database system. This is just to point out that despite all your efforts, the client might think that some transactions have failed (error on commit) but they are actually committed. As pointed out above, that would currently be an erroneous conclusion. If you don't put some state in the driver that is able to check at failover time if the commit operation succeeded or not, it does not really matter what happens for in-flight transactions (or in-commit transactions) at failure time. Sure it does. The database system still needs to guarantee consistency. Hm.. well, you're right if there's only one single standby left (as is obviously the case for the proposed Sync Rep). Ensuring consistency is pretty simple in such a case. But imagine having two or more standby servers. Those would need to agree on a set of in-flight transactions from the master they both need to apply. Actually, if there was a way to query the database about the status of a particular transaction by providing a cluster-wide unique id, that would help a lot. You're certainly aware, that Postgres-R features such a global transaction id... And I guess Sync Rep could easily add such an identifier as well. I wrote a paper on the issues with database replication at Sigmod earlier this year (http://infoscience.epfl.ch/record/129042). Even though it was targeted at middleware replication, I think that some of it is still relevant for the problem at hand. Interesting read, thanks for the pointer. It's pretty obvious that I don't consider Postgres-R to be obsolete... I've linked your paper from www.postgres-r.org [1]. Regarding the wording, if experts can't agree, you can be sure that users won't either. Most of them don't have a clue about the different flavors of replication. So as long as you state clearly how it behaves and define all the terms you use that should be fine. I mostly agree to that, without repeating my concerns again, here. Regards Markus Wanner [1]: Referenced Papers from Postgres-R website: http://www.postgres-r.org/documentation/references -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consul
Re: [HACKERS] Partitioning wiki page
Hi all, I did not find the option to remove the page so I just left the links section. Feel free to remove the page so that there is no trace left of this in the history. I don't think it is worth keeping anything since no one found value in it anyway. Until I figure out a way to come up with a constructive proposal that does not offend anyone, I think that it is best that I do not contribute anything to the wiki anymore. Best, Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning wiki page
Hi Robert, Thanks all for the time you are spending commenting on this during this busy Commit Fest. I am not a fan of the proposed syntax. It is conceptually similar to what we already do with constraints, but invents a whole new syntax to no obvious benefit that I can see. Actually I did not invent a new syntax but tried to map the Oracle syntax which seems to be a requirement that appeared at multiple occasions in the discussions. The question is more whether Postgres wants to have its own syntax (and there might be good reasons for that) or if we want to use a syntax similar to other databases for easier migration (after all Postgres is very late on that topic compared to other databases). I think we would do well to look at what other systems besides Oracle do, as well as considering any newer ideas Oracle may have introduced. Perhaps this would be a good thing to add to the Wiki page - instead of saying, this is the design, say, here are some different possibilities, what do we like? Maybe I was not clear on the intent of the wiki page. I was not trying to pretend that this was the design but I was just trying to do my homework on what are the issues that we will have to address (that's basically the section headers) and I just filled the sections with what I understood so far from the discussions, the Oracle and MySQL documentations. So yes, there is more work to do on understanding what other systems besides Oracle & MySQL do, and then figuring out what we want to propose in Postgres. Oracle's new interval partitioning sounds great, but it ignores the reality that most data varies considerably over time, either growing or fluctuating. I much prefer the idea of a size-equalized partitioning scheme, as implemented by Coppereye's Greenwich. That method gives equal Sometimes (though certainly not always), the structure of the underling data makes interval partitioning a win, as when for example you are accumulating transactions that are billed at the end of each month. If you do a lot of queries on the open transactions for the current month, you want to make sure that there's a partition break at the start of the month so that you're not unnecessarily scanning some of the previous month's entries. Partitions of similar sized are usually achieved by hash partitioning (in both Oracle and MySQL). There are many use cases where range/interval partitioning makes a lot of sense. One use case that was mentioned to me is multi-level hierarchies like year->month->week->day that would require more than 2 levels of partitioning. I think we can think of a way of having an arbitrary depth of partitions even though partition management might become slightly more complex/ I will try to integrate the comments as I see them on the mailing list but feel free also to update the wiki page with your thoughts and use cases. Thanks again for the feedback, Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers