Re: [HACKERS] GSoC project : K-medoids clustering in Madlib
>> I'm a bit confused as to why this is being proposed as a >> Postgres-related project. I don't even know what MADlib is, but I'm >> pretty darn sure that no part of Postgres uses it. KNNGist certainly >> doesn't. > > It's a reasonably well established extension for Postgres for > statistical and machine learning methods. Rather neat, but as you > indicate, it's not part of Postgres proper. > > http://madlib.net/ > > https://github.com/madlib/madlib/ > It is the extension that is normally referred to when we talk about data analytics in Postgres. As you said, it is not part of postgres proper,but IMO, if we want to extend the data analytics functionalities of postgres, we need to work on MADlib. Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC project : K-medoids clustering in Madlib
On Tue, Mar 26, 2013 at 10:27 PM, Tom Lane wrote: > Atri Sharma writes: >> I suggested a couple of algorithms to be implemented in MADLib(apart >> from K Medoids). You could pick some(or all) of them, which would >> require 3 months to be completed. > >> As for more information on index, you can refer > >> http://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.1 > >> along with the postgres wiki. The wiki is the standard for anything postgres. > >> pg_trgm used KNN, but I believe it uses its own implementation of the >> algorithm. The idea I proposed aims at writing an implementation in >> the MADlib so that any client program can use the algorithm(s) in >> their code directly, using MADlib functions. > > I'm a bit confused as to why this is being proposed as a > Postgres-related project. I don't even know what MADlib is, but I'm > pretty darn sure that no part of Postgres uses it. KNNGist certainly > doesn't. It's a reasonably well established extension for Postgres for statistical and machine learning methods. Rather neat, but as you indicate, it's not part of Postgres proper. http://madlib.net/ https://github.com/madlib/madlib/ -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC project : K-medoids clustering in Madlib
Atri Sharma writes: > I suggested a couple of algorithms to be implemented in MADLib(apart > from K Medoids). You could pick some(or all) of them, which would > require 3 months to be completed. > As for more information on index, you can refer > http://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.1 > along with the postgres wiki. The wiki is the standard for anything postgres. > pg_trgm used KNN, but I believe it uses its own implementation of the > algorithm. The idea I proposed aims at writing an implementation in > the MADlib so that any client program can use the algorithm(s) in > their code directly, using MADlib functions. I'm a bit confused as to why this is being proposed as a Postgres-related project. I don't even know what MADlib is, but I'm pretty darn sure that no part of Postgres uses it. KNNGist certainly doesn't. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC project : K-medoids clustering in Madlib
I suggested a couple of algorithms to be implemented in MADLib(apart from K Medoids). You could pick some(or all) of them, which would require 3 months to be completed. As for more information on index, you can refer http://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.1 along with the postgres wiki. The wiki is the standard for anything postgres. pg_trgm used KNN, but I believe it uses its own implementation of the algorithm. The idea I proposed aims at writing an implementation in the MADlib so that any client program can use the algorithm(s) in their code directly, using MADlib functions. Regards, Atri On 3/26/13, viod wrote: > Hello! > > I'm an IT student, and I would like to apply for the 2013 GSoC. > I've been looking at this mailing list for a while now, and I saw a > suggestion for GSoC that particularly interested me: implementing the > K-medoids clustering in Madlib, as it is supposed to be more efficient than > the K-means algorithm. > > I didn't know about these algorithms before, but I have documented myself, > and it looks quite interesting to me, and even more as I currently have > lessons (but very very simplified unfortunately). > > I've got a few questions: > Won't this be a quite short project? I can't get an idea of how long it > would take me to implement this algorithm in a way that would be usable by > postgresql, but 3 months looks long for this task, doesn't it? > > Someone on the IRC channel (can't remember who, sorry) told me it was used > in the KNN index. I guess this is used by pg_trgm, but are there other > modules using it currently? > And could you please give me some links explaining the internals of this > index? I've been through several articles presenting of it, but none very > satisfying. > > Thanks a lot in advance! > -- Regards, Atri *l'apprenant* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertion failure when promoting node by deleting recovery.conf and restart node
On 25 March 2013 19:14, Heikki Linnakangas wrote: > Simon, can you comment on this? Yes, will do. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regression test failed when enabling checksum
On 26 March 2013 23:23, Jeff Davis wrote: > Patch attached. Only brief testing done, so I might have missed > something. I will look more closely later. Thanks, I'll look at that tomorrow. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regression test failed when enabling checksum
On 25 March 2013 17:50, Fujii Masao wrote: > I found that the regression test failed when I created the database > cluster with the checksum and set wal_level to archive. I think that > there are some bugs around checksum feature. Attached is the regression.diff. Apologies for not responding to your original email, I must have missed that. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regression test failed when enabling checksum
On Tue, 2013-03-26 at 02:50 +0900, Fujii Masao wrote: > Hi, > > I found that the regression test failed when I created the database > cluster with the checksum and set wal_level to archive. I think that > there are some bugs around checksum feature. Attached is the regression.diff. Thank you for the report. This was a significant oversight, but simple to diagnose and fix. There were several places that were doing something like: PageSetChecksumInplace if (use_wal) log_newpage smgrextend Which is obviously wrong, because log_newpage set the LSN of the page, invalidating the checksum. We need to set the checksum after log_newpage. Also, I noticed that copy_relation_data was doing smgrread without validating the checksum (or page header, for that matter), so I also fixed that. Patch attached. Only brief testing done, so I might have missed something. I will look more closely later. Regards, Jeff Davis *** a/src/backend/access/heap/rewriteheap.c --- b/src/backend/access/heap/rewriteheap.c *** *** 273,286 end_heap_rewrite(RewriteState state) /* Write the last page, if any */ if (state->rs_buffer_valid) { - PageSetChecksumInplace(state->rs_buffer, state->rs_blockno); - if (state->rs_use_wal) log_newpage(&state->rs_new_rel->rd_node, MAIN_FORKNUM, state->rs_blockno, state->rs_buffer); RelationOpenSmgr(state->rs_new_rel); smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, state->rs_blockno, (char *) state->rs_buffer, true); } --- 273,287 /* Write the last page, if any */ if (state->rs_buffer_valid) { if (state->rs_use_wal) log_newpage(&state->rs_new_rel->rd_node, MAIN_FORKNUM, state->rs_blockno, state->rs_buffer); RelationOpenSmgr(state->rs_new_rel); + + PageSetChecksumInplace(state->rs_buffer, state->rs_blockno); + smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, state->rs_blockno, (char *) state->rs_buffer, true); } *** *** 616,623 raw_heap_insert(RewriteState state, HeapTuple tup) { /* Doesn't fit, so write out the existing page */ - PageSetChecksumInplace(page, state->rs_blockno); - /* XLOG stuff */ if (state->rs_use_wal) log_newpage(&state->rs_new_rel->rd_node, --- 617,622 *** *** 632,637 raw_heap_insert(RewriteState state, HeapTuple tup) --- 631,639 * end_heap_rewrite. */ RelationOpenSmgr(state->rs_new_rel); + + PageSetChecksumInplace(page, state->rs_blockno); + smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, state->rs_blockno, (char *) page, true); *** a/src/backend/commands/tablecmds.c --- b/src/backend/commands/tablecmds.c *** *** 51,56 --- 51,57 #include "commands/tablespace.h" #include "commands/trigger.h" #include "commands/typecmds.h" + #include "common/relpath.h" #include "executor/executor.h" #include "foreign/foreign.h" #include "miscadmin.h" *** *** 8902,8913 copy_relation_data(SMgrRelation src, SMgrRelation dst, smgrread(src, forkNum, blkno, buf); ! PageSetChecksumInplace(page, blkno); /* XLOG stuff */ if (use_wal) log_newpage(&dst->smgr_rnode.node, forkNum, blkno, page); /* * Now write the page. We say isTemp = true even if it's not a temp * rel, because there's no need for smgr to schedule an fsync for this --- 8903,8923 smgrread(src, forkNum, blkno, buf); ! if (!PageIsVerified(page, blkno)) ! ereport(ERROR, ! (errcode(ERRCODE_DATA_CORRUPTED), ! errmsg("invalid page in block %u of relation %s", ! blkno, ! relpathbackend(src->smgr_rnode.node, ! src->smgr_rnode.backend, ! forkNum; /* XLOG stuff */ if (use_wal) log_newpage(&dst->smgr_rnode.node, forkNum, blkno, page); + PageSetChecksumInplace(page, blkno); + /* * Now write the page. We say isTemp = true even if it's not a temp * rel, because there's no need for smgr to schedule an fsync for this -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ignore invalid indexes in pg_dump
On Wed, Mar 27, 2013 at 6:47 AM, Tom Lane wrote: > Michael Paquier writes: > > On top of checking indisvalid, I think that some additional checks on > > indislive and indisready are also necessary. > > Those are not necessary, as an index that is marked indisvalid should > certainly also have those flags set. If it didn't require making two > new version distinctions in getIndexes(), I'd be okay with the extra > checks; but as-is I think the maintenance pain this would add greatly > outweighs any likely value. > > I've committed this in the simpler form that just adds indisvalid > checks to the appropriate version cases. > Thanks. -- Michael
Re: [HACKERS] spoonbill vs. -HEAD
Stefan Kaltenbrunner writes: > hmm - will look into that in a bit - but I also just noticed that on the > same day spoonbill broke there was also a commit to that file > immediately before that code adding the fflush() calls. It's hard to see how those would be related to this symptom. My bet is that the new fk-deadlock test exposed some pre-existing issue in isolationtester. Not quite clear what yet, though. A different line of thought is that the cancel was received by the backend but didn't succeed in cancelling the query for some reason. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spoonbill vs. -HEAD
On 03/26/2013 09:33 PM, Tom Lane wrote: > Stefan Kaltenbrunner writes: >> On 03/26/2013 08:45 PM, Tom Lane wrote: >>> It looks from here like the isolationtester client is what's dropping >>> the ball --- the backend states are unsurprising, and two of them are >>> waiting for a new client command. Can you get a stack trace from the >>> isolationtester process? > >> https://www.kaltenbrunner.cc/files/isolationtester.txt > > Hmm ... isolationtester.c:584 is in the code that tries to cancel the > current permutation (test case) after realizing that it's constructed > an invalid permutation. It looks like the preceding PQcancel() failed > for some reason, since the waiting backend is still waiting. The > isolationtester code doesn't bother to check for an error result there, > which is kinda bad, not that it's clear what it could do about it. > Could you look at the contents of the local variable "buf" in the > run_permutation stack frame? Or else try modifying the code along the > lines of > > -PQcancel(cancel, buf, sizeof(buf)); > +if (!PQcancel(cancel, buf, sizeof(buf))) > + fprintf(stderr, "PQcancel failed: %s\n", buf); > > and see if it prints anything interesting before hanging up. hmm - will look into that in a bit - but I also just noticed that on the same day spoonbill broke there was also a commit to that file immediately before that code adding the fflush() calls. Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ignore invalid indexes in pg_dump
Michael Paquier writes: > On top of checking indisvalid, I think that some additional checks on > indislive and indisready are also necessary. Those are not necessary, as an index that is marked indisvalid should certainly also have those flags set. If it didn't require making two new version distinctions in getIndexes(), I'd be okay with the extra checks; but as-is I think the maintenance pain this would add greatly outweighs any likely value. I've committed this in the simpler form that just adds indisvalid checks to the appropriate version cases. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD
Tom Lane wrote: > Using HEAD's pg_dump, I see "pg_dump -s regression" taking 5 > seconds. > On the other hand, running the same executable against the regression > database on a 9.2 postmaster takes 1.2 seconds. Looks to me like we > broke something performance-wise. > > A quick check with oprofile says it's all AllocSetCheck's fault: > > samples % image name symbol name > 8 83.6059 postgres AllocSetCheck > 1140 1.0858 postgres base_yyparse > 918 0.8744 postgres AllocSetAlloc > 778 0.7410 postgres SearchCatCache > 406 0.3867 postgres pg_strtok > 394 0.3753 postgres hash_search_with_hash_value > 387 0.3686 postgres core_yylex > 373 0.3553 postgres MemoryContextCheck > 256 0.2438 postgres nocachegetattr > 231 0.2200 postgres ScanKeywordLookup > 207 0.1972 postgres palloc > > So maybe I'm nuts to care about the performance of an assert-enabled > backend, but I don't really find a 4X runtime degradation acceptable, > even for development work. Does anyone want to fess up to having caused > this, or do I need to start tracking down what changed? I checked master HEAD for a dump of regression and got about 4 seconds. I checked right after my initial push of matview code and got 2.5 seconds. I checked just before that and got 1 second. There was some additional pg_dump work for matviews after the initial push which may or may not account for the rest of the time. Investigating now. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spoonbill vs. -HEAD
Andrew Dunstan writes: > There is some timeout code already in the buildfarm client. It was > originally put there to help us when we got CVS hangs, a not infrequent > occurrence in the early days, so it's currently only used if configured > for the checkout phase, but it could easily be used to create a build > timeout which would kill the whole process group if the timeout expired. > It wouldn't work on Windows, and of course it won't solve whatever > problem caused the hang in the first place, but it still might be worth > doing. +1 --- at least then we'd get reports of failures, rather than the current behavior where the animal just stops reporting. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch to add \watch to psql
On 3/24/13 3:10 PM, Tom Lane wrote: > I also concur with the complaint here > http://www.postgresql.org/message-id/caazkufzxyj-rt1aec6s0g7zm68tdlfbbm1r6hgrbbxnz80k...@mail.gmail.com > that allowing a minimum sleep of 0 is rather dangerous The original "watch" command apparently silently corrects a delay of 0 to 0.1 seconds. > Another minor question is whether we really need the time-of-day in the > banner, That's also part of the original "watch" and occasionally useful, I think. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spoonbill vs. -HEAD
On 03/26/2013 02:50 PM, Stefan Kaltenbrunner wrote: Hi all! I finally started to investigate why spoonbill stopped reporting to the buildfarm feedback about 2 months ago. It seems that the foreign-keys locking patch (or something commity very close to January 23th) broke it in a fairly annoying way - running the buildfarm script seems to consistently "stall" during the isolationtester part of the regression testing leaving the postgresql instance running causing all future buildfarm runs to fail... There is some timeout code already in the buildfarm client. It was originally put there to help us when we got CVS hangs, a not infrequent occurrence in the early days, so it's currently only used if configured for the checkout phase, but it could easily be used to create a build timeout which would kill the whole process group if the timeout expired. It wouldn't work on Windows, and of course it won't solve whatever problem caused the hang in the first place, but it still might be worth doing. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)
Robert Haas writes: > Well, you could easily change array_ndims() to error out if ARR_NDIM() > is negative or more than MAXDIM and return NULL only if it's exactly > 0. That wouldn't break backward compatibility because it would throw > an error only if fed a value that shouldn't ever exist in the first > place, short of a corrupted database. I imagine the other functions > are amenable to similar treatment. I haven't looked at the patch in detail, but I thought one of the key changes was that '{}' would now be interpreted as a zero-length 1-D array rather than a zero-D array. If we do that it seems a bit moot to argue about whether we should exactly preserve backwards-compatible behavior in array_ndims(), because the input it's looking at won't be the same anymore anyway. In any case, the entire point of this proposal is that the current behavior around zero-D arrays is *broken* and we don't want to be backwards-compatible with it anymore. So if you wish to argue against that opinion, do so; but it seems a bit beside the point to simply complain that backwards compatibility is being lost. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)
On 27 March 2013 06:47, Robert Haas wrote: > On Tue, Mar 26, 2013 at 9:02 AM, Brendan Jurd wrote: >> We can't sensibly test for whether an array is empty. I'd call that a >> functional problem. > > Sure you can. Equality comparisons work just fine. > > rhaas=# select '{}'::int4[] = '{}'::int4[]; The good news is, if anybody out there is using that idiom to test for emptiness, they will not be disrupted by the change. Cheers, BJ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spoonbill vs. -HEAD
Stefan Kaltenbrunner writes: > On 03/26/2013 08:45 PM, Tom Lane wrote: >> It looks from here like the isolationtester client is what's dropping >> the ball --- the backend states are unsurprising, and two of them are >> waiting for a new client command. Can you get a stack trace from the >> isolationtester process? > https://www.kaltenbrunner.cc/files/isolationtester.txt Hmm ... isolationtester.c:584 is in the code that tries to cancel the current permutation (test case) after realizing that it's constructed an invalid permutation. It looks like the preceding PQcancel() failed for some reason, since the waiting backend is still waiting. The isolationtester code doesn't bother to check for an error result there, which is kinda bad, not that it's clear what it could do about it. Could you look at the contents of the local variable "buf" in the run_permutation stack frame? Or else try modifying the code along the lines of -PQcancel(cancel, buf, sizeof(buf)); +if (!PQcancel(cancel, buf, sizeof(buf))) + fprintf(stderr, "PQcancel failed: %s\n", buf); and see if it prints anything interesting before hanging up. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Drastic performance loss in assert-enabled build in HEAD
Using HEAD's pg_dump, I see "pg_dump -s regression" taking 5 seconds. On the other hand, running the same executable against the regression database on a 9.2 postmaster takes 1.2 seconds. Looks to me like we broke something performance-wise. A quick check with oprofile says it's all AllocSetCheck's fault: samples %image name symbol name 883.6059 postgres AllocSetCheck 1140 1.0858 postgres base_yyparse 918 0.8744 postgres AllocSetAlloc 778 0.7410 postgres SearchCatCache 406 0.3867 postgres pg_strtok 394 0.3753 postgres hash_search_with_hash_value 387 0.3686 postgres core_yylex 373 0.3553 postgres MemoryContextCheck 256 0.2438 postgres nocachegetattr 231 0.2200 postgres ScanKeywordLookup 207 0.1972 postgres palloc So maybe I'm nuts to care about the performance of an assert-enabled backend, but I don't really find a 4X runtime degradation acceptable, even for development work. Does anyone want to fess up to having caused this, or do I need to start tracking down what changed? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.
On 2013-03-26 13:14:53 -0700, Kevin Grittner wrote: > Alvaro Herrera wrote: > > > Not happy with misc.c as a filename. > > We already have two misc.c files: > > src/backend/utils/adt/misc.c > src/interfaces/ecpg/ecpglib/misc.c > > I much prefer not to repeat the same filename in different > directories if we can avoid it. > > > How about pg_dump_utils.c or pg_dump_misc.c? > > Those seem reasonable to me. I vote against including pg_ in the filename, for an implementation private file that seems duplicative. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.
Alvaro Herrera wrote: > Not happy with misc.c as a filename. We already have two misc.c files: src/backend/utils/adt/misc.c src/interfaces/ecpg/ecpglib/misc.c I much prefer not to repeat the same filename in different directories if we can avoid it. > How about pg_dump_utils.c or pg_dump_misc.c? Those seem reasonable to me. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spoonbill vs. -HEAD
On 03/26/2013 08:45 PM, Tom Lane wrote: > Stefan Kaltenbrunner writes: >> I finally started to investigate why spoonbill stopped reporting to the >> buildfarm feedback about 2 months ago. >> It seems that the foreign-keys locking patch (or something commity very >> close to January 23th) broke it in a fairly annoying way - running the >> buildfarm script seems to >> consistently "stall" during the isolationtester part of the regression >> testing leaving the postgresql instance running causing all future >> buildfarm runs to fail... > > It looks from here like the isolationtester client is what's dropping > the ball --- the backend states are unsurprising, and two of them are > waiting for a new client command. Can you get a stack trace from the > isolationtester process? https://www.kaltenbrunner.cc/files/isolationtester.txt Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sql_drop Event Triggerg
Robert Haas escribió: > On Tue, Mar 26, 2013 at 3:02 PM, Alvaro Herrera > wrote: > > Now there *is* one rather big performance problem in this patch, which > > is that it turns on collection of object dropped data regardless of > > there being event triggers that use the info at all. That's a serious > > drawback and we're going to get complaints about it. So we need to do > > something to fix that. > > Really? Who is going to care about that? Surely that overhead is > quite trivial. I don't think it is, because it involves syscache lookups for each object being dropped, many extra pallocs, etc. Surely that's many times bigger than the PG_TRY overhead you were worried about. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] GSoC project : K-medoids clustering in Madlib
Hello! I'm an IT student, and I would like to apply for the 2013 GSoC. I've been looking at this mailing list for a while now, and I saw a suggestion for GSoC that particularly interested me: implementing the K-medoids clustering in Madlib, as it is supposed to be more efficient than the K-means algorithm. I didn't know about these algorithms before, but I have documented myself, and it looks quite interesting to me, and even more as I currently have lessons (but very very simplified unfortunately). I've got a few questions: Won't this be a quite short project? I can't get an idea of how long it would take me to implement this algorithm in a way that would be usable by postgresql, but 3 months looks long for this task, doesn't it? Someone on the IRC channel (can't remember who, sorry) told me it was used in the KNN index. I guess this is used by pg_trgm, but are there other modules using it currently? And could you please give me some links explaining the internals of this index? I've been through several articles presenting of it, but none very satisfying. Thanks a lot in advance!
Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)
On Tue, Mar 26, 2013 at 9:02 AM, Brendan Jurd wrote: > On 26 March 2013 22:57, Robert Haas wrote: >> They hate it twice as much when the change is essentially cosmetic. >> There's no functional problems with arrays as they exist today that >> this change would solve. > > We can't sensibly test for whether an array is empty. I'd call that a > functional problem. Sure you can. Equality comparisons work just fine. rhaas=# select '{}'::int4[] = '{}'::int4[]; ?column? -- t (1 row) rhaas=# select '{}'::int4[] = '{1}'::int4[]; ?column? -- f (1 row) > The NULL return from array_{length,lower,upper,ndims} is those > functions' way of saying their arguments failed a sanity check. So we > cannot distinguish in a disciplined way between a valid, empty array, > and bad arguments. If the zero-D implementation had been more > polished and say, array_ndims returned zero, we had provided an > array_empty function, or the existing functions threw errors for silly > arguments instead of returning NULL, then I'd be more inclined to see > your point. But as it stands, the zero-D implementation has always > been half-baked and slightly broken, we just got used to working > around it. Well, you could easily change array_ndims() to error out if ARR_NDIM() is negative or more than MAXDIM and return NULL only if it's exactly 0. That wouldn't break backward compatibility because it would throw an error only if fed a value that shouldn't ever exist in the first place, short of a corrupted database. I imagine the other functions are amenable to similar treatment. And if neither that nor just comparing against an empty array literal floats your boat, adding an array_is_empty() function would let you test for this condition without breaking backward compatibility, too. That's overkill, I think, but it would work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spoonbill vs. -HEAD
Stefan Kaltenbrunner writes: > I finally started to investigate why spoonbill stopped reporting to the > buildfarm feedback about 2 months ago. > It seems that the foreign-keys locking patch (or something commity very > close to January 23th) broke it in a fairly annoying way - running the > buildfarm script seems to > consistently "stall" during the isolationtester part of the regression > testing leaving the postgresql instance running causing all future > buildfarm runs to fail... It looks from here like the isolationtester client is what's dropping the ball --- the backend states are unsurprising, and two of them are waiting for a new client command. Can you get a stack trace from the isolationtester process? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sql_drop Event Triggerg
On Tue, Mar 26, 2013 at 3:02 PM, Alvaro Herrera wrote: > I tried this and it doesn't work. The "error pathways" you speak about > would be the xact.c entry points to commit and abort transactions; > however, there's a problem with that because some of the commands that > ProcessUtility() deals with have their own transaction management > calls internally; so those would call CommitTransaction() and the > event trigger state would go away, and then when control gets back to > ProcessUtility there would be nothing to clean up. I think we could > ignore the problem, or install smarts in ProcessUtility to avoid calling > event_trigger.c when one of those commands is involved -- but this seems > to me a solution worse than the problem. So all in all I feel like the > macro on top of PG_TRY is the way to go. I see. :-( > Now there *is* one rather big performance problem in this patch, which > is that it turns on collection of object dropped data regardless of > there being event triggers that use the info at all. That's a serious > drawback and we're going to get complaints about it. So we need to do > something to fix that. Really? Who is going to care about that? Surely that overhead is quite trivial. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sql_drop Event Triggerg
Tom Lane escribió: > Alvaro Herrera writes: > > Now there *is* one rather big performance problem in this patch, which > > is that it turns on collection of object dropped data regardless of > > there being event triggers that use the info at all. That's a serious > > drawback and we're going to get complaints about it. So we need to do > > something to fix that. > > > One idea that comes to mind is to add some more things to the grammar, > > CREATE EVENT TRIGGER name ... WITH ('DROPPED OBJECTS'); > > Uh ... surely we can just notice whether there's a trigger on the > object-drop event? I don't understand why we need *yet more* > mechanism here. There's no object-drop event, only ddl_command_end. From previous discussion I understood we didn't want a separate event, so that's what we've been running with. However, I think previous discussions have conflated many different things, and we've been slowly fixing them one by one; so perhaps at this point it does make sense to have a new object-drop event. Let's discuss it -- we would define it as taking place just before ddl_command_end, and firing any time a command (with matching tag?) has called performDeletion or performMultipleDeletions. Does that sound okay? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql_check_function - rebase for 9.3
Hello all 2013/3/26 Tom Lane : > Josh Berkus writes: >> Where are we with this patch? I'm a bit unclear from the discussion on >> whether it passes muster or not. Things seem to have petered out. > > I took another look at this patch tonight. I think the thing that is > bothering everybody (including Pavel) is that as submitted, pl_check.c > involves a huge amount of duplication of knowledge and code from > pl_exec.c, and to a lesser extent pl_comp.c. It certainly looks like a > maintenance disaster in the making. It doesn't bother me so much that > pl_check.c knows about each syntactic structure in plpgsql: there are > already four or five places you have to touch when adding new syntax. > Rather, it's that there's so much duplication of knowledge about > semantic details, which is largely expressed by copied-and-pasted code > from pl_exec.c. It seems like a safe bet that we'll sometimes miss the > need to fix pl_check.c when we fix something in pl_exec.c. There are > annoying duplications from pl_comp.c as well, eg knowledge of exactly > which magic variables are defined in trigger functions. > > Having said all that, it's not clear that we can really do any better. > The only obvious alternative is pushing support for a checking mode > directly into pl_exec.c, which would obfuscate and slow down that code > to an unacceptable degree if Pavel's results at > http://www.postgresql.org/message-id/cafj8prakujmvjpjzfsrye7+ub8jf8wtz5rkxk-0ykxme-k8...@mail.gmail.com > are any indication. (In that message, Pavel proposes shoveling the > problem into the compile code instead, but that seems unlikely to work > IMO: the main problem in pl_check.c as it stands is duplication of code > from pl_exec.c not pl_comp.c. So I think that path could only lead to > duplicating the same code into pl_comp.c.) > > So question 1 is do we want to accept that this is the implementation > pathway we're going to settle for, or are we going to hold out for a > better one? I'd be the first in line to hold out if I had a clue how > to move towards a better implementation, but I have none. Are we > prepared to reject this type of feature entirely because of the > code-duplication problem? That doesn't seem attractive either. I wrote lot of versions and proposed code is redundant, but most simple and clean. I am really against to pushing check to pl_exec, because it significantly decrease code readability and increase some bottle neck in CPU extensive tests. More, there are too less place for some future feature - performance advising, more verbose error messages, etc In PL/pgPSM I used a little bit different architecture - necessary for PSM and maybe better for PL/pgSQL too. It is two stage compiler - parsing to AST, and AST compilation. It simplify gram.y and processing order depends on AST deep iteration and not on bizon rules. It can have a impact on speed of very large procedures probably - I don't see different disadvantages. With this architecture I was able do lot of controls to compile stage without problems. Most complexity in current code is related to detecting record types from expressions without expression evaluation. Maybe this code can be in core or pl_compile.c. Code for Describe (F) message is not too reusable. It is necessary for DECLARE r RECORD; FOR r IN SELECT ... LOOP RAISE NOTICE '>>%<<', r.xx; END LOOP; > > But, even granting that this implementation approach is acceptable, > the patch does not seem close to being committable as-is: there's > a lot of fit-and-finish work yet to be done. To make my point I will > just quote from one of the regression test additions: > > create or replace function f1() > returns void as $$ > declare a1 int; a2 int; > begin > select 10,20 into a1; > end; > $$ language plpgsql; > -- raise warning > select plpgsql_check_function('f1()'); > plpgsql_check_function > - > warning:0:4:SQL statement:too many attributies for target variables > Detail: There are less target variables than output columns in query. > Hint: Check target variables in SELECT INTO statement > (3 rows) > > Do we like this output format? I don't. The unlabeled, undocumented > fields crammed into a single line with colon separators are neither > readable nor useful. If we actually need these fields, why aren't we > splitting the output into multiple columns? (I'm also wondering why > the patch bothers with an option to emit this same info in XML. Surely > there is vanishingly small use-case for mechanical examination of this > output.) This format can be reduced, redesigned, changed. It is designed like gcc output and optimized for using from psql console. I tested table output - in original CHECK statement implementation, but it is not too friendly for showing in monitor - it is just too wide. There are similar arguments like using tabular output for EXPLAIN, although there are high
Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)
> I expect to lose this argument, but I think this is a terrible idea. > Users really hate it when they try to upgrade and find that they, uh, > can't, because of some application-level incompatibility like this. > They hate it twice as much when the change is essentially cosmetic. > There's no functional problems with arrays as they exist today that > this change would solve. Sure there is. How do you distinguish between an array which is NULL and an array which is empty? Also, the whole array_dims is NULL thing trips up pretty much every single PG user who uses arrays for the first time. I'd expect when we announce the fix, we'll find that many users where doing the wrong thing in the first place and didn't know why it wasn't working. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sql_drop Event Triggerg
Alvaro Herrera writes: > Now there *is* one rather big performance problem in this patch, which > is that it turns on collection of object dropped data regardless of > there being event triggers that use the info at all. That's a serious > drawback and we're going to get complaints about it. So we need to do > something to fix that. > One idea that comes to mind is to add some more things to the grammar, > CREATE EVENT TRIGGER name ... WITH ('DROPPED OBJECTS'); Uh ... surely we can just notice whether there's a trigger on the object-drop event? I don't understand why we need *yet more* mechanism here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sql_drop Event Triggerg
Robert Haas escribió: > On Wed, Mar 20, 2013 at 5:42 PM, Alvaro Herrera > wrote: > > Here's a new version of this patch, rebased on top of the new > > pg_identify_object() stuff. Note that the regression test doesn't work > > yet, because I didn't adjust to the new identity output definition (the > > docs need work, too). But that's a simple change to do. I'm leaving > > that for later. > > I think this is getting there. A few things to think about: Thanks. > - pg_event_trigger_dropped_objects seems to assume that > currentEventTriggerState will be pointing to the same list on every > call. But is that necessarily true? I'm thinking about a case where > someone opens a cursor in an event trigger and then tries to read from > that cursor later in the transaction. I think you might be able to > crash the server that way. Well, no, because it uses materialized return mode, so there's no "next time" --- the list is constructed completely before pg_event_trigger_dropped_objects returns. So there's no such hole. > - I am not wild about the idea of propagating PG_TRY/PG_CATCH blocks > into yet more places. On Linux-x86 they are pretty cheap because > Linux doesn't need a system call to change the signal mask and x86 has > few registers that must be saved-and-restored, but elsewhere this can > be a performance problem. Now maybe ProcessUtility is not a > sufficiently-frequently called function for this to matter... but I'm > not sure. The alternative is to teach the error handling pathways > about this in somewhat greater detail, since the point of TRY/CATCH is > to cleanup things that the regular error handling stuff doesn't now > about. I tried this and it doesn't work. The "error pathways" you speak about would be the xact.c entry points to commit and abort transactions; however, there's a problem with that because some of the commands that ProcessUtility() deals with have their own transaction management calls internally; so those would call CommitTransaction() and the event trigger state would go away, and then when control gets back to ProcessUtility there would be nothing to clean up. I think we could ignore the problem, or install smarts in ProcessUtility to avoid calling event_trigger.c when one of those commands is involved -- but this seems to me a solution worse than the problem. So all in all I feel like the macro on top of PG_TRY is the way to go. Now there *is* one rather big performance problem in this patch, which is that it turns on collection of object dropped data regardless of there being event triggers that use the info at all. That's a serious drawback and we're going to get complaints about it. So we need to do something to fix that. One idea that comes to mind is to add some more things to the grammar, CREATE EVENT TRIGGER name ... WITH ('DROPPED OBJECTS'); or some such, so that when events happen for which any triggers have that flag enabled, *then* collecting is activated, otherwise not. This would be stored in a new column in pg_event_trigger (say "evtsupport", a char array much like proargmodes). The sequence of (ahem) events goes like this: ProcessUtility() EventTriggerBeginCompleteQuery() EventTriggerDDLCommandStart() EventCacheLookup() EventTriggerInvoke() .. run whatever command we've been handed ... EventTriggerDDLCommandEnd() EventCacheLookup() EventTriggerInvoke() EventTriggerEndCompleteQuery() So EventTriggerBeginCompleteQuery() will have to peek into the event trigger cache for any ddl_command_end triggers that might apply, and see if any of them has the flag for "dropped objects". If it's there, then enable dropped object collection. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] spoonbill vs. -HEAD
Hi all! I finally started to investigate why spoonbill stopped reporting to the buildfarm feedback about 2 months ago. It seems that the foreign-keys locking patch (or something commity very close to January 23th) broke it in a fairly annoying way - running the buildfarm script seems to consistently "stall" during the isolationtester part of the regression testing leaving the postgresql instance running causing all future buildfarm runs to fail... The process listing at that time looks like: https://www.kaltenbrunner.cc/files/process_listing.txt pg_stats_activity of the running instance: https://www.kaltenbrunner.cc/files/pg_stat_activity.txt pg_locks: https://www.kaltenbrunner.cc/files/pg_locks.txt backtraces of the three backends: https://www.kaltenbrunner.cc/files/bt_20467.txt https://www.kaltenbrunner.cc/files/bt_20897.txt https://www.kaltenbrunner.cc/files/bt_24285.txt Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] adding support for zero-attribute unique/etc keys
On 27/03/13 06:14, Darren Duncan wrote: On 2013.03.26 1:40 AM, Albe Laurenz wrote: Darren Duncan wrote: So, determining if 2 rows are the same involves an iteration of dyadic logical AND over the predicates for each column comparison. Now logical AND has an identity value, which is TRUE, because "TRUE AND p" (and "p AND TRUE") results in "p" for all "p". Therefore, any 2 rows with zero columns each are the same. Since any 2 rows with zero columns are the same, the "UNIQUE predicate" is FALSE any time there is more than 1 row in a table. Does anyone agree or disagree with this logic? Yes :^) You could use the same kind of argument like this: UNIQUE is true iff any two rows in T satisfy for each column: the column in row 1 is null OR the column in row 2 is null OR the column in row 1 is distinct from the column in row 2 Now you you iterate your logical AND over this predicate for all columns and come up with TRUE since there are none. Consequently UNIQUE is satisfied, no matter how many rows there are. In a nutshell: All members of the empty set satisfy p, but also: all members of the empty set satisfy the negation of p. You can use this technique to make anything plausible. Consider the context however. We're talking about a UNIQUE constraint and so what we want to do is prevent the existence of multiple tuples in a relation that are the same for some defined subset of their attributes. I would argue that logically, and commonsensically, two tuples with no attributes are the same, and hence a set of distinct tuples having zero attributes could have no more than one member, and so a UNIQUE constraint over zero attributes would say the relation can't have more than one tuple. So unless someone wants to argue that two tuples with no attributes are not the same, my interpretation makes more sense and is clearly the one to follow. -- Darren Duncan Hmm as a user, I would like at most one row with empty fields covered by a unique index. Logical arguments to the contrary, remind me of the joke of the school boy who told his unlearned father that he had learnt logic and could prove that his father actually had 3 fish in his basket despite both seeing only 2 fish. His unlearned father did not try to argue, and simply said: well your mother can have the first fish, I'll have the second, and that his learned son could have the third...
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On Sun, Mar 24, 2013 at 12:37 PM, Michael Paquier wrote: > > > On Sat, Mar 23, 2013 at 10:20 PM, Andres Freund > wrote: >> >> On 2013-03-22 07:38:36 +0900, Michael Paquier wrote: >> > Is someone planning to provide additional feedback about this patch at >> > some >> > point? >> >> Yes, now that I have returned from my holidays - or well, am returning >> from them, I do plan to. But it should probably get some implementation >> level review from somebody but Fujii and me... > > Yeah, it would be good to have an extra pair of fresh eyes looking at those > patches. Probably I don't have enough time to review the patch thoroughly. It's quite helpful if someone becomes another reviewer of this patch. > Please find new patches realigned with HEAD. There were conflicts with > commits done recently. ISTM you failed to make the patches from your repository. 20130323_1_toastindex_v7.patch contains all the changes of 20130323_2_reindex_concurrently_v25.patch Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.
Heikki Linnakangas wrote: > This is what I came up with. I created a new file, misc.c (for lack > of a better name), for things that are shared by pg_dump and > pg_restore, but not pg_dumpall or other programs. I moved all the > parallel stuff from dumputils.c to parallel.c, and everything else > that's not used outside pg_dump and pg_restore to misc.c. I moved > exit_horribly() to parallel.c, because it needs to do things > differently in parallel mode. Not happy with misc.c as a filename. How about pg_dump_utils.c or pg_dump_misc.c? I think the comment at the top should explicitely say that the file is intended not to be linked in pg_dumpall. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay
Merlin Moncure wrote: > *) For off-cycle release work that would help enable patches with > complex performance trade-offs (I'm working up another patch that has > even more compelling benefits and risks in the buffer allocator), We > desperately need a standard battery of comprehensive performance tests > and doner machines. Such a thing would vastly reduce the time needed to work on something like this with confidence that it would not be a disaster for some unidentified workload. Sure, something could still "slip though the cracks" -- but they would *be* cracks, not a wide gaping hole. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Remove invalid indexes from pg_dump Was: [HACKERS] Support for REINDEX CONCURRENTLY
On Tue, Mar 19, 2013 at 9:19 AM, Michael Paquier wrote: > If failures happen with CREATE INDEX CONCURRENTLY, the system will be let > with invalid indexes. I don't think that the user would like to see invalid > indexes of > an existing system being recreated as valid after a restore. > So why not removing from a dump invalid indexes with something like the > patch > attached? +1 The patch looks good to me. > This should perhaps be applied in pg_dump for versions down to 8.2 where > CREATE > INDEX CONCURRENTLY has been implemented? I think so. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.
On 26.03.2013 09:51, Heikki Linnakangas wrote: On 26.03.2013 02:02, Tom Lane wrote: Heikki Linnakangas writes: On 25.03.2013 15:36, Tom Lane wrote: Heikki Linnakangas writes: Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt. Per warning from -Wmissing-format-attribute. Hm, this is exactly what I removed yesterday, because it makes the build fail outright on old gcc: The attached seems to work. With this patch, on_exit_msg_func() is gone. There's a different implementation of exit_horribly for pg_dumpall and pg_dump/restore. In pg_dumpall, it just calls vwrite_msg(). In pg_dump/restore's version, the logic from parallel_exit_msg_func() is moved directly to exit_horribly(). Seems probably reasonable, though if we're taking exit_horribly out of dumputils.c, meseems it ought not be declared in dumputils.h anymore. Can we put that declaration someplace else, rather than commenting it with an apology? Ugh, the patch I posted doesn't actually work, because dumputils.c is also used in psql and some scripts, so you get a linker error in those. psql and scripts don't use exit_horribly or many of the other functions in dumputils.c, so I think we should split dumputils.c into two parts anyway. fmtId and the other functions that are used by psql in one file, and the functions that are only shared between pg_dumpall and pg_dump in another. Then there's also functions that are used by pg_dump and pg_restore, but not pg_dumpall or psql. I'll try moving things around a bit... This is what I came up with. I created a new file, misc.c (for lack of a better name), for things that are shared by pg_dump and pg_restore, but not pg_dumpall or other programs. I moved all the parallel stuff from dumputils.c to parallel.c, and everything else that's not used outside pg_dump and pg_restore to misc.c. I moved exit_horribly() to parallel.c, because it needs to do things differently in parallel mode. I still used a function pointer, not for the printf-style message printing routine, but for making dumputils.c independent of parallel mode. getThreadLocalPQBuffer() is now a function pointer; the default implementation just uses a static variable, but when pg_dump/restore enters parallel mode, it points the function pointer to a version that uses thread-local storage (on windows). - Heikki *** a/src/bin/pg_dump/Makefile --- b/src/bin/pg_dump/Makefile *** *** 19,25 include $(top_builddir)/src/Makefile.global override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS) OBJS= pg_backup_archiver.o pg_backup_db.o pg_backup_custom.o \ ! pg_backup_null.o pg_backup_tar.o parallel.o \ pg_backup_directory.o dumputils.o compress_io.o $(WIN32RES) KEYWRDOBJS = keywords.o kwlookup.o --- 19,25 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS) OBJS= pg_backup_archiver.o pg_backup_db.o pg_backup_custom.o \ ! pg_backup_null.o pg_backup_tar.o parallel.o misc.o \ pg_backup_directory.o dumputils.o compress_io.o $(WIN32RES) KEYWRDOBJS = keywords.o kwlookup.o *** a/src/bin/pg_dump/common.c --- b/src/bin/pg_dump/common.c *** *** 14,19 --- 14,20 *- */ #include "pg_backup_archiver.h" + #include "misc.h" #include *** a/src/bin/pg_dump/compress_io.c --- b/src/bin/pg_dump/compress_io.c *** *** 53,59 */ #include "compress_io.h" ! #include "dumputils.h" #include "parallel.h" /*-- --- 53,59 */ #include "compress_io.h" ! #include "misc.h" #include "parallel.h" /*-- *** a/src/bin/pg_dump/dumputils.c --- b/src/bin/pg_dump/dumputils.c *** *** 25,45 extern const ScanKeyword FEScanKeywords[]; extern const int NumFEScanKeywords; - /* Globals exported by this file */ - int quote_all_identifiers = 0; - const char *progname = NULL; - - #define MAX_ON_EXIT_NICELY20 - - static struct - { - on_exit_nicely_callback function; - void *arg; - } on_exit_nicely_list[MAX_ON_EXIT_NICELY]; - - static int on_exit_nicely_index; - void (*on_exit_msg_func) (const char *modulename, const char *fmt, va_list ap) = vwrite_msg; - #define supports_grant_options(version) ((version) >= 70400) static bool parseAclItem(const char *item, const char *type, --- 25,30 *** *** 49,116 static bool parseAclItem(const char *item, const char *type, static char *copyAclUserName(PQExpBuffer output, char *input); static void AddAcl(PQExpBuffer aclbuf, const char *keyword, const char *subname); ! static PQExpBuffer getThreadLocalPQExpBuffer(void); ! ! #ifdef WIN32 ! static void shutdown_parallel_dump_utils(int code, void *unused); ! static bool parallel_init_done = false; ! static DWORD tls_index; ! static DWORD mainThreadId; ! static void ! shutdown_parallel_dump_utils(int code, void *unused) ! { ! /* Call the cleanup function only from the main thread */ ! if (main
Re: [HACKERS] adding support for zero-attribute unique/etc keys
On 2013.03.26 1:40 AM, Albe Laurenz wrote: Darren Duncan wrote: So, determining if 2 rows are the same involves an iteration of dyadic logical AND over the predicates for each column comparison. Now logical AND has an identity value, which is TRUE, because "TRUE AND p" (and "p AND TRUE") results in "p" for all "p". Therefore, any 2 rows with zero columns each are the same. Since any 2 rows with zero columns are the same, the "UNIQUE predicate" is FALSE any time there is more than 1 row in a table. Does anyone agree or disagree with this logic? Yes :^) You could use the same kind of argument like this: UNIQUE is true iff any two rows in T satisfy for each column: the column in row 1 is null OR the column in row 2 is null OR the column in row 1 is distinct from the column in row 2 Now you you iterate your logical AND over this predicate for all columns and come up with TRUE since there are none. Consequently UNIQUE is satisfied, no matter how many rows there are. In a nutshell: All members of the empty set satisfy p, but also: all members of the empty set satisfy the negation of p. You can use this technique to make anything plausible. Consider the context however. We're talking about a UNIQUE constraint and so what we want to do is prevent the existence of multiple tuples in a relation that are the same for some defined subset of their attributes. I would argue that logically, and commonsensically, two tuples with no attributes are the same, and hence a set of distinct tuples having zero attributes could have no more than one member, and so a UNIQUE constraint over zero attributes would say the relation can't have more than one tuple. So unless someone wants to argue that two tuples with no attributes are not the same, my interpretation makes more sense and is clearly the one to follow. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay
On Tue, Mar 26, 2013 at 03:06:30PM +, Simon Riggs wrote: > > More generally, the fact that a patch has some user-frobbable knob > > does not mean that it's actually a good or even usable solution. As > > everybody keeps saying, testing on a wide range of use-cases would be > > needed to prove that, and we don't have enough time left for such > > testing in the 9.3 timeframe. This problem needs to be attacked in > > an organized and deliberate fashion, not by hacking something up under > > time pressure and shipping it with minimal testing. > > Well, it has been tackled like that and we've *all* got nowhere. No > worries, I can wait a year for that beer. This was the obvious result of this discussion --- it is a shame we had to discuss this rather than working on more pressing 9.3 issues. I also think someone saying "I would like to apply this now" is more disruptive than casual discussion about things like buffer count locking. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
On Tue, Mar 26, 2013 at 11:40 AM, Bruce Momjian wrote: > On Fri, Mar 22, 2013 at 04:16:18PM -0400, Tom Lane wrote: >> Merlin Moncure writes: >> > I think there is some very low hanging optimization fruit in the clock >> > sweep loop. first and foremost, I see no good reason why when >> > scanning pages we have to spin and wait on a buffer in order to >> > pedantically adjust usage_count. some simple refactoring there could >> > set it up so that a simple TAS (or even a TTAS with the first test in >> > front of the cache line lock as we done automatically in x86 IIRC) >> > could guard the buffer and, in the event of any lock detected, simply >> > move on to the next candidate without messing around with that buffer >> > at all. This could construed as a 'trylock' variant of a spinlock >> > and might help out with cases where an especially hot buffer is >> > locking up the sweep. This is exploiting the fact that from >> > StrategyGetBuffer we don't need a *particular* buffer, just *a* >> > buffer. >> >> Hm. You could argue in fact that if there's contention for the buffer >> header, that's proof that it's busy and shouldn't have its usage count >> decremented. So this seems okay from a logical standpoint. >> >> However, I'm not real sure that it's possible to do a conditional >> spinlock acquire that doesn't create just as much hardware-level >> contention as a full acquire (ie, TAS is about as bad whether it >> gets the lock or not). So the actual benefit is a bit less clear. > > Could we view the usage count, and if it is 5, and if there is someone > holding the lock, we just ignore the buffer and move on to the next > buffer? Seems that could be done with no locking. The idea is that if someone is "holding the lock" to completely ignore the buffer regardless of usage. Quotes there because we test the lock without cacheline lock. Now if the buffer is apparently unlocked but returns locked once you *do* acquire cache line lock in anticipation of refcounting, again immediately bail and go to next buffer. I see no reason whatsoever to have buffer allocator spin and wait on a blocked buffer. This is like jumping onto a merry-go-round being spun by sadistic teenagers. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
On Fri, Mar 22, 2013 at 04:16:18PM -0400, Tom Lane wrote: > Merlin Moncure writes: > > I think there is some very low hanging optimization fruit in the clock > > sweep loop. first and foremost, I see no good reason why when > > scanning pages we have to spin and wait on a buffer in order to > > pedantically adjust usage_count. some simple refactoring there could > > set it up so that a simple TAS (or even a TTAS with the first test in > > front of the cache line lock as we done automatically in x86 IIRC) > > could guard the buffer and, in the event of any lock detected, simply > > move on to the next candidate without messing around with that buffer > > at all. This could construed as a 'trylock' variant of a spinlock > > and might help out with cases where an especially hot buffer is > > locking up the sweep. This is exploiting the fact that from > > StrategyGetBuffer we don't need a *particular* buffer, just *a* > > buffer. > > Hm. You could argue in fact that if there's contention for the buffer > header, that's proof that it's busy and shouldn't have its usage count > decremented. So this seems okay from a logical standpoint. > > However, I'm not real sure that it's possible to do a conditional > spinlock acquire that doesn't create just as much hardware-level > contention as a full acquire (ie, TAS is about as bad whether it > gets the lock or not). So the actual benefit is a bit less clear. Could we view the usage count, and if it is 5, and if there is someone holding the lock, we just ignore the buffer and move on to the next buffer? Seems that could be done with no locking. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
On Fri, Mar 22, 2013 at 06:06:18PM +, Greg Stark wrote: > On Fri, Mar 22, 2013 at 2:02 PM, Tom Lane wrote: > > And we definitely looked at ARC > > We didn't just look at it. At least one release used it. Then patent > issues were raised (and I think the implementation had some contention > problems). The problem was cache line overhead between CPUs to manage the ARC queues. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ideas for improving Concurrency Tests
On Tue, Mar 26, 2013 at 7:31 AM, Amit Kapila wrote: > Above ideas could be useful to improve concurrency testing and can also be > helpful to generate test cases for some of the complicated bugs for which > there is no direct test. I wonder how much explicit sync points would help with testing though. It seems like they suffer from the problem that you'll only put sync points where you actually expect problems and not where you don't expect them -- which is exactly where problems are likely to occur. Wouldn't it be more useful to implicitly create sync points whenever synchronization events like spinlocks being taken occur? And likewise explicitly listing the timing sequences to test seems unconvincing. If we could arrange for two threads to execute every possible interleaving of code by exhaustively trying every combination that would be far more convincing. Most bugs are likely to hang out in combinations we don't see in practice -- for instance having a tuple deleted and a new one inserted in the same slot in the time a different transaction was context switched out. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] odd behavior in materialized view
Fujii Masao wrote: > Ping? ISTM this problem has not been fixed in HEAD yet. It's next on my list. The other reports seemed more serious and more likely to be contentious in terms of the best fix. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay
On 26 March 2013 14:44, Tom Lane wrote: > Simon Riggs writes: >> So please, lets go with a simple solution now that allows users to say >> what they want. > > Simon, this is just empty posturing, as your arguments have nothing > whatsoever to do with whether the above description applies to your > patch. Waiting for an auto-tuned solution to *every* problem means we just sit and watch bad things happen, knowing how to fix them for particular cases yet not being able to do anything at all. > More generally, the fact that a patch has some user-frobbable knob > does not mean that it's actually a good or even usable solution. As > everybody keeps saying, testing on a wide range of use-cases would be > needed to prove that, and we don't have enough time left for such > testing in the 9.3 timeframe. This problem needs to be attacked in > an organized and deliberate fashion, not by hacking something up under > time pressure and shipping it with minimal testing. Well, it has been tackled like that and we've *all* got nowhere. No worries, I can wait a year for that beer. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Back-branch security updates coming next week
The core team has received word of a seriously nasty security problem in recent releases of Postgres. We will be wrapping update releases to fix this next week, following the new "usual schedule" of tarball wrap Monday afternoon EDT, public announcement Thursday (4/4). Committers are reminded that it's uncool to commit any potentially destabilizing changes to back branches in the last day or two before a release wrap. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay
Simon Riggs writes: > So please, lets go with a simple solution now that allows users to say > what they want. Simon, this is just empty posturing, as your arguments have nothing whatsoever to do with whether the above description applies to your patch. More generally, the fact that a patch has some user-frobbable knob does not mean that it's actually a good or even usable solution. As everybody keeps saying, testing on a wide range of use-cases would be needed to prove that, and we don't have enough time left for such testing in the 9.3 timeframe. This problem needs to be attacked in an organized and deliberate fashion, not by hacking something up under time pressure and shipping it with minimal testing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump in current master segfaults when dumping 9.2/9.1 databases
On 26.03.2013 15:31, Bernd Helmle wrote: My current master segfaults with pg_dump when dumping a 9.1 or 9.2 database: $ LC_ALL=en_US.utf8 pg_dump -s -p 5448 pg_dump: column number -1 is out of range 0..22 zsh: segmentation fault LC_ALL=en_US.utf8 pg_dump -s -p 5448 The reason seems to be that getTables() in pg_dump.c forget to select relpages in the query for releases >= 90100. The error message comes from PQgetvalue(res, i, i_relpages), which complains about i_relpages being -1, which will then return NULL... Thanks, fixed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_dump in current master segfaults when dumping 9.2/9.1 databases
My current master segfaults with pg_dump when dumping a 9.1 or 9.2 database: $ LC_ALL=en_US.utf8 pg_dump -s -p 5448 pg_dump: column number -1 is out of range 0..22 zsh: segmentation fault LC_ALL=en_US.utf8 pg_dump -s -p 5448 The reason seems to be that getTables() in pg_dump.c forget to select relpages in the query for releases >= 90100. The error message comes from PQgetvalue(res, i, i_relpages), which complains about i_relpages being -1, which will then return NULL... -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay
On Tue, Mar 26, 2013 at 7:30 AM, Simon Riggs wrote: > On 26 March 2013 11:33, Robert Haas wrote: >> On Tue, Mar 26, 2013 at 5:27 AM, Simon Riggs wrote: >>> On 26 March 2013 01:35, Greg Stark wrote: On Tue, Mar 26, 2013 at 12:00 AM, Simon Riggs wrote: > I'll bet you all a beer at PgCon 2014 that this remains unresolved at > that point. Are you saying you're only interested in working on it now? That after 9.3 is release you won't be interested in working on it any more? As you said we've been eyeing this particular logic since 2004, why did it suddenly become more urgent now? Why didn't you work on it 9 months ago at the beginning of the release cycle? >>> >>> I'm not sure why your comments are so confrontational here, but I >>> don't think it helps much. I'm happy to buy you a beer too. >>> >>> As I explained clearly in my first post, this idea came about trying >>> to improve on the negative aspects of the checksum patch. People were >>> working on ideas 9 months ago to resolve this, but they have come to >>> nothing. I regret that; Merlin and others have worked hard to find a >>> way: Respect to them. >>> >>> My suggestion is to implement a feature that takes 1 day to write and >>> needs little testing to show it works. >> >> Any patch in this area isn't likely to take much testing to establish >> whether it improves some particular case. The problem is what happens >> to all of the other cases - and I don't believe that part needs little >> testing, hence the objections (with which I agree) to doing anything >> about this now. >> >> If we want to change something in this area, we might consider >> resurrecting the patch I worked on for this last year, which had, I >> believe, a fairly similar mechanism of operation to what you're >> proposing, and some other nice properties as well: >> >> http://www.postgresql.org/message-id/aanlktik5qzr8wts0mqcwwmnp-qhgrdky5av5aob7w...@mail.gmail.com >> http://www.postgresql.org/message-id/aanlktimgkag7wdu-x77gnv2gh6_qo5ss1u5b6q1ms...@mail.gmail.com >> >> ...but I think the main reason why that never went anywhere is because >> we never really had any confidence that the upsides were worth the >> downsides. Fundamentally, postponing hint bit setting (or hint bit >> I/O) increases the total amount of work done by the system. You still >> end up writing the hint bits eventually, and in the meantime you do >> more CLOG lookups. Now, as a compensating benefit, you can spread the >> work of writing the hint-bit updated pages out over a longer period of >> time, so that no single query carries too much of the burden of >> getting the bits set. The worst-case-latency vs. aggregate-throughput >> tradeoff is one with a long history and I think it's appropriate to >> view this problem through that lens also. > > I hadn't realised so many patches existed that were similar. Hackers > is bigger these days. > > Reviewing the patch, I'd say the problem is that it is basically > implementing a new automatic heuristic. We simply don't have any > evidence that any new heuristic will work for all cases, so we do > nothing. > > Whether we apply my patch, yours or Merlin's, my main thought now is > that we need a user parameter to control it so it can be adjusted > according to need and not touched at all if there is no problem. After a night thinking about this, I'd like to make some points: *) my patch deliberately did not 'set bits without dirty' -- with checksums in mind as you noted (thanks for that). I think the upside for marking pages in that fasion anyways is overrated. *) Any strategy that does not approximate hint bit behavior IMNSHO is a non-starter. By that I mean when your $condition is met so that hint bits are not being written out, scans need to bail out of HeapTupleSatisfiesMVCC processing with a cheap check. If you don't do that and rely on the transam.c guard, you've already missed the boat: the even without clog lookup the extra processing there I can assure you will show up in profiling of repeated scans (until vacuum). *) The case of sequential tuples with the same xid is far and away the most important one. In OLTP workloads hint bit i/o is minor compared to everything else going on. Also, OLTP workloads are probably better handled with an hint bit check just before eviction via bgwriter vs during scan. *) The budget for extra work inside HeapTupleSatisfiesMVCC is exceptionally low. For this reason, I think your idea would be better framed at the page level and the bailout should be measured in the number of pages, not tuples (that way the page can send in a single boolean to control hint bit behavior). *) The upside of optimizing xmax processing is fairly low for most workloads I've seen *) The benchmarking Amit and Hari did needs analysis. *) For off-cycle release work that would help enable patches with complex performance trade-offs (I'm working up another patch that has even more comp
Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)
On 26 March 2013 22:57, Robert Haas wrote: > They hate it twice as much when the change is essentially cosmetic. > There's no functional problems with arrays as they exist today that > this change would solve. > We can't sensibly test for whether an array is empty. I'd call that a functional problem. The NULL return from array_{length,lower,upper,ndims} is those functions' way of saying their arguments failed a sanity check. So we cannot distinguish in a disciplined way between a valid, empty array, and bad arguments. If the zero-D implementation had been more polished and say, array_ndims returned zero, we had provided an array_empty function, or the existing functions threw errors for silly arguments instead of returning NULL, then I'd be more inclined to see your point. But as it stands, the zero-D implementation has always been half-baked and slightly broken, we just got used to working around it. Cheers, BJ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay
On 26 March 2013 11:33, Robert Haas wrote: > On Tue, Mar 26, 2013 at 5:27 AM, Simon Riggs wrote: >> On 26 March 2013 01:35, Greg Stark wrote: >>> On Tue, Mar 26, 2013 at 12:00 AM, Simon Riggs wrote: I'll bet you all a beer at PgCon 2014 that this remains unresolved at that point. >>> >>> Are you saying you're only interested in working on it now? That after >>> 9.3 is release you won't be interested in working on it any more? >>> >>> As you said we've been eyeing this particular logic since 2004, why >>> did it suddenly become more urgent now? Why didn't you work on it 9 >>> months ago at the beginning of the release cycle? >> >> I'm not sure why your comments are so confrontational here, but I >> don't think it helps much. I'm happy to buy you a beer too. >> >> As I explained clearly in my first post, this idea came about trying >> to improve on the negative aspects of the checksum patch. People were >> working on ideas 9 months ago to resolve this, but they have come to >> nothing. I regret that; Merlin and others have worked hard to find a >> way: Respect to them. >> >> My suggestion is to implement a feature that takes 1 day to write and >> needs little testing to show it works. > > Any patch in this area isn't likely to take much testing to establish > whether it improves some particular case. The problem is what happens > to all of the other cases - and I don't believe that part needs little > testing, hence the objections (with which I agree) to doing anything > about this now. > > If we want to change something in this area, we might consider > resurrecting the patch I worked on for this last year, which had, I > believe, a fairly similar mechanism of operation to what you're > proposing, and some other nice properties as well: > > http://www.postgresql.org/message-id/aanlktik5qzr8wts0mqcwwmnp-qhgrdky5av5aob7w...@mail.gmail.com > http://www.postgresql.org/message-id/aanlktimgkag7wdu-x77gnv2gh6_qo5ss1u5b6q1ms...@mail.gmail.com > > ...but I think the main reason why that never went anywhere is because > we never really had any confidence that the upsides were worth the > downsides. Fundamentally, postponing hint bit setting (or hint bit > I/O) increases the total amount of work done by the system. You still > end up writing the hint bits eventually, and in the meantime you do > more CLOG lookups. Now, as a compensating benefit, you can spread the > work of writing the hint-bit updated pages out over a longer period of > time, so that no single query carries too much of the burden of > getting the bits set. The worst-case-latency vs. aggregate-throughput > tradeoff is one with a long history and I think it's appropriate to > view this problem through that lens also. I hadn't realised so many patches existed that were similar. Hackers is bigger these days. Reviewing the patch, I'd say the problem is that it is basically implementing a new automatic heuristic. We simply don't have any evidence that any new heuristic will work for all cases, so we do nothing. Whether we apply my patch, yours or Merlin's, my main thought now is that we need a user parameter to control it so it can be adjusted according to need and not touched at all if there is no problem. My washing machine has a wonderful feature "15 min wash" and it works great for the times I know I need it; but in general, the auto wash mode works fine since often you don't care that it takes 90 minutes. It's much easier to see that the additional user option is beneficial, but much harder to start arguing that the default wash cycle should be 85 or 92 minutes. It'd be great if the washing machine could work out that I need my clothes quickly and that on-this-day-only I don't care about the thoroughness of the wash, but it can't. I don't think the washing machine engineers are idiots for not being able to work that out, but if they only offered a single option because they thought they knew better than me, I'd be less than impressed. In the same way, we need some way to say "big queries shouldn't do cleanup" even if autovacuum ends up doing more I/O over time (though in fact I doubt this is the case, detailed argument on other post). So please, lets go with a simple solution now that allows users to say what they want. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)
2013/3/26 Robert Haas : > On Sun, Mar 24, 2013 at 10:02 PM, Josh Berkus wrote: >> On 03/20/2013 04:45 PM, Brendan Jurd wrote: >>> Incompatibility: >>> This patch introduces an incompatible change in the behaviour of the >>> aforementioned array functions -- instead of returning NULL for empty >>> arrays they return meaningful values. Applications relying on the old >>> behaviour to test for emptiness may be disrupted. One can >> >> As a heavy user of arrays, I support this patch as being worth the >> breakage of backwards compatibility. However, that means it certainly >> will need to wait for 9.4 to provide adequate warning. > > I expect to lose this argument, but I think this is a terrible idea. > Users really hate it when they try to upgrade and find that they, uh, > can't, because of some application-level incompatibility like this. > They hate it twice as much when the change is essentially cosmetic. > There's no functional problems with arrays as they exist today that > this change would solve. > > The way to make a change like this without breaking things for users > is to introduce a new type with different behavior and gradually > deprecate the old one. Now, maybe it doesn't seem worth doing that > for a change this small. But if so, I think that's evidence that this > isn't worth changing in the first place, not that it's worth changing > without regard for backwards-compatibility. > > Personally, I think if we're going to start whacking around the > behavior here and risk inconveniencing our users, we ought to think a > little larger. The fundamental thing that's dictating the current > behavior is that we have arrays of between 1 and 6 dimensions all > rolled up under a single data type. But in many cases, if not nearly > all cases, what people want is, specifically, a one-dimensional array. > If we were going to actually bite the bullet and create separate data > types for each possible number of array dimensions... and maybe fix > some other problems at the same time... then the effort involved in > ensuring backward-compatibility might seem like time better spent. > I understand, but I don't agree. W have to fix impractical design of arrays early. A ARRAY is 1st class - so there is not possible to use varchar2 trick. if we don't would to use GUC, what do you think about compatible extension? We can overload a system functions behave. This can solve a problem with updates and migrations. Regards Pavel > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)
On Sun, Mar 24, 2013 at 10:02 PM, Josh Berkus wrote: > On 03/20/2013 04:45 PM, Brendan Jurd wrote: >> Incompatibility: >> This patch introduces an incompatible change in the behaviour of the >> aforementioned array functions -- instead of returning NULL for empty >> arrays they return meaningful values. Applications relying on the old >> behaviour to test for emptiness may be disrupted. One can > > As a heavy user of arrays, I support this patch as being worth the > breakage of backwards compatibility. However, that means it certainly > will need to wait for 9.4 to provide adequate warning. I expect to lose this argument, but I think this is a terrible idea. Users really hate it when they try to upgrade and find that they, uh, can't, because of some application-level incompatibility like this. They hate it twice as much when the change is essentially cosmetic. There's no functional problems with arrays as they exist today that this change would solve. The way to make a change like this without breaking things for users is to introduce a new type with different behavior and gradually deprecate the old one. Now, maybe it doesn't seem worth doing that for a change this small. But if so, I think that's evidence that this isn't worth changing in the first place, not that it's worth changing without regard for backwards-compatibility. Personally, I think if we're going to start whacking around the behavior here and risk inconveniencing our users, we ought to think a little larger. The fundamental thing that's dictating the current behavior is that we have arrays of between 1 and 6 dimensions all rolled up under a single data type. But in many cases, if not nearly all cases, what people want is, specifically, a one-dimensional array. If we were going to actually bite the bullet and create separate data types for each possible number of array dimensions... and maybe fix some other problems at the same time... then the effort involved in ensuring backward-compatibility might seem like time better spent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interesting post-mortem on a near disaster with git
Le lundi 25 mars 2013 19:35:12, Daniel Farina a écrit : > On Mon, Mar 25, 2013 at 11:07 AM, Stefan Kaltenbrunner > > wrote: > >> Back when we used CVS for quite a few years I kept 7 day rolling > >> snapshots of the CVS repo, against just such a difficulty as this. But > >> we seem to be much better organized with infrastructure these days so I > >> haven't done that for a long time. > > > > well there is always room for improvement(and for learning from others) > > - but I agree that this proposal seems way overkill. If people think we > > should keep online "delayed" mirrors we certainly have the resources to > > do that on our own if we want though... > > What about rdiff-backup? I've set it up for personal use years ago > (via the handy open source bash script backupninja) years ago and it > has a pretty nice no-frills point-in-time, self-expiring, file-based > automatic backup program that works well with file synchronization > like rsync (I rdiff-backup to one disk and rsync the entire > rsync-backup output to another disk). I've enjoyed using it quite a > bit during my own personal-computer emergencies and thought the > maintenance required from me has been zero, and I have used it from > time to time to restore, proving it even works. Hardlinks can be used > to tag versions of a file-directory tree recursively relatively > compactly. > > It won't be as compact as a git-aware solution (since git tends to to > rewrite entire files, which will confuse file-based incremental > differential backup), but the amount of data we are talking about is > pretty small, and as far as a lowest-common-denominator tradeoff for > use in emergencies, I have to give it a lot of praise. The main > advantage it has here is it implements point-in-time recovery > operations that easy to use and actually seem to work. That said, > I've mostly done targeted recoveries rather than trying to recover > entire trees. I have the same set up, and same feedback. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay
On Tue, Mar 26, 2013 at 5:27 AM, Simon Riggs wrote: > On 26 March 2013 01:35, Greg Stark wrote: >> On Tue, Mar 26, 2013 at 12:00 AM, Simon Riggs wrote: >>> I'll bet you all a beer at PgCon 2014 that this remains unresolved at >>> that point. >> >> Are you saying you're only interested in working on it now? That after >> 9.3 is release you won't be interested in working on it any more? >> >> As you said we've been eyeing this particular logic since 2004, why >> did it suddenly become more urgent now? Why didn't you work on it 9 >> months ago at the beginning of the release cycle? > > I'm not sure why your comments are so confrontational here, but I > don't think it helps much. I'm happy to buy you a beer too. > > As I explained clearly in my first post, this idea came about trying > to improve on the negative aspects of the checksum patch. People were > working on ideas 9 months ago to resolve this, but they have come to > nothing. I regret that; Merlin and others have worked hard to find a > way: Respect to them. > > My suggestion is to implement a feature that takes 1 day to write and > needs little testing to show it works. Any patch in this area isn't likely to take much testing to establish whether it improves some particular case. The problem is what happens to all of the other cases - and I don't believe that part needs little testing, hence the objections (with which I agree) to doing anything about this now. If we want to change something in this area, we might consider resurrecting the patch I worked on for this last year, which had, I believe, a fairly similar mechanism of operation to what you're proposing, and some other nice properties as well: http://www.postgresql.org/message-id/aanlktik5qzr8wts0mqcwwmnp-qhgrdky5av5aob7w...@mail.gmail.com http://www.postgresql.org/message-id/aanlktimgkag7wdu-x77gnv2gh6_qo5ss1u5b6q1ms...@mail.gmail.com ...but I think the main reason why that never went anywhere is because we never really had any confidence that the upsides were worth the downsides. Fundamentally, postponing hint bit setting (or hint bit I/O) increases the total amount of work done by the system. You still end up writing the hint bits eventually, and in the meantime you do more CLOG lookups. Now, as a compensating benefit, you can spread the work of writing the hint-bit updated pages out over a longer period of time, so that no single query carries too much of the burden of getting the bits set. The worst-case-latency vs. aggregate-throughput tradeoff is one with a long history and I think it's appropriate to view this problem through that lens also. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On 30.12.2012 08:31, Daniel Farina wrote: A version implementing that is attached, except I generate an additional 64-bit session not exposed to the client to prevent even casual de-leaking of the session state. That may seem absurd, until someone writes a tool that de-xors things and relies on it and then nobody feels inclined to break it. It also keeps the public session number short. I also opted to save the underestimate since I'm adding a handful of fixed width fields to the file format anyway. This patch needs documentation. At a minimum, the new calls_underest field needs to be listed in the description of the pg_stat_statements. Pardon for not following the discussion: What exactly does the calls_underest field mean? I couldn't decipher it from the patch. What can an admin do with the value? How does it compare with just bumping up pg_stat_statements.max? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay
On 26 March 2013 01:35, Greg Stark wrote: > On Tue, Mar 26, 2013 at 12:00 AM, Simon Riggs wrote: >> I'll bet you all a beer at PgCon 2014 that this remains unresolved at >> that point. > > Are you saying you're only interested in working on it now? That after > 9.3 is release you won't be interested in working on it any more? > > As you said we've been eyeing this particular logic since 2004, why > did it suddenly become more urgent now? Why didn't you work on it 9 > months ago at the beginning of the release cycle? I'm not sure why your comments are so confrontational here, but I don't think it helps much. I'm happy to buy you a beer too. As I explained clearly in my first post, this idea came about trying to improve on the negative aspects of the checksum patch. People were working on ideas 9 months ago to resolve this, but they have come to nothing. I regret that; Merlin and others have worked hard to find a way: Respect to them. My suggestion is to implement a feature that takes 1 day to write and needs little testing to show it works. I'm happy to pursue that path now, or later. Deciding we need an all-singing, all-dancing solution that will take our best men (another) 6 months of hard arguing and implementation is by far the best way I know of killing anything and I won't be pursuing that route. If we did have 6 months funding for any-feature-you-like, I wouldn't spend it all on this. My bet that nobody else will have enough patience, time and skill, let alone unpaid leave to follow that path, stands. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] adding support for zero-attribute unique/etc keys
Darren Duncan wrote: >> The standard defines UNIQUE on the basis of the "UNIQUE predicate": >> ::= UNIQUE >> and states: >> 1) Let T be the result of the . >> 2) If there are no two rows in T such that the value of each column >>in one row is non-null and is not distinct >>from the value of the corresponding column in the other row, >>then the result of the is >>*True*; otherwise, the result of the is *False*. >> >> Since an imagined zero-column query would have an empty set of >> result columns, you could with equal force argue that these columns >> satisfy the condition or not, because the members of the empty >> set have all the properties you desire. >> >> So I see no compelling argument that such a UNIQUE constraint >> would force a single-row table. > > I do see that compelling argument, and it has to do with identities. > > The above definition of "UNIQUE predicate" says that the UNIQUE predicate is > FALSE iff, for every pair of rows in T, the 2 rows of any pair are the same. I don't understand that sentence. I would say that it is FALSE iff there exist two rows in T that satisy: a) each column in both rows is not-null b) each column in one of the rows is not distinct from the corresponding column in the other row > Further, 2 rows are the same iff, for every corresponding column, the values > in > both rows are the same. Further, 2 such values are the same iff they are both > not null and are mutually not distinct. > > So, determining if 2 rows are the same involves an iteration of dyadic logical > AND over the predicates for each column comparison. Now logical AND has an > identity value, which is TRUE, because "TRUE AND p" (and "p AND TRUE") results > in "p" for all "p". Therefore, any 2 rows with zero columns each are the > same. > > Since any 2 rows with zero columns are the same, the "UNIQUE predicate" is > FALSE > any time there is more than 1 row in a table. > > Hence, a UNIQUE constraint over zero columns signifies a row-comparison > predicate that unconditionally results in TRUE, and so no two rows at all > would > be allowed in the table with that constraint at once, thus restricting the > table > to at most one row. > > Does anyone agree or disagree with this logic? Yes :^) You could use the same kind of argument like this: UNIQUE is true iff any two rows in T satisfy for each column: the column in row 1 is null OR the column in row 2 is null OR the column in row 1 is distinct from the column in row 2 Now you you iterate your logical AND over this predicate for all columns and come up with TRUE since there are none. Consequently UNIQUE is satisfied, no matter how many rows there are. In a nutshell: All members of the empty set satisfy p, but also: all members of the empty set satisfy the negation of p. You can use this technique to make anything plausible. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.
On 26.03.2013 02:02, Tom Lane wrote: Heikki Linnakangas writes: On 25.03.2013 15:36, Tom Lane wrote: Heikki Linnakangas writes: Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt. Per warning from -Wmissing-format-attribute. Hm, this is exactly what I removed yesterday, because it makes the build fail outright on old gcc: The attached seems to work. With this patch, on_exit_msg_func() is gone. There's a different implementation of exit_horribly for pg_dumpall and pg_dump/restore. In pg_dumpall, it just calls vwrite_msg(). In pg_dump/restore's version, the logic from parallel_exit_msg_func() is moved directly to exit_horribly(). Seems probably reasonable, though if we're taking exit_horribly out of dumputils.c, meseems it ought not be declared in dumputils.h anymore. Can we put that declaration someplace else, rather than commenting it with an apology? Ugh, the patch I posted doesn't actually work, because dumputils.c is also used in psql and some scripts, so you get a linker error in those. psql and scripts don't use exit_horribly or many of the other functions in dumputils.c, so I think we should split dumputils.c into two parts anyway. fmtId and the other functions that are used by psql in one file, and the functions that are only shared between pg_dumpall and pg_dump in another. Then there's also functions that are used by pg_dump and pg_restore, but not pg_dumpall or psql. I'll try moving things around a bit... - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Ideas for improving Concurrency Tests
Ideas for improving Concurrency testing 1. Synchronization points in server code - To have better control for concurrency testing, define synchronization points in server code which can be used as follows: heap_truncate(..) { SYNC_POINT(procid,'before_heap_open') rel = heap_open(rid, AccessExclusiveLock); relations = lappend(relations, rel); } exec_simple_query(..) { ... finish_xact_command(); SYNC_POINT(procid,'finish_xact_command') /* * If there were no parsetrees, return EmptyQueryResponse message. */ if (!parsetree_list) NullCommand(dest); ... } When code reaches at sync point it can either emit a signal or wait for a signal Signal A value of a shared memory variable that will be interpretted by different SYNC POINTS based on it's value. Emit a signal Assign the value (the signal) to the shared memory variable ("set a flag") and broadcast a global condition to wake those waiting for a signal. Wait for a signal Loop over waiting for the global condition until the global value matches the wait-for signal To activate Synchronization points appropriate actions can be set. For Example, SET SYNC_POINT = 'before_heap_open WAIT_FOR commit'; SET SYNC_POINT = 'after_finish_xact_command SIGNAL commit'; This above commands can activate the synchronization points named 'before_heap_open' and 'after_finish_xact_command'. session "s1" step s11 {SET SYNC_POINT = 'before_heap_open WAIT_FOR commit';} step s12 {Truncate tbl;} session "s2" step s21 {SET SYNC_POINT = 'after_finish_xact_command SIGNAL commit';} step s22 {Insert into tbl values(1);} The first activation requests the synchronization point to wait for another backend to emit the signal 'commit', and second activation requests the synchronization point to emit the signal 'commit', when the process's execution runs through the synchronization point. Above defined test will allow Truncate table to wait for Insert to finish 2. Enhance Isolation Framework - Currently, at most one step can be waiting at a time. Enhance Concurrency test framework (isolation tester) to make multiple sessions wait and then allow to release it serially. This might help in generating complex dead lock scenario's. Above ideas could be useful to improve concurrency testing and can also be helpful to generate test cases for some of the complicated bugs for which there is no direct test. This work is not a patch for 9.3, I just wanted an initial feedback. Feedback/Suggestions? Reference : http://dev.mysql.com/doc/internals/en/debug-sync-facility.html With Regards, Amit Kapila.