Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
> I reviewed > your patch. It seems to be in good shape, and worked as > expected. I > suppressed a compiler warning in the patch and cleaned up > whitespaces in it. > Patch attached. Thanks for the review! I saw that you also changed the writing: LogicalTapeWrite(state->tapeset, tapenum, tuple, HEAPTUPLESIZE); LogicalTapeWrite(state->tapeset, tapenum, tuple->t_data, tuple->t_len-HEAPTUPLESIZE); and the reading: tuple->t_len = tuplen - HEAPTUPLESIZE; if (LogicalTapeRead(state->tapeset, tapenum, &tuple->t_self, tuplen-sizeof(tuplen)) != tuplen-sizeof(tuplen)) elog(ERROR, "unexpected end of data"); into (writing): LogicalTapeWrite(state->tapeset, tapenum, tuple, tuple->t_len); (reading): if (LogicalTapeRead(state->tapeset, tapenum, &tuple->t_self, tuplen-sizeof(tuplen)) != tuplen-sizeof(tuplen)) Are we sure it's 100% equivalent? I remember I had issues with the fact that tuple->t_data wasn't at HEAPTUPLESIZE distance from tuple: http://osdir.com/ml/pgsql-hackers/2010-02/msg00744.html > I think we need some documentation for the change. The > only downside > of the feature is that sorted cluster requires twice disk > spaces of > the target table (temp space for disk sort and the result > table). > Could I ask you to write documentation about the new > behavior? > Also, code comments can be improved; especially we need > better > description than "copy&paste from > FormIndexDatum". I'll try to improve the comments and add doc changes (but my English will have to be double checked...) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bitmap indexes - performance
> In > principle a bitmap index scan should be significantly faster if the > index can > return the bitmap more or less "natively" rather than having > to construct > it. The problem I'm seeing is that even on a 20M rows table, doing a select * from t where c1=10 and c2=1 where c1 and c2 are low cardinality columns, leads to a *very* fast bitmap index scan, even with btree indexes (200ms per index on my PC). The rest of the time is spent in actually retrieving heap rows; and of course no index type is going to help with that. Now: if an index search on such a big table takes so little time, what kind of improvement are we trying to get? The btree indexes on c1 and c2 are about 340MB eaxh: maybe I'm experiencing some caching weirdness? Or it's normal that an index search on such a big table is that fast (again, not counting the heap scan step, which will be required no matter the index type)? I'll try to re-test it... > In particular, I recall some discussions about developing > a "streaming > API" whereby an index AM could return a bitmap page-by-page or > so, > rather than having to construct the whole thing in-memory > before > anything could happen. This would be a huge win for AND/OR > cases, > and even for a simple indexscan it would eliminate the existing > startup > cost penalty for a bitmap scan. Streaming like this would > also > eliminate the problem of having to lossify large bitmaps in order > to > not overrun memory. One of the improvements I was going to try was to avoid calling tid_set_bit (or whatever is the function, I don't remember now) for every row, and call something like tid_set_bits_in_page where a whole page was passed in: this would remove a lot of the hash_* calls that are made in each and every tid_set_bit call (now that's something btree can't do, but bitmap indexes can do "easily"). But I stopped before implementing it, because, as I said, I don't think the improvement would still be worth it (even calling tid_set_bit 1/20th of the needed times didn't help that much; we're still talking about going from 200ms to 180ms on a query that takes seconds to execute). But I'm going to give more "tested" numbers... Talking about bitmap indexes I don't think we should mention memory... I mean: bitmap indexes are supposed to be used on huge tables, and I don't think that 100MB (which holds a lot of rows in a tbm...) to spare as work_mem would be a big problem... As for the "startup cost": again, I wouldn't see that as a big improvement, as we're talking mostly OLAP scenarios, where most likely there will be some other "blocking" operator (group by, sort, sub select etc) that will "remove" any improvements in startup time... To sum up: IMHO nor improvements in memory usage nor in startup time would be good reasons to switch to bitmap indexes... but bulk index creation time (10 minutes to index what it takes 60 minutes with btree... and maybe more if tables are bigger) and (maybe) index disk space might be... but I'm not 100% convinced... I'm trying to find more docs that explain the "improvements" of bitmap indexes in other products... but most of what I've found talks about bitmapAND/OR which is something that is very cool, but that postgres already does even with btree indexes... or index creation time/size, which are, for the moment, the only things that I'm pretty confident the patch would actually provide. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] bitmap indexes - performance
Using as a starting point the old bitmap patch in: http://archives.postgresql.org/message-id/20081101000154.go27...@fune I re-applied and re-worked the patch to see what kind of improvements over btrees bitmaps actually provided. Using a 20M rows table of 10/100/1000 random values, I've found that: 1) bulk index creation time is roughly 6 times better 2) index size is 6-15 times smaller (depending on column cardinality) 3) there's almost no difference in query times (but I have to make more tests) 4) I can't say anything about the insertion performance, but I guess bitmap will perform way worse than btree Are these improvements (index creation time, index size) worth enough to keep on working on this? I mean: given that bitmaps don't give any benefits in query times, but only benefits related to disk size and bulk index creation times, and will have horrible performance for insertions/deletions: would this job be worthed? In case it is: I will try to clean up the patch and post it... As a side note: I guess that most of the bitmap indexes performance improvements in the SELECT area are already implemented in postgres in the bitmapand/or and bitmap scan stuff? I couldn't find any docs that say that bitmap indexes are faster for selects, unless of course they are ANDed/ORed together (which is something postgres already does for regular btree indexes) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] access method: are disk pages mandatory?
in bufpage.h: "all blocks written out by an access method must be disk pages" but in http://www.postgresql.org/docs/8.4/static/storage-page-layout.html "Actually, index access methods need not use this page format. All the existing index methods do use this basic format, but the data kept on index metapages usually doesn't follow the item layout rules." I'm not getting it: am I supposed to use the "disk page format" when writing an index access method, or it's just a "good practice" because it makes the handling easier? Given the docs it looks "recommended", but the comment on the code sounds more "mandatory". -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with GIT Repository
> Why are you cloning over http? Me too I've used http, since I'm behind a proxy and I couldn't find a "simple" way of having the git:// method working behind a proxy... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Git: Unable to get pack file
> I've re-run git repack on > it, please try again. At least that file is > accessible from here > now.. It worked, thank you very much -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Git: Unable to get pack file
Hi, I tried getting the source using: git clone http://git.postgresql.org/git/postgresql.git postgresql-git but after a while (252MB) I always get: [...] Getting pack 61e1395a5bdacda95de5432123a0f8124fff05e6 which contains 476418893d3a2f366f47dbe4ce6d7522cc427545 error: Unable to get pack file http://git.postgresql.org/git/postgresql.git/objects/pack/pack-61e1395a5bdacda95de5432123a0f8124fff05e6.pack couldn't connect to host error: Unable to find 476418893d3a2f366f47dbe4ce6d7522cc427545 under http://git.postgresql.org/git/postgresql.git Cannot obtain needed blob 476418893d3a2f366f47dbe4ce6d7522cc427545 while processing commit ee6ddc7575ab1ce16d8f4eb2cdbcc525d43a6437. fatal: Fetch failed. I'm behind proxy/firewall, so I used the "http://"; protocol. wget http://git.postgresql.org/git/postgresql.git/objects/pack/pack-61e1395a5bdacda95de5432123a0f8124fff05e6.pack works... What am I doing wrong? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [FWD] About "Our CLUSTER implementation is pessimal" patch
> Yes. There's not going to be any more commitfests for this release, so > the next commitfest is for 9.1. Perfect! Where could I find such information? I mean: how could I know it? > (don't worry about the lack of enthusiasm for the patch, people are just > very busy with 9.0 and don't have the energy to think about 9.1 material > at this point) I understand! Since it's going to be posted in a CommitFest, I'll try to clean up the patch a little. Thank you Greg and Heikki. Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [FWD] About "Our CLUSTER implementation is pessimal" patch
> As outlined in the "Submission timing" section, you're > asking about something during the wrong time to be doing so--that's why > you're > not getting any real feedback. Add your patch to the next CommitFest by > linking > to your message at https://commitfest.postgresql.org/ Ok! But there's something I don't understand: I didn't add the patch to the next CommitFest because I thought it could never be added in 9.0 (because it adds a new "feature" which has never been discussed). Hence I thought it should have been "discussed" (not properly "reviewed") out of a CommitFest. The "Submission timing" section talks about "beta phase", not "alpha phase", so I'm stll confused... In other words: should patches that won't be included in the next release (because it's too late) still added to the next CommitFest? I thought a very "rough" discussion was the way to go in these cases, but I'm not familiar at all with the process... I'll wait for an answer before adding the patch to the CommitFest (and in case, I'll add more comments and docs to it) Thank you very much! Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [FWD] About "Our CLUSTER implementation is pessimal" patch
I really thought this would have caused some interest, since - this item is in the TODO list - the improvement for CLUSTER in some scenarios is 800%, and maybe more (if I didn't do anything wrong, of course...) Could at least the message: http://archives.postgresql.org/pgsql-hackers/2010-02/msg00766.php be added to the TODO page, under "Improve CLUSTER performance by sorting to reduce random I/O" ? It would be sad if the patch got lost... Leonardo > Attached the updated patch (should solve a bug) and a script. > The sql scripts generates a 2M rows table ("orig"); then the > table is copied and the copy clustered using seq + sort (since > "set enable_seqscan=false;"). > Then the table "orig" is copied again, and the copy clustered > using regular index scan (set enable_indexscan=true; set > enable_seqscan=false). > Then the same thing is done on a 5M rows table, and on a 10M > rows table. > > On my system (Sol10 on a dual Opteron 2.8) single disc: > > > 2M: seq+sort 11secs; regular index scan: 33secs > 5M: seq+sort 39secs; regular index scan: 105secs > 10M:seq+sort 83secs; regular index scan: 646secs > > > Maybe someone could suggest a better/different test? > > > Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
>Perhaps you could supply a .sql file containing a testcase > illustrating the performance benefits you tested with your patch Sure. Attached the updated patch (should solve a bug) and a script. The sql scripts generates a 2M rows table ("orig"); then the table is copied and the copy clustered using seq + sort (since "set enable_seqscan=false;"). Then the table "orig" is copied again, and the copy clustered using regular index scan (set enable_indexscan=true; set enable_seqscan=false). Then the same thing is done on a 5M rows table, and on a 10M rows table. On my system (Sol10 on a dual Opteron 2.8) single disc: 2M: seq+sort 11secs; regular index scan: 33secs 5M: seq+sort 39secs; regular index scan: 105secs 10M:seq+sort 83secs; regular index scan: 646secs Maybe someone could suggest a better/different test? Leonardo sorted_cluster20100210.patch Description: Binary data cluster_tests.sql Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
I think I've found the problem: tuple->t_data wasn't at HEAPTUPLESIZE distance from tuple. I guess the code makes that assumption somewhere, so I had to do: tuple->t_data = (HeapTupleHeader) ((char *) tuple + HEAPTUPLESIZE); Now that test works! Hope I don't find any more problems... Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
> I think you're confusing HeapTuple and HeapTupleHeader. SortTuple->tuple > field should point to a HeapTupleHeader, not a HeapTuple. Mmh, I don't get it: that would mean I might be using more info than required, but I still don't understand why it's not working... at the end, CLUSTER is going to need full HeapTuple(s) (if I'm not mistaken). SortTuple->tuple can be any of MinimalTuple, IndexTuple, even a Datum. So I added my own read/write_rawtuple (I'm not that good, they were in the original patch and I copy&pasted them). I can write and read those tuples (using some Asserts I think everything goes as expected in write/readtup_rawheap). But when calling tuplesort_getrawtuple, after some "right" tuples, the call to tuplesort_gettuple_common fails because the lines: /* pull next preread tuple from list, insert in heap */ newtup = &state->memtuples[tupIndex]; give a wrong initialized "newtup" (in other words, newtup is random memory). Since write/readtup_rawheap seems fine, I can't understand what's going on... I write the HeapTuples with: (in writetup_rawheap): HeapTupletuple = (HeapTuple) stup->tuple; tuple->t_len += HEAPTUPLESIZE; /* write out the header as well */ LogicalTapeWrite(state->tapeset, tapenum, tuple, HEAPTUPLESIZE); LogicalTapeWrite(state->tapeset, tapenum, tuple->t_data, tuple->t_len-HEAPTUPLESIZE); then I read them with: (in readtup_rawheap): HeapTupletuple = (HeapTuple) palloc(tuplen); USEMEM(state, GetMemoryChunkSpace(tuple)); tuple->t_len = tuplen - HEAPTUPLESIZE; if (LogicalTapeRead(state->tapeset, tapenum, &tuple->t_self, HEAPTUPLESIZE-sizeof(tuplen)) != HEAPTUPLESIZE-sizeof(tuplen)) elog(ERROR, "unexpected end of data"); if (LogicalTapeRead(state->tapeset, tapenum, tuple->t_data, tuple->t_len) != tuple->t_len) elog(ERROR, "unexpected end of data"); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
Hi all, while testing the seq scan + sort CLUSTER implementation, I've found a bug in write/readtup_rawheap. The functions are needed by the sort part. The code in http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php didn't copy the whole tuple, but only the HeapTuple "header": the t_data part wasn't copied (at least, that's my understanding), The original code is at the bottom of the email. It didn't work (the table wasn't fully clustered at the end of the CLUSTER command). So I modified write/readtup_rawheap: they are supposed to store/retrieve both the "fixed" part of HeapTupleData, plus the "variable" part t_data. But a get a lot of: WARNING: problem in alloc set TupleSort: detected write past chunk end in block 0x96853f0, chunk 0x968723c WARNING: problem in alloc set TupleSort: detected write past chunk end in block 0x96853f0, chunk 0x96872c8 warnings when calling "tuplesort_end" and some of the data gets garbled after the CLUSTER command. What am I doing wrong? Using the debugger, data coming out of readtup_rawheap seems fine... I *really* need your help here... static void writetup_rawheap(Tuplesortstate *state, int tapenum, SortTuple *stup) { HeapTupletuple = (HeapTuple) stup->tuple; tuple->t_len += sizeof(HeapTupleData); /* write out the header as well */ LogicalTapeWrite(state->tapeset, tapenum, tuple, sizeof(HeapTupleData)); LogicalTapeWrite(state->tapeset, tapenum, tuple->t_data, tuple->t_len-sizeof(HeapTupleData)); if (state->randomAccess)/* need trailing length word? */ LogicalTapeWrite(state->tapeset, tapenum, tuple, sizeof(tuple->t_len)); FREEMEM(state, GetMemoryChunkSpace(tuple)); heap_freetuple(tuple); } static void readtup_rawheap(Tuplesortstate *state, SortTuple *stup, int tapenum, unsigned int tuplen) { HeapTupletuple = (HeapTuple) palloc(sizeof(HeapTupleData)); tuple->t_data = (HeapTupleHeader) palloc(tuplen-sizeof(HeapTupleData)); USEMEM(state, GetMemoryChunkSpace(tuple)); tuple->t_len = tuplen - sizeof(HeapTupleData); if (LogicalTapeRead(state->tapeset, tapenum, &tuple->t_self, sizeof(HeapTupleData)-sizeof(tuplen)) != sizeof(HeapTupleData)-sizeof(tuplen)) elog(ERROR, "unexpected end of data"); if (LogicalTapeRead(state->tapeset, tapenum, tuple->t_data, tuple->t_len) != tuple->t_len) elog(ERROR, "unexpected end of data"); if (state->randomAccess)/* need trailing length word? */ if (LogicalTapeRead(state->tapeset, tapenum, &tuplen, sizeof(tuplen)) != sizeof(tuplen)) elog(ERROR, "unexpected end of data"); stup->tuple = tuple; /* set up first-column key value */ if (state->indexInfo->ii_Expressions == NULL) { /* no expression index, just get the key datum value */ stup->datum1 = heap_getattr((HeapTuple) stup->tuple, state->indexInfo->ii_KeyAttrNumbers[0], state->tupDesc, &stup->isnull1); } else { [...] expression index part, removed for clarity } } Original code: static void writetup_rawheap(Tuplesortstate *state, int tapenum, SortTuple *stup) { HeapTuple tuple = (HeapTuple) stup->tuple; tuple->t_len += HEAPTUPLESIZE; /* write out the header as well */ LogicalTapeWrite(state->tapeset, tapenum, tuple, tuple->t_len); if (state->randomAccess)/* need trailing length word? */ LogicalTapeWrite(state->tapeset, tapenum, tuple, sizeof(tuple->t_len)); FREEMEM(state, GetMemoryChunkSpace(tuple)); heap_freetuple(tuple); } static void readtup_rawheap(Tuplesortstate *state, SortTuple *stup, int tapenum, unsigned int tuplen) { HeapTuple tuple = (HeapTuple) palloc(tuplen); USEMEM(state, GetMemoryChunkSpace(tuple)); tuple->t_len = tuplen - HEAPTUPLESIZE; if (LogicalTapeRead(state->tapeset, tapenum, &tuple->t_self, tuplen-sizeof(tuplen)) != tuplen-sizeof(tuplen)) elog(ERROR, "unexpected end of data"); if (state->randomAccess)/* need trailing length word? */ if (LogicalTapeRead(state->tapeset, tapenum, &tuplen, sizeof(tuplen)) != sizeof(tuplen)) elog(ERROR, "unexpected end of data"); stup->tuple = tuple; /* set up first-column key value */ stup->datum1 = heap_getattr(tuple, state->scanKeys[0].sk_attno, state->tupDesc, &stup->isnull1); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
Not even a comment? As I said, performance results on my system were very good > I know you're all very busy getting 9.0 out, but I think the results in > heap scanning + sort instead of index scanning for CLUSTER are > very good... I would like to know if I did something wrong/I should > improve something in the patch... I haven't tested it with index > expressions yet (but the tests in "make check" work). > > Thanks > > Leonardo > > > > Hi all, > > > > attached a patch to do seq scan + sorting instead of index scan > > > > on CLUSTER (when that's supposed to be faster). > > > > As I've already said, the patch is based on: > > http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php > > > > Of course, the code isn't supposed to be ready to be merged: I > > would like to write more comments and add some test cases to > > cluster.sql (plus change all the things you are going to tell me I > > have to change...) > > > > I would like your opinions on code correctness and the decisions > > I took, especially: > > > > 1) function names ("cost_index_scan_vs_seqscansort" I guess > > it's awful...) > > > > 2) the fact that I put in Tuplesortstate an EState variable, so that > > MakeSingleTupleTableSlot wouldn't have to be called for every > > row in the expression indexes case > > > > 3) the expression index case is not "optimized": I preferred to > > call FormIndexDatum once for the first key value in > > copytup_rawheap and another time to get all the remaining values > > in comparetup_rawheap. I liked the idea of re-using > > FormIndexDatum in that case, instead of copying&pasting only > > the relevant code: but FormIndexDatum returns all the values, > > > > even when I might need only the first one > > > > > > 4) I refactored the code to deform and rewrite tuple into the function > > "deform_and_rewrite_tuple", because now that part can be called > > by the regular index scan or by the new seq-scan + sort (but I > > could copy&paste those lines instead of refactoring them into a new > > function) > > > > Suggestions and comments are not just welcome, but needed! sorted_cluster.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] IndexBuildHeapScan and RIDs order
Hi, I was looking at the code for bitmap index: http://archives.postgresql.org/pgsql-hackers/2008-10/msg01691.php and I couldn't understand why during "bmbuild" (the ambuild call for bitmap indexes) the code checks for not-ordered ItemPointerData(s). In which cases the ItemPointerData(s) given by IndexBuildHeapScan are not in order (when allow_sync=false)? Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
I know you're all very busy getting 9.0 out, but I think the results in heap scanning + sort instead of index scanning for CLUSTER are very good... I would like to know if I did something wrong/I should improve something in the patch... I haven't tested it with index expressions yet (but the tests in "make check" work). Thanks Leonardo > Hi all, > > attached a patch to do seq scan + sorting instead of index scan > > on CLUSTER (when that's supposed to be faster). > > As I've already said, the patch is based on: > http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php > > Of course, the code isn't supposed to be ready to be merged: I > would like to write more comments and add some test cases to > cluster.sql (plus change all the things you are going to tell me I > have to change...) > > I would like your opinions on code correctness and the decisions > I took, especially: > > 1) function names ("cost_index_scan_vs_seqscansort" I guess > it's awful...) > > 2) the fact that I put in Tuplesortstate an EState variable, so that > MakeSingleTupleTableSlot wouldn't have to be called for every > row in the expression indexes case > > 3) the expression index case is not "optimized": I preferred to > call FormIndexDatum once for the first key value in > copytup_rawheap and another time to get all the remaining values > in comparetup_rawheap. I liked the idea of re-using > FormIndexDatum in that case, instead of copying&pasting only > the relevant code: but FormIndexDatum returns all the values, > > even when I might need only the first one > > > 4) I refactored the code to deform and rewrite tuple into the function > "deform_and_rewrite_tuple", because now that part can be called > by the regular index scan or by the new seq-scan + sort (but I > could copy&paste those lines instead of refactoring them into a new > function) > > Suggestions and comments are not just welcome, but needed! sorted_cluster.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
Hi all, attached a patch to do seq scan + sorting instead of index scan on CLUSTER (when that's supposed to be faster). As I've already said, the patch is based on: http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php Of course, the code isn't supposed to be ready to be merged: I would like to write more comments and add some test cases to cluster.sql (plus change all the things you are going to tell me I have to change...) I would like your opinions on code correctness and the decisions I took, especially: 1) function names ("cost_index_scan_vs_seqscansort" I guess it's awful...) 2) the fact that I put in Tuplesortstate an EState variable, so that MakeSingleTupleTableSlot wouldn't have to be called for every row in the expression indexes case 3) the expression index case is not "optimized": I preferred to call FormIndexDatum once for the first key value in copytup_rawheap and another time to get all the remaining values in comparetup_rawheap. I liked the idea of re-using FormIndexDatum in that case, instead of copying&pasting only the relevant code: but FormIndexDatum returns all the values, even when I might need only the first one 4) I refactored the code to deform and rewrite tuple into the function "deform_and_rewrite_tuple", because now that part can be called by the regular index scan or by the new seq-scan + sort (but I could copy&paste those lines instead of refactoring them into a new function) Suggestions and comments are not just welcome, but needed! Leonardo sorted_cluster.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
> Consider multi-column indexes, ie: > CREATE INDEX i_foo ON foo (length(a), length(b)); Ok, I've never thought of expression indexes that way (in the (expr1,expr2,exprN) form): that is a good example. > Maybe you're confusing expression indexes with partial indexes? No no, that was exactly what I needed to know. Thank you very much Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
> If we ever get another index type that supports ordered > scans, it'll be time enough to worry about cases like this. Ok > BTW, I think you could use tuplesort_begin_index_btree() rather than > touching _bt_mkscankey_nodata directly. well I created my own tuplesort_begin_rawheap method (copied from: http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php ). From there I call _bt_mkscankey_nodata (as tuplesort_begin_index_btree() does), plus I set up everything else I'm going to need in tuplesort. Another question: why is IndexInfo.ii_Expressions a list? How can an index have more than one expression? Sorry if it's a stupid question, but I'm not familiar with index expressions. I think I'm almost there (some very stupid tests pass). I'll try to submit a patch soon to understand if I'm going in the right direction. Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
> Rule it out. Note you should be looking at pg_am.amcanorder, not > hardwiring knowledge of particular index types. Sorry, I replied "ok" too fast... I can look at pg_am.amcanorder, but I would still need the ScanKey to be used by tuplesort; and I can't find any other way of doing it than calling _bt_mkscankey_nodata, which is btree-specific. I guess either: - add another function to the list of "Index Access Method Functions", something that returns the ScanKey in case pg_am.amcanorder is true or - hardwiring the fact that the only way to seq scan + sort in CLUSTER is using a btree... hence the call to _bt_mkscankey_nodata But maybe there's another way of doing it, I don't know the code enough Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
> Note you should be looking at pg_am.amcanorder, not > hardwiring knowledge of particular index types. Ok. Would it make sense to use FormIndexDatum to get the index value to be used by tuplesort? I'm having trouble avoiding the call to _bt_mkscankey_nodata to get the scanKeys... that relates to btree only, I can't find the "general" function... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
So, if I'm not mistaken: hash indexes -> can't be used in CLUSTER gin indexes -> can't be used in CLUSTER that leaves: btree -> ok expression btree -> I have to find a way to compute the expression for each tuple: hints? gist -> how can I get something "comparable" by tuplesort? Or should I rule it out from the seq scan + sort path? Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
> > So my proposal would be: do the test seq_scan vs sort/index_scan only for > > regular btree index, and integrate that test in the planner. > > Keep in mind that this patch was after the deadline for 9.0, so there > is probably not a huge rush to get this done. That's true; I'll try to get the whole thing done then... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
> Well, the expression cases would be more likely to cost more if > implemented as a sort, but that doesn't mean that a sort couldn't be a > win. Besides, even if you blow off the expression case, what about > nulls first/last, nondefault opclasses, etc? Ok, let's split the problem in 2 parts: a) "choosing the right path" b) "using seq scan + sort in case the planner (somehow) said it's faster" You're right: for a) nondefault opclasses and other things would make the SPI version "wrong". So let's stick for the moment with the cost_sort create_index_path etc calls for a). I think that the calls to create_index_path would also deal with every other possible index type (something that's impossible with the SPI version). For b): I'm using the code in http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php That doesn't deal with expression indexes, nor with anything that is not a btree index. But it's much faster on what people use the most (and not slower on anything else). So my proposal would be: do the test seq_scan vs sort/index_scan only for regular btree index, and integrate that test in the planner. That doesn't mean that sometime in the future we're not going to have full support for all index types in seq scan + sort CLUSTER... but I would like to have something that works faster on what people use, and not slower in the other cases without waiting ages to have the "whole" thing... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
> > I meant to add only ASC/DESC; I would leave all other cases > > (non-btrees, custom expression btrees) to use the old index-scan method. > > That hardly seems acceptable. Well I brought up that in an earlier post: http://old.nabble.com/Re%3A-About-%22Our-CLUSTER-implementation-is-pessimal%22-patch-p27179107.html I hoped that since people mostly (>95%?) use plain btree indexes, a patch that helped CLUSTER with using such indexes would be fine (at least at first...). I guess that a patch that deals with all other types of indexes would be way more complicated (not at the "planning" stage, but in the scan+sort phase)? > I hardly think that keeping yourself at arm's length from the planner > by getting cozy with SPI internals is a net improvement in modularity. So you think that code snippet that I sent earlier (the function that uses create_index_path etc) could be put in planner.c (almost) as it is? It looked clumsy to me (I liked the SPI code better) Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
> By the time you make this actually work in all cases, it's probably > going to be more of a mess than the other way; I meant to add only ASC/DESC; I would leave all other cases (non-btrees, custom expression btrees) to use the old index-scan method. > not to mention that it > doesn't work *at all* without violating SPI internals. You lost me there... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
>one idea could be to actually prepare a query using SPI for "select * from >table order by " and then peek inside > to see which plan was generated. I like that!!! Here's a first attempt, it looks like it's working... (I still have to skip non-btree indexes and expression indexes, plus add a ASC/DESC to the select) static bool use_index_scan(Relation OldHeap, Oid indexOid) { HeapTupleindexTuple; Form_pg_index indexForm; struct _SPI_plan *spiPlan; char st[(NAMEDATALEN+1)*INDEX_MAX_KEYS+NAMEDATALEN+100]; int i; TupleDescoldTupDesc; bool retval = true; PlannedStmt *plan; CachedPlanSource *cps; oldTupDesc = RelationGetDescr(OldHeap); sprintf(st, "select * from %s order by ", OldHeap->rd_rel->relname.data); indexTuple = SearchSysCache(INDEXRELID, ObjectIdGetDatum(indexOid),0, 0, 0); if (!HeapTupleIsValid(indexTuple)) elog(ERROR, "cache lookup failed for index %u", indexOid); indexForm = (Form_pg_index) GETSTRUCT(indexTuple); for (i = 0; i < indexForm->indnatts; i++) { strcat(st, oldTupDesc->attrs[indexForm->indkey.values[i]-1]->attname.data); if (i+1 < indexForm->indnatts) { strcat(st, ","); } } ReleaseSysCache(indexTuple); if (SPI_connect() != SPI_OK_CONNECT) return false; spiPlan = SPI_prepare(st, 0, NULL); if (spiPlan == NULL) { SPI_finish(); return false; } cps = (CachedPlanSource*)(list_head(spiPlan->plancache_list)->data.ptr_value); plan = (PlannedStmt*)(list_head(cps->plan->stmt_list)->data.ptr_value); if (IsA(plan->planTree, Sort)) { retval = false; } SPI_freeplan(spiPlan); SPI_finish(); return retval; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
> * Do we need to disable sort-path for tables clustered on a gist index? Yes; as I said in a previous mail, only plain btree indexes (that is, not custom expression indexes) would have that option (at least in a first version...) > * I'd prefer to separate cost calculation routines from create_index_path() >and cost_sort(), rather than using a dummy planner. How? Copy&paste (removing unnecessary code) of the existing create_index_path() and cost_sort()? Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
Anyone? I'd like some feedback before moving on to do the seq scan + sort in those CLUSTER cases where "use_index_scan" returns false... - Messaggio originale - > Da: Leonardo F > A: pgsql-hackers@postgresql.org > Inviato: Mer 20 gennaio 2010, 18:48:00 > Oggetto: Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch > > > I read the thread "Our CLUSTER implementation is pessimal" > > http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php . > > > > I was going to try to add the proper cost_index/cost_sort calls to decide > which > > "path" should be executed, as in: > > > > http://archives.postgresql.org/pgsql-hackers/2008-09/msg00517.php > > I think I got something up and running to check if a table scan + sort is > supposed > to be faster than an index scan for a certain CLUSTER operation. > > The way I did it is (I guess...) wrong: I created the elements needed by > get_relation_info, create_seqscan_path, create_index_path, cost_sort. > > It has been, obviously, a trial and error approach: I added the member values > as > soon as one function call crashed... and I bet I didn't get all the corner > cases. > Is there any better way of doing it? > > Leonardo > > (this is called in copy_heap_data to decide which path to choose:) > > static bool use_index_scan(Oid tableOid, Oid indexOid) > { > RelOptInfo *rel; > PlannerInfo *root; > Query *query; > PlannerGlobal *glob; > Path *seqAndSortPath; > IndexPath *indexPath; > RangeTblEntry *rte; > > rel = makeNode(RelOptInfo); > rel->reloptkind = RELOPT_BASEREL; > rel->relid = 1; > rel->rtekind = RTE_RELATION; > > /* needed by get_relation_info */ > glob = makeNode(PlannerGlobal); > > /* needed by get_relation_info: */ > query = makeNode(Query); > query->resultRelation = 0; > > root = makeNode(PlannerInfo); > > root->parse = query; > root->glob = glob; > > get_relation_info(root, tableOid, false, rel); > seqAndSortPath = create_seqscan_path(NULL, rel); > > rel->rows = rel->tuples; > > rte = makeNode(RangeTblEntry); > rte->rtekind = RTE_RELATION; > rte->relid = tableOid; > > root->simple_rel_array_size = 2; > root->simple_rte_array = (RangeTblEntry **) > palloc0(root->simple_rel_array_size * sizeof(RangeTblEntry *)); > root->simple_rte_array[1] = rte; > > root->total_table_pages = rel->pages; > > indexPath = create_index_path(root, > (IndexOptInfo*)(list_head(rel->indexlist)->data.ptr_value), NULL, NULL, > ForwardScanDirection, NULL); > cost_sort(seqAndSortPath, root, NULL, seqAndSortPath->total_cost, > rel->tuples, > rel->width, -1); > > return indexPath->path.total_cost < seqAndSortPath->total_cost; > } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
> I read the thread "Our CLUSTER implementation is pessimal" > http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php . > > I would like to try/integrate that patch as we use CLUSTER a lot on our > system. > > I was going to try to add the proper cost_index/cost_sort calls to decide > which > "path" should be executed, as in: > > http://archives.postgresql.org/pgsql-hackers/2008-09/msg00517.php I think I got something up and running to check if a table scan + sort is supposed to be faster than an index scan for a certain CLUSTER operation. The way I did it is (I guess...) wrong: I created the elements needed by get_relation_info, create_seqscan_path, create_index_path, cost_sort. It has been, obviously, a trial and error approach: I added the member values as soon as one function call crashed... and I bet I didn't get all the corner cases. Is there any better way of doing it? Leonardo (this is called in copy_heap_data to decide which path to choose:) static bool use_index_scan(Oid tableOid, Oid indexOid) { RelOptInfo *rel; PlannerInfo *root; Query *query; PlannerGlobal *glob; Path *seqAndSortPath; IndexPath *indexPath; RangeTblEntry *rte; rel = makeNode(RelOptInfo); rel->reloptkind = RELOPT_BASEREL; rel->relid = 1; rel->rtekind = RTE_RELATION; /* needed by get_relation_info */ glob = makeNode(PlannerGlobal); /* needed by get_relation_info: */ query = makeNode(Query); query->resultRelation = 0; root = makeNode(PlannerInfo); root->parse = query; root->glob = glob; get_relation_info(root, tableOid, false, rel); seqAndSortPath = create_seqscan_path(NULL, rel); rel->rows = rel->tuples; rte = makeNode(RangeTblEntry); rte->rtekind = RTE_RELATION; rte->relid = tableOid; root->simple_rel_array_size = 2; root->simple_rte_array = (RangeTblEntry **) palloc0(root->simple_rel_array_size * sizeof(RangeTblEntry *)); root->simple_rte_array[1] = rte; root->total_table_pages = rel->pages; indexPath = create_index_path(root, (IndexOptInfo*)(list_head(rel->indexlist)->data.ptr_value), NULL, NULL, ForwardScanDirection, NULL); cost_sort(seqAndSortPath, root, NULL, seqAndSortPath->total_cost, rel->tuples, rel->width, -1); return indexPath->path.total_cost < seqAndSortPath->total_cost; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values
> All issues addressed, with one tiny nit-pick -- the get_bit and > set_bit methods are not part of the SQL standard. Damn! I completely forgot to mention that I had no idea if what I wrote in the docs made any sense... Well thank you for your thorough review. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values
New version of the patch, let me know if I can fix/change something else. Leonardo getsetbit.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values
> In the > documentation, the get_bit and set_bit methods are added to a list > where we state "The following SQL-standard functions work on bit > strings as well as character strings"; however they are not > SQL-standard functions and are implemented on binary strings, not > character strings. Ok, I didn't change that, but I guess I can fix it. > The tests don't include any examples where the bit > string lines up with a byte boundary or is operating on the first or > last bit of a byte, and the overlay function is not tested with > values which change the length of the bit string. (All of these > cases seem to work in the current version, but seem like the sort of > thing which could get broken in revision.) Ok, I'll add some corner-cases tests. > There is a peculiar omission from the patch -- it adds supportfor the > overlay function to bit strings but not binary strings. Sincethe > standard drops the bit string as a standard type in the 2003standard, > in favor of boolean type and binary string types, it seemsodd to omit > binary string support (which is defined in the standard)but include > bit string support (which isn't). What the patch adds seemsuseful, > and I don't think the omission should be considered fatal, butit > would be nice to see us also cover binary strings if we can. Mmh, don't know how complicated that would be, but I can give it a try: I guess it could be a simple SQL replacement (as for char strings and bit strings?) > Well, there was one thing which is beyond my ken > in this regard: I'm not sure that the cases from bits8 to (unsigned > char *) are needed or appropriate, although on my machine they don't > seem to hurt. Perhaps someone with a better grasp of such issues can > comment. That's some kind of copy&paste from varlena.c byteaGetBit: I don't know if they can be avoided. > I'm not sure the leading whitespace in pg_proc.h is as it should be. Do you mean the tab in: _null_(tab)"select ... ? Yes, I think I should remove it. > I'd prefer to see the comments for get_bit and set_bit mention that > the location is specified left-to-right in a zero-based fashion > consistent with the other get_bit and set_bit functions, but > inconsistent with the standard substring, position, overlay > functions. Personally, I find it unfortunate that our extensions > introduced this inconsistency, but let's keep it consistent within > the similarly named functions. Ok; I found them bizarre too, but I kept it consistent. > It seems odd to me that you specify the bit to set as a 32 bit > integer, and that that is what get_bit returns, but again, this is > consistent with what we do for the bytea functions, so it's probably > the right thing to match that. Same as above: I tried to be consistent. > Using a ERRCODE_ARRAY_SUBSCRIPT_ERROR for a bit position which is > out-of-bounds seems somewhat bizarre to me -- I'd probably have > chosen ERRCODE_INVALID_PARAMETER_VALUE in a green field -- but again, > it matches the bytea implementation. Again, same as above: consistency; but I can change it if you want > This last point is probably more a matter of C coding style than > anything, and I'm way to rusty in C to want to be the final arbiter > of something like this, but it seems to me that oldByte and newByte > local variables are just cluttering things up rather than clarifying: > > intoldByte, > newByte; > [...] > oldByte = ((unsigned char *) r)[byteNo]; > > if (newBit == 0) > newByte = oldByte & (~(1 << bitNo)); > else > newByte = oldByte | (1 << bitNo); > > ((unsigned char *) r)[byteNo] = newByte; > > vs > > if (newBit == 0) > ((unsigned char *) r)[byteNo] &= (~(1 << bitNo)); > else > ((unsigned char *) r)[byteNo] |= (1 << bitNo); > I agree, you version is cleaner. Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values
> This patch no longer applies. Could you rebase it? Done (I think). Added a couple of simple tests for bit overlay. I didn't include the catversion.h changes: obviously the CATALOG_VERSION_NO has to be changed. Leonardo Index: src/backend/utils/adt/varbit.c === RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/varbit.c,v retrieving revision 1.63 diff -c -r1.63 varbit.c *** src/backend/utils/adt/varbit.c 7 Jan 2010 20:17:43 - 1.63 --- src/backend/utils/adt/varbit.c 18 Jan 2010 09:05:50 - *** *** 1606,1608 --- 1606,1704 } PG_RETURN_INT32(0); } + + + /* bitsetbit + * Given an instance of type 'bit' creates a new one with + * the Nth bit set to the given value. + */ + Datum + bitsetbit(PG_FUNCTION_ARGS) + { + VarBit *arg1 = PG_GETARG_VARBIT_P(0); + int32 n = PG_GETARG_INT32(1); + int32 newBit = PG_GETARG_INT32(2); + VarBit *result; + int len, + bitlen; + bits8 *r, + *p; + int byteNo, + bitNo; + int oldByte, + newByte; + + bitlen = VARBITLEN(arg1); + if (n < 0 || n >= bitlen) + ereport(ERROR, + (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR), +errmsg("index %d out of valid range, 0..%d", + n, bitlen - 1))); + /* +* sanity check! +*/ + if (newBit != 0 && newBit != 1) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), +errmsg("new bit must be 0 or 1"))); + + len = VARSIZE(arg1); + result = (VarBit *) palloc(len); + SET_VARSIZE(result, len); + VARBITLEN(result) = bitlen; + + p = VARBITS(arg1); + r = VARBITS(result); + + memcpy(r, p, VARBITBYTES(arg1)); + + byteNo = n / BITS_PER_BYTE; + bitNo = BITS_PER_BYTE - 1 - (n % BITS_PER_BYTE); + + /* +* Update the byte. +*/ + oldByte = ((unsigned char *) r)[byteNo]; + + if (newBit == 0) + newByte = oldByte & (~(1 << bitNo)); + else + newByte = oldByte | (1 << bitNo); + + ((unsigned char *) r)[byteNo] = newByte; + + PG_RETURN_VARBIT_P(result); + } + + /* bitgetbit + * returns the value of the Nth bit (0 or 1). + */ + Datum + bitgetbit(PG_FUNCTION_ARGS) + { + VarBit *arg1 = PG_GETARG_VARBIT_P(0); + int32 n = PG_GETARG_INT32(1); + int bitlen; + int byteNo, + bitNo; + int byte; + + bitlen = VARBITLEN(arg1); + + if (n < 0 || n >= bitlen) + ereport(ERROR, + (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR), +errmsg("index %d out of valid range, 0..%d", + n, bitlen - 1))); + + byteNo = n / BITS_PER_BYTE; + bitNo = BITS_PER_BYTE - 1 - n % BITS_PER_BYTE; + + byte = ((unsigned char *) VARBITS(arg1))[byteNo]; + + if (byte &(1 << bitNo)) + PG_RETURN_INT32(1); + else + PG_RETURN_INT32(0); + } + Index: src/include/catalog/pg_proc.h === RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_proc.h,v retrieving revision 1.562 diff -c -r1.562 pg_proc.h *** src/include/catalog/pg_proc.h 15 Jan 2010 09:19:07 - 1.562 --- src/include/catalog/pg_proc.h 18 Jan 2010 09:05:58 - *** *** 2405,2410 --- 2405,2418 DATA(insert OID = 1699 ( substring PGNSP PGUID 12 1 0 0 f f f t f i 2 0 1560 "1560 23" _null_ _null_ _null_ _null_ bitsubstr_no_len _null_ _null_ _null_ )); DESCR("return portion of bitstring"); + DATA(insert OID = 3030 ( overlay PGNSP PGUID 14 1 0 0 f f f t f i 4 0 1560 "1560 1560 23 23" _null_ _null_ _null_ _null_ "select pg_catalog.substring($1, 1, ($3 - 1)) || $2 || pg_catalog.substring($1, ($3 + $4))" _null_ _null_ _null_ )); + DESCR("substitute portion of bit string"); + DATA(insert OID = 3031 ( overlay PGNSP PGUID 14 1 0 0 f f f t f i 3 0 1560 "1560 1560 23" _null_ _null_ _null_ _null_"select pg_catalog.substring($1, 1, ($3 - 1)) || $2 || pg_catalog.substring($1, ($3 + pg_catalog.length($2)))" _null_ _null_ _null_ )); + DESCR("substitute portion of bit string"); + DATA(insert OID = 3032 ( get_bitPGNSP PGUID 12 1 0 0 f f f t f i 2 0 23 "1560 23" _null_ _null_ _null_ _null_ bitgetbit _null_ _null_ _null
Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
> Yeah, I think you could do that, I agree it feels better that way. > You'll still need new copytup and comparetup functions, though, to deal > with HeapTupleHeaders instead of MinimalTuples, or modify the existing > ones to handle both. You meant HeapTuple, not HeapTupleHeaders, right? Mmh, didn't think of those two functions; I might as well start with Gregory Stark's patch (that is: using HeapTuple) > And some way to indicate that you want to preserve > the visibility information when you create the tuplesort, maybe a new > parameter to tuplesort_begin_heap(). I guess that using Gregory Stark's patch there's no need for it, since it uses HeapTuples, right? A patch that: 1) uses always the old CLUSTER method for non-btree indexes and for expression indexes 2) add a whole set of new functions to tuplesort (as in Gregory Stark's patch) would be rejected "for sure"? Or can be thought as a "better than nothing, works in 90% cases" patch? Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] About "Our CLUSTER implementation is pessimal" patch
Hi, I read the thread "Our CLUSTER implementation is pessimal" http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php . I would like to try/integrate that patch as we use CLUSTER a lot on our system. I was going to try to add the proper cost_index/cost_sort calls to decide which "path" should be executed, as in: http://archives.postgresql.org/pgsql-hackers/2008-09/msg00517.php I don't think it will be easy without help... I'll ask here a lot I'm afraid... About that patch: 1) would it be possible to use the tuplesort_*tupleslot set of functions instead of writing new ones for HeapTuple? That is: is it that difficult/impossible/nonsense to construct TupleTableSlot from HeapTuple and use those? 2) The patch doesn't check "HeapTupleSatisfiesVacuum" before passing it to tuplesort_putrawtuple: would it be reasonable to check the "isdead" flag before calling tuplesort_putrawtuple for each tuple? Sorry if I talked nonsense... Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Allow substring/replace() to get/set bit values
> What we can do in the back branches is make the code treat any > negative value as meaning two-arg form. To throw an error we'd > need to refactor the pg_proc representation ... I was going to fix that myself, but I think it has just been done. How can I keep up with "who's doing what"? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Allow substring/replace() to get/set bit values
> Thanks! Please add your patch here: > > https://commitfest.postgresql.org/action/commitfest_view/open > Ok; but what about what I said about the difference between bit/string substring? That affects overlay behaviour for bit... I've even got "ERROR: invalid memory alloc request size 4244635647" with: SELECT substring(B'0001' from 5 for -2); (this is 8.4.2, windows version, not modified...) test=# SELECT substring(B'0001' from 1 for -1); substring -- 0001 (1 row) test=# SELECT substring('0001' from 1 for -1); ERROR: negative substring length not allowed -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patch: Allow substring/replace() to get/set bit values
Hi all, attached a patch that adds the following functions for bit string: - overlay - get_bit - set_bit Some info: 1) overlay is implemented as calls to substring; given the different way substring behaves when used with strings vs bit strings: test=# SELECT substring(B'0001' from 1 for -1); substring -- 0001 (1 row) test=# SELECT substring('0001' from 1 for -1); ERROR: negative substring length not allowed I don't think that this overlay implementation is what we want? Example: test=# SELECT overlay(B'' placing B'01' from 1 for 2); overlay - 0111 (1 row) (looks ok) test=# SELECT overlay(B'' placing B'01' from 0 for 2); overlay --- 0 (1 row) This happens because substring(bit, pos, -1) means substring(bit, pos, length(bit string)), and < -1 values for bit substring parameters are allowed: is this a bug in bit substring??? 2) I tried implementing bit_get and bit_set as calls to overlay/substring: DATA(insert OID = 3032 ( get_bit PGNSP PGUID 14 1 0 0 f f f t f i 2 0 23 "1560 23" _null_ _null_ _null_ _null_ "select (pg_catalog.substring($1, $2, 1))::int4" _null_ _null_ _null_ )); DESCR("get bit"); DATA(insert OID = 3033 ( set_bit PGNSP PGUID 14 1 0 0 f f f t f i 3 0 1560 "1560 23 23" _null_ _null_ _null_ _null_ "select pg_catalog.overlay($1, $3::bit, $2, 1)" _null_ _null_ _null_ )); DESCR("set bit"); but this doesn't give any check on the values provided: that the bit looked for is in fact in the right range, and that the bit in set_bit is in fact a bit (I don't like the idea of writing "select set_bit(B'01010111', B'1')" instead of "select set_bit(B'01010111', 1)" ). So I coded them in proper C internal functions. Leonardo patch_bit.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: I: [HACKERS] TODO: Allow substring/replace() to get/set bit values
> > Is anybody interested? Otherwise the entry could be removed from the TODO > list... > > Even if not, you can still submit a patch. There are a lot more users > of PG than there are people who read -hackers. Ok, I'll try and submit a patch. Thank you very much. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: I: [HACKERS] TODO: Allow substring/replace() to get/set bit values
> > To sum up: > > > > 1) a new function, "get_bit", that calls substring > > 2) a new function, "overlay", that replaces bits (starting at a certain > position) > > 3) a new function, "set_bit", that calls overlay > > That seems reasonable to me. Not sure what others think. Is anybody interested? Otherwise the entry could be removed from the TODO list... Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: I: [HACKERS] TODO: Allow substring/replace() to get/set bit values
> As you say, there's really no point in changing the internal > representation, and if you don't find replace() useful either, then > why are you even working on this at all? I would like a get_bit / set_bit for bit strings, as I find them useful. get_bit could be a simple call to substring, but there's no way of doing a set_bit on a bit string as far as I know. I don't like the "replace" syntax for bit strings since it won't give you the same functionality of set_bit, plus I don't really see how someone would want to look for a bit string and replace it with another bit string. But I see that someone might want to overlay a bit string with another (this is different from "replace" since you have to tell the position where the replacing would start, instead of looking for a bit string). To sum up: 1) a new function, "get_bit", that calls substring 2) a new function, "overlay", that replaces bits (starting at a certain position) 3) a new function, "set_bit", that calls overlay > Since the latest discussion > of this is more than five years old, it's unclear that anyone even > cares any more. It seems to me that making replace overlay a > substring of bits could be a reasonable thing to do, but if nobody > actually wants it, then the simplest thing to do is remove this from > the TODO and call it good. I understand: it would be both a useful feature to me and a way to start coding postgres. But, of course, if there's no interest, I'll pass... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: I: [HACKERS] TODO: Allow substring/replace() to get/set bit values
> You might want to search the archives (or the wiki history, or the CVS > history if it's been there since before we moved the TODO list to the > wiki) for discussion of why that item was added to the TODO in the > first place. I read the thread: http://archives.postgresql.org/pgsql-hackers/2004-02/msg00478.php 1) it is true that "getbit sounds a lot like what substring() does", but the same could be said for binary string substring/get_byte; so IMHO get/set_bit should be present for bit string 2) it is not very clear to me how "setbit could actually be handled by replace()" (maybe "overlay" style?) 3) since I'm looking at byte string get/set_bit to understand how that works, I'm having a hard time understanding why the bit indexes in get/set_bit are low-first based: select get_bit(E'\\376\\376'::bytea, s) as b,s from generate_series(0,15,1) as s b s 0 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 0 8 1 9 1 10 1 11 1 12 1 13 1 14 1 15 I understand this is the internal representation, but still: if someone asked me what the 8th bit in 11101110 is, I would have said 1, not 0 (assuming the first bit has index '0'). Actually, David Helgason's patch (http://archives.postgresql.org/pgsql-hackers/2004-01/msg00498.php) goes in this direction: note the bitNo = 7 - (n % 8); part. Using that algorithm would mean get/set_bit in bit string would behave differently from what they do in binary string (IMHO it's the binary string implementation that is "wrong"). Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
I: [HACKERS] TODO: Allow substring/replace() to get/set bit values
Re-reading the docs it looks like the only thing missing is get/set_bit for bit string. Substring is already implemented for bit string, and I don't really know if replace is useful at all. (sorry if the other mail came with a different sender name) Leonardo > I would like to work on "Allow substring/replace() to get/set bit values", > since > it looks like a simple task. > > The item is not marked as "easy" on the TODO though. Before proceding to a > discussion on how this functions should be implemented (I got from the > messages > on the mailing list that bit substring/replace functions should do it) I > would > like to know if it's a complicated task. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers