from:"Leonardo F"

Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-07-06 Thread Leonardo F

> I reviewed 
> your patch. It seems to be in good shape, and worked as
> expected. I 
> suppressed a compiler warning in the patch and cleaned up
> whitespaces in it. 
> Patch attached.


Thanks for the review!

I saw that you also changed the writing:

 LogicalTapeWrite(state->tapeset, tapenum,
 tuple, HEAPTUPLESIZE);
 LogicalTapeWrite(state->tapeset, tapenum, tuple->t_data, 
tuple->t_len-HEAPTUPLESIZE);


and the reading:

 tuple->t_len = tuplen - HEAPTUPLESIZE;
 if (LogicalTapeRead(state->tapeset, tapenum, &tuple->t_self, 
tuplen-sizeof(tuplen)) != tuplen-sizeof(tuplen))
  elog(ERROR, "unexpected end of data");



into (writing):

 LogicalTapeWrite(state->tapeset, tapenum,
tuple, tuple->t_len);


(reading):

if (LogicalTapeRead(state->tapeset, tapenum, &tuple->t_self, 
tuplen-sizeof(tuplen)) != tuplen-sizeof(tuplen))



Are we sure it's 100% equivalent?

I remember I had issues with the fact that tuple->t_data wasn't
at HEAPTUPLESIZE distance from tuple:

http://osdir.com/ml/pgsql-hackers/2010-02/msg00744.html


> I think we need some documentation for the change. The 
> only downside
> of the feature is that sorted cluster requires twice disk 
> spaces of
> the target table (temp space for disk sort and the result 
> table).
> Could I ask you to write documentation about the new 
> behavior?
> Also, code comments can be improved; especially we need 
> better
> description than "copy&paste from 
> FormIndexDatum".


I'll try to improve the comments and add doc changes (but my English
will have to be double checked...)





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bitmap indexes - performance

2010-07-01 Thread Leonardo F

> In 
> principle a bitmap index scan should be significantly faster if the
> index can 
> return the bitmap more or less "natively" rather than having
> to construct 
> it.  

The problem I'm seeing is that even on a 20M rows table, doing a

select * from t where c1=10 and c2=1

where c1 and c2 are low cardinality columns, leads to a *very*
fast bitmap index scan, even with btree indexes (200ms per index
on my PC).
The rest of the time is spent in actually retrieving heap rows; and
of course no index type is going to help with that.

Now: if an index search on such a big table takes so little time,
what kind of improvement are we trying to get?
The btree indexes on c1 and c2 are about 340MB eaxh: maybe
I'm experiencing some caching weirdness? Or it's normal that an
index search on such a big table is that fast (again, not counting
the heap scan step, which will be required no matter the index
type)? I'll try to re-test it...

> In particular, I recall some discussions about developing 
> a "streaming
> API" whereby an index AM could return a bitmap page-by-page or 
> so,
> rather than having to construct the whole thing in-memory 
> before
> anything could happen.  This would be a huge win for AND/OR 
> cases,
> and even for a simple indexscan it would eliminate the existing 
> startup
> cost penalty for a bitmap scan.  Streaming like this would 
> also
> eliminate the problem of having to lossify large bitmaps in order 
> to
> not overrun memory.

One of the improvements I was going to try was to avoid calling

tid_set_bit (or whatever is the function, I don't remember now) for
every row, and call something like tid_set_bits_in_page where
a whole page was passed in: this would remove a lot of the hash_*
calls that are made in each and every tid_set_bit call (now that's
something btree can't do, but bitmap indexes can do "easily").
But I stopped before implementing it, because, as I said, I don't
think the improvement would still be worth it (even calling 
tid_set_bit 1/20th of the needed times didn't help that much; we're
still talking about going from 200ms to 180ms on a query that
takes seconds to execute). But I'm going to give more "tested"
numbers...

Talking about bitmap indexes I don't think we should mention
memory... I mean: bitmap indexes are supposed to be used on
huge tables, and I don't think that 100MB (which holds a lot of
rows in a tbm...) to spare as work_mem would be a big problem... 
As for the "startup cost": again, I wouldn't see that as a big
improvement, as we're talking mostly OLAP scenarios, where
most likely there will be some other "blocking" operator (group by,
sort, sub select etc) that will "remove" any improvements in
startup time...

To sum up: IMHO nor improvements in memory usage nor 
in startup time would be good reasons to switch to bitmap
indexes... but bulk index creation time (10 minutes to index
what it takes 60 minutes with btree... and maybe more if tables
are bigger) and (maybe) index disk space might be...
but I'm not 100% convinced...

I'm trying to find more docs that explain the "improvements" of
bitmap indexes in other products... but most of what I've found
talks about bitmapAND/OR which is something that is very
cool, but that postgres already does even with btree indexes...
or index creation time/size, which are, for the moment, the only
things that I'm pretty confident the patch would actually provide.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] bitmap indexes - performance

2010-07-01 Thread Leonardo F

Using as a starting point the old bitmap patch in:

http://archives.postgresql.org/message-id/20081101000154.go27...@fune


I re-applied and re-worked the patch to see what kind of improvements over
btrees bitmaps actually provided.

Using a 20M rows table of 10/100/1000 random values, I've found that:

1) bulk index creation time is roughly 6 times better
2) index size is 6-15 times smaller (depending on column cardinality)
3) there's almost no difference in query times (but I have to make more
tests)
4) I can't say anything about the insertion performance, but I guess
bitmap will perform way worse than btree

Are these improvements (index creation time, index size) worth enough
to keep on working on this?

I mean: given that bitmaps don't give any benefits in query times, but
only benefits related to disk size and bulk index creation times, and
will have horrible performance for insertions/deletions: would this job be
worthed?

In case it is: I will try to clean up the patch and post it...


As a side note: I guess that most of the bitmap indexes performance 
improvements in the SELECT area are already implemented in postgres
in the bitmapand/or and bitmap scan stuff? I couldn't find any docs that
say that bitmap indexes are faster for selects, unless of course they
are ANDed/ORed together (which is something postgres already does
for regular btree indexes)




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] access method: are disk pages mandatory?

2010-06-23 Thread Leonardo F

in bufpage.h:

"all blocks written out by an access method must be disk pages"

but in 
http://www.postgresql.org/docs/8.4/static/storage-page-layout.html

"Actually, index access methods need not use this page format. All the 
existing index methods do use this basic format, but the data kept on
index metapages usually doesn't follow the item layout rules."

I'm not getting it: am I supposed to use the "disk page format" when
writing an index access method, or it's just a "good practice" because
it makes the handling easier? Given the docs it looks "recommended",
but the comment on the code sounds more "mandatory".




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error with GIT Repository

2010-06-11 Thread Leonardo F

> Why are you cloning over http? 

Me too I've used http, since I'm behind a proxy and I couldn't
find a "simple" way of having the git:// method working behind
a proxy...




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Git: Unable to get pack file

2010-06-09 Thread Leonardo F

> I've re-run git repack on 
> it, please try again. At least that file is
> accessible from here 
> now..


It worked, thank you very much





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Git: Unable to get pack file

2010-06-08 Thread Leonardo F

Hi,


I tried getting the source using:

git clone http://git.postgresql.org/git/postgresql.git postgresql-git

but after a while (252MB) I always get:

[...]
Getting pack 61e1395a5bdacda95de5432123a0f8124fff05e6
which contains 476418893d3a2f366f47dbe4ce6d7522cc427545
error: Unable to get pack file 
http://git.postgresql.org/git/postgresql.git/objects/pack/pack-61e1395a5bdacda95de5432123a0f8124fff05e6.pack
couldn't connect to host
error: Unable to find 476418893d3a2f366f47dbe4ce6d7522cc427545 under 
http://git.postgresql.org/git/postgresql.git
Cannot obtain needed blob 476418893d3a2f366f47dbe4ce6d7522cc427545
while processing commit ee6ddc7575ab1ce16d8f4eb2cdbcc525d43a6437.
fatal: Fetch failed.



I'm behind proxy/firewall, so I used the "http://"; protocol.

wget 
http://git.postgresql.org/git/postgresql.git/objects/pack/pack-61e1395a5bdacda95de5432123a0f8124fff05e6.pack

works... 

What am I doing wrong?




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [FWD] About "Our CLUSTER implementation is pessimal" patch

2010-02-15 Thread Leonardo F

> Yes. There's not going to be any more commitfests for this release, so
> the next commitfest is for 9.1.


Perfect! Where could I find such information? I mean: how could I know
it?

> (don't worry about the lack of enthusiasm for the patch, people are just
> very busy with 9.0 and don't have the energy to think about 9.1 material
> at this point)


I understand! Since it's going to be posted in a CommitFest, I'll try to clean
up the patch a little.

Thank you Greg and Heikki.


Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [FWD] About "Our CLUSTER implementation is pessimal" patch

2010-02-15 Thread Leonardo F

> As outlined in the "Submission timing" section, you're 
> asking about something during the wrong time to be doing so--that's why 
> you're 
> not getting any real feedback.  Add your patch to the next CommitFest by 
> linking 
> to your message at https://commitfest.postgresql.org/ 


Ok!
But there's something I don't understand: I didn't add the patch to the next
CommitFest because I thought it could never be added in 9.0 (because it adds a
new "feature" which has never been discussed). Hence I thought it should have
been "discussed" (not properly "reviewed") out of a CommitFest.
The "Submission timing" section talks about "beta phase", not "alpha phase", so
I'm stll confused...
In other words: should patches that won't be included in the next release
(because it's too late) still added to the next CommitFest? I thought a very 
"rough"
discussion was the way to go in these cases, but I'm not familiar at all with 
the
process... I'll wait for an answer before adding the patch to the CommitFest 
(and
in case, I'll add more comments and docs to it)

Thank you very much!


Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [FWD] About "Our CLUSTER implementation is pessimal" patch

2010-02-15 Thread Leonardo F

I really thought this would have caused some interest, since

- this item is in the TODO list
- the improvement for CLUSTER in some scenarios is 800%,
and maybe more (if I didn't do anything wrong, of course...)

Could at least the message:
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00766.php
be added to the TODO page, under 
"Improve CLUSTER performance by sorting to reduce
random I/O" ?
It would be sad if the patch got lost... 


Leonardo


> Attached the updated patch (should solve a bug) and a script.
> The sql scripts generates a 2M rows table ("orig"); then the
> table is copied and the copy clustered using seq + sort (since 
> "set enable_seqscan=false;").
> Then the table "orig" is copied again, and the copy clustered
> using regular index scan (set enable_indexscan=true; set 
> enable_seqscan=false).
> Then the same thing is done on a 5M rows table, and on a 10M
> rows table.
> 
> On my system (Sol10 on a dual Opteron 2.8) single disc:
> 
> 
> 2M:  seq+sort 11secs; regular index scan: 33secs
> 5M:  seq+sort 39secs; regular index scan: 105secs
> 10M:seq+sort 83secs; regular index scan: 646secs
> 
> 
> Maybe someone could suggest a better/different test?
> 
> 
> Leonardo





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-02-10 Thread Leonardo F

>Perhaps you could supply a .sql file containing a testcase 
> illustrating the performance benefits you tested with your patch

Sure.


Attached the updated patch (should solve a bug) and a script.
The sql scripts generates a 2M rows table ("orig"); then the
table is copied and the copy clustered using seq + sort (since 
"set enable_seqscan=false;").
Then the table "orig" is copied again, and the copy clustered
using regular index scan (set enable_indexscan=true; set 
enable_seqscan=false).
Then the same thing is done on a 5M rows table, and on a 10M
rows table.

On my system (Sol10 on a dual Opteron 2.8) single disc:


2M:  seq+sort 11secs; regular index scan: 33secs
5M:  seq+sort 39secs; regular index scan: 105secs
10M:seq+sort 83secs; regular index scan: 646secs


Maybe someone could suggest a better/different test?


Leonardo



  

sorted_cluster20100210.patch
Description: Binary data


cluster_tests.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-02-10 Thread Leonardo F

I think I've found the problem:

tuple->t_data wasn't at HEAPTUPLESIZE distance from tuple.
I guess the code makes that assumption somewhere, so I had
to do:

tuple->t_data = (HeapTupleHeader) ((char *) tuple + 
 HEAPTUPLESIZE);

Now that test works! Hope I don't find any more problems...



Leonardo





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-02-10 Thread Leonardo F

> I think you're confusing HeapTuple and HeapTupleHeader. SortTuple->tuple
> field should point to a HeapTupleHeader, not a HeapTuple.


Mmh, I don't get it: that would mean I might be using more info than required,
but I still don't understand why it's not working... at the end, CLUSTER is 
going
to need full HeapTuple(s) (if I'm not mistaken).
SortTuple->tuple can be any of MinimalTuple, IndexTuple, even a Datum.
So I added my own read/write_rawtuple (I'm not that good, they were in the
original patch and I copy&pasted them).

I can write and read those tuples (using some Asserts I think everything goes

as expected in write/readtup_rawheap). But when calling tuplesort_getrawtuple,
after some "right" tuples, the call to tuplesort_gettuple_common fails because
the lines:

/* pull next preread tuple from list, insert in heap */
newtup = &state->memtuples[tupIndex];


give a wrong initialized "newtup" (in other words, newtup is random memory).
Since write/readtup_rawheap seems fine, I can't understand what's going on...

I write the HeapTuples with:

(in writetup_rawheap):
HeapTupletuple = (HeapTuple) stup->tuple;
tuple->t_len += HEAPTUPLESIZE; /* write out the header as well */

LogicalTapeWrite(state->tapeset, tapenum,
  tuple, HEAPTUPLESIZE);
LogicalTapeWrite(state->tapeset, tapenum, tuple->t_data, 
   tuple->t_len-HEAPTUPLESIZE);



then I read them with:
(in readtup_rawheap):
HeapTupletuple = (HeapTuple) palloc(tuplen);
USEMEM(state, GetMemoryChunkSpace(tuple));

tuple->t_len = tuplen - HEAPTUPLESIZE;
if (LogicalTapeRead(state->tapeset, tapenum, &tuple->t_self, 
  HEAPTUPLESIZE-sizeof(tuplen)) != HEAPTUPLESIZE-sizeof(tuplen))
  elog(ERROR, "unexpected end of data");
if (LogicalTapeRead(state->tapeset, tapenum, tuple->t_data, tuple->t_len) != 
tuple->t_len)
  elog(ERROR, "unexpected end of data");




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-02-10 Thread Leonardo F

Hi all,


while testing the seq scan + sort CLUSTER implementation, I've found a
bug in write/readtup_rawheap.

The functions are needed by the sort part.
The code in 

http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php

didn't copy the whole tuple, but only the HeapTuple "header": the t_data
part wasn't copied (at least, that's my understanding), The original code
is at the bottom of the email. It didn't work (the table wasn't fully clustered
at the end of the CLUSTER command).

So I modified write/readtup_rawheap: they are supposed to store/retrieve
both the "fixed" part of HeapTupleData, plus the "variable" part t_data.
But a get a lot of:
WARNING:  problem in alloc set TupleSort: detected write past chunk end
  in block 0x96853f0, chunk 0x968723c
WARNING:  problem in alloc set TupleSort: detected write past chunk end 
  in block 0x96853f0, chunk 0x96872c8
warnings when calling "tuplesort_end"  and some of the data gets garbled
after the CLUSTER command.


What am I doing wrong? Using the debugger, data coming out of
readtup_rawheap seems fine...

I *really* need your help here...


static void
writetup_rawheap(Tuplesortstate *state, int tapenum, SortTuple *stup)
{
HeapTupletuple = (HeapTuple) stup->tuple;
tuple->t_len += sizeof(HeapTupleData); /* write out the header as well */

LogicalTapeWrite(state->tapeset, tapenum,
tuple, sizeof(HeapTupleData));
LogicalTapeWrite(state->tapeset, tapenum, tuple->t_data, 
 tuple->t_len-sizeof(HeapTupleData));
if (state->randomAccess)/* need trailing length word? */
  LogicalTapeWrite(state->tapeset, tapenum,
 tuple, sizeof(tuple->t_len));

FREEMEM(state, GetMemoryChunkSpace(tuple));
heap_freetuple(tuple);
}

static void
readtup_rawheap(Tuplesortstate *state, SortTuple *stup,
int tapenum, unsigned int tuplen)
{
HeapTupletuple = (HeapTuple) palloc(sizeof(HeapTupleData));
tuple->t_data = (HeapTupleHeader) palloc(tuplen-sizeof(HeapTupleData));

USEMEM(state, GetMemoryChunkSpace(tuple));

tuple->t_len = tuplen - sizeof(HeapTupleData);
if (LogicalTapeRead(state->tapeset, tapenum, &tuple->t_self, 
sizeof(HeapTupleData)-sizeof(tuplen))
   != sizeof(HeapTupleData)-sizeof(tuplen))
  elog(ERROR, "unexpected end of data");
if (LogicalTapeRead(state->tapeset, tapenum, tuple->t_data, tuple->t_len) != 
tuple->t_len)
  elog(ERROR, "unexpected end of data");
if (state->randomAccess)/* need trailing length word? */
 if (LogicalTapeRead(state->tapeset, tapenum, &tuplen,
   sizeof(tuplen)) != sizeof(tuplen))
  elog(ERROR, "unexpected end of data");

stup->tuple = tuple;
/* set up first-column key value */
if (state->indexInfo->ii_Expressions == NULL)
{
/* no expression index, just get the key datum value */
stup->datum1 = heap_getattr((HeapTuple) stup->tuple,
state->indexInfo->ii_KeyAttrNumbers[0],
state->tupDesc,
&stup->isnull1);
}
else
{
  [...] expression index part, removed for clarity 
}
}




Original code:

static void
writetup_rawheap(Tuplesortstate *state, int tapenum, SortTuple *stup)
{
HeapTuple   tuple = (HeapTuple) stup->tuple;
tuple->t_len += HEAPTUPLESIZE; /* write out the header as well */

LogicalTapeWrite(state->tapeset, tapenum,
tuple, tuple->t_len);

if (state->randomAccess)/* need trailing length word? */
 LogicalTapeWrite(state->tapeset, tapenum,
   tuple, sizeof(tuple->t_len));

FREEMEM(state, GetMemoryChunkSpace(tuple));
heap_freetuple(tuple);
}

static void
readtup_rawheap(Tuplesortstate *state, SortTuple *stup,
int tapenum, unsigned int tuplen)
{
HeapTuple   tuple = (HeapTuple) palloc(tuplen);

USEMEM(state, GetMemoryChunkSpace(tuple));

tuple->t_len = tuplen - HEAPTUPLESIZE;
if (LogicalTapeRead(state->tapeset, tapenum, &tuple->t_self, 
  tuplen-sizeof(tuplen)) != tuplen-sizeof(tuplen))
  elog(ERROR, "unexpected end of data");
if (state->randomAccess)/* need trailing length word? */
if (LogicalTapeRead(state->tapeset, tapenum, &tuplen,
sizeof(tuplen)) != sizeof(tuplen))
elog(ERROR, "unexpected end of data");

stup->tuple = tuple;
/* set up first-column key value */
stup->datum1 = heap_getattr(tuple,
state->scanKeys[0].sk_attno,
state->tupDesc,
&stup->isnull1);
}





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-02-09 Thread Leonardo F

Not even a comment? As I said, performance results on my system
were very good

> I know you're all very busy getting 9.0 out, but I think the results in
> heap scanning + sort instead of index scanning for CLUSTER are
> very good... I would like to know if I did something wrong/I should
> improve something in the patch... I haven't tested it with index
> expressions yet (but the tests in "make check" work).
> 
> Thanks
> 
> Leonardo
> 
> 
> > Hi all,
> > 
> > attached a patch to do seq scan + sorting instead of index scan 
> > 
> > on CLUSTER (when that's supposed to be faster).
> > 
> > As I've already said, the patch is based on: 
> > http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php
> > 
> > Of course, the code isn't supposed to be ready to be merged: I
> > would like to write more comments and add some test cases to
> > cluster.sql (plus change all the things you are going to tell me I
> > have to change...)
> > 
> > I would like your opinions on code correctness and the decisions
> > I took, especially:
> > 
> > 1) function names ("cost_index_scan_vs_seqscansort" I guess
> > it's awful...)
> > 
> > 2) the fact that I put in Tuplesortstate an EState variable, so that 
> > MakeSingleTupleTableSlot wouldn't have to be called for every
> > row in the expression indexes case
> > 
> > 3) the expression index case is not "optimized": I preferred to
> > call FormIndexDatum once for the first key value in 
> > copytup_rawheap and another time to get all the remaining values
> > in comparetup_rawheap. I liked the idea of re-using
> > FormIndexDatum in that case, instead of copying&pasting only
> > the relevant code: but FormIndexDatum returns all the values,
> > 
> > even when I might need only the first one
> > 
> > 
> > 4) I refactored the code to deform and rewrite tuple into the function
> > "deform_and_rewrite_tuple", because now that part can be called
> > by the regular index scan or by the new seq-scan + sort (but I
> > could copy&paste those lines instead of refactoring them into a new
> > function)
> > 
> > Suggestions and comments are not just welcome, but needed!



  

sorted_cluster.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] IndexBuildHeapScan and RIDs order

2010-02-08 Thread Leonardo F

Hi,

I was looking at the code for bitmap index:

http://archives.postgresql.org/pgsql-hackers/2008-10/msg01691.php

and I couldn't understand why during "bmbuild" (the ambuild call for
bitmap indexes) the code checks for not-ordered ItemPointerData(s).

In which cases the ItemPointerData(s) given by IndexBuildHeapScan

are not in order (when allow_sync=false)?



Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-02-01 Thread Leonardo F

I know you're all very busy getting 9.0 out, but I think the results in
heap scanning + sort instead of index scanning for CLUSTER are
very good... I would like to know if I did something wrong/I should
improve something in the patch... I haven't tested it with index
expressions yet (but the tests in "make check" work).

Thanks

Leonardo


> Hi all,
> 
> attached a patch to do seq scan + sorting instead of index scan 
> 
> on CLUSTER (when that's supposed to be faster).
> 
> As I've already said, the patch is based on: 
> http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php
> 
> Of course, the code isn't supposed to be ready to be merged: I
> would like to write more comments and add some test cases to
> cluster.sql (plus change all the things you are going to tell me I
> have to change...)
> 
> I would like your opinions on code correctness and the decisions
> I took, especially:
> 
> 1) function names ("cost_index_scan_vs_seqscansort" I guess
> it's awful...)
> 
> 2) the fact that I put in Tuplesortstate an EState variable, so that 
> MakeSingleTupleTableSlot wouldn't have to be called for every
> row in the expression indexes case
> 
> 3) the expression index case is not "optimized": I preferred to
> call FormIndexDatum once for the first key value in 
> copytup_rawheap and another time to get all the remaining values
> in comparetup_rawheap. I liked the idea of re-using
> FormIndexDatum in that case, instead of copying&pasting only
> the relevant code: but FormIndexDatum returns all the values,
> 
> even when I might need only the first one
> 
> 
> 4) I refactored the code to deform and rewrite tuple into the function
> "deform_and_rewrite_tuple", because now that part can be called
> by the regular index scan or by the new seq-scan + sort (but I
> could copy&paste those lines instead of refactoring them into a new
> function)
> 
> Suggestions and comments are not just welcome, but needed!


  

sorted_cluster.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-01-28 Thread Leonardo F

Hi all,

attached a patch to do seq scan + sorting instead of index scan 

on CLUSTER (when that's supposed to be faster).

As I've already said, the patch is based on: 
http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php

Of course, the code isn't supposed to be ready to be merged: I
would like to write more comments and add some test cases to
cluster.sql (plus change all the things you are going to tell me I
have to change...)

I would like your opinions on code correctness and the decisions
I took, especially:

1) function names ("cost_index_scan_vs_seqscansort" I guess
it's awful...)

2) the fact that I put in Tuplesortstate an EState variable, so that 
MakeSingleTupleTableSlot wouldn't have to be called for every
row in the expression indexes case

3) the expression index case is not "optimized": I preferred to
call FormIndexDatum once for the first key value in 
copytup_rawheap and another time to get all the remaining values
in comparetup_rawheap. I liked the idea of re-using
FormIndexDatum in that case, instead of copying&pasting only
the relevant code: but FormIndexDatum returns all the values,

even when I might need only the first one


4) I refactored the code to deform and rewrite tuple into the function
"deform_and_rewrite_tuple", because now that part can be called
by the regular index scan or by the new seq-scan + sort (but I
could copy&paste those lines instead of refactoring them into a new
function)

Suggestions and comments are not just welcome, but needed!


Leonardo


  

sorted_cluster.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-01-27 Thread Leonardo F

> Consider multi-column indexes, ie:
> CREATE INDEX i_foo ON foo (length(a), length(b));


Ok, I've never thought of expression indexes that way 
(in the (expr1,expr2,exprN) form): that is a good example.

> Maybe you're confusing expression indexes with partial indexes? 

No no, that was exactly what I needed to know.


Thank you very much

Leonardo





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-01-26 Thread Leonardo F

> If we ever get another index type that supports ordered
> scans, it'll be time enough to worry about cases like this.


Ok

> BTW, I think you could use tuplesort_begin_index_btree() rather than
> touching _bt_mkscankey_nodata directly.  


well I created my own tuplesort_begin_rawheap method (copied from:
http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php ).
From there I call _bt_mkscankey_nodata (as tuplesort_begin_index_btree()
does), plus I set up everything else I'm going to need in tuplesort.

Another question: 

why is IndexInfo.ii_Expressions a list? How can an index have more than
one expression? Sorry if it's a stupid question, but I'm not familiar with
index expressions.

I think I'm almost there (some very stupid tests pass). I'll try to submit a
patch soon to understand if I'm going in the right direction.



Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-01-25 Thread Leonardo F

> Rule it out.  Note you should be looking at pg_am.amcanorder, not
> hardwiring knowledge of particular index types. 

Sorry, I replied "ok" too fast...


I can look at pg_am.amcanorder, but I would still need the ScanKey to be used

by tuplesort; and I can't find any other way of doing it than calling 
_bt_mkscankey_nodata, which is btree-specific.

I guess either:

- add another function to the list of "Index Access Method Functions", something
that returns the ScanKey in case pg_am.amcanorder is true

or

- hardwiring the fact that the only way to seq scan + sort in CLUSTER is using
a btree... hence the call to _bt_mkscankey_nodata

But maybe there's another way of doing it, I don't know the code enough


Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-01-22 Thread Leonardo F

> Note you should be looking at pg_am.amcanorder, not
> hardwiring knowledge of particular index types.


Ok. 

Would it make sense to use

FormIndexDatum

to get the index value to be used by tuplesort?

I'm having trouble avoiding the call to _bt_mkscankey_nodata
to get the scanKeys... that relates to btree only, I can't find 
the "general" function...




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-01-22 Thread Leonardo F

So, if I'm not mistaken:


hash indexes -> can't be used in CLUSTER
gin indexes -> can't be used in CLUSTER

that leaves:

btree -> ok
expression btree -> I have to find a way to compute the expression for
  each tuple: hints?
gist -> how can I get something "comparable" by tuplesort? Or should I rule it
   out from the seq scan + sort path?



Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-01-22 Thread Leonardo F

> > So my proposal would be: do the test seq_scan vs sort/index_scan only for
> > regular btree index, and integrate that test in the planner.
> 
> Keep in mind that this patch was after the deadline for 9.0, so there
> is probably not a huge rush to get this done.


That's true; I'll try to get the whole thing done then... 




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-01-21 Thread Leonardo F

> Well, the expression cases would be more likely to cost more if
> implemented as a sort, but that doesn't mean that a sort couldn't be a
> win.  Besides, even if you blow off the expression case, what about
> nulls first/last, nondefault opclasses, etc?


Ok, let's split the problem in 2 parts: 

a) "choosing the right path"
b) "using seq scan + sort in case the planner (somehow) said it's faster"

You're right: for a) nondefault opclasses and other things would make the
SPI version "wrong". So let's stick for the moment with the cost_sort 
create_index_path etc calls for a). I think that the calls to create_index_path
would also deal with every other possible index type (something that's
impossible with the SPI version).

For b):

I'm using the code in 

http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php

That doesn't deal with expression indexes, nor with anything that is not
a btree index. But it's much faster on what people use the most (and not
slower on anything else).

So my proposal would be: do the test seq_scan vs sort/index_scan only for
regular btree index, and integrate that test in the planner.

That doesn't mean that sometime in the future we're not going to have full
support for all index types in seq scan + sort CLUSTER... but I would like
to have something that works faster on what people use, and not slower
in the other cases without waiting ages to have the "whole" thing...




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-01-21 Thread Leonardo F

> > I meant to add only ASC/DESC; I would leave all other cases
> > (non-btrees, custom expression btrees) to use the old index-scan method.
> 
> That hardly seems acceptable.


Well I brought up that in an earlier post:

http://old.nabble.com/Re%3A-About-%22Our-CLUSTER-implementation-is-pessimal%22-patch-p27179107.html

I hoped that since people mostly (>95%?) use plain btree indexes,
a patch that helped CLUSTER with using such indexes would be fine
(at least at first...). I guess that a patch that deals with all other types of
indexes would be way more complicated (not at the "planning" stage,
but in the scan+sort phase)?

> I hardly think that keeping yourself at arm's length from the planner
> by getting cozy with SPI internals is a net improvement in modularity.


So you think that code snippet that I sent earlier (the function that uses
create_index_path etc) could be put in planner.c (almost) as it is? It looked
clumsy to me (I liked the SPI code better)


Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-01-21 Thread Leonardo F

> By the time you make this actually work in all cases, it's probably
> going to be more of a mess than the other way; 

I meant to add only ASC/DESC; I would leave all other cases
(non-btrees, custom expression btrees) to use the old index-scan method.

> not to mention that it
> doesn't work *at all* without violating SPI internals.


You lost me there...




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-01-21 Thread Leonardo F

>one idea could be to actually prepare a query using SPI for "select * from 
>table order by " and then peek inside
> to see which plan was generated. 

I like that!!!
Here's a first attempt, it looks like it's working...
(I still have to skip non-btree indexes and expression indexes, plus
add a ASC/DESC to the select)

static bool use_index_scan(Relation OldHeap, Oid indexOid)
{
HeapTupleindexTuple;
Form_pg_index indexForm;
struct _SPI_plan *spiPlan;
char st[(NAMEDATALEN+1)*INDEX_MAX_KEYS+NAMEDATALEN+100];
int i;
TupleDescoldTupDesc;
bool retval = true;
PlannedStmt *plan;
CachedPlanSource *cps;

oldTupDesc = RelationGetDescr(OldHeap);

sprintf(st, "select * from %s order by ", OldHeap->rd_rel->relname.data);
indexTuple = SearchSysCache(INDEXRELID, ObjectIdGetDatum(indexOid),0, 0, 0);

if (!HeapTupleIsValid(indexTuple))
   elog(ERROR, "cache lookup failed for index %u", indexOid);

indexForm = (Form_pg_index) GETSTRUCT(indexTuple);

for (i = 0; i < indexForm->indnatts; i++)
{
  strcat(st, oldTupDesc->attrs[indexForm->indkey.values[i]-1]->attname.data);
  if (i+1 < indexForm->indnatts)
  {
   strcat(st, ",");
   }
}

ReleaseSysCache(indexTuple);

if (SPI_connect() != SPI_OK_CONNECT)
   return false;

spiPlan = SPI_prepare(st, 0, NULL);
if (spiPlan == NULL)
{
  SPI_finish(); 
  return false;
}

cps = (CachedPlanSource*)(list_head(spiPlan->plancache_list)->data.ptr_value);
plan = (PlannedStmt*)(list_head(cps->plan->stmt_list)->data.ptr_value);
if (IsA(plan->planTree, Sort))
{
   retval = false;
}

SPI_freeplan(spiPlan);
SPI_finish();

return retval;
}





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-01-21 Thread Leonardo F

> * Do we need to disable sort-path for tables clustered on a gist index?

Yes; as I said in a previous mail, only plain btree indexes (that is, not
custom expression indexes) would have that option (at least in a first
version...)

> * I'd prefer to separate cost calculation routines from create_index_path()
>and cost_sort(), rather than using a dummy planner.


How? Copy&paste (removing unnecessary code) of the existing 
create_index_path() and cost_sort()?


Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-01-21 Thread Leonardo F

Anyone? I'd like some feedback before moving on to do the seq scan + sort in 
those
CLUSTER cases where "use_index_scan" returns false...



- Messaggio originale -
> Da: Leonardo F 
> A: pgsql-hackers@postgresql.org
> Inviato: Mer 20 gennaio 2010, 18:48:00
> Oggetto: Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
> 
> > I read the thread "Our CLUSTER implementation is pessimal" 
> > http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php .
> > 
> > I was going to try to add the proper cost_index/cost_sort calls to decide 
> which 
> > "path" should be executed, as in:
> > 
> > http://archives.postgresql.org/pgsql-hackers/2008-09/msg00517.php
> 
> I think I got something up and running to check if a table scan + sort is 
> supposed
> to be faster than an index scan for a certain CLUSTER operation.
> 
> The way I did it is (I guess...) wrong: I created the elements needed by
> get_relation_info, create_seqscan_path, create_index_path, cost_sort.
> 
> It has been, obviously, a trial and error approach: I added the member values 
> as
> soon as one function call crashed... and I bet I didn't get all the corner 
> cases.
> Is there any better way of doing it?
> 
> Leonardo
> 
> (this is called in copy_heap_data to decide which path to choose:)
> 
> static bool use_index_scan(Oid tableOid, Oid indexOid)
> {
> RelOptInfo *rel;
> PlannerInfo *root;
> Query *query;
> PlannerGlobal *glob;
> Path *seqAndSortPath;
> IndexPath *indexPath;
> RangeTblEntry *rte;
> 
> rel = makeNode(RelOptInfo);
> rel->reloptkind = RELOPT_BASEREL;
> rel->relid = 1;
> rel->rtekind = RTE_RELATION;
> 
> /* needed by get_relation_info */
> glob = makeNode(PlannerGlobal);
> 
> /* needed by get_relation_info: */
> query = makeNode(Query);
> query->resultRelation = 0;
> 
> root = makeNode(PlannerInfo);
> 
> root->parse = query;
> root->glob = glob;
> 
> get_relation_info(root, tableOid, false, rel);
> seqAndSortPath = create_seqscan_path(NULL, rel);
> 
> rel->rows = rel->tuples;
> 
> rte = makeNode(RangeTblEntry);
> rte->rtekind = RTE_RELATION;
> rte->relid = tableOid;
> 
> root->simple_rel_array_size = 2;
> root->simple_rte_array = (RangeTblEntry **)
> palloc0(root->simple_rel_array_size * sizeof(RangeTblEntry *));
> root->simple_rte_array[1] = rte;
> 
> root->total_table_pages = rel->pages;
> 
> indexPath = create_index_path(root, 
> (IndexOptInfo*)(list_head(rel->indexlist)->data.ptr_value), NULL, NULL, 
> ForwardScanDirection, NULL);
> cost_sort(seqAndSortPath, root, NULL, seqAndSortPath->total_cost, 
> rel->tuples, 
> rel->width, -1);
> 
> return indexPath->path.total_cost < seqAndSortPath->total_cost;
> }




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-01-20 Thread Leonardo F

> I read the thread "Our CLUSTER implementation is pessimal" 
> http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php .
> 
> I would like to try/integrate that patch as we use CLUSTER a lot on our 
> system.
> 
> I was going to try to add the proper cost_index/cost_sort calls to decide 
> which 
> "path" should be executed, as in:
> 
> http://archives.postgresql.org/pgsql-hackers/2008-09/msg00517.php

I think I got something up and running to check if a table scan + sort is 
supposed
to be faster than an index scan for a certain CLUSTER operation.

The way I did it is (I guess...) wrong: I created the elements needed by
get_relation_info, create_seqscan_path, create_index_path, cost_sort.

It has been, obviously, a trial and error approach: I added the member values as
soon as one function call crashed... and I bet I didn't get all the corner 
cases.
Is there any better way of doing it?

Leonardo

(this is called in copy_heap_data to decide which path to choose:)

static bool use_index_scan(Oid tableOid, Oid indexOid)
{
RelOptInfo *rel;
PlannerInfo *root;
Query *query;
PlannerGlobal *glob;
Path *seqAndSortPath;
IndexPath *indexPath;
RangeTblEntry *rte;

rel = makeNode(RelOptInfo);
rel->reloptkind = RELOPT_BASEREL;
rel->relid = 1;
rel->rtekind = RTE_RELATION;

/* needed by get_relation_info */
glob = makeNode(PlannerGlobal);

/* needed by get_relation_info: */
query = makeNode(Query);
query->resultRelation = 0;

root = makeNode(PlannerInfo);

root->parse = query;
root->glob = glob;

get_relation_info(root, tableOid, false, rel);
seqAndSortPath = create_seqscan_path(NULL, rel);

rel->rows = rel->tuples;

rte = makeNode(RangeTblEntry);
rte->rtekind = RTE_RELATION;
rte->relid = tableOid;

root->simple_rel_array_size = 2;
root->simple_rte_array = (RangeTblEntry **)
palloc0(root->simple_rel_array_size * sizeof(RangeTblEntry *));
root->simple_rte_array[1] = rte;

root->total_table_pages = rel->pages;

indexPath = create_index_path(root, 
(IndexOptInfo*)(list_head(rel->indexlist)->data.ptr_value), NULL, NULL, 
ForwardScanDirection, NULL);
cost_sort(seqAndSortPath, root, NULL, seqAndSortPath->total_cost, rel->tuples, 
rel->width, -1);

return indexPath->path.total_cost < seqAndSortPath->total_cost;
}





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values

2010-01-20 Thread Leonardo F

> All issues addressed, with one tiny nit-pick -- the get_bit and
> set_bit methods are not part of the SQL standard. 


Damn! I completely forgot to mention that I had no idea if what I wrote
in the docs made any sense...

Well thank you for your thorough review.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values

2010-01-20 Thread Leonardo F

New version of the patch, let me know if I can fix/change something else.



Leonardo



  

getsetbit.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values

2010-01-19 Thread Leonardo F

> In the
> documentation, the get_bit and set_bit methods are added to a list
> where we state "The following SQL-standard functions work on bit
> strings as well as character strings"; however they are not
> SQL-standard functions and are implemented on binary strings, not
> character strings. 

Ok, I didn't change that, but I guess I can fix it.


> The tests don't include any examples where the bit
> string lines up with a byte boundary or is operating on the first or
> last bit of a byte, and the overlay function is not tested with
> values which change the length of the bit string. (All of these
> cases seem to work in the current version, but seem like the sort of
> thing which could get broken in revision.)


Ok, I'll add some corner-cases tests.

> There is a peculiar omission from the patch -- it adds supportfor the
> overlay function to bit strings but not binary strings. Sincethe
> standard drops the bit string as a standard type in the 2003standard,
> in favor of boolean type and binary string types, it seemsodd to omit
> binary string support (which is defined in the standard)but include
> bit string support (which isn't). What the patch adds seemsuseful,
> and I don't think the omission should be considered fatal, butit
> would be nice to see us also cover binary strings if we can.


Mmh, don't know how complicated that would be, but I can give it a try:
I guess it could be a simple SQL replacement (as for char strings and bit
strings?)

> Well, there was one thing which is beyond my ken
> in this regard: I'm not sure that the cases from bits8 to (unsigned
> char *) are needed or appropriate, although on my machine they don't
> seem to hurt.  Perhaps someone with a better grasp of such issues can
> comment.


That's some kind of copy&paste from varlena.c byteaGetBit: I don't know
if they can be avoided.

> I'm not sure the leading whitespace in pg_proc.h is as it should be.


Do you mean the tab in:
 
_null_(tab)"select ...

?

Yes, I think I should remove it.

> I'd prefer to see the comments for get_bit and set_bit mention that
> the location is specified left-to-right in a zero-based fashion
> consistent with the other get_bit and set_bit functions, but
> inconsistent with the standard substring, position, overlay
> functions.  Personally, I find it unfortunate that our extensions
> introduced this inconsistency, but let's keep it consistent within
> the similarly named functions.

Ok; I found them bizarre too, but I kept it consistent.

> It seems odd to me that you specify the bit to set as a 32 bit
> integer, and that that is what get_bit returns, but again, this is
> consistent with what we do for the bytea functions, so it's probably
> the right thing to match that.


Same as above: I tried to be consistent.

> Using a ERRCODE_ARRAY_SUBSCRIPT_ERROR for a bit position which is
> out-of-bounds seems somewhat bizarre to me -- I'd probably have
> chosen ERRCODE_INVALID_PARAMETER_VALUE in a green field -- but again,
> it matches the bytea implementation.


Again, same as above: consistency; but I can change it if you want

> This last point is probably more a matter of C coding style than
> anything, and I'm way to rusty in C to want to be the final arbiter
> of something like this, but it seems to me that oldByte and newByte
> local variables are just cluttering things up rather than clarifying:
> 
> intoldByte,
> newByte;
> [...]
> oldByte = ((unsigned char *) r)[byteNo];
> 
> if (newBit == 0)
> newByte = oldByte & (~(1 << bitNo));
> else
> newByte = oldByte | (1 << bitNo);
> 
> ((unsigned char *) r)[byteNo] = newByte;
> 
> vs
> 
> if (newBit == 0)
> ((unsigned char *) r)[byteNo] &= (~(1 << bitNo));
> else
> ((unsigned char *) r)[byteNo] |= (1 << bitNo);
> 


I agree, you version is cleaner.


Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values

2010-01-18 Thread Leonardo F

> This patch no longer applies.  Could you rebase it?


Done (I think). Added a couple of simple tests for bit overlay.

I didn't include the catversion.h changes: obviously the CATALOG_VERSION_NO has 
to be changed.


Leonardo


  Index: src/backend/utils/adt/varbit.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/varbit.c,v
retrieving revision 1.63
diff -c -r1.63 varbit.c
*** src/backend/utils/adt/varbit.c  7 Jan 2010 20:17:43 -   1.63
--- src/backend/utils/adt/varbit.c  18 Jan 2010 09:05:50 -
***
*** 1606,1608 
--- 1606,1704 
}
PG_RETURN_INT32(0);
  }
+ 
+ 
+ /* bitsetbit
+  * Given an instance of type 'bit' creates a new one with
+  * the Nth bit set to the given value.
+  */
+ Datum
+ bitsetbit(PG_FUNCTION_ARGS)
+ {
+   VarBit *arg1 = PG_GETARG_VARBIT_P(0);
+   int32   n = PG_GETARG_INT32(1);
+   int32   newBit = PG_GETARG_INT32(2);
+   VarBit *result;
+   int len,
+   bitlen;
+   bits8  *r,
+  *p;
+   int byteNo,
+   bitNo;
+   int oldByte,
+   newByte;
+ 
+   bitlen = VARBITLEN(arg1);
+   if (n < 0 || n >= bitlen)
+   ereport(ERROR,
+   (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+errmsg("index %d out of valid range, 0..%d",
+   n, bitlen - 1)));
+   /*
+* sanity check!
+*/
+   if (newBit != 0 && newBit != 1)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("new bit must be 0 or 1")));
+ 
+   len = VARSIZE(arg1);
+   result = (VarBit *) palloc(len);
+   SET_VARSIZE(result, len);
+   VARBITLEN(result) = bitlen;
+ 
+   p = VARBITS(arg1);
+   r = VARBITS(result);
+ 
+   memcpy(r, p, VARBITBYTES(arg1));
+ 
+   byteNo = n / BITS_PER_BYTE;
+   bitNo = BITS_PER_BYTE - 1 - (n % BITS_PER_BYTE);
+ 
+   /*
+* Update the byte.
+*/
+   oldByte = ((unsigned char *) r)[byteNo];
+ 
+   if (newBit == 0)
+   newByte = oldByte & (~(1 << bitNo));
+   else
+   newByte = oldByte | (1 << bitNo);
+ 
+   ((unsigned char *) r)[byteNo] = newByte;
+ 
+   PG_RETURN_VARBIT_P(result);
+ }
+ 
+ /* bitgetbit
+  * returns the value of the Nth bit (0 or 1).
+  */
+ Datum
+ bitgetbit(PG_FUNCTION_ARGS)
+ {
+   VarBit *arg1 = PG_GETARG_VARBIT_P(0);
+   int32   n = PG_GETARG_INT32(1);
+   int bitlen;
+   int byteNo,
+   bitNo;
+   int byte;
+ 
+   bitlen = VARBITLEN(arg1);
+ 
+   if (n < 0 || n >= bitlen)
+   ereport(ERROR,
+   (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+errmsg("index %d out of valid range, 0..%d",
+   n, bitlen - 1)));
+ 
+   byteNo = n / BITS_PER_BYTE;
+   bitNo = BITS_PER_BYTE - 1 - n % BITS_PER_BYTE;
+ 
+   byte = ((unsigned char *) VARBITS(arg1))[byteNo];
+ 
+   if (byte &(1 << bitNo))
+   PG_RETURN_INT32(1);
+   else
+   PG_RETURN_INT32(0);
+ }
+ 
Index: src/include/catalog/pg_proc.h
===
RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_proc.h,v
retrieving revision 1.562
diff -c -r1.562 pg_proc.h
*** src/include/catalog/pg_proc.h   15 Jan 2010 09:19:07 -  1.562
--- src/include/catalog/pg_proc.h   18 Jan 2010 09:05:58 -
***
*** 2405,2410 
--- 2405,2418 
  DATA(insert OID = 1699 (  substring   PGNSP PGUID 12 1 0 0 f 
f f t f i 2 0 1560 "1560 23" _null_ _null_ _null_ _null_ bitsubstr_no_len 
_null_ _null_ _null_ ));
  DESCR("return portion of bitstring");
  
+ DATA(insert OID = 3030 (  overlay PGNSP PGUID 14 1 0 0 f 
f f t f i 4 0 1560 "1560 1560 23 23" _null_ _null_ _null_ _null_ "select 
pg_catalog.substring($1, 1, ($3 - 1)) || $2 || pg_catalog.substring($1, ($3 + 
$4))" _null_ _null_ _null_ ));
+ DESCR("substitute portion of bit string");
+ DATA(insert OID = 3031 (  overlay PGNSP PGUID 14 1 0 0 f 
f f t f i 3 0 1560 "1560 1560 23" _null_ _null_ _null_ _null_"select 
pg_catalog.substring($1, 1, ($3 - 1)) || $2 || pg_catalog.substring($1, ($3 + 
pg_catalog.length($2)))" _null_ _null_ _null_ ));
+ DESCR("substitute portion of bit string");
+ DATA(insert OID = 3032 (  get_bitPGNSP PGUID 12 1 0 0 f f f t 
f i 2 0 23 "1560 23" _null_ _null_ _null_ _null_ bitgetbit _null_ _null_ _null

Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-01-15 Thread Leonardo F

> Yeah, I think you could do that, I agree it feels better that way.
> You'll still need new copytup and comparetup functions, though, to deal
> with HeapTupleHeaders instead of MinimalTuples, or modify the existing
> ones to handle both. 

You meant HeapTuple, not HeapTupleHeaders, right?

Mmh, didn't think of those two functions; I might as well start with Gregory
Stark's patch (that is: using HeapTuple)

> And some way to indicate that you want to preserve
> the visibility information when you create the tuplesort, maybe a new
> parameter to tuplesort_begin_heap().

I guess that using Gregory Stark's patch there's no need for it, since it uses
HeapTuples, right?

A patch that:

1) uses always the old CLUSTER method for non-btree indexes and for
expression indexes
2) add a whole set of new functions to tuplesort (as in Gregory Stark's patch)

would be rejected "for sure"? Or can be thought as a "better than nothing,
works in 90% cases" patch?


Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-01-15 Thread Leonardo F

Hi,

I read the thread "Our CLUSTER implementation is pessimal" 
http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php .

I would like to try/integrate that patch as we use CLUSTER a lot on our system.

I was going to try to add the proper cost_index/cost_sort calls to decide which 
"path" should be executed, as in:

http://archives.postgresql.org/pgsql-hackers/2008-09/msg00517.php

I don't think it will be easy without help... I'll ask here a lot I'm afraid...

About that patch:

1) would it be possible to use the tuplesort_*tupleslot set of functions 
instead of writing new ones for HeapTuple? That is: is it that 
difficult/impossible/nonsense to construct TupleTableSlot from HeapTuple and 
use those?

2) The patch doesn't check "HeapTupleSatisfiesVacuum" before passing it to 
tuplesort_putrawtuple: would it be reasonable to check the "isdead" flag before 
calling tuplesort_putrawtuple for each tuple?


Sorry if I talked nonsense...



Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Allow substring/replace() to get/set bit values

2010-01-08 Thread Leonardo F

> What we can do in the back branches is make the code treat any
> negative value as meaning two-arg form.  To throw an error we'd
> need to refactor the pg_proc representation ...


I was going to fix that myself, but I think it has just been done.

How can I keep up with "who's doing what"?




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Allow substring/replace() to get/set bit values

2010-01-07 Thread Leonardo F

> Thanks!  Please add your patch here:
> 
> https://commitfest.postgresql.org/action/commitfest_view/open
> 


Ok; but what about what I said about the difference between bit/string 
substring?
That affects overlay behaviour for bit...


I've even got 

"ERROR:  invalid memory alloc request size 4244635647" 

with:


SELECT substring(B'0001' from 5 for -2);

(this is 8.4.2, windows version, not modified...)



test=# SELECT substring(B'0001' from 1 for -1);
substring 
--
0001
(1 row)


test=# SELECT substring('0001' from 1 for -1);
ERROR:  negative substring length not allowed




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Patch: Allow substring/replace() to get/set bit values

2010-01-07 Thread Leonardo F

Hi all,


attached a patch that adds the following functions for bit string:

- overlay
- get_bit
- set_bit


Some info:

1) overlay is implemented as calls to substring; given the different way 
substring behaves when used with strings vs bit strings:

test=# SELECT substring(B'0001' from 1 for -1);
substring 
--
0001
(1 row)

test=# SELECT substring('0001' from 1 for -1);
ERROR:  negative substring length not allowed



I don't think that this overlay implementation is what we want?

Example:

test=# SELECT overlay(B'' placing B'01' from 1 for 2);
overlay 
-
0111
(1 row)


(looks ok)

test=# SELECT overlay(B'' placing B'01' from 0 for 2);
overlay 
---
0
(1 row)



This happens because substring(bit, pos, -1) means substring(bit, pos, 
length(bit string)),
and < -1 values for bit substring parameters are allowed: is this a bug in bit 
substring???

2) I tried implementing bit_get and bit_set as calls to overlay/substring:

DATA(insert OID = 3032 (  get_bit  PGNSP PGUID 14 1 0 0 f f f t 
f i 2 0 23 "1560 23" _null_ _null_ _null_ _null_ "select 
(pg_catalog.substring($1, $2, 1))::int4" _null_ _null_ _null_ ));
DESCR("get bit");
DATA(insert OID = 3033 (  set_bit  PGNSP PGUID 14 1 0 0 f f f t 
f i 3 0 1560 "1560 23 23" _null_ _null_ _null_ _null_ "select 
pg_catalog.overlay($1, $3::bit, $2, 1)" _null_ _null_ _null_ ));
DESCR("set bit");


but this doesn't give any check on the values provided:
that the bit looked for is in fact in the right range, and that the bit in 
set_bit is in fact a bit
(I don't like the idea of writing "select set_bit(B'01010111', B'1')" instead 
of 
"select set_bit(B'01010111', 1)" ).
So I coded them in proper C internal functions.



Leonardo


  

patch_bit.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: I: [HACKERS] TODO: Allow substring/replace() to get/set bit values

2010-01-07 Thread Leonardo F



> > Is anybody interested? Otherwise the entry could be removed from the TODO 
> list...
> 
> Even if not, you can still submit a patch.  There are a lot more users
> of PG than there are people who read -hackers.



Ok, I'll try and submit a patch. Thank you very much.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: I: [HACKERS] TODO: Allow substring/replace() to get/set bit values

2010-01-06 Thread Leonardo F

> > To sum up:
> >
> > 1) a new function, "get_bit", that calls substring
> > 2) a new function, "overlay", that replaces bits (starting at a certain 
> position)
> > 3) a new function, "set_bit", that calls overlay
> 
> That seems reasonable to me.  Not sure what others think.


Is anybody interested? Otherwise the entry could be removed from the TODO 
list...



Leonardo





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: I: [HACKERS] TODO: Allow substring/replace() to get/set bit values

2010-01-05 Thread Leonardo F

> As you say, there's really no point in changing the internal
> representation, and if you don't find replace() useful either, then
> why are you even working on this at all?  

I would like a get_bit / set_bit for bit strings, as I find them useful.
get_bit could be a simple call to substring, but there's no way of doing a 
set_bit on a bit string as far as I know.

I don't like the "replace" syntax for bit strings since it won't give you the 
same functionality of set_bit, 
plus I don't really see how someone would want to look for a bit string and 
replace it with another bit string.
But I see that someone might want to overlay a bit string with another (this is 
different from "replace" since you
have to tell the position where the replacing would start, instead of looking 
for a bit string).

To sum up:

1) a new function, "get_bit", that calls substring
2) a new function, "overlay", that replaces bits (starting at a certain 
position)
3) a new function, "set_bit", that calls overlay


> Since the latest discussion
> of this is more than five years old, it's unclear that anyone even
> cares any more.  It seems to me that making replace overlay a
> substring of bits could be a reasonable thing to do, but if nobody
> actually wants it, then the simplest thing to do is remove this from
> the TODO and call it good.

I understand: it would be both a useful feature to me and a way to start coding 
postgres.

But, of course, if there's no interest, I'll pass...





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: I: [HACKERS] TODO: Allow substring/replace() to get/set bit values

2010-01-05 Thread Leonardo F

> You might want to search the archives (or the wiki history, or the CVS
> history if it's been there since before we moved the TODO list to the
> wiki) for discussion of why that item was added to the TODO in the
> first place.



I read the thread:

http://archives.postgresql.org/pgsql-hackers/2004-02/msg00478.php

1) it is true that "getbit sounds a lot like what substring() does", but the 
same could be said for binary string substring/get_byte; so IMHO get/set_bit 
should be present for bit string
2) it is not very clear to me how "setbit could actually be handled by 
replace()" (maybe "overlay" style?)
3) since I'm looking at byte string get/set_bit to understand how that works, 
I'm having a hard time understanding why the bit indexes in get/set_bit are 
low-first based:

select get_bit(E'\\376\\376'::bytea, s) as b,s from generate_series(0,15,1) as s
b s
0 0
1 1
1 2
1 3
1 4
1 5
1 6
1 7
0 8
1 9
1 10
1 11
1 12
1 13
1 14
1 15


I understand this is the internal representation, but still: if someone asked 
me what the 8th bit in 11101110 is, I would have said 1, not 0 
(assuming the first bit has index '0'). Actually, David Helgason's patch 
(http://archives.postgresql.org/pgsql-hackers/2004-01/msg00498.php) goes in 
this direction: note the 

bitNo = 7 - (n % 8);

part. Using that algorithm would mean get/set_bit in bit string would behave 
differently from what they do in binary string (IMHO it's the binary string 
implementation that is "wrong").



Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

I: [HACKERS] TODO: Allow substring/replace() to get/set bit values

2010-01-04 Thread Leonardo F

Re-reading the docs it looks like the only thing missing is get/set_bit for bit 
string.


Substring is already implemented for bit string, and I don't really know if 
replace is useful at all.


(sorry if the other mail came with a different sender name)


Leonardo



> I would like to work on "Allow substring/replace() to get/set bit values", 
> since 
> it looks like a simple task.
> 
> The item is not marked as "easy" on the TODO though. Before proceding to a 
> discussion on how this functions should be implemented (I got from the 
> messages 
> on the mailing list that bit substring/replace functions should do it) I 
> would 
> like to know if it's a complicated task.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

45 matches

Mail list logo