Re: [PATCHES] hash index improving v3

2008-09-11 Thread Alex Hunsaker
On Thu, Sep 11, 2008 at 9:24 AM, Kenneth Marshall <[EMAIL PROTECTED]> wrote:
> Alex,
>
> I meant to check the performance with increasing numbers of collisions,
> not increasing size of the hashed item. In other words, something like
> this:
>
> for ($coll=500; $i<=100; $i=$i*2) {
>  for ($i=0; $i<=100; $i++) {
>hash(int8 $i);
>  }
>  # add the appropriate number of collisions, distributed evenly to
>  # minimize the packing overrun problem
>  for ($dup=0; $dup<=$coll; $dup++) {
>hash(int8 MAX_INT + $dup * 100/$coll);
>  }
> }
>
> Ken

*doh* right something like this...

create or replace function create_test_hash() returns bool as $$
declare
coll integer default 500;
-- tweak this to where create index gets really slow
max_coll integer default 100;
begin
loop
execute 'create table test_hash_'|| coll ||'(num int8);';
execute 'insert into test_hash_'|| coll ||' (num) select n
from generate_series(0, '|| max_coll ||') as n;';
execute 'insert into test_hash_'|| coll ||' (num) select
(n+4294967296) * '|| max_col ||'/'|| coll ||'::int from
generate_series(0, '|| coll ||') as n;';

coll := coll * 2;

exit when coll >= max_coll;
end loop;
return true;
end;
$$ language 'plpgsql';

And then benchmark each table, and for extra credit cluster the table
on the index and benchmark that.

Also obviously with the hashint8 which just ignores the top 32 bits.

Right?

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-11 Thread Kenneth Marshall
On Wed, Sep 10, 2008 at 10:17:31PM -0600, Alex Hunsaker wrote:
> On Wed, Sep 10, 2008 at 9:49 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote:
> > On Wed, Sep 10, 2008 at 7:04 AM, Kenneth Marshall <[EMAIL PROTECTED]> wrote:
> >> On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote:
> >>> On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall <[EMAIL PROTECTED]> 
> >>> wrote:
> >>> > I think that the glacial speed for generating a big hash index is
> >>> > the same problem that the original code faced.
> >>>
> >>> Yeah sorry, I was not saying it was a new problem with the patch.  Err
> >>> at least not trying to :) *Both* of them had been running at 18+ (I
> >>> finally killed them sometime Sunday or around +32 hours...)
> >>>
> >>> > It would be useful to have an equivalent test for the hash-only
> >>> > index without the modified int8 hash function, since that would
> >>> > be more representative of its performance. The collision rates
> >>> > that I was observing in my tests of the old and new mix() functions
> >>> > was about 2 * (1/1) of what you test generated. You could just
> >>> > test against the integers between 1 and 200.
> >>>
> >>> Sure but then its pretty much just a general test of patch vs no
> >>> patch.  i.e. How do we measure how much longer collisions take when
> >>> the new patch makes things faster?  That's what I was trying to
> >>> measure... Though I apologize I don't think that was clearly stated
> >>> anywhere...
> >>
> >> Right, I agree that we need to benchmark the collision processing
> >> time difference. I am not certain that two data points is useful
> >> information. There are 469 collisions with our current hash function
> >> on the integers from 1 to 200. What about testing the performance
> >> at power-of-2 multiples of 500, i.e. 500, 1000, 2000, 4000, 8000,...
> >> Unless you adjust the fill calculation for the CREATE INDEX, I would
> >> stop once the time to create the index spikes. It might also be useful
> >> to see if a CLUSTER affects the performance as well. What do you think
> >> of that strategy?
> >
> > Not sure it will be a good benchmark of collision processing.  Then
> > again you seem to have studied the hash algo closer than me.  Ill go
> > see about doing this.  Stay tuned.
> 
> Assuming I understood you correctly, And I probably didn't this does
> not work very well because you max out at 27,006 values before you get
> this error:
> ERROR:  index row size 8152 exceeds hash maximum 8144
> HINT:  Values larger than a buffer page cannot be indexed.
> 
> So is a power-of-2 multiple of 500 not simply:
> x = 500;
> while(1)
> {
> print x;
> x *= 2;
> }
> 
> ?
> 
Alex,

I meant to check the performance with increasing numbers of collisions,
not increasing size of the hashed item. In other words, something like
this:

for ($coll=500; $i<=100; $i=$i*2) {
  for ($i=0; $i<=100; $i++) {
hash(int8 $i);
  }
  # add the appropriate number of collisions, distributed evenly to
  # minimize the packing overrun problem
  for ($dup=0; $dup<=$coll; $dup++) {
hash(int8 MAX_INT + $dup * 100/$coll);
  }
}

Ken

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] still alive?

2008-09-11 Thread Peter Eisentraut

Bruce Momjian wrote:

Abhijit Menon-Sen wrote:

I thought -patches was supposed to die. What happened?


I was wondering the same thing.  Peter?


Hmm, let's try this:

Anyone who thinks the patches list should remain as separate from 
hackers, shout now (with rationale)!


--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches