Re: [HACKERS] todo: Hash index creation

2007-07-07 Thread Kenneth Marshall
On Thu, Jul 05, 2007 at 12:26:45PM +0100, Heikki Linnakangas wrote:
> Kenneth Marshall wrote:
> >I definitely agree with Tom's assessment. If we cannot need to make the
> >hash index as performant as it is in theory, none of the other refinements
> >are worth it. You would need to use BTree if you were concerned about
> >speed. (and who isn't)
> 
> I just got an idea.
> 
> Hash indexes would take much less space, and be more efficient to 
> search, if we only stored the hash of the key in the index. Such index 
> tuples would be fixed size, so we could get rid of the overhead of the 
> length-field in IndexTuple, as well as the line pointers.
> 
> Of course, the key value would need to be rechecked after fetching the 
> heap tuple, but that shouldn't be a problem assuming there's few collisions.
> 
> Another idea: when searching, we scan the whole bucket page looking for 
> matching tuples. If we stored the tuples ordered by hash value within 
> page, we could do a binary search instead.
> 
> These changes might give hash indexam the edge over b-tree it needs.
> 
> -- 
>   Heikki Linnakangas
>   EnterpriseDB   http://www.enterprisedb.com
> 
I was thinking about your idea on my commute yesterday and the new
HOT functionality. The first step would be as you suggested, store
only the hash value and the heap bucket page in the index. The check
for collisions within a hash bucket is essentially the same thing
that we do when searching for the heap tuple visible for a specific
transaction. This would mean, that with a HOT type process, the index
would not need to be updated when a non-indexed field is changed. We
treat them as another tuple in the HOT chain.

This would make typical lookups require 2 disk reads per item, the
first to get the heap location and the second to get the heap page
itself. The next step would be to have the hash function generate
the heap location directly instead of a disk read, then retrieving
an item would be one read. This would also provide for a simple
clustering based on the hash index. I am certain that there are
holes in this ideal, but it would be a logical next step.

Regards,
Ken

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] todo: Hash index creation

2007-07-05 Thread Heikki Linnakangas

Kenneth Marshall wrote:

I definitely agree with Tom's assessment. If we cannot need to make the
hash index as performant as it is in theory, none of the other refinements
are worth it. You would need to use BTree if you were concerned about
speed. (and who isn't)


I just got an idea.

Hash indexes would take much less space, and be more efficient to 
search, if we only stored the hash of the key in the index. Such index 
tuples would be fixed size, so we could get rid of the overhead of the 
length-field in IndexTuple, as well as the line pointers.


Of course, the key value would need to be rechecked after fetching the 
heap tuple, but that shouldn't be a problem assuming there's few collisions.


Another idea: when searching, we scan the whole bucket page looking for 
matching tuples. If we stored the tuples ordered by hash value within 
page, we could do a binary search instead.


These changes might give hash indexam the edge over b-tree it needs.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] todo: Hash index creation

2007-07-03 Thread Naz Gassiep




Wow... not sure how I missed that. I *did* create this schema ages ago,
perhaps it wasn't there, or at the time I had no idea what the
implications were. *shrug*
Regards,
- Naz.


Tom Lane wrote:

  Naz Gassiep <[EMAIL PROTECTED]> writes:
  
  
As a result, when creating tables containing large blocks of text I wish
to index, I've been using HASH as an index method. Please can we state
in the manual that HASH index types are in a beta stage of development
or something similar, or perhaps remove the manual entry altogether
until HASH is at a point where it is usable in production.

  
  
Uh, the manual already does say

Note: Testing has shown PostgreSQL's hash indexes to perform no better
than B-tree indexes, and the index size and build time for hash indexes
is much worse. Furthermore, hash index operations are not presently
WAL-logged, so hash indexes might need to be rebuilt with REINDEX after
a database crash. For these reasons, hash index use is presently
discouraged.

under 11.2 Index Types, as well as various derogatory remarks elsewhere.

			regards, tom lane

  





Re: [HACKERS] todo: Hash index creation

2007-07-02 Thread Hannu Krosing
Ühel kenal päeval, E, 2007-07-02 kell 04:27, kirjutas Naz Gassiep:

> I've been warned away from hash indexes before, however I had no idea
> that it's performance was that abysmal that BTREE beat it and I was
> definitely not aware that they were not included in WAL logs. I was told
> it wasn't as good as it could be, but I wasn't told it was pretty much
> an alpha piece of code.
> 
> As a result, when creating tables containing large blocks of text I wish
> to index, I've been using HASH as an index method. 

If you just wish to have smaller indexes, then you can use functional
btree indexes over text hash, like this:

CREATE INDEX largetextindex on mytable(hashtext(largetext));

and use

SELECT * FROM mytable 
 where hashtext(largetext) = hastext('searchvalue') 
   and largetext = 'searchvalue'
;

btw, if the real hash indexes don't get fixes soon, maybe we could
redefine hash index to actually mean usage like this and do the rewrites
in parser?

> Please can we state
> in the manual that HASH index types are in a beta stage of development
> or something similar, or perhaps remove the manual entry altogether
> until HASH is at a point where it is usable in production.
> 
> Regards,
> A very surprised n00b.
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] todo: Hash index creation

2007-07-01 Thread Tom Lane
Naz Gassiep <[EMAIL PROTECTED]> writes:
> As a result, when creating tables containing large blocks of text I wish
> to index, I've been using HASH as an index method. Please can we state
> in the manual that HASH index types are in a beta stage of development
> or something similar, or perhaps remove the manual entry altogether
> until HASH is at a point where it is usable in production.

Uh, the manual already does say

Note: Testing has shown PostgreSQL's hash indexes to perform no better
than B-tree indexes, and the index size and build time for hash indexes
is much worse. Furthermore, hash index operations are not presently
WAL-logged, so hash indexes might need to be rebuilt with REINDEX after
a database crash. For these reasons, hash index use is presently
discouraged.

under 11.2 Index Types, as well as various derogatory remarks elsewhere.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] todo: Hash index creation

2007-07-01 Thread Naz Gassiep

> Actually I think the *most* important thing to work on is to get hash to
> the point where its search speed actually beats btree consistently, so
> that it has an excuse to live.  If that is insoluble we might well end up
> ripping it out entirely.  (The first three TODO items for hash indexes
> are ideas for trying to improve the speed.)
>
> Fixing the WAL support would come after that, and bring it to the point
> where someone could actually recommend it for production use.
>
> After that it would be sensible to work on inessentials like improving
> the build speed.
I've been warned away from hash indexes before, however I had no idea
that it's performance was that abysmal that BTREE beat it and I was
definitely not aware that they were not included in WAL logs. I was told
it wasn't as good as it could be, but I wasn't told it was pretty much
an alpha piece of code.

As a result, when creating tables containing large blocks of text I wish
to index, I've been using HASH as an index method. Please can we state
in the manual that HASH index types are in a beta stage of development
or something similar, or perhaps remove the manual entry altogether
until HASH is at a point where it is usable in production.

Regards,
A very surprised n00b.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] todo: Hash index creation

2007-06-27 Thread Kenneth Marshall
On Wed, Jun 27, 2007 at 08:36:54PM -0400, Tom Lane wrote:
> Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> > [EMAIL PROTECTED] wrote:
> >> Is anyone currently working on this TODO item?  
> >> "During index creation, pre-sort the tuples to improve build speed"
> 
> > If you want to work on hash indexes, though, this TODO item seems more 
> > important to me at least:
> >> Add WAL logging for crash recovery
> 
> Actually I think the *most* important thing to work on is to get hash to
> the point where its search speed actually beats btree consistently, so
> that it has an excuse to live.  If that is insoluble we might well end up
> ripping it out entirely.  (The first three TODO items for hash indexes
> are ideas for trying to improve the speed.)
> 
> Fixing the WAL support would come after that, and bring it to the point
> where someone could actually recommend it for production use.
> 
> After that it would be sensible to work on inessentials like improving
> the build speed.
> 
>   regards, tom lane
> 

I definitely agree with Tom's assessment. If we cannot need to make the
hash index as performant as it is in theory, none of the other refinements
are worth it. You would need to use BTree if you were concerned about
speed. (and who isn't)

Ken

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] todo: Hash index creation

2007-06-27 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> [EMAIL PROTECTED] wrote:
>> Is anyone currently working on this TODO item?  
>> "During index creation, pre-sort the tuples to improve build speed"

> If you want to work on hash indexes, though, this TODO item seems more 
> important to me at least:
>> Add WAL logging for crash recovery

Actually I think the *most* important thing to work on is to get hash to
the point where its search speed actually beats btree consistently, so
that it has an excuse to live.  If that is insoluble we might well end up
ripping it out entirely.  (The first three TODO items for hash indexes
are ideas for trying to improve the speed.)

Fixing the WAL support would come after that, and bring it to the point
where someone could actually recommend it for production use.

After that it would be sensible to work on inessentials like improving
the build speed.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] todo: Hash index creation

2007-06-27 Thread Heikki Linnakangas

[EMAIL PROTECTED] wrote:
Is anyone currently working on this TODO item?  


"During index creation, pre-sort the tuples to improve build speed"
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php

A few of us would like to tackle it and see if we can add some value here.  


That would be nice!

If you want to work on hash indexes, though, this TODO item seems more 
important to me at least:



Add WAL logging for crash recovery


Until that's done, you can't really use hash indexes for anything else 
than read-only tables.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] todo: Hash index creation

2007-06-27 Thread twraney
Is anyone currently working on this TODO item?  

"During index creation, pre-sort the tuples to improve build speed"
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php

A few of us would like to tackle it and see if we can add some value here.  

Tom, 
Shreya  

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match