subject:"Re\: \[HACKERS\] GSoC on WAL\-logging hash indexes"

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-09-03 Thread Vik Fearing

On 08/20/2014 02:43 AM, Michael Paquier wrote:
 
 
 
 On Thu, Jun 19, 2014 at 6:40 PM, Vik Fearing vik.fear...@dalibo.com
 mailto:vik.fear...@dalibo.com wrote:
 
 On 04/30/2014 11:41 PM, Tom Lane wrote:
  We really oughta fix the WAL situation, not just band-aid around it.
 
 After re-reading this thread, it is not clear that anyone is going to
 work on it so I'll just ask:
 
 Is anyone working on this?
 
 If not, I'd like to put it on my plate.
 
 Vik, did you get time to look at that finally?

Yes, I am (very slowly) working on this.  I've got a decent learning
curve for WAL replay to get over and I figured this can't be urgent
considering how many years it's been like this so I'm sort of taking my
time.
-- 
Vik


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-08-19 Thread Michael Paquier

On Thu, Jun 19, 2014 at 6:40 PM, Vik Fearing vik.fear...@dalibo.com wrote:

 On 04/30/2014 11:41 PM, Tom Lane wrote:
  We really oughta fix the WAL situation, not just band-aid around it.

 After re-reading this thread, it is not clear that anyone is going to
 work on it so I'll just ask:

 Is anyone working on this?

 If not, I'd like to put it on my plate.

Vik, did you get time to look at that finally?
Regards,
-- 
Michael

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-06-19 Thread Vik Fearing

On 04/30/2014 11:41 PM, Tom Lane wrote:
 We really oughta fix the WAL situation, not just band-aid around it.

After re-reading this thread, it is not clear that anyone is going to
work on it so I'll just ask:

Is anyone working on this?

If not, I'd like to put it on my plate.

-- 
Vik



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Peter Geoghegan

On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas robertmh...@gmail.com wrote:
 As a GSoC student, I will implement WAL recovery of hash indexes using the
 other index types' WAL code as a guide.

Frankly, I'm skeptical of the idea that hash indexes will ever really
be useful. I realize that that's a counter-intuitive conclusion, but
there are many things we could do to improve B-Tree CPU costs to make
them closer to those of hash indexes, without making them any less
flexible. I myself would much rather work on that, and intend to.

The O(1) cost seems attractive when you consider that that only
requires that we read one index page from disk to service any given
index scan, but in fact B-Trees almost always only require the same.
They are of course also much more flexible. The concurrency
characteristics B-Trees are a lot better understood. I sincerely
suggest that we forget about conventional hash table type indexes. I
fear they're a lost cause.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread k...@rice.edu

On Wed, Apr 30, 2014 at 12:26:20AM -0700, Peter Geoghegan wrote:
 On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas robertmh...@gmail.com wrote:
  As a GSoC student, I will implement WAL recovery of hash indexes using the
  other index types' WAL code as a guide.
 
 Frankly, I'm skeptical of the idea that hash indexes will ever really
 be useful. I realize that that's a counter-intuitive conclusion, but
 there are many things we could do to improve B-Tree CPU costs to make
 them closer to those of hash indexes, without making them any less
 flexible. I myself would much rather work on that, and intend to.
 
 The O(1) cost seems attractive when you consider that that only
 requires that we read one index page from disk to service any given
 index scan, but in fact B-Trees almost always only require the same.
 They are of course also much more flexible. The concurrency
 characteristics B-Trees are a lot better understood. I sincerely
 suggest that we forget about conventional hash table type indexes. I
 fear they're a lost cause.
 
 -- 
 Peter Geoghegan
 
Hi Peter,

I do not think that CPU costs matter as much as the O(1) probe to
get a result value specifically for very large indexes/tables where
even caching the upper levels of a B-tree index would kill your
working set in memory. I know, I know, everyone has so much memory
and can just buy more... but this does matter. I also think that
development of hash indexes has been stalled waiting for WAL
logging. For example, hash indexes can almost trivially become
more space efficient as they grow in size by utilizing the page
number to represent the prefix bits of the hash value for a bucket.

My 2 cents.
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Peter Geoghegan

On Wed, Apr 30, 2014 at 5:55 AM, k...@rice.edu k...@rice.edu wrote:
 I do not think that CPU costs matter as much as the O(1) probe to
 get a result value specifically for very large indexes/tables where
 even caching the upper levels of a B-tree index would kill your
 working set in memory. I know, I know, everyone has so much memory
 and can just buy more... but this does matter.

Have you actually investigated how little memory it takes to store the
inner pages? It's typically less than 1% of the entire index. AFAIK,
hash indexes are not used much in any other system. I think MySQL has
them, and SQL Server 2014 has special in-memory hash table indexes for
in memory tables, but that's all I can find on Google.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Robert Haas

On Wed, Apr 30, 2014 at 12:54 PM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Apr 30, 2014 at 5:55 AM, k...@rice.edu k...@rice.edu wrote:
 I do not think that CPU costs matter as much as the O(1) probe to
 get a result value specifically for very large indexes/tables where
 even caching the upper levels of a B-tree index would kill your
 working set in memory. I know, I know, everyone has so much memory
 and can just buy more... but this does matter.

 Have you actually investigated how little memory it takes to store the
 inner pages? It's typically less than 1% of the entire index. AFAIK,
 hash indexes are not used much in any other system. I think MySQL has
 them, and SQL Server 2014 has special in-memory hash table indexes for
 in memory tables, but that's all I can find on Google.

I thought the theoretical advantage of hash indexes wasn't that they
were smaller but that you avoided a central contention point (the
btree root).

Of course our current hash indexes have *more* not less contention
than btree but I'm pretty comfortable chalking that up to quality of
implementation rather than anything intrinsic.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 I thought the theoretical advantage of hash indexes wasn't that they
 were smaller but that you avoided a central contention point (the
 btree root).

 Of course our current hash indexes have *more* not less contention
 than btree but I'm pretty comfortable chalking that up to quality of
 implementation rather than anything intrinsic.

The long and the short of it is that there are *lots* of implementation
deficiences in our hash indexes.  There's no real way to know whether
they'd be competitive if all those things were rectified, except by doing
the work to fix 'em.  And it's hard to justify putting much effort into
hash indexes so long as there's an elephant in the room of the size of no
WAL support.  So I'm in favor of getting that fixed, if we have somebody
who's willing to do it.  It might lead to good things later; and even if
it doesn't, the lack of WAL support is an embarrassment.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Peter Geoghegan

On Wed, Apr 30, 2014 at 10:11 AM, Robert Haas robertmh...@gmail.com wrote:
 I thought the theoretical advantage of hash indexes wasn't that they
 were smaller but that you avoided a central contention point (the
 btree root).

The B-Tree root isn't really a central contention point at all. The
locking/latching protocol that nbtree uses is remarkably
concurrency-friendly. In the real world, there is pretty much no
exclusive locking of the root page's buffer.

 Of course our current hash indexes have *more* not less contention
 than btree but I'm pretty comfortable chalking that up to quality of
 implementation rather than anything intrinsic.

I am not convinced of that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Peter Geoghegan

On Wed, Apr 30, 2014 at 11:01 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  It might lead to good things later; and even if
 it doesn't, the lack of WAL support is an embarrassment.

I don't think it will, but I do agree that the current state of
affairs is an embarrassment.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Jeff Janes

On Wed, Apr 30, 2014 at 12:26 AM, Peter Geoghegan p...@heroku.com wrote:

 On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas robertmh...@gmail.com wrote:
  As a GSoC student, I will implement WAL recovery of hash indexes using
 the
  other index types' WAL code as a guide.

 Frankly, I'm skeptical of the idea that hash indexes will ever really
 be useful. I realize that that's a counter-intuitive conclusion, but
 there are many things we could do to improve B-Tree CPU costs to make
 them closer to those of hash indexes, without making them any less
 flexible. I myself would much rather work on that, and intend to.


If we don't put in the work to make them useful, then they won't ever
become useful.

If we do put in the effort (and it would be considerable) then I think they
will be.  But you may be correct that the effort required would perhaps be
better used in making btree even more better.  I don't think we can
conclude that definitively without putting in the work to do the experiment.

One advantage of the hash indexes is that the code is simple enough for
someone to actually understand it in a summer. Whether it would still be
like that after WAL logging was implemented, I don't know.

The O(1) cost seems attractive when you consider that that only
 requires that we read one index page from disk to service any given
 index scan, but in fact B-Trees almost always only require the same.
 They are of course also much more flexible. The concurrency
 characteristics B-Trees are a lot better understood.


Not sure what you mean there.  The concurrency issues of the hash index has
a lot less that needs to be understand.  I think I understand it pretty
well (unlike B-tree), I just don't know what to with that knowledge.


 I sincerely
 suggest that we forget about conventional hash table type indexes. I
 fear they're a lost cause.


I understand that those are the only ones worth fighting for. :)

Cheers,

Jeff

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Josh Berkus

All,

(dropping pgsql-advocacy off the cc list)

On 04/30/2014 10:11 AM, Robert Haas wrote:
 On Wed, Apr 30, 2014 at 12:54 PM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Apr 30, 2014 at 5:55 AM, k...@rice.edu k...@rice.edu wrote:
 I do not think that CPU costs matter as much as the O(1) probe to
 get a result value specifically for very large indexes/tables where
 even caching the upper levels of a B-tree index would kill your
 working set in memory. I know, I know, everyone has so much memory
 and can just buy more... but this does matter.

 Have you actually investigated how little memory it takes to store the
 inner pages? It's typically less than 1% of the entire index. AFAIK,
 hash indexes are not used much in any other system. I think MySQL has
 them, and SQL Server 2014 has special in-memory hash table indexes for
 in memory tables, but that's all I can find on Google.

Hash indexes are more important for MySQL because they have
index-organized tables.

 I thought the theoretical advantage of hash indexes wasn't that they
 were smaller but that you avoided a central contention point (the
 btree root).

Yes. And being smaller isn't insignificant; think of billion-row tables
with fairly random access over the whole table.  Also, *theoretically*,
a hash index could avoid the rebalancing issues which cause our btree
indexes to become bloated and need a REINDEX with certain update patterns.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Jeff Janes

On Wed, Apr 30, 2014 at 11:02 AM, Peter Geoghegan p...@heroku.com wrote:

 On Wed, Apr 30, 2014 at 10:11 AM, Robert Haas robertmh...@gmail.com
 wrote:
  I thought the theoretical advantage of hash indexes wasn't that they
  were smaller but that you avoided a central contention point (the
  btree root).

 The B-Tree root isn't really a central contention point at all. The
 locking/latching protocol that nbtree uses is remarkably
 concurrency-friendly. In the real world, there is pretty much no
 exclusive locking of the root page's buffer.


I've seen the simple pinning and unpinning of the root page (or the fast
root, whatever the first page we bother to pin on a regular basis is
called) be a point of contention.  When one index dominates the entire
system workload, that one page also drives contention on the spin lock that
protects the lwlock that share-protects whichever buffer mapping partition
happens to contain it.

Cheers,

Jeff

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Andres Freund

On 2014-04-30 11:10:22 -0700, Jeff Janes wrote:
 I've seen the simple pinning and unpinning of the root page (or the fast root,
 whatever the first page we bother to pin on a regular basis is called) be a
 point of contention.  When one index dominates the entire system workload, 
 that
 one page also drives contention on the spin lock that protects the lwlock that
 share-protects whichever buffer mapping partition happens to contain it.

To quite some degree that's an implementation deficiency of our lwlocks
though. I've seen *massive* improvements with my lwlock patch for that
problem. Additionally we need to get rid of the spinlock around
pin/unpin.
That said, even after those optimizations, there remains a significant
amount of cacheline bouncing. That's much easier to avoid for something
like hash indexes than btrees.

I think another advantage is that hash indexes can be *much* smaller
than btree when the individual rows are wide. I wonder though if we
couldn't solve that better by introducing transforms around the looked
up data. E.g. allow to *transparently* use a hash(indexed_column) to be
used. If you currently do that a lot of work has to be done in every
query...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Robert Haas

On Wed, Apr 30, 2014 at 2:02 PM, Peter Geoghegan p...@heroku.com wrote:
 Of course our current hash indexes have *more* not less contention
 than btree but I'm pretty comfortable chalking that up to quality of
 implementation rather than anything intrinsic.

 I am not convinced of that.

In commit 76837c1507cb5a5a0048046233568447729e66dd, I remove part
(basically half) of the heavyweight locking used by hash indexes, and
performance improved rather dramatically:

http://www.postgresql.org/message-id/CA+Tgmoaf=nojxlyzgcbrry+pe-0vll0vfhi6tjdm3fftvws...@mail.gmail.com

I think it's quite reasonable to suppose that getting rid of the
remaining heavyweight locking will help just as much.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Peter Geoghegan

On Wed, Apr 30, 2014 at 11:03 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 If we don't put in the work to make them useful, then they won't ever become
 useful.

 If we do put in the effort (and it would be considerable) then I think they
 will be.  But you may be correct that the effort required would perhaps be
 better used in making btree even more better.  I don't think we can conclude
 that definitively without putting in the work to do the experiment.

My argument doesn't hinge on there being more important work to do.
Rather, I simply don't think that there is never going to be a
compelling reason to use hash indexes in production. Apart from the
obvious inflexibility, consider what it takes to make index creation
fast - insertion-style building of indexes is much slower. Consider
multi-key indexes.

Now, I'm not telling anyone what to work on, and if someone wants to
make hash indexes WAL-logged to plug that hole, don't let me stop you.
It probably makes sense as a project to learn more about Postgres
internals. However, it would be unfair to not speak up given my
misgivings around the practical utility of hash indexes.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Greg Stark

I think the key question was if someone wanted to develop a good hash
index would they start from our current hash index or would they be
better off starting from a fresh codebase? If the former then we
should add WAL logging and throw GSOC and other people who ask for
projects at it. If the latter then we should throw out the current
codebase and let people develop extensions that add new hash index
code until someone comes up with a good design we want to move to
core.

But imnsho doing nothing is a bad idea. We should have long ago either
added WAL logging or removed the index type. We shouldn't have left a
foot-gun that large lying around for so long.

Incidentally something else I've considered would be having a WAL
record type saying relation corrupted and emitting one the first
time a hash index is touched after a checkpoint. That could provide a
general mechanism that might be useful for unlogged operations (and
might be combinable with the infrastructure for unlogged tables). But
it seems like a better tool for other objects than hash indexes.

Another quick fix would be having a GUC allow_unrecoverable_relations
which defaulted to false. Creating a hash index (and presumably
unlogged tables) would error out with a hint to set that to true so at
least users who create them would have to know what they were in for.
Right now we're seeing a lot of users who create hash indexes and then
complain about corrupted standbys.

I could do the legwork on either the GUC or moving hash indexes to an
extension if there's a consensus. But I suspect either will be quite
controversial...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Stephen Frost

* Greg Stark (st...@mit.edu) wrote:
 But imnsho doing nothing is a bad idea. We should have long ago either
 added WAL logging or removed the index type. We shouldn't have left a
 foot-gun that large lying around for so long.

I certainly encourage you to work on it. :)

 Incidentally something else I've considered would be having a WAL
 record type saying relation corrupted and emitting one the first
 time a hash index is touched after a checkpoint. That could provide a
 general mechanism that might be useful for unlogged operations (and
 might be combinable with the infrastructure for unlogged tables). But
 it seems like a better tool for other objects than hash indexes.

Ugh, please do not make unlogged tables suffer for this.  I feel it is
*quite* clear when you say UNLOGGED or TEMP in the CREATE statement
that you know what you're gonna get.  Perhaps we should require
'UNLOGGED INDEX' to be passed in when someone creates a 'hash' index
instead.

 Another quick fix would be having a GUC allow_unrecoverable_relations
 which defaulted to false. Creating a hash index (and presumably
 unlogged tables) would error out with a hint to set that to true so at
 least users who create them would have to know what they were in for.

-1

 Right now we're seeing a lot of users who create hash indexes and then
 complain about corrupted standbys.

Uh, we are?  If you're saying that $employer is, great, but please
clarify that as the rest of us aren't 'in the loop' as far as that goes
and we see the various list/IRC traffic instead, and I can't remember
ever seeing such a complaint in either place (but I'll admit, my memory
isn't the best).

 I could do the legwork on either the GUC or moving hash indexes to an
 extension if there's a consensus. But I suspect either will be quite
 controversial...

-1 on the GUC.  I'd be alright with finding a way to make it clear, upon
creation of the hash index, that it's not WAL'd (maybe even just throw a
WARNING, it'd be better than nothing..).  Trying to rip out the current
hash index wouldn't be great, imv.  If you'd like to build an extension
which provides hash indexes- I say go for it?  Nothing stopping you as
far as that's concerned.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Jeff Janes

On Wed, Apr 30, 2014 at 11:19 AM, Peter Geoghegan p...@heroku.com wrote:

 On Wed, Apr 30, 2014 at 11:03 AM, Jeff Janes jeff.ja...@gmail.com wrote:
  If we don't put in the work to make them useful, then they won't ever
 become
  useful.
 
  If we do put in the effort (and it would be considerable) then I think
 they
  will be.  But you may be correct that the effort required would perhaps
 be
  better used in making btree even more better.  I don't think we can
 conclude
  that definitively without putting in the work to do the experiment.

 My argument doesn't hinge on there being more important work to do.
 Rather, I simply don't think that there is never going to be a
 compelling reason to use hash indexes in production.


I have an indexed text column with an average length of 50, a stddev length
of 15, and a pronounced right skew.  Currently, the longest value in it is
811.  But inevitably someone will need to insert something longer
than 2712.  When that day comes, I will drop the btree index and add a hash
index (unless we remove that limitation from btree indexes in the mean
time).  It lets me sleep at night knowing that I have that option today,
even if it would complicate crash recovery.


Apart from the
 obvious inflexibility, consider what it takes to make index creation
 fast - insertion-style building of indexes is much slower. Consider
 multi-key indexes.


I'm pretty sure hash indexes already implement a bulk creation fast path.
 In any case, I've never noticed them being slow, and I've tested some
pretty big ones.



 Now, I'm not telling anyone what to work on, and if someone wants to
 make hash indexes WAL-logged to plug that hole, don't let me stop you.
 It probably makes sense as a project to learn more about Postgres
 internals. However, it would be unfair to not speak up given my
 misgivings around the practical utility of hash indexes.


Sure, and we all have our own opinions on that.  Should we summarize them
somewhere easier to follow than a long email thread but more detailed than
a TODO entry?  Whatever happened with the GSOC people?  That should be well
under way by now, is anyone working on it?  Are the discussions of their
efforts on-list, or is it between them and their mentors?

Cheers,

Jeff

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Jeff Janes

On Wed, Apr 30, 2014 at 12:16 PM, Greg Stark st...@mit.edu wrote:

 I think the key question was if someone wanted to develop a good hash
 index would they start from our current hash index or would they be
 better off starting from a fresh codebase?


If it were me I'd start with the current code.  It would be nice if one
could just fork the code to have a new type of index (say hash2) which is
initially identical, but I never figured out how to do that.



 If the former then we
 should add WAL logging and throw GSOC and other people who ask for
 projects at it. If the latter then we should throw out the current
 codebase and let people develop extensions that add new hash index
 code until someone comes up with a good design we want to move to
 core.


Extensions have no hooks into the WAL system, and I'm not optimistic that
that will ever change.  Relegating a fundamentally new index to be an
extension virtually *guarantees* that it will never be WAL logged.




 Incidentally something else I've considered would be having a WAL
 record type saying relation corrupted and emitting one the first
 time a hash index is touched after a checkpoint. That could provide a
 general mechanism that might be useful for unlogged operations (and
 might be combinable with the infrastructure for unlogged tables). But
 it seems like a better tool for other objects than hash indexes.


+1.

I often lament that unlogged tables cannot be used on a standby or a test
server which were derived from a hot backup.  In the case of unlogged
tables, this does mean we would need a way to checkpoint them with a
non-shutdown checkpoint, though.  I don't know if that would need a
different type of unlogged table, or a different type of checkpoint.

Cheers,

Jeff

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Alvaro Herrera

Stephen Frost wrote:
 * Greg Stark (st...@mit.edu) wrote:

  I could do the legwork on either the GUC or moving hash indexes to an
  extension if there's a consensus. But I suspect either will be quite
  controversial...

 If you'd like to build an extension which provides hash indexes- I say
 go for it?  Nothing stopping you as far as that's concerned.

Extensions can't write/replay WAL, so it's not clear to me how would this
work.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-04-30 Thread Tom Lane

Greg Stark st...@mit.edu writes:
 But imnsho doing nothing is a bad idea. We should have long ago either
 added WAL logging or removed the index type. We shouldn't have left a
 foot-gun that large lying around for so long.

We can't remove the hash index type, nor move it to an extension,
because it is the operator classes for the built-in hash index AM
that tell the planner and executor how to do hashing for arbitrary
datatypes.  And we certainly do not want to give up hashing-based
query plans, whatever you may think of hash indexes.

We really oughta fix the WAL situation, not just band-aid around it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-07 Thread Heikki Linnakangas

On 03/06/2014 09:34 PM, Robert Haas wrote:

On Thu, Mar 6, 2014 at 8:11 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

I don't think it's necessary to improve concurrency just to get WAL-logging.
Better concurrency is a worthy goal of its own, of course, but it's a
separate concern.

To some extent, I agree, but only to some extent. To make hash
indexes generally usable, we really need to solve both problems. When
I got rid of just some of the heavyweight locking in commit
76837c1507cb5a5a0048046233568447729e66dd, the results were pretty
dramatic at higher levels of concurrency:

http://www.postgresql.org/message-id/CA+Tgmoaf=nojxlyzgcbrry+pe-0vll0vfhi6tjdm3fftvws...@mail.gmail.com

But there was still an awful lot of contention inside the heavyweight
lock manager, and I don't think hash indexes are going to be ready for
prime time until we solve that problem.

Hmm. You suggested ensuring that a scan always has at least a pin, and
split takes a vacuum-lock. That ought to work. There's no need for the
more complicated maneuvers you described, ISTM that you can just replace
the heavy-weight share lock with holding a pin on the primary page of
the bucket, and an exclusive lock with a vacuum-lock. Note that
_hash_expandtable already takes the exclusive lock conditionally, ie. if
it doesn't get the lock immediately it just gives up. We could do the
same with the cleanup lock.

Vacuum could also be enhanced. It currently takes an exclusive lock on
the bucket, then removes any dead tuples and finally squeezes the
bucket by moving tuples to earlier pages. But you only really need the
exclusive lock for the squeeze-phase. You could do the dead-tuple
removal without the bucket-lock, and only grab for the squeeze-phase.
And the squeezing is optional, so you could just skip that if you can't
get the lock. But that's a separate patch as well.

One more thing we could do to make hash indexes more scalable,
independent of the above: Cache the information in the metapage in
backend-private memory. Then you (usually) wouldn't need to access the
metapage at all when scanning. Store a copy of the bitmask for that
bucket in the primary page, and when scanning, check that it matches the
cached value. If not, refresh the cached metapage and try again.

So, there seems to be a few fairly simple and independent improvements
to be made:

1. Replace the heavy-weight lock with pin vacuum-lock.
2. Make it crash-safe, by adding WAL-logging
3. Only acquire the exclusive-lock (vacuum-lock after step 1) in VACUUM
for the squeeze phase.

4. Cache the metapage.

We still don't know if it's going to be any better than B-tree after all
that's done, but the only way to find out is to go ahead and implement it.

This seems like a great GSoC project to me. We have a pretty good idea
of what we want to accomplish. It's uncontroversial: I don't think
anyone is going to object to improving hash indexes (one could argue
that it's a waste of time, but that's different from objecting to the
idea). And it consists of a few mostly independent parts, so it's
possible to do incrementally which makes it easier to track progress,
and we'll probably have something useful at the end of the summer even
if it doesn't all get finished.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-07 Thread Robert Haas

On Thu, Mar 6, 2014 at 7:07 PM, Greg Stark st...@mit.edu wrote:
 On Thu, Mar 6, 2014 at 11:14 PM, Robert Haas robertmh...@gmail.com wrote:
 I've been tempted to implement a new type of hash index that allows both WAL
 and high concurrency, simply by disallowing bucket splits.  At the index
 creation time you use a storage parameter to specify the number of buckets,
 and that is that. If you mis-planned, build a new index with more buckets,
 possibly concurrently, and drop the too-small one.

 Yeah, we could certainly do something like that.  It sort of sucks,
 though.  I mean, it's probably pretty easy to know that starting with
 the default 2 buckets is not going to be enough; most people will at
 least be smart enough to start with, say, 1024.  But are you going to
 know whether you need 32768 or 1048576 or 33554432?  A lot of people
 won't, and we have more than enough reasons for performance to degrade
 over time as it is.

 The other thought I had was that you can do things lazily in vacuum.
 So when you probe you need to check multiple pages until vacuum comes
 along and rehashes everything.

I was thinking about this, too.  Cleaning up the old bucket after the
split is pretty similar to vacuuming.  And lo and behold, vacuum also
needs to lock the entire bucket.  AFAICT, there are two reasons for
this.  First, when we resume a suspended scan, we assume that the next
match on the page, if any, will occur on the same page at an offset
greater than the offset where we found the previous match.  The code
copes with the possibility of current insertions by refinding the last
item we returned and scanning forward from there, but assumes the
pointer we're refinding can't have moved to a lower offset.  I think
this could be easily fixed - at essentially no cost - by changing
hashgettuple so that, if the forward scan errors out, it tries
scanning backward rather than just giving up.  Second, vacuum compacts
each bucket that it modifies using _hash_squeezebucket, which scans
from the two ends of the index in toward the middle, filling any free
space on earlier pages by pulling tuples from the end of the bucket
chain.  This is a little thornier.  We could change vacuum so that it
only removes TIDs from the individual pages, without actually trying
to free up pages, but that seems undesirable.

However, I think we might be able to get by with making bucket
compaction less aggressive, without eliminating it altogether.
Suppose that whenever we remove items from a page, we consider
consolidating the page with its immediate predecessor and successor in
the bucket chain.  This means our utilization will be a little over
50% in the worst case where we have a full page, a page with one item,
another full page, another page with one item, and so on.  But that's
not all bad, because compacting a bucket chain to the minimum possible
size when it may well be about to suffer more inserts isn't
necessarily a good thing anyway.  Also, doing this means that we don't
need to lock out scans from the entire bucket chain in order to
compact.  It's sufficient to make sure that nobody's in the middle of
scanning the two pages we want to merge.

That turns out not to be possible right now.  When a scan is
suspended, we still hold a pin on the page we're scanning.  But when
_hash_readnext or _hash_readprev walks the bucket chain, it drops both
lock and pin on the current buffer and then pins and locks the new
buffer.  That, however, seems like it could easily be changed: drop
lock on current buffer, acquire pin on new buffer, drop pin on current
buffer, lock new buffer.  Then, if we want to merge two buffers, it's
enough to lock both of them for cleanup.  To avoid any risk of
deadlock, and also to avoid waiting for a long-running suspended scan
to wake up and complete, we can do this conditionally; if we fail to
get either cleanup lock, then just don't merge the pages; take an
exclusive lock only and remove whatever you need to remove, leaving
the rest.  Merging pages is only a performance optimization, so if it
fails now and then, no real harm done.

(A side benefit of this approach is that we could opportunistically
attempt to compact pages containing dead items any time we can manage
a ConditionalLockBufferForCleanup() on the page, sort of like what HOT
pruning does for heap blocks.  We could even, after pruning away dead
items, attempt to merge the page with siblings, so that even without
vacuum the bucket chains can gradually shrink if the index tuples are
discovered to be pointing to dead heap tuples.)

With the above changes, vacuuming a bucket can happen without taking
the heavyweight lock in exclusive mode, leaving bucket splitting as
the only operation that requires it.  And we could perhaps use
basically the same algorithm to clean the tuples out of the old bucket
after the split.  The problem is that, when we're vacuuming, we know
that no scan currently in progress can still care about any of the
tuples we're removing.

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-07 Thread Robert Haas

[ I just sent a reply to Greg Stark's email which touches on some of
the same points you mentioned here; that was mostly written last
night, and finished this morning before seeing this.  ]

On Fri, Mar 7, 2014 at 4:34 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Hmm. You suggested ensuring that a scan always has at least a pin, and split
 takes a vacuum-lock. That ought to work. There's no need for the more
 complicated maneuvers you described, ISTM that you can just replace the
 heavy-weight share lock with holding a pin on the primary page of the
 bucket, and an exclusive lock with a vacuum-lock. Note that
 _hash_expandtable already takes the exclusive lock conditionally, ie. if it
 doesn't get the lock immediately it just gives up. We could do the same with
 the cleanup lock.

We could try that.  I assume you mean do *just* what you describe
here, without the split-in-progress or moved-by-split flags I
suggested.  The only issue I see with that is that instead of everyone
piling up on the heavyweight lock, a wait which is interruptible,
they'd all pile up on the buffer content lwlock, a wait which isn't.
And splitting a bucket can involve an arbitrary number of I/O
operations, so that's kind of unappealing.  Even checkpoints would be
blocked until the bucket split completed, which seems unfortunate.

But I like the idea of teaching each scan to retain a pin on the
primary buffer page.  If we then do the split the way I proposed
before, we can implement the outwait concurrent scans step by taking
a cleanup lock on the primary buffer page.  In this design, we only
need to acquire and release the cleanup lock.  Once we get the cleanup
lock on the primary buffer page, even for an instant, we know that
there are no remaining scans in the bucket that need the pre-split
tuples that have now been moved to the new bucket.  We can then remove
tuples with a lesser lock (or keep the stronger lock if we wish to
re-pack).

 Vacuum could also be enhanced. It currently takes an exclusive lock on the
 bucket, then removes any dead tuples and finally squeezes the bucket by
 moving tuples to earlier pages. But you only really need the exclusive lock
 for the squeeze-phase. You could do the dead-tuple removal without the
 bucket-lock, and only grab for the squeeze-phase. And the squeezing is
 optional, so you could just skip that if you can't get the lock. But that's
 a separate patch as well.

Yeah, I think this would be a useful improvement.

 One more thing we could do to make hash indexes more scalable, independent
 of the above: Cache the information in the metapage in backend-private
 memory. Then you (usually) wouldn't need to access the metapage at all when
 scanning. Store a copy of the bitmask for that bucket in the primary page,
 and when scanning, check that it matches the cached value. If not, refresh
 the cached metapage and try again.

I don't know whether this would be a useful improvement.

 So, there seems to be a few fairly simple and independent improvements to be
 made:

 1. Replace the heavy-weight lock with pin  vacuum-lock.
 2. Make it crash-safe, by adding WAL-logging
 3. Only acquire the exclusive-lock (vacuum-lock after step 1) in VACUUM for
 the squeeze phase.
 4. Cache the metapage.

 We still don't know if it's going to be any better than B-tree after all
 that's done, but the only way to find out is to go ahead and implement it.

I'm of the opinion that we ought to start by making splits and
vacuuming more concurrent, and then do that stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-07 Thread Heikki Linnakangas


On 03/07/2014 03:48 PM, Robert Haas wrote:

So, there seems to be a few fairly simple and independent improvements to be
made:

1. Replace the heavy-weight lock with pin  vacuum-lock.
2. Make it crash-safe, by adding WAL-logging
3. Only acquire the exclusive-lock (vacuum-lock after step 1) in VACUUM for
the squeeze phase.
4. Cache the metapage.

We still don't know if it's going to be any better than B-tree after all
that's done, but the only way to find out is to go ahead and implement it.

I'm of the opinion that we ought to start by making splits and
vacuuming more concurrent, and then do that stuff.


Priorities are a matter of opinion, but note that for many applications 
the concurrency of splits and vacuuming is pretty much irrelevant (at 
least after we do bullet point 3 above). Like, if the index is mostly 
read-only, or at least doesn't grow much. You'd still want it to be 
crash-safe (2), and you might care a lot about the performance of scans 
(1 and 4), but if splits and vacuum hold an exclusive lock for a few 
seconds, that's OK if you only need to do it once in a blue moon.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-07 Thread k...@rice.edu

On Thu, Mar 06, 2014 at 06:14:21PM -0500, Robert Haas wrote:
 On Thu, Mar 6, 2014 at 3:44 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 
  I've been tempted to implement a new type of hash index that allows both WAL
  and high concurrency, simply by disallowing bucket splits.  At the index
  creation time you use a storage parameter to specify the number of buckets,
  and that is that. If you mis-planned, build a new index with more buckets,
  possibly concurrently, and drop the too-small one.
 
 Yeah, we could certainly do something like that.  It sort of sucks,
 though.  I mean, it's probably pretty easy to know that starting with
 the default 2 buckets is not going to be enough; most people will at
 least be smart enough to start with, say, 1024.  But are you going to
 know whether you need 32768 or 1048576 or 33554432?  A lot of people
 won't, and we have more than enough reasons for performance to degrade
 over time as it is.
 
It would be useful to have a storage parameter for the target size of
the index, even if it is not exact, to use in the initial index build
to avoid the flurry of i/o caused by bucket splits as the index grows.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-07 Thread Heikki Linnakangas


On 03/07/2014 03:48 PM, Robert Haas wrote:

On Fri, Mar 7, 2014 at 4:34 AM, Heikki Linnakangas
hlinnakan...@vmware.com  wrote:

Hmm. You suggested ensuring that a scan always has at least a pin, and split
takes a vacuum-lock. That ought to work. There's no need for the more
complicated maneuvers you described, ISTM that you can just replace the
heavy-weight share lock with holding a pin on the primary page of the
bucket, and an exclusive lock with a vacuum-lock. Note that
_hash_expandtable already takes the exclusive lock conditionally, ie. if it
doesn't get the lock immediately it just gives up. We could do the same with
the cleanup lock.

We could try that.  I assume you mean do*just*  what you describe
here, without the split-in-progress or moved-by-split flags I
suggested.


Yep.


The only issue I see with that is that instead of everyone
piling up on the heavyweight lock, a wait which is interruptible,
they'd all pile up on the buffer content lwlock, a wait which isn't.
And splitting a bucket can involve an arbitrary number of I/O
operations, so that's kind of unappealing.  Even checkpoints would be
blocked until the bucket split completed, which seems unfortunate.


Hmm. I doubt that's a big deal in practice, although I agree it's a bit 
ugly.


Once we solve the crash-safety of splits, we actually have the option of 
doing the split in many small steps, even when there's no crash 
involved. You could for example grab the vacuum-lock, move all the 
tuples in the first 5 pages, and then release the lock to give other 
backends that are queued up a chance to do their scans/insertions. Then 
re-acquire the lock, and continue where you left. Or just bail out and 
let the next vacuum or insertion to finish it later.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-06 Thread Heikki Linnakangas


(de-CC'ing pgsql-advocacy)

On 03/06/2014 04:03 AM, Greg Stark wrote:

On Mon, Mar 3, 2014 at 4:12 PM, Robert Haas robertmh...@gmail.com wrote:

Unfortunately, I don't believe that it's possible to do this easily
today because of the way bucket splits are handled.  I wrote about
this previously here, with an idea for solving the problem:


We could just tackle this in the same incomplete, buggy, way that
btrees tackled it for years until Heikki fixed them and the way gin
and gist still do I believe. Namely just emit xlog records for each
page individually and during replay remember when you have an
incomplete split and complain if recovery ends with any still
incomplete. That would be unfortunate to be adding new cases of this
just as Heikki and company are making progress eliminating the ones we
already had but that's surely better than having no recovery at all.


Grmph. Indeed.

Looking at Robert's idea from November:


I have thought about doing some work on this, although I don't know
when I'll ever have the time.  As far as I can see, the basic problem
is that we need a better way of splitting buckets.  Right now,
splitting a bucket means locking it, scanning the entire bucket chain
and moving ~1/2 the tuples to the new bucket, and then unlocking it.
Since this is a long operation, we have to use heavyweight locks to
protect the buckets, which is bad for concurrency.  Since it involves
a modification to an arbitrarily large number of pages that has to be
atomic, it's not possible to WAL-log it sensibly  -- and in fact, I
think this is really the major obstacle to being able to implementing
WAL-logging for hash indexes.


I don't think it's necessary to improve concurrency just to get 
WAL-logging. Better concurrency is a worthy goal of its own, of course, 
but it's a separate concern.


For crash safety, the key is to make sure you perform the whole 
operation in small atomic steps, and you have a way of knowing where to 
continue after crash (this is the same whether you do the cleanup 
immediately at the end of recovery, which I want to avoid, or lazily 
afterwards). But you can hold locks across those small atomic steps, to 
ensure concurrency-safety.



I think we could work around this problem as follows.   Suppose we
have buckets 1..N and the split will create bucket N+1.  We need a
flag that can be set and cleared on the metapage, call it
split-in-progress, and a flag that can optionally be set on particular
index tuples, call it moved-by-split.  We will allow scans of all
buckets and insertions into all buckets while the split is in
progress, but (as now) we will not allow more than one split to be in
progress at the same time.


Hmm, unless I'm reading the code wrong, it *does* allow more than one 
split to be in progress at the same time today.



[description of how to use the moved-by-split to allow scans to run 
concurrently with the split]


I guess that would work, although you didn't actually describe how to 
continue a split after a crash. But it's a lot simpler if you don't also 
try to make it more concurrent:


---
When you start splitting a bucket, first acquire a heavy-weight lock on 
the old and new buckets. Allocate the required number of pages, before 
changing anything on-disk, so that you can easily back out if you run 
out of disk space. So far, this is how splitting works today.


Then you update the metapage to show that the bucket has been split and 
initialize the new bucket's primary page (as one atomic WAL-logged 
operation). Also mark the new bucket's primary page with a 
split-in-progress flag. That's used later by scans to detect if the 
split was interrupted.


Now you scan the old bucket, and move all the tuples belonging to the 
new bucket. That needs to be done as a series of small atomic WAL-logged 
operations, each involving a small number of old and new pages (one WAL 
record for each moved tuple is the simplest, but you can do some 
batching for efficiency). After you're done, clear the split-in-progress 
flag in the new bucket's primary page (WAL-log that), and release the locks.


In a scan, if the bucket you're about to scan has the split-in-progress 
flag set, that indicates that the split was interrupted by a crash. You 
won't see the flag as set if a concurrent split is in progress, because 
you will block on the lock it's holding on the bucket. If you see the 
flag as set, you share-lock and scan both buckets, the old and the new.


If you see the split-in-progress flag in the bucket you're about to 
insert to, you don't need to do anything special. Just insert the tuple 
to the new bucket as normal. Before splitting the new bucket again, 
however, the previous split needs to be finished, or things will get 
complicated. To do that, acquire the locks on the old and the new 
bucket, and scan the old bucket for any remaining tuples that belong to 
the new bucket and move them, and finally clear the split-in-progress flag.

---

This is similar to

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-06 Thread Robert Haas

On Thu, Mar 6, 2014 at 8:11 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
I don't think it's necessary to improve concurrency just to get WAL-logging.
Better concurrency is a worthy goal of its own, of course, but it's a
separate concern.

http://www.postgresql.org/message-id/CA+Tgmoaf=nojxlyzgcbrry+pe-0vll0vfhi6tjdm3fftvws...@mail.gmail.com

But there was still an awful lot of contention inside the heavyweight
lock manager, and I don't think hash indexes are going to be ready for
prime time until we solve that problem.

This is similar to your description, as you scan both buckets if you see an
interrupted split, but doesn't require the per-tuple moved-by-split flag, or
waiting out scans. Also, I put the split-in-progress flag in the new
bucket's primary page instead of the metapage. That allows multiple splits
to be in progress at the same time.

Putting the split-in-progress flag in the new bucket's primary page
makes a lot of sense. I don't have any problem with dumping the rest
of it for a first cut if we have a different long-term plan for how to
improve concurrency, but I don't see much point in going to a lot of
work to implement a system for WAL logging if we're going to end up
having to afterwards throw it out and start from scratch to get rid of
the heavyweight locks - and it's not obvious to me how what you have
here could be extended to do that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-06 Thread Jeff Janes

On Thu, Mar 6, 2014 at 11:34 AM, Robert Haas robertmh...@gmail.com wrote:


 Putting the split-in-progress flag in the new bucket's primary page
 makes a lot of sense.  I don't have any problem with dumping the rest
 of it for a first cut if we have a different long-term plan for how to
 improve concurrency, but I don't see much point in going to a lot of
 work to implement a system for WAL logging if we're going to end up
 having to afterwards throw it out and start from scratch to get rid of
 the heavyweight locks - and it's not obvious to me how what you have
 here could be extended to do that.


+1   I don't think we have to improve concurrency at the same time as WAL
logging, but we at least have to implement WAL logging in a way that
doesn't foreclose future improvements to concurrency.

I've been tempted to implement a new type of hash index that allows both
WAL and high concurrency, simply by disallowing bucket splits.  At the
index creation time you use a storage parameter to specify the number of
buckets, and that is that. If you mis-planned, build a new index with more
buckets, possibly concurrently, and drop the too-small one.

I think it would be easier to add bucket splitting to something like this
than it would be to add WAL logging and concurrency to what we already
have--mostly because I think that route would be more amenable to
incremental programming efforts, and also because if people had an actually
useful hash index type they would use it and see that it worked well
(assuming of course that it does work well), and then be motivated to
improve it.

If this could be done as an extension I would just go (attempt to) do it.
 But since WAL isn't pluggable, it can't go in as an extension.

Cheers,

Jeff

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-06 Thread Robert Haas

On Thu, Mar 6, 2014 at 3:44 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Thu, Mar 6, 2014 at 11:34 AM, Robert Haas robertmh...@gmail.com wrote:
 Putting the split-in-progress flag in the new bucket's primary page
 makes a lot of sense.  I don't have any problem with dumping the rest
 of it for a first cut if we have a different long-term plan for how to
 improve concurrency, but I don't see much point in going to a lot of
 work to implement a system for WAL logging if we're going to end up
 having to afterwards throw it out and start from scratch to get rid of
 the heavyweight locks - and it's not obvious to me how what you have
 here could be extended to do that.

 +1   I don't think we have to improve concurrency at the same time as WAL
 logging, but we at least have to implement WAL logging in a way that doesn't
 foreclose future improvements to concurrency.

 I've been tempted to implement a new type of hash index that allows both WAL
 and high concurrency, simply by disallowing bucket splits.  At the index
 creation time you use a storage parameter to specify the number of buckets,
 and that is that. If you mis-planned, build a new index with more buckets,
 possibly concurrently, and drop the too-small one.

Yeah, we could certainly do something like that.  It sort of sucks,
though.  I mean, it's probably pretty easy to know that starting with
the default 2 buckets is not going to be enough; most people will at
least be smart enough to start with, say, 1024.  But are you going to
know whether you need 32768 or 1048576 or 33554432?  A lot of people
won't, and we have more than enough reasons for performance to degrade
over time as it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-06 Thread Greg Stark

On Thu, Mar 6, 2014 at 11:14 PM, Robert Haas robertmh...@gmail.com wrote:
 I've been tempted to implement a new type of hash index that allows both WAL
 and high concurrency, simply by disallowing bucket splits.  At the index
 creation time you use a storage parameter to specify the number of buckets,
 and that is that. If you mis-planned, build a new index with more buckets,
 possibly concurrently, and drop the too-small one.

 Yeah, we could certainly do something like that.  It sort of sucks,
 though.  I mean, it's probably pretty easy to know that starting with
 the default 2 buckets is not going to be enough; most people will at
 least be smart enough to start with, say, 1024.  But are you going to
 know whether you need 32768 or 1048576 or 33554432?  A lot of people
 won't, and we have more than enough reasons for performance to degrade
 over time as it is.

The other thought I had was that you can do things lazily in vacuum.
So when you probe you need to check multiple pages until vacuum comes
along and rehashes everything.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-05 Thread Jeff Janes

On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas robertmh...@gmail.com wrote:


 Unfortunately, I don't believe that it's possible to do this easily
 today because of the way bucket splits are handled.  I wrote about
 this previously here, with an idea for solving the problem:


 http://www.postgresql.org/message-id/ca+tgmozymojsrfxhxq06g8jhjxqcskvdihb_8z_7nc7hj7i...@mail.gmail.com

 Sadly, no one responded.  :-(


On Mon, Mar 3, 2014 at 9:39 AM, Tan Tran tankimt...@gmail.com wrote:

 Thanks for alerting me to your previous idea. While I don't know enough
 about Postgresql internals to judge its merits yet, I'll write some
 pseudocode based on it in my proposal; and I'll relegate it to a reach
 proposal alongside a more straightforward one.

 Tan


Hi Tan,

I'm not familiar with the inner workings of the GSoC, but I don't know if
this can be relegated to a stretch goal.  WAL logging is an all or
nothing thing.  I think that, to be applied to the codebase (which I assume
is the goal of GSoC), all actions need to be implemented.  That is probably
why this has remained open so long: there is no incremental way to get the
code written.  (But I would like to see it get done, I don't want to
discourage that.)

Cheers,

Jeff

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-05 Thread Greg Stark

On Mon, Mar 3, 2014 at 4:12 PM, Robert Haas robertmh...@gmail.com wrote:
 Unfortunately, I don't believe that it's possible to do this easily
 today because of the way bucket splits are handled.  I wrote about
 this previously here, with an idea for solving the problem:

We could just tackle this in the same incomplete, buggy, way that
btrees tackled it for years until Heikki fixed them and the way gin
and gist still do I believe. Namely just emit xlog records for each
page individually and during replay remember when you have an
incomplete split and complain if recovery ends with any still
incomplete. That would be unfortunate to be adding new cases of this
just as Heikki and company are making progress eliminating the ones we
already had but that's surely better than having no recovery at all.



-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-03 Thread Robert Haas

On Sun, Mar 2, 2014 at 2:38 PM, Tan Tran tankimt...@gmail.com wrote:
 2. Proposal
 As a GSoC student, I will implement WAL recovery of hash indexes using the
 other index types' WAL code as a guide. Roughly, I will:
 - Devise a way to store and retrieve hashing data within the XLog data
 structures.
 - In the existing skeleton for hash_redo(XLogRecPtr lsn, XLogRecord *record)
 in hash.c, branch to code for the various redo operations: creating an
 index, inserting into an index, deleting an index, and page operations
 (split, delete, update?).
 - Code each branch by drawing on examples from btree_redo, gin_redo, and
 gist_redo, the existing XLog code of the other index types.

Unfortunately, I don't believe that it's possible to do this easily
today because of the way bucket splits are handled.  I wrote about
this previously here, with an idea for solving the problem:

http://www.postgresql.org/message-id/ca+tgmozymojsrfxhxq06g8jhjxqcskvdihb_8z_7nc7hj7i...@mail.gmail.com

Sadly, no one responded.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-03 Thread Tan Tran

Thanks for alerting me to your previous idea. While I don't know enough
about Postgresql internals to judge its merits yet, I'll write some
pseudocode based on it in my proposal; and I'll relegate it to a reach
proposal alongside a more straightforward one.

Tan

On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas robertmh...@gmail.com wrote:

 On Sun, Mar 2, 2014 at 2:38 PM, Tan Tran tankimt...@gmail.com wrote:
  2. Proposal
  As a GSoC student, I will implement WAL recovery of hash indexes using
 the
  other index types' WAL code as a guide. Roughly, I will:
  - Devise a way to store and retrieve hashing data within the XLog data
  structures.
  - In the existing skeleton for hash_redo(XLogRecPtr lsn, XLogRecord
 *record)
  in hash.c, branch to code for the various redo operations: creating an
  index, inserting into an index, deleting an index, and page operations
  (split, delete, update?).
  - Code each branch by drawing on examples from btree_redo, gin_redo, and
  gist_redo, the existing XLog code of the other index types.

 Unfortunately, I don't believe that it's possible to do this easily
 today because of the way bucket splits are handled.  I wrote about
 this previously here, with an idea for solving the problem:


 http://www.postgresql.org/message-id/ca+tgmozymojsrfxhxq06g8jhjxqcskvdihb_8z_7nc7hj7i...@mail.gmail.com

 Sadly, no one responded.  :-(

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

37 matches

Mail list logo