Re: [HACKERS] GSoC on WAL-logging hash indexes
On 08/20/2014 02:43 AM, Michael Paquier wrote: On Thu, Jun 19, 2014 at 6:40 PM, Vik Fearing vik.fear...@dalibo.com mailto:vik.fear...@dalibo.com wrote: On 04/30/2014 11:41 PM, Tom Lane wrote: We really oughta fix the WAL situation, not just band-aid around it. After re-reading this thread, it is not clear that anyone is going to work on it so I'll just ask: Is anyone working on this? If not, I'd like to put it on my plate. Vik, did you get time to look at that finally? Yes, I am (very slowly) working on this. I've got a decent learning curve for WAL replay to get over and I figured this can't be urgent considering how many years it's been like this so I'm sort of taking my time. -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Thu, Jun 19, 2014 at 6:40 PM, Vik Fearing vik.fear...@dalibo.com wrote: On 04/30/2014 11:41 PM, Tom Lane wrote: We really oughta fix the WAL situation, not just band-aid around it. After re-reading this thread, it is not clear that anyone is going to work on it so I'll just ask: Is anyone working on this? If not, I'd like to put it on my plate. Vik, did you get time to look at that finally? Regards, -- Michael
Re: [HACKERS] GSoC on WAL-logging hash indexes
On 04/30/2014 11:41 PM, Tom Lane wrote: We really oughta fix the WAL situation, not just band-aid around it. After re-reading this thread, it is not clear that anyone is going to work on it so I'll just ask: Is anyone working on this? If not, I'd like to put it on my plate. -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas robertmh...@gmail.com wrote: As a GSoC student, I will implement WAL recovery of hash indexes using the other index types' WAL code as a guide. Frankly, I'm skeptical of the idea that hash indexes will ever really be useful. I realize that that's a counter-intuitive conclusion, but there are many things we could do to improve B-Tree CPU costs to make them closer to those of hash indexes, without making them any less flexible. I myself would much rather work on that, and intend to. The O(1) cost seems attractive when you consider that that only requires that we read one index page from disk to service any given index scan, but in fact B-Trees almost always only require the same. They are of course also much more flexible. The concurrency characteristics B-Trees are a lot better understood. I sincerely suggest that we forget about conventional hash table type indexes. I fear they're a lost cause. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Wed, Apr 30, 2014 at 12:26:20AM -0700, Peter Geoghegan wrote: On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas robertmh...@gmail.com wrote: As a GSoC student, I will implement WAL recovery of hash indexes using the other index types' WAL code as a guide. Frankly, I'm skeptical of the idea that hash indexes will ever really be useful. I realize that that's a counter-intuitive conclusion, but there are many things we could do to improve B-Tree CPU costs to make them closer to those of hash indexes, without making them any less flexible. I myself would much rather work on that, and intend to. The O(1) cost seems attractive when you consider that that only requires that we read one index page from disk to service any given index scan, but in fact B-Trees almost always only require the same. They are of course also much more flexible. The concurrency characteristics B-Trees are a lot better understood. I sincerely suggest that we forget about conventional hash table type indexes. I fear they're a lost cause. -- Peter Geoghegan Hi Peter, I do not think that CPU costs matter as much as the O(1) probe to get a result value specifically for very large indexes/tables where even caching the upper levels of a B-tree index would kill your working set in memory. I know, I know, everyone has so much memory and can just buy more... but this does matter. I also think that development of hash indexes has been stalled waiting for WAL logging. For example, hash indexes can almost trivially become more space efficient as they grow in size by utilizing the page number to represent the prefix bits of the hash value for a bucket. My 2 cents. Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Wed, Apr 30, 2014 at 5:55 AM, k...@rice.edu k...@rice.edu wrote: I do not think that CPU costs matter as much as the O(1) probe to get a result value specifically for very large indexes/tables where even caching the upper levels of a B-tree index would kill your working set in memory. I know, I know, everyone has so much memory and can just buy more... but this does matter. Have you actually investigated how little memory it takes to store the inner pages? It's typically less than 1% of the entire index. AFAIK, hash indexes are not used much in any other system. I think MySQL has them, and SQL Server 2014 has special in-memory hash table indexes for in memory tables, but that's all I can find on Google. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Wed, Apr 30, 2014 at 12:54 PM, Peter Geoghegan p...@heroku.com wrote: On Wed, Apr 30, 2014 at 5:55 AM, k...@rice.edu k...@rice.edu wrote: I do not think that CPU costs matter as much as the O(1) probe to get a result value specifically for very large indexes/tables where even caching the upper levels of a B-tree index would kill your working set in memory. I know, I know, everyone has so much memory and can just buy more... but this does matter. Have you actually investigated how little memory it takes to store the inner pages? It's typically less than 1% of the entire index. AFAIK, hash indexes are not used much in any other system. I think MySQL has them, and SQL Server 2014 has special in-memory hash table indexes for in memory tables, but that's all I can find on Google. I thought the theoretical advantage of hash indexes wasn't that they were smaller but that you avoided a central contention point (the btree root). Of course our current hash indexes have *more* not less contention than btree but I'm pretty comfortable chalking that up to quality of implementation rather than anything intrinsic. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
Robert Haas robertmh...@gmail.com writes: I thought the theoretical advantage of hash indexes wasn't that they were smaller but that you avoided a central contention point (the btree root). Of course our current hash indexes have *more* not less contention than btree but I'm pretty comfortable chalking that up to quality of implementation rather than anything intrinsic. The long and the short of it is that there are *lots* of implementation deficiences in our hash indexes. There's no real way to know whether they'd be competitive if all those things were rectified, except by doing the work to fix 'em. And it's hard to justify putting much effort into hash indexes so long as there's an elephant in the room of the size of no WAL support. So I'm in favor of getting that fixed, if we have somebody who's willing to do it. It might lead to good things later; and even if it doesn't, the lack of WAL support is an embarrassment. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Wed, Apr 30, 2014 at 10:11 AM, Robert Haas robertmh...@gmail.com wrote: I thought the theoretical advantage of hash indexes wasn't that they were smaller but that you avoided a central contention point (the btree root). The B-Tree root isn't really a central contention point at all. The locking/latching protocol that nbtree uses is remarkably concurrency-friendly. In the real world, there is pretty much no exclusive locking of the root page's buffer. Of course our current hash indexes have *more* not less contention than btree but I'm pretty comfortable chalking that up to quality of implementation rather than anything intrinsic. I am not convinced of that. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Wed, Apr 30, 2014 at 11:01 AM, Tom Lane t...@sss.pgh.pa.us wrote: It might lead to good things later; and even if it doesn't, the lack of WAL support is an embarrassment. I don't think it will, but I do agree that the current state of affairs is an embarrassment. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Wed, Apr 30, 2014 at 12:26 AM, Peter Geoghegan p...@heroku.com wrote: On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas robertmh...@gmail.com wrote: As a GSoC student, I will implement WAL recovery of hash indexes using the other index types' WAL code as a guide. Frankly, I'm skeptical of the idea that hash indexes will ever really be useful. I realize that that's a counter-intuitive conclusion, but there are many things we could do to improve B-Tree CPU costs to make them closer to those of hash indexes, without making them any less flexible. I myself would much rather work on that, and intend to. If we don't put in the work to make them useful, then they won't ever become useful. If we do put in the effort (and it would be considerable) then I think they will be. But you may be correct that the effort required would perhaps be better used in making btree even more better. I don't think we can conclude that definitively without putting in the work to do the experiment. One advantage of the hash indexes is that the code is simple enough for someone to actually understand it in a summer. Whether it would still be like that after WAL logging was implemented, I don't know. The O(1) cost seems attractive when you consider that that only requires that we read one index page from disk to service any given index scan, but in fact B-Trees almost always only require the same. They are of course also much more flexible. The concurrency characteristics B-Trees are a lot better understood. Not sure what you mean there. The concurrency issues of the hash index has a lot less that needs to be understand. I think I understand it pretty well (unlike B-tree), I just don't know what to with that knowledge. I sincerely suggest that we forget about conventional hash table type indexes. I fear they're a lost cause. I understand that those are the only ones worth fighting for. :) Cheers, Jeff
Re: [HACKERS] GSoC on WAL-logging hash indexes
All, (dropping pgsql-advocacy off the cc list) On 04/30/2014 10:11 AM, Robert Haas wrote: On Wed, Apr 30, 2014 at 12:54 PM, Peter Geoghegan p...@heroku.com wrote: On Wed, Apr 30, 2014 at 5:55 AM, k...@rice.edu k...@rice.edu wrote: I do not think that CPU costs matter as much as the O(1) probe to get a result value specifically for very large indexes/tables where even caching the upper levels of a B-tree index would kill your working set in memory. I know, I know, everyone has so much memory and can just buy more... but this does matter. Have you actually investigated how little memory it takes to store the inner pages? It's typically less than 1% of the entire index. AFAIK, hash indexes are not used much in any other system. I think MySQL has them, and SQL Server 2014 has special in-memory hash table indexes for in memory tables, but that's all I can find on Google. Hash indexes are more important for MySQL because they have index-organized tables. I thought the theoretical advantage of hash indexes wasn't that they were smaller but that you avoided a central contention point (the btree root). Yes. And being smaller isn't insignificant; think of billion-row tables with fairly random access over the whole table. Also, *theoretically*, a hash index could avoid the rebalancing issues which cause our btree indexes to become bloated and need a REINDEX with certain update patterns. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Wed, Apr 30, 2014 at 11:02 AM, Peter Geoghegan p...@heroku.com wrote: On Wed, Apr 30, 2014 at 10:11 AM, Robert Haas robertmh...@gmail.com wrote: I thought the theoretical advantage of hash indexes wasn't that they were smaller but that you avoided a central contention point (the btree root). The B-Tree root isn't really a central contention point at all. The locking/latching protocol that nbtree uses is remarkably concurrency-friendly. In the real world, there is pretty much no exclusive locking of the root page's buffer. I've seen the simple pinning and unpinning of the root page (or the fast root, whatever the first page we bother to pin on a regular basis is called) be a point of contention. When one index dominates the entire system workload, that one page also drives contention on the spin lock that protects the lwlock that share-protects whichever buffer mapping partition happens to contain it. Cheers, Jeff
Re: [HACKERS] GSoC on WAL-logging hash indexes
On 2014-04-30 11:10:22 -0700, Jeff Janes wrote: I've seen the simple pinning and unpinning of the root page (or the fast root, whatever the first page we bother to pin on a regular basis is called) be a point of contention. When one index dominates the entire system workload, that one page also drives contention on the spin lock that protects the lwlock that share-protects whichever buffer mapping partition happens to contain it. To quite some degree that's an implementation deficiency of our lwlocks though. I've seen *massive* improvements with my lwlock patch for that problem. Additionally we need to get rid of the spinlock around pin/unpin. That said, even after those optimizations, there remains a significant amount of cacheline bouncing. That's much easier to avoid for something like hash indexes than btrees. I think another advantage is that hash indexes can be *much* smaller than btree when the individual rows are wide. I wonder though if we couldn't solve that better by introducing transforms around the looked up data. E.g. allow to *transparently* use a hash(indexed_column) to be used. If you currently do that a lot of work has to be done in every query... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Wed, Apr 30, 2014 at 2:02 PM, Peter Geoghegan p...@heroku.com wrote: Of course our current hash indexes have *more* not less contention than btree but I'm pretty comfortable chalking that up to quality of implementation rather than anything intrinsic. I am not convinced of that. In commit 76837c1507cb5a5a0048046233568447729e66dd, I remove part (basically half) of the heavyweight locking used by hash indexes, and performance improved rather dramatically: http://www.postgresql.org/message-id/CA+Tgmoaf=nojxlyzgcbrry+pe-0vll0vfhi6tjdm3fftvws...@mail.gmail.com I think it's quite reasonable to suppose that getting rid of the remaining heavyweight locking will help just as much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Wed, Apr 30, 2014 at 11:03 AM, Jeff Janes jeff.ja...@gmail.com wrote: If we don't put in the work to make them useful, then they won't ever become useful. If we do put in the effort (and it would be considerable) then I think they will be. But you may be correct that the effort required would perhaps be better used in making btree even more better. I don't think we can conclude that definitively without putting in the work to do the experiment. My argument doesn't hinge on there being more important work to do. Rather, I simply don't think that there is never going to be a compelling reason to use hash indexes in production. Apart from the obvious inflexibility, consider what it takes to make index creation fast - insertion-style building of indexes is much slower. Consider multi-key indexes. Now, I'm not telling anyone what to work on, and if someone wants to make hash indexes WAL-logged to plug that hole, don't let me stop you. It probably makes sense as a project to learn more about Postgres internals. However, it would be unfair to not speak up given my misgivings around the practical utility of hash indexes. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
I think the key question was if someone wanted to develop a good hash index would they start from our current hash index or would they be better off starting from a fresh codebase? If the former then we should add WAL logging and throw GSOC and other people who ask for projects at it. If the latter then we should throw out the current codebase and let people develop extensions that add new hash index code until someone comes up with a good design we want to move to core. But imnsho doing nothing is a bad idea. We should have long ago either added WAL logging or removed the index type. We shouldn't have left a foot-gun that large lying around for so long. Incidentally something else I've considered would be having a WAL record type saying relation corrupted and emitting one the first time a hash index is touched after a checkpoint. That could provide a general mechanism that might be useful for unlogged operations (and might be combinable with the infrastructure for unlogged tables). But it seems like a better tool for other objects than hash indexes. Another quick fix would be having a GUC allow_unrecoverable_relations which defaulted to false. Creating a hash index (and presumably unlogged tables) would error out with a hint to set that to true so at least users who create them would have to know what they were in for. Right now we're seeing a lot of users who create hash indexes and then complain about corrupted standbys. I could do the legwork on either the GUC or moving hash indexes to an extension if there's a consensus. But I suspect either will be quite controversial... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
* Greg Stark (st...@mit.edu) wrote: But imnsho doing nothing is a bad idea. We should have long ago either added WAL logging or removed the index type. We shouldn't have left a foot-gun that large lying around for so long. I certainly encourage you to work on it. :) Incidentally something else I've considered would be having a WAL record type saying relation corrupted and emitting one the first time a hash index is touched after a checkpoint. That could provide a general mechanism that might be useful for unlogged operations (and might be combinable with the infrastructure for unlogged tables). But it seems like a better tool for other objects than hash indexes. Ugh, please do not make unlogged tables suffer for this. I feel it is *quite* clear when you say UNLOGGED or TEMP in the CREATE statement that you know what you're gonna get. Perhaps we should require 'UNLOGGED INDEX' to be passed in when someone creates a 'hash' index instead. Another quick fix would be having a GUC allow_unrecoverable_relations which defaulted to false. Creating a hash index (and presumably unlogged tables) would error out with a hint to set that to true so at least users who create them would have to know what they were in for. -1 Right now we're seeing a lot of users who create hash indexes and then complain about corrupted standbys. Uh, we are? If you're saying that $employer is, great, but please clarify that as the rest of us aren't 'in the loop' as far as that goes and we see the various list/IRC traffic instead, and I can't remember ever seeing such a complaint in either place (but I'll admit, my memory isn't the best). I could do the legwork on either the GUC or moving hash indexes to an extension if there's a consensus. But I suspect either will be quite controversial... -1 on the GUC. I'd be alright with finding a way to make it clear, upon creation of the hash index, that it's not WAL'd (maybe even just throw a WARNING, it'd be better than nothing..). Trying to rip out the current hash index wouldn't be great, imv. If you'd like to build an extension which provides hash indexes- I say go for it? Nothing stopping you as far as that's concerned. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Wed, Apr 30, 2014 at 11:19 AM, Peter Geoghegan p...@heroku.com wrote: On Wed, Apr 30, 2014 at 11:03 AM, Jeff Janes jeff.ja...@gmail.com wrote: If we don't put in the work to make them useful, then they won't ever become useful. If we do put in the effort (and it would be considerable) then I think they will be. But you may be correct that the effort required would perhaps be better used in making btree even more better. I don't think we can conclude that definitively without putting in the work to do the experiment. My argument doesn't hinge on there being more important work to do. Rather, I simply don't think that there is never going to be a compelling reason to use hash indexes in production. I have an indexed text column with an average length of 50, a stddev length of 15, and a pronounced right skew. Currently, the longest value in it is 811. But inevitably someone will need to insert something longer than 2712. When that day comes, I will drop the btree index and add a hash index (unless we remove that limitation from btree indexes in the mean time). It lets me sleep at night knowing that I have that option today, even if it would complicate crash recovery. Apart from the obvious inflexibility, consider what it takes to make index creation fast - insertion-style building of indexes is much slower. Consider multi-key indexes. I'm pretty sure hash indexes already implement a bulk creation fast path. In any case, I've never noticed them being slow, and I've tested some pretty big ones. Now, I'm not telling anyone what to work on, and if someone wants to make hash indexes WAL-logged to plug that hole, don't let me stop you. It probably makes sense as a project to learn more about Postgres internals. However, it would be unfair to not speak up given my misgivings around the practical utility of hash indexes. Sure, and we all have our own opinions on that. Should we summarize them somewhere easier to follow than a long email thread but more detailed than a TODO entry? Whatever happened with the GSOC people? That should be well under way by now, is anyone working on it? Are the discussions of their efforts on-list, or is it between them and their mentors? Cheers, Jeff
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Wed, Apr 30, 2014 at 12:16 PM, Greg Stark st...@mit.edu wrote: I think the key question was if someone wanted to develop a good hash index would they start from our current hash index or would they be better off starting from a fresh codebase? If it were me I'd start with the current code. It would be nice if one could just fork the code to have a new type of index (say hash2) which is initially identical, but I never figured out how to do that. If the former then we should add WAL logging and throw GSOC and other people who ask for projects at it. If the latter then we should throw out the current codebase and let people develop extensions that add new hash index code until someone comes up with a good design we want to move to core. Extensions have no hooks into the WAL system, and I'm not optimistic that that will ever change. Relegating a fundamentally new index to be an extension virtually *guarantees* that it will never be WAL logged. Incidentally something else I've considered would be having a WAL record type saying relation corrupted and emitting one the first time a hash index is touched after a checkpoint. That could provide a general mechanism that might be useful for unlogged operations (and might be combinable with the infrastructure for unlogged tables). But it seems like a better tool for other objects than hash indexes. +1. I often lament that unlogged tables cannot be used on a standby or a test server which were derived from a hot backup. In the case of unlogged tables, this does mean we would need a way to checkpoint them with a non-shutdown checkpoint, though. I don't know if that would need a different type of unlogged table, or a different type of checkpoint. Cheers, Jeff
Re: [HACKERS] GSoC on WAL-logging hash indexes
Stephen Frost wrote: * Greg Stark (st...@mit.edu) wrote: I could do the legwork on either the GUC or moving hash indexes to an extension if there's a consensus. But I suspect either will be quite controversial... If you'd like to build an extension which provides hash indexes- I say go for it? Nothing stopping you as far as that's concerned. Extensions can't write/replay WAL, so it's not clear to me how would this work. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
Greg Stark st...@mit.edu writes: But imnsho doing nothing is a bad idea. We should have long ago either added WAL logging or removed the index type. We shouldn't have left a foot-gun that large lying around for so long. We can't remove the hash index type, nor move it to an extension, because it is the operator classes for the built-in hash index AM that tell the planner and executor how to do hashing for arbitrary datatypes. And we certainly do not want to give up hashing-based query plans, whatever you may think of hash indexes. We really oughta fix the WAL situation, not just band-aid around it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On 03/06/2014 09:34 PM, Robert Haas wrote: On Thu, Mar 6, 2014 at 8:11 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I don't think it's necessary to improve concurrency just to get WAL-logging. Better concurrency is a worthy goal of its own, of course, but it's a separate concern. To some extent, I agree, but only to some extent. To make hash indexes generally usable, we really need to solve both problems. When I got rid of just some of the heavyweight locking in commit 76837c1507cb5a5a0048046233568447729e66dd, the results were pretty dramatic at higher levels of concurrency: http://www.postgresql.org/message-id/CA+Tgmoaf=nojxlyzgcbrry+pe-0vll0vfhi6tjdm3fftvws...@mail.gmail.com But there was still an awful lot of contention inside the heavyweight lock manager, and I don't think hash indexes are going to be ready for prime time until we solve that problem. Hmm. You suggested ensuring that a scan always has at least a pin, and split takes a vacuum-lock. That ought to work. There's no need for the more complicated maneuvers you described, ISTM that you can just replace the heavy-weight share lock with holding a pin on the primary page of the bucket, and an exclusive lock with a vacuum-lock. Note that _hash_expandtable already takes the exclusive lock conditionally, ie. if it doesn't get the lock immediately it just gives up. We could do the same with the cleanup lock. Vacuum could also be enhanced. It currently takes an exclusive lock on the bucket, then removes any dead tuples and finally squeezes the bucket by moving tuples to earlier pages. But you only really need the exclusive lock for the squeeze-phase. You could do the dead-tuple removal without the bucket-lock, and only grab for the squeeze-phase. And the squeezing is optional, so you could just skip that if you can't get the lock. But that's a separate patch as well. One more thing we could do to make hash indexes more scalable, independent of the above: Cache the information in the metapage in backend-private memory. Then you (usually) wouldn't need to access the metapage at all when scanning. Store a copy of the bitmask for that bucket in the primary page, and when scanning, check that it matches the cached value. If not, refresh the cached metapage and try again. So, there seems to be a few fairly simple and independent improvements to be made: 1. Replace the heavy-weight lock with pin vacuum-lock. 2. Make it crash-safe, by adding WAL-logging 3. Only acquire the exclusive-lock (vacuum-lock after step 1) in VACUUM for the squeeze phase. 4. Cache the metapage. We still don't know if it's going to be any better than B-tree after all that's done, but the only way to find out is to go ahead and implement it. This seems like a great GSoC project to me. We have a pretty good idea of what we want to accomplish. It's uncontroversial: I don't think anyone is going to object to improving hash indexes (one could argue that it's a waste of time, but that's different from objecting to the idea). And it consists of a few mostly independent parts, so it's possible to do incrementally which makes it easier to track progress, and we'll probably have something useful at the end of the summer even if it doesn't all get finished. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Thu, Mar 6, 2014 at 7:07 PM, Greg Stark st...@mit.edu wrote: On Thu, Mar 6, 2014 at 11:14 PM, Robert Haas robertmh...@gmail.com wrote: I've been tempted to implement a new type of hash index that allows both WAL and high concurrency, simply by disallowing bucket splits. At the index creation time you use a storage parameter to specify the number of buckets, and that is that. If you mis-planned, build a new index with more buckets, possibly concurrently, and drop the too-small one. Yeah, we could certainly do something like that. It sort of sucks, though. I mean, it's probably pretty easy to know that starting with the default 2 buckets is not going to be enough; most people will at least be smart enough to start with, say, 1024. But are you going to know whether you need 32768 or 1048576 or 33554432? A lot of people won't, and we have more than enough reasons for performance to degrade over time as it is. The other thought I had was that you can do things lazily in vacuum. So when you probe you need to check multiple pages until vacuum comes along and rehashes everything. I was thinking about this, too. Cleaning up the old bucket after the split is pretty similar to vacuuming. And lo and behold, vacuum also needs to lock the entire bucket. AFAICT, there are two reasons for this. First, when we resume a suspended scan, we assume that the next match on the page, if any, will occur on the same page at an offset greater than the offset where we found the previous match. The code copes with the possibility of current insertions by refinding the last item we returned and scanning forward from there, but assumes the pointer we're refinding can't have moved to a lower offset. I think this could be easily fixed - at essentially no cost - by changing hashgettuple so that, if the forward scan errors out, it tries scanning backward rather than just giving up. Second, vacuum compacts each bucket that it modifies using _hash_squeezebucket, which scans from the two ends of the index in toward the middle, filling any free space on earlier pages by pulling tuples from the end of the bucket chain. This is a little thornier. We could change vacuum so that it only removes TIDs from the individual pages, without actually trying to free up pages, but that seems undesirable. However, I think we might be able to get by with making bucket compaction less aggressive, without eliminating it altogether. Suppose that whenever we remove items from a page, we consider consolidating the page with its immediate predecessor and successor in the bucket chain. This means our utilization will be a little over 50% in the worst case where we have a full page, a page with one item, another full page, another page with one item, and so on. But that's not all bad, because compacting a bucket chain to the minimum possible size when it may well be about to suffer more inserts isn't necessarily a good thing anyway. Also, doing this means that we don't need to lock out scans from the entire bucket chain in order to compact. It's sufficient to make sure that nobody's in the middle of scanning the two pages we want to merge. That turns out not to be possible right now. When a scan is suspended, we still hold a pin on the page we're scanning. But when _hash_readnext or _hash_readprev walks the bucket chain, it drops both lock and pin on the current buffer and then pins and locks the new buffer. That, however, seems like it could easily be changed: drop lock on current buffer, acquire pin on new buffer, drop pin on current buffer, lock new buffer. Then, if we want to merge two buffers, it's enough to lock both of them for cleanup. To avoid any risk of deadlock, and also to avoid waiting for a long-running suspended scan to wake up and complete, we can do this conditionally; if we fail to get either cleanup lock, then just don't merge the pages; take an exclusive lock only and remove whatever you need to remove, leaving the rest. Merging pages is only a performance optimization, so if it fails now and then, no real harm done. (A side benefit of this approach is that we could opportunistically attempt to compact pages containing dead items any time we can manage a ConditionalLockBufferForCleanup() on the page, sort of like what HOT pruning does for heap blocks. We could even, after pruning away dead items, attempt to merge the page with siblings, so that even without vacuum the bucket chains can gradually shrink if the index tuples are discovered to be pointing to dead heap tuples.) With the above changes, vacuuming a bucket can happen without taking the heavyweight lock in exclusive mode, leaving bucket splitting as the only operation that requires it. And we could perhaps use basically the same algorithm to clean the tuples out of the old bucket after the split. The problem is that, when we're vacuuming, we know that no scan currently in progress can still care about any of the tuples we're removing.
Re: [HACKERS] GSoC on WAL-logging hash indexes
[ I just sent a reply to Greg Stark's email which touches on some of the same points you mentioned here; that was mostly written last night, and finished this morning before seeing this. ] On Fri, Mar 7, 2014 at 4:34 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Hmm. You suggested ensuring that a scan always has at least a pin, and split takes a vacuum-lock. That ought to work. There's no need for the more complicated maneuvers you described, ISTM that you can just replace the heavy-weight share lock with holding a pin on the primary page of the bucket, and an exclusive lock with a vacuum-lock. Note that _hash_expandtable already takes the exclusive lock conditionally, ie. if it doesn't get the lock immediately it just gives up. We could do the same with the cleanup lock. We could try that. I assume you mean do *just* what you describe here, without the split-in-progress or moved-by-split flags I suggested. The only issue I see with that is that instead of everyone piling up on the heavyweight lock, a wait which is interruptible, they'd all pile up on the buffer content lwlock, a wait which isn't. And splitting a bucket can involve an arbitrary number of I/O operations, so that's kind of unappealing. Even checkpoints would be blocked until the bucket split completed, which seems unfortunate. But I like the idea of teaching each scan to retain a pin on the primary buffer page. If we then do the split the way I proposed before, we can implement the outwait concurrent scans step by taking a cleanup lock on the primary buffer page. In this design, we only need to acquire and release the cleanup lock. Once we get the cleanup lock on the primary buffer page, even for an instant, we know that there are no remaining scans in the bucket that need the pre-split tuples that have now been moved to the new bucket. We can then remove tuples with a lesser lock (or keep the stronger lock if we wish to re-pack). Vacuum could also be enhanced. It currently takes an exclusive lock on the bucket, then removes any dead tuples and finally squeezes the bucket by moving tuples to earlier pages. But you only really need the exclusive lock for the squeeze-phase. You could do the dead-tuple removal without the bucket-lock, and only grab for the squeeze-phase. And the squeezing is optional, so you could just skip that if you can't get the lock. But that's a separate patch as well. Yeah, I think this would be a useful improvement. One more thing we could do to make hash indexes more scalable, independent of the above: Cache the information in the metapage in backend-private memory. Then you (usually) wouldn't need to access the metapage at all when scanning. Store a copy of the bitmask for that bucket in the primary page, and when scanning, check that it matches the cached value. If not, refresh the cached metapage and try again. I don't know whether this would be a useful improvement. So, there seems to be a few fairly simple and independent improvements to be made: 1. Replace the heavy-weight lock with pin vacuum-lock. 2. Make it crash-safe, by adding WAL-logging 3. Only acquire the exclusive-lock (vacuum-lock after step 1) in VACUUM for the squeeze phase. 4. Cache the metapage. We still don't know if it's going to be any better than B-tree after all that's done, but the only way to find out is to go ahead and implement it. I'm of the opinion that we ought to start by making splits and vacuuming more concurrent, and then do that stuff. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On 03/07/2014 03:48 PM, Robert Haas wrote: So, there seems to be a few fairly simple and independent improvements to be made: 1. Replace the heavy-weight lock with pin vacuum-lock. 2. Make it crash-safe, by adding WAL-logging 3. Only acquire the exclusive-lock (vacuum-lock after step 1) in VACUUM for the squeeze phase. 4. Cache the metapage. We still don't know if it's going to be any better than B-tree after all that's done, but the only way to find out is to go ahead and implement it. I'm of the opinion that we ought to start by making splits and vacuuming more concurrent, and then do that stuff. Priorities are a matter of opinion, but note that for many applications the concurrency of splits and vacuuming is pretty much irrelevant (at least after we do bullet point 3 above). Like, if the index is mostly read-only, or at least doesn't grow much. You'd still want it to be crash-safe (2), and you might care a lot about the performance of scans (1 and 4), but if splits and vacuum hold an exclusive lock for a few seconds, that's OK if you only need to do it once in a blue moon. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Thu, Mar 06, 2014 at 06:14:21PM -0500, Robert Haas wrote: On Thu, Mar 6, 2014 at 3:44 PM, Jeff Janes jeff.ja...@gmail.com wrote: I've been tempted to implement a new type of hash index that allows both WAL and high concurrency, simply by disallowing bucket splits. At the index creation time you use a storage parameter to specify the number of buckets, and that is that. If you mis-planned, build a new index with more buckets, possibly concurrently, and drop the too-small one. Yeah, we could certainly do something like that. It sort of sucks, though. I mean, it's probably pretty easy to know that starting with the default 2 buckets is not going to be enough; most people will at least be smart enough to start with, say, 1024. But are you going to know whether you need 32768 or 1048576 or 33554432? A lot of people won't, and we have more than enough reasons for performance to degrade over time as it is. It would be useful to have a storage parameter for the target size of the index, even if it is not exact, to use in the initial index build to avoid the flurry of i/o caused by bucket splits as the index grows. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On 03/07/2014 03:48 PM, Robert Haas wrote: On Fri, Mar 7, 2014 at 4:34 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Hmm. You suggested ensuring that a scan always has at least a pin, and split takes a vacuum-lock. That ought to work. There's no need for the more complicated maneuvers you described, ISTM that you can just replace the heavy-weight share lock with holding a pin on the primary page of the bucket, and an exclusive lock with a vacuum-lock. Note that _hash_expandtable already takes the exclusive lock conditionally, ie. if it doesn't get the lock immediately it just gives up. We could do the same with the cleanup lock. We could try that. I assume you mean do*just* what you describe here, without the split-in-progress or moved-by-split flags I suggested. Yep. The only issue I see with that is that instead of everyone piling up on the heavyweight lock, a wait which is interruptible, they'd all pile up on the buffer content lwlock, a wait which isn't. And splitting a bucket can involve an arbitrary number of I/O operations, so that's kind of unappealing. Even checkpoints would be blocked until the bucket split completed, which seems unfortunate. Hmm. I doubt that's a big deal in practice, although I agree it's a bit ugly. Once we solve the crash-safety of splits, we actually have the option of doing the split in many small steps, even when there's no crash involved. You could for example grab the vacuum-lock, move all the tuples in the first 5 pages, and then release the lock to give other backends that are queued up a chance to do their scans/insertions. Then re-acquire the lock, and continue where you left. Or just bail out and let the next vacuum or insertion to finish it later. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
(de-CC'ing pgsql-advocacy) On 03/06/2014 04:03 AM, Greg Stark wrote: On Mon, Mar 3, 2014 at 4:12 PM, Robert Haas robertmh...@gmail.com wrote: Unfortunately, I don't believe that it's possible to do this easily today because of the way bucket splits are handled. I wrote about this previously here, with an idea for solving the problem: We could just tackle this in the same incomplete, buggy, way that btrees tackled it for years until Heikki fixed them and the way gin and gist still do I believe. Namely just emit xlog records for each page individually and during replay remember when you have an incomplete split and complain if recovery ends with any still incomplete. That would be unfortunate to be adding new cases of this just as Heikki and company are making progress eliminating the ones we already had but that's surely better than having no recovery at all. Grmph. Indeed. Looking at Robert's idea from November: I have thought about doing some work on this, although I don't know when I'll ever have the time. As far as I can see, the basic problem is that we need a better way of splitting buckets. Right now, splitting a bucket means locking it, scanning the entire bucket chain and moving ~1/2 the tuples to the new bucket, and then unlocking it. Since this is a long operation, we have to use heavyweight locks to protect the buckets, which is bad for concurrency. Since it involves a modification to an arbitrarily large number of pages that has to be atomic, it's not possible to WAL-log it sensibly -- and in fact, I think this is really the major obstacle to being able to implementing WAL-logging for hash indexes. I don't think it's necessary to improve concurrency just to get WAL-logging. Better concurrency is a worthy goal of its own, of course, but it's a separate concern. For crash safety, the key is to make sure you perform the whole operation in small atomic steps, and you have a way of knowing where to continue after crash (this is the same whether you do the cleanup immediately at the end of recovery, which I want to avoid, or lazily afterwards). But you can hold locks across those small atomic steps, to ensure concurrency-safety. I think we could work around this problem as follows. Suppose we have buckets 1..N and the split will create bucket N+1. We need a flag that can be set and cleared on the metapage, call it split-in-progress, and a flag that can optionally be set on particular index tuples, call it moved-by-split. We will allow scans of all buckets and insertions into all buckets while the split is in progress, but (as now) we will not allow more than one split to be in progress at the same time. Hmm, unless I'm reading the code wrong, it *does* allow more than one split to be in progress at the same time today. [description of how to use the moved-by-split to allow scans to run concurrently with the split] I guess that would work, although you didn't actually describe how to continue a split after a crash. But it's a lot simpler if you don't also try to make it more concurrent: --- When you start splitting a bucket, first acquire a heavy-weight lock on the old and new buckets. Allocate the required number of pages, before changing anything on-disk, so that you can easily back out if you run out of disk space. So far, this is how splitting works today. Then you update the metapage to show that the bucket has been split and initialize the new bucket's primary page (as one atomic WAL-logged operation). Also mark the new bucket's primary page with a split-in-progress flag. That's used later by scans to detect if the split was interrupted. Now you scan the old bucket, and move all the tuples belonging to the new bucket. That needs to be done as a series of small atomic WAL-logged operations, each involving a small number of old and new pages (one WAL record for each moved tuple is the simplest, but you can do some batching for efficiency). After you're done, clear the split-in-progress flag in the new bucket's primary page (WAL-log that), and release the locks. In a scan, if the bucket you're about to scan has the split-in-progress flag set, that indicates that the split was interrupted by a crash. You won't see the flag as set if a concurrent split is in progress, because you will block on the lock it's holding on the bucket. If you see the flag as set, you share-lock and scan both buckets, the old and the new. If you see the split-in-progress flag in the bucket you're about to insert to, you don't need to do anything special. Just insert the tuple to the new bucket as normal. Before splitting the new bucket again, however, the previous split needs to be finished, or things will get complicated. To do that, acquire the locks on the old and the new bucket, and scan the old bucket for any remaining tuples that belong to the new bucket and move them, and finally clear the split-in-progress flag. --- This is similar to
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Thu, Mar 6, 2014 at 8:11 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I don't think it's necessary to improve concurrency just to get WAL-logging. Better concurrency is a worthy goal of its own, of course, but it's a separate concern. To some extent, I agree, but only to some extent. To make hash indexes generally usable, we really need to solve both problems. When I got rid of just some of the heavyweight locking in commit 76837c1507cb5a5a0048046233568447729e66dd, the results were pretty dramatic at higher levels of concurrency: http://www.postgresql.org/message-id/CA+Tgmoaf=nojxlyzgcbrry+pe-0vll0vfhi6tjdm3fftvws...@mail.gmail.com But there was still an awful lot of contention inside the heavyweight lock manager, and I don't think hash indexes are going to be ready for prime time until we solve that problem. This is similar to your description, as you scan both buckets if you see an interrupted split, but doesn't require the per-tuple moved-by-split flag, or waiting out scans. Also, I put the split-in-progress flag in the new bucket's primary page instead of the metapage. That allows multiple splits to be in progress at the same time. Putting the split-in-progress flag in the new bucket's primary page makes a lot of sense. I don't have any problem with dumping the rest of it for a first cut if we have a different long-term plan for how to improve concurrency, but I don't see much point in going to a lot of work to implement a system for WAL logging if we're going to end up having to afterwards throw it out and start from scratch to get rid of the heavyweight locks - and it's not obvious to me how what you have here could be extended to do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Thu, Mar 6, 2014 at 11:34 AM, Robert Haas robertmh...@gmail.com wrote: Putting the split-in-progress flag in the new bucket's primary page makes a lot of sense. I don't have any problem with dumping the rest of it for a first cut if we have a different long-term plan for how to improve concurrency, but I don't see much point in going to a lot of work to implement a system for WAL logging if we're going to end up having to afterwards throw it out and start from scratch to get rid of the heavyweight locks - and it's not obvious to me how what you have here could be extended to do that. +1 I don't think we have to improve concurrency at the same time as WAL logging, but we at least have to implement WAL logging in a way that doesn't foreclose future improvements to concurrency. I've been tempted to implement a new type of hash index that allows both WAL and high concurrency, simply by disallowing bucket splits. At the index creation time you use a storage parameter to specify the number of buckets, and that is that. If you mis-planned, build a new index with more buckets, possibly concurrently, and drop the too-small one. I think it would be easier to add bucket splitting to something like this than it would be to add WAL logging and concurrency to what we already have--mostly because I think that route would be more amenable to incremental programming efforts, and also because if people had an actually useful hash index type they would use it and see that it worked well (assuming of course that it does work well), and then be motivated to improve it. If this could be done as an extension I would just go (attempt to) do it. But since WAL isn't pluggable, it can't go in as an extension. Cheers, Jeff
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Thu, Mar 6, 2014 at 3:44 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Thu, Mar 6, 2014 at 11:34 AM, Robert Haas robertmh...@gmail.com wrote: Putting the split-in-progress flag in the new bucket's primary page makes a lot of sense. I don't have any problem with dumping the rest of it for a first cut if we have a different long-term plan for how to improve concurrency, but I don't see much point in going to a lot of work to implement a system for WAL logging if we're going to end up having to afterwards throw it out and start from scratch to get rid of the heavyweight locks - and it's not obvious to me how what you have here could be extended to do that. +1 I don't think we have to improve concurrency at the same time as WAL logging, but we at least have to implement WAL logging in a way that doesn't foreclose future improvements to concurrency. I've been tempted to implement a new type of hash index that allows both WAL and high concurrency, simply by disallowing bucket splits. At the index creation time you use a storage parameter to specify the number of buckets, and that is that. If you mis-planned, build a new index with more buckets, possibly concurrently, and drop the too-small one. Yeah, we could certainly do something like that. It sort of sucks, though. I mean, it's probably pretty easy to know that starting with the default 2 buckets is not going to be enough; most people will at least be smart enough to start with, say, 1024. But are you going to know whether you need 32768 or 1048576 or 33554432? A lot of people won't, and we have more than enough reasons for performance to degrade over time as it is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Thu, Mar 6, 2014 at 11:14 PM, Robert Haas robertmh...@gmail.com wrote: I've been tempted to implement a new type of hash index that allows both WAL and high concurrency, simply by disallowing bucket splits. At the index creation time you use a storage parameter to specify the number of buckets, and that is that. If you mis-planned, build a new index with more buckets, possibly concurrently, and drop the too-small one. Yeah, we could certainly do something like that. It sort of sucks, though. I mean, it's probably pretty easy to know that starting with the default 2 buckets is not going to be enough; most people will at least be smart enough to start with, say, 1024. But are you going to know whether you need 32768 or 1048576 or 33554432? A lot of people won't, and we have more than enough reasons for performance to degrade over time as it is. The other thought I had was that you can do things lazily in vacuum. So when you probe you need to check multiple pages until vacuum comes along and rehashes everything. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas robertmh...@gmail.com wrote: Unfortunately, I don't believe that it's possible to do this easily today because of the way bucket splits are handled. I wrote about this previously here, with an idea for solving the problem: http://www.postgresql.org/message-id/ca+tgmozymojsrfxhxq06g8jhjxqcskvdihb_8z_7nc7hj7i...@mail.gmail.com Sadly, no one responded. :-( On Mon, Mar 3, 2014 at 9:39 AM, Tan Tran tankimt...@gmail.com wrote: Thanks for alerting me to your previous idea. While I don't know enough about Postgresql internals to judge its merits yet, I'll write some pseudocode based on it in my proposal; and I'll relegate it to a reach proposal alongside a more straightforward one. Tan Hi Tan, I'm not familiar with the inner workings of the GSoC, but I don't know if this can be relegated to a stretch goal. WAL logging is an all or nothing thing. I think that, to be applied to the codebase (which I assume is the goal of GSoC), all actions need to be implemented. That is probably why this has remained open so long: there is no incremental way to get the code written. (But I would like to see it get done, I don't want to discourage that.) Cheers, Jeff
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Mon, Mar 3, 2014 at 4:12 PM, Robert Haas robertmh...@gmail.com wrote: Unfortunately, I don't believe that it's possible to do this easily today because of the way bucket splits are handled. I wrote about this previously here, with an idea for solving the problem: We could just tackle this in the same incomplete, buggy, way that btrees tackled it for years until Heikki fixed them and the way gin and gist still do I believe. Namely just emit xlog records for each page individually and during replay remember when you have an incomplete split and complain if recovery ends with any still incomplete. That would be unfortunate to be adding new cases of this just as Heikki and company are making progress eliminating the ones we already had but that's surely better than having no recovery at all. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Sun, Mar 2, 2014 at 2:38 PM, Tan Tran tankimt...@gmail.com wrote: 2. Proposal As a GSoC student, I will implement WAL recovery of hash indexes using the other index types' WAL code as a guide. Roughly, I will: - Devise a way to store and retrieve hashing data within the XLog data structures. - In the existing skeleton for hash_redo(XLogRecPtr lsn, XLogRecord *record) in hash.c, branch to code for the various redo operations: creating an index, inserting into an index, deleting an index, and page operations (split, delete, update?). - Code each branch by drawing on examples from btree_redo, gin_redo, and gist_redo, the existing XLog code of the other index types. Unfortunately, I don't believe that it's possible to do this easily today because of the way bucket splits are handled. I wrote about this previously here, with an idea for solving the problem: http://www.postgresql.org/message-id/ca+tgmozymojsrfxhxq06g8jhjxqcskvdihb_8z_7nc7hj7i...@mail.gmail.com Sadly, no one responded. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
Thanks for alerting me to your previous idea. While I don't know enough about Postgresql internals to judge its merits yet, I'll write some pseudocode based on it in my proposal; and I'll relegate it to a reach proposal alongside a more straightforward one. Tan On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas robertmh...@gmail.com wrote: On Sun, Mar 2, 2014 at 2:38 PM, Tan Tran tankimt...@gmail.com wrote: 2. Proposal As a GSoC student, I will implement WAL recovery of hash indexes using the other index types' WAL code as a guide. Roughly, I will: - Devise a way to store and retrieve hashing data within the XLog data structures. - In the existing skeleton for hash_redo(XLogRecPtr lsn, XLogRecord *record) in hash.c, branch to code for the various redo operations: creating an index, inserting into an index, deleting an index, and page operations (split, delete, update?). - Code each branch by drawing on examples from btree_redo, gin_redo, and gist_redo, the existing XLog code of the other index types. Unfortunately, I don't believe that it's possible to do this easily today because of the way bucket splits are handled. I wrote about this previously here, with an idea for solving the problem: http://www.postgresql.org/message-id/ca+tgmozymojsrfxhxq06g8jhjxqcskvdihb_8z_7nc7hj7i...@mail.gmail.com Sadly, no one responded. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company