Re: [HACKERS] Hash Indexes
On 25/09/16 18:18, Amit Kapila wrote: On Sat, Sep 24, 2016 at 10:49 PM, Greg Stark wrote: On Thu, Sep 22, 2016 at 3:23 AM, Tom Lane wrote: But to kick the hash AM as such to the curb is to say "sorry, there will never be O(1) index lookups in Postgres". Well there's plenty of halfway solutions for that. We could move hash indexes to contrib or even have them in core as experimental_hash or unlogged_hash until the day they achieve their potential. We definitely shouldn't discourage people from working on hash indexes Okay, but to me it appears that naming it as experimental_hash or moving it to contrib could discourage people or at the very least people will be less motivated. Thinking on those lines a year or so back would have been a wise direction, but now when already there is lot of work done (patches to make it wal-enabled, more concurrent and performant, page inspect module are available) for hash indexes and still more is in progress, that sounds like a step backward then step forward. +1 I think so too - I've seen many email threads over the years on this list that essentially state "we need hash indexes wal logged to make progress with them"...and Amit et al has/have done this (more than this obviously - made 'em better too) and I'm astonished that folk are suggesting anything other than 'commit this great patch now!'... regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tracking wait event for latches
On Thu, Sep 22, 2016 at 10:49 PM, Robert Haas wrote: > Finally, extensions got their own category in this taxonomy, though I > wonder if it would be better to instead have > Activity/ExtensionActivity, Client/ExtensionClient, > Timeout/ExtensionTimeout, and IPC/ExtensionIPC instead of making it a > separate toplevel category. I don't think that it is necessary to go up to that level. If you look at the latest patch, WaitLatch & friends have been extended with two arguments: classId and eventId, so extensions could just use WAIT_ACTIVITY as classId and WE_EXTENSION as eventId to do the equivalent of ExtensionActivity. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On Sun, Sep 25, 2016 at 10:30 AM, Amit Kapila wrote: > On Fri, Sep 23, 2016 at 5:34 PM, Amit Kapila wrote: >> >> I think here I am slightly wrong. For the full page writes, it do use >> RBM_ZERO_AND_LOCK mode to read the page and for such mode we are not >> doing page verification check and rather blindly setting the page to >> zero and then overwrites it with full page image. So after my fix, >> you will not see the error of checksum failure. I have a fix ready, >> but still doing some more verification. If everything passes, I will >> share the patch in a day or so. >> > > Attached patch fixes the problem, now we do perform full page writes > for bitmap pages. Apart from that, I have rebased the patch based on > latest concurrent index patch [1]. I have updated the README as well > to reflect the WAL logging related information for different > operations. > > With attached patch, all the review comments or issues found till now > are addressed. > I forgot to mention that Ashutosh has tested this patch for a day using Jeff's tool and he didn't found any problem. Also, he has found a way to easily reproduce the problem. Ashutosh, can you share your changes to the script using which you have reproduce the problem? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Sat, Sep 24, 2016 at 10:49 PM, Greg Stark wrote: > On Thu, Sep 22, 2016 at 3:23 AM, Tom Lane wrote: >> But to kick the hash AM as such to the curb is to say >> "sorry, there will never be O(1) index lookups in Postgres". > > Well there's plenty of halfway solutions for that. We could move hash > indexes to contrib or even have them in core as experimental_hash or > unlogged_hash until the day they achieve their potential. > > We definitely shouldn't discourage people from working on hash indexes > Okay, but to me it appears that naming it as experimental_hash or moving it to contrib could discourage people or at the very least people will be less motivated. Thinking on those lines a year or so back would have been a wise direction, but now when already there is lot of work done (patches to make it wal-enabled, more concurrent and performant, page inspect module are available) for hash indexes and still more is in progress, that sounds like a step backward then step forward. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
On Sat, Sep 24, 2016 at 10:13 PM, Peter Geoghegan wrote: > On Fri, Sep 23, 2016 at 7:27 AM, Thomas Munro > wrote: >> It looks like varstr_abbrev_convert calls strxfrm unconditionally >> (assuming TRUST_STRXFRM is defined). This needs to >> use ucol_getSortKey instead when appropriate. It looks like it's a >> bit more helpful than strxfrm about telling you the output buffer size >> it wants, and it doesn't need nul termination, which is nice. >> Unfortunately it is like strxfrm in that the output buffer's contents >> is unspecified if it ran out of space. > > One can use the ucol_nextSortKeyPart() interface to just get the first > 4/8 bytes of an abbreviated key, reducing the overhead somewhat, so > the output buffer size limitation is probably irrelevant. The ICU > documentation says something about this being useful for Radix sort, > but I suspect it's more often used to generate abbreviated keys. > Abbreviated keys were not my original idea. They're really just a > standard technique. Nice! The other advantage of ucol_nextSortKeyPart is that you don't have to convert the whole string to UChar (UTF16) first, as I think you would need to with ucol_getSortKey, because the UCharIterator mechanism can read directly from a UTF8 string. I see in the documentation that ucol_nextSortKeyPart and ucol_getSortKey don't have compatible output, and this caveat may be related to whether sort key compression is used. I don't understand what sort of compression is involved but out of curiosity I asked ICU to spit out some sort keys from ucol_nextSortKeyPart so I could see their size. As you say, we can ask it to stop at 4 or 8 bytes which is very convenient for our purposes, but here I asked for more to get the full output so I could see where the primary weight part ends. The primary weight took one byte for the Latin letters I tried and two for the Japanese characters I tried (except 一 which was just 0xaa). ucol_nextSortKeyPart(en_US, "a", ...) -> 29 01 05 01 05 ucol_nextSortKeyPart(en_US, "ab", ...) -> 29 2b 01 06 01 06 ucol_nextSortKeyPart(en_US, "abc", ...) -> 29 2b 2d 01 07 01 07 ucol_nextSortKeyPart(en_US, "abcd", ...) -> 29 2b 2d 2f 01 08 01 08 ucol_nextSortKeyPart(en_US, "A", ...) -> 29 01 05 01 dc ucol_nextSortKeyPart(en_US, "AB", ...) -> 29 2b 01 06 01 dc dc ucol_nextSortKeyPart(en_US, "ABC", ...) -> 29 2b 2d 01 07 01 dc dc dc ucol_nextSortKeyPart(en_US, "ABCD", ...) -> 29 2b 2d 2f 01 08 01 dc dc dc dc ucol_nextSortKeyPart(ja_JP, "一", ...) -> aa 01 05 01 05 ucol_nextSortKeyPart(ja_JP, "一二", ...) -> aa d0 0f 01 06 01 06 ucol_nextSortKeyPart(ja_JP, "一二三", ...) -> aa d0 0f cb b8 01 07 01 07 ucol_nextSortKeyPart(ja_JP, "一二三四", ...) -> aa d0 0f cb b8 cb d5 01 08 01 08 ucol_nextSortKeyPart(ja_JP, "日", ...) -> d0 18 01 05 01 05 ucol_nextSortKeyPart(ja_JP, "日本", ...) -> d0 18 d1 d0 01 06 01 06 ucol_nextSortKeyPart(fr_FR, "cote", ...) -> 2d 45 4f 31 01 08 01 08 ucol_nextSortKeyPart(fr_FR, "côte", ...) -> 2d 45 4f 31 01 44 8e 06 01 09 ucol_nextSortKeyPart(fr_FR, "coté", ...) -> 2d 45 4f 31 01 42 88 01 09 ucol_nextSortKeyPart(fr_FR, "côté", ...) -> 2d 45 4f 31 01 44 8e 44 88 01 0a ucol_nextSortKeyPart(fr_CA, "cote", ...) -> 2d 45 4f 31 01 08 01 08 ucol_nextSortKeyPart(fr_CA, "côte", ...) -> 2d 45 4f 31 01 44 8e 06 01 09 ucol_nextSortKeyPart(fr_CA, "coté", ...) -> 2d 45 4f 31 01 88 08 01 09 ucol_nextSortKeyPart(fr_CA, "côté", ...) -> 2d 45 4f 31 01 88 44 8e 06 01 0a I wonder how it manages to deal with fr_CA's reversed secondary weighting rule which requires you to consider diacritics in reverse order -- apparently abandoned in France but still used in Canada -- using a fixed size space for state between calls. -- Thomas Munro http://www.enterprisedb.com
[HACKERS] Development build with uuid-ossp support - macOS
I am trying to do a macOS build of postgresql (9.6 stable branch from github) with the uuid-ossp contrib by typing "make world" but it fails due to an openjade error (I did install openjade using homebrew using this setup https://github.com/petere/homebrew-sgml). Is there a way to build postgresql and install with uuid-ossp without having to build the documentation? I don't really need the documentation for my test. Thank you, Enrique
Re: [HACKERS] Rebranding OS X as macOS
On Sun, Sep 25, 2016 at 1:47 AM, Tom Lane wrote: > Apple has decided to rename Mac OS X to "macOS", and apparently is now > retroactively referring to old releases that way too. I propose that > we should do likewise, ie run around and clean up our various references > to "Mac OS X", "OS X", or "OSX" to uniformly say "macOS". I think it's > considerably clearer to non-Apple people exactly what that is, plus > there's not so much temptation towards variant spellings. > > We also have various uses of the ancient code name "Darwin". I think > we should change that to "macOS" in user-facing docs, but I'm less > inclined to do so in the code. In particular, renaming the template > file or trying to change the PORTNAME value seems much too messy, > so I think we should stick to "darwin" as the port name. +1 to all that. I am seeing the same trend regarding the naming. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Better tracking of free space during SP-GiST index build
On 09/22/2016 07:37 PM, Tom Lane wrote: Tomas Vondra writes: ... I've tried increasing the cache size to 768 entries, with vast majority of them (~600) allocated to leaf pages. Sadly, this seems to only increase the CREATE INDEX duration a bit, without making the index significantly smaller (still ~120MB). Yeah, that's in line with my results: not much further gain from a larger cache. Though if you were testing with the same IRRExplorer data, it's not surprising that our results would match. Would be good to try some other cases... Agreed, but I don't have any other data sets at hand. One possibility would be to generate something randomly (e.g. it's not particularly difficult to generate random IP addresses), but I'd much rather use some real-world data sets. >> One thing I'd change is making the SpGistLUPCache dynamic, i.e. storing the size and lastUsedPagesMap on the meta page. That should allow us resizing the cache and tweak lastUsedPagesMap in the future. Yeah, probably a good idea. I had thought of bumping SPGIST_MAGIC_NUMBER again if we want to revisit the cache size; but keeping it as a separate field won't add noticeable cost, and it might save some trouble. I see you plan to track only the cache size, while I proposed to track also the map, i.e. number of pages per category. I think that'd useful in case we come up with better values (e.g. more entries for leaf pages), or even somewhat adaptive way. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
Greg Stark writes: > On Thu, Sep 22, 2016 at 3:23 AM, Tom Lane wrote: >> But to kick the hash AM as such to the curb is to say >> "sorry, there will never be O(1) index lookups in Postgres". > Well there's plenty of halfway solutions for that. We could move hash > indexes to contrib or even have them in core as experimental_hash or > unlogged_hash until the day they achieve their potential. > We definitely shouldn't discourage people from working on hash indexes > but we probably shouldn't have released ten years worth of a feature > marked "please don't use this" that's guaranteed to corrupt your > database and cause weird problems if you use it a any of a number of > supported situations (including non-replicated system recovery that > has been a bedrock feature of Postgres for over a decade). Obviously that has not been a good situation, but we lack a time machine to retroactively make it better, so I don't see much point in fretting over what should have been done in the past. > Arguably adding a hashed btree opclass and relegating the existing > code to an experimental state would actually encourage development > since a) Users would actually be likely to use the hashed btree > opclass so any work on a real hash opclass would have a real userbase > ready and waiting for delivery, b) delivering a real hash opclass > wouldn't involve convincing users to unlearn a million instructions > warning not to use this feature and c) The fear of breaking existing > users use cases and databases would be less and pg_upgrade would be an > ignorable problem at least until the day comes for the big cutover of > the default to the new opclass. I'm not following your point here. There is no hash-over-btree AM and nobody (including Andres) has volunteered to create one. Meanwhile, we have a patch in hand to WAL-enable the hash AM. Why would we do anything other than apply that patch and stop saying hash is deprecated? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/24/2016 06:06 AM, Amit Kapila wrote: On Fri, Sep 23, 2016 at 8:22 PM, Tomas Vondra wrote: ... >> So I'm using 16GB shared buffers (so with scale 300 everything fits into shared buffers), min_wal_size=16GB, max_wal_size=128GB, checkpoint timeout 1h etc. So no, there are no checkpoints during the 5-minute runs, only those triggered explicitly before each run. Thanks for clarification. Do you think we should try some different settings *_flush_after parameters as those can help in reducing spikes in writes? I don't see why that settings would matter. The tests are on unlogged tables, so there's almost no WAL traffic and checkpoints (triggered explicitly before each run) look like this: checkpoint complete: wrote 17 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 13 recycled; write=0.062 s, sync=0.006 s, total=0.092 s; sync files=10, longest=0.004 s, average=0.000 s; distance=309223 kB, estimate=363742 kB So I don't see how tuning the flushing would change anything, as we're not doing any writes. Moreover, the machine has a bunch of SSD drives (16 or 24, I don't remember at the moment), behind a RAID controller with 2GB of write cache on it. Also, I think instead of 5 mins, read-write runs should be run for 15 mins to get consistent data. Where does the inconsistency come from? Thats what I am also curious to know. Lack of warmup? Can't say, but at least we should try to rule out the possibilities. I think one way to rule out is to do slightly longer runs for Dilip's test cases and for pgbench we might need to drop and re-create database after each reading. My point is that it's unlikely to be due to insufficient warmup, because the inconsistencies appear randomly - generally you get a bunch of slow runs, one significantly faster one, then slow ones again. I believe the runs to be sufficiently long. I don't see why recreating the database would be useful - the whole point is to get the database and shared buffers into a stable state, and then do measurements on it. I don't think bloat is a major factor here - I'm collecting some additional statistics during this run, including pg_database_size, and I can see the size oscillates between 4.8GB and 5.4GB. That's pretty negligible, I believe. I'll let the current set of benchmarks complete - it's running on 4.5.5 now, I'll do tests on 3.2.80 too. Then we can re-evaluate if longer runs are needed. Considering how uniform the results from the 10 runs are (at least on 4.5.5), I claim this is not an issue. It is quite possible that it is some kernel regression which might be fixed in later version. Like we are doing most tests in cthulhu which has 3.10 version of kernel and we generally get consistent results. I am not sure if later version of kernel say 4.5.5 is a net win, because there is a considerable difference (dip) of performance in that version, though it produces quite stable results. Well, the thing is - the 4.5.5 behavior is much nicer in general. I'll always prefer lower but more consistent performance (in most cases). In any case, we're stuck with whatever kernel version the people are using, and they're likely to use the newer ones. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Thu, Sep 22, 2016 at 3:23 AM, Tom Lane wrote: > But to kick the hash AM as such to the curb is to say > "sorry, there will never be O(1) index lookups in Postgres". Well there's plenty of halfway solutions for that. We could move hash indexes to contrib or even have them in core as experimental_hash or unlogged_hash until the day they achieve their potential. We definitely shouldn't discourage people from working on hash indexes but we probably shouldn't have released ten years worth of a feature marked "please don't use this" that's guaranteed to corrupt your database and cause weird problems if you use it a any of a number of supported situations (including non-replicated system recovery that has been a bedrock feature of Postgres for over a decade). Arguably adding a hashed btree opclass and relegating the existing code to an experimental state would actually encourage development since a) Users would actually be likely to use the hashed btree opclass so any work on a real hash opclass would have a real userbase ready and waiting for delivery, b) delivering a real hash opclass wouldn't involve convincing users to unlearn a million instructions warning not to use this feature and c) The fear of breaking existing users use cases and databases would be less and pg_upgrade would be an ignorable problem at least until the day comes for the big cutover of the default to the new opclass. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Rebranding OS X as macOS
Apple has decided to rename Mac OS X to "macOS", and apparently is now retroactively referring to old releases that way too. I propose that we should do likewise, ie run around and clean up our various references to "Mac OS X", "OS X", or "OSX" to uniformly say "macOS". I think it's considerably clearer to non-Apple people exactly what that is, plus there's not so much temptation towards variant spellings. We also have various uses of the ancient code name "Darwin". I think we should change that to "macOS" in user-facing docs, but I'm less inclined to do so in the code. In particular, renaming the template file or trying to change the PORTNAME value seems much too messy, so I think we should stick to "darwin" as the port name. Barring objections, I'll make this happen (in HEAD only) sometime soon. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] store narrow values in hash indexes?
On Sat, Sep 24, 2016 at 10:33:01AM +0530, Amit Kapila wrote: > On Sat, Sep 24, 2016 at 1:02 AM, Robert Haas wrote: > > Currently, hash indexes always store the hash code in the index, but > > not the actual Datum. It's recently been noted that this can make a > > hash index smaller than the corresponding btree index would be if the > > column is wide. However, if the index is being built on a fixed-width > > column with a typlen <= sizeof(Datum), we could store the original > > value in the hash index rather than the hash code without using any > > more space. That would complicate the code, but I bet it would be > > faster: we wouldn't need to set xs_recheck, we could rule out hash > > collisions without visiting the heap, and we could support index-only > > scans in such cases. > > > > What exactly you mean by Datum? Is it for datatypes that fits into 64 > bits like integer. I think if we are able to support index only scans > for hash indexes for some data types, that will be a huge plus. > Surely, there is some benefit without index only scans as well, which > is we can avoid recheck, but not sure if that alone can give us any > big performance boost. As, you say, it might lead to some > complication in code, but I think it is worth trying. > > Won't it add some requirements for pg_upgrade as well? Yes, pg_upgrade will mark the indexes as invalid and supply a script to reindex them. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] raised checkpoint limit & manual checkpoint
I would suggest that a good complementary feature would be to allow a manual checkpoint to run over a period of time, say something like: CHECKPOINT OVER '10 hours'; That would target to complete after this period (whether it succeeds or not is another issue) instead of going as fast as possible, thus avoiding some performance degradation. Isn't it somewhat overlaps with existing parameter checkpoint_completion_target? More or less. The difference is that throttled checkpoints are currently started *automatically* when an certain amount of work has been done or some time as passed, but you cannot start them manually. You can use checkpoint_completion_target to throttle the checkpoints. Nearly yes, however it does not give any control to when a throttle checkpoint is started. I'm arguing that since the configuration allows to delay checkpointing up to a day, than the ability to control when to actually start one seems to make sense. The option you are suggesting seems to be more straight forward, but how will user decide the time he wants Checkpoint to take. In the hypothetical use case I have in mind, the user would happen to know its load well enough to choose. Say the system supports a load linked to office hour, you would know that you want it done before the next morning. -- Fabien -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Thu, Sep 22, 2016 at 8:57 PM, Robert Haas wrote: > On Thu, Sep 22, 2016 at 3:51 PM, Heikki Linnakangas wrote: >> It'd be good if you could overlap the final merges in the workers with the >> merge in the leader. ISTM it would be quite straightforward to replace the >> final tape of each worker with a shared memory queue, so that the leader >> could start merging and returning tuples as soon as it gets the first tuple >> from each worker. Instead of having to wait for all the workers to complete >> first. > > If you do that, make sure to have the leader read multiple tuples at a > time from each worker whenever possible. It makes a huge difference > to performance. See bc7fcab5e36b9597857fa7e3fa6d9ba54aaea167. That requires some kind of mutual exclusion mechanism, like an LWLock. It's possible that merging everything lazily is actually the faster approach, given this, and given the likely bottleneck on I/O at htis stage. It's also certainly simpler to not overlap things. This is something I've read about before [1], with "eager evaluation" sorting not necessarily coming out ahead IIRC. [1] http://digitalcommons.ohsu.edu/cgi/viewcontent.cgi?article=1193&context=csetech -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possibly too stringent Assert() in b-tree code
On Fri, Sep 23, 2016 at 12:21 AM, Tom Lane wrote: > Robert Haas writes: >> On Mon, Sep 19, 2016 at 7:07 AM, Amit Kapila wrote: >>> I think you have a valid point. It seems we don't need to write WAL >>> for reuse page (aka don't call _bt_log_reuse_page()), if the page is >>> new, as the only purpose of that log is to handle conflict based on >>> transaction id stored in special area which will be anyway zero. > >> +1. > > This is clearly an oversight in Simon's patch fafa374f2, which introduced > this code without any consideration for the possibility that the page > doesn't have a valid special area. We could prevent the crash by > doing nothing if PageIsNew, a la > > if (_bt_page_recyclable(page)) > { > /* > * If we are generating WAL for Hot Standby then create a > * WAL record that will allow us to conflict with queries > * running on standby. > */ > - if (XLogStandbyInfoActive() && RelationNeedsWAL(rel)) > + if (XLogStandbyInfoActive() && RelationNeedsWAL(rel) && > + !PageIsNew(page)) > { > BTPageOpaque opaque = (BTPageOpaque) > PageGetSpecialPointer(page); > > _bt_log_reuse_page(rel, blkno, opaque->btpo.xact); > } > > /* Okay to use page. Re-initialize and return it */ > > but I'm not very clear on whether this is a safe fix, mainly because > I don't understand what the reuse WAL record really accomplishes. > Maybe we need to instead generate a reuse record with some special > transaction ID indicating worst-case assumptions? > I think it is basically to ensure that the queries on standby should not be considering the page being recycled as valid and if there is any pending query to which this page is visible, cancel it. If this understanding is correct, then I think the fix you are proposing is correct. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Wed, Sep 21, 2016 at 5:52 PM, Heikki Linnakangas wrote: > I find this unification business really complicated. I can certainly understand why you would. As I said, it's the most complicated part of the patch, which overall is one of the most ambitious patches I've ever written. > I think it'd be simpler > to keep the BufFiles and LogicalTapeSets separate, and instead teach > tuplesort.c how to merge tapes that live on different > LogicalTapeSets/BufFiles. Or refactor LogicalTapeSet so that a single > LogicalTapeSet can contain tapes from different underlying BufFiles. > > What I have in mind is something like the attached patch. It refactors > LogicalTapeRead(), LogicalTapeWrite() etc. functions to take a LogicalTape > as argument, instead of LogicalTapeSet and tape number. LogicalTapeSet > doesn't have the concept of a tape number anymore, it can contain any number > of tapes, and you can create more on the fly. With that, it'd be fairly easy > to make tuplesort.c merge LogicalTapes that came from different tape sets, > backed by different BufFiles. I think that'd avoid much of the unification > code. I think that it won't be possible to make a LogicalTapeSet ever use more than one BufFile without regressing the ability to eagerly reuse space, which is almost the entire reason for logtape.c existing. The whole indirect block thing is an idea borrowed from the FS world, of course, and so logtape.c needs one block-device-like BufFile, with blocks that can be reclaimed eagerly, but consumed for recycling in *contiguous* order (which is why they're sorted using qsort() within ltsGetFreeBlock()). You're going to increase the amount of random I/O by using more than one BufFile for an entire tapeset, I think. This patch you posted ("0001-Refactor-LogicalTapeSet-LogicalTape-interface.patch") just keeps one BufFile, and only changes the interface to expose the tapes themselves to tuplesort.c, without actually making tuplesort.c do anything with that capability. I see what you're getting at, I think, but I don't see how that accomplishes all that much for parallel CREATE INDEX. I mean, the special case of having multiple tapesets from workers (not one "unified" tapeset created from worker temp files from their tapesets to begin with) now needs special treatment. Haven't you just moved the complexity around (once your patch is made to care about parallelism)? Having multiple entire tapesets explicitly from workers, with their own BufFiles, is not clearly less complicated than managing ranges from BufFile fd.c files with delineated ranges of "logical tapeset space". Seems almost equivalent, except that my way doesn't bother tuplesort.c with any of this. >> +* As a consequence of only being permitted to write to the leader >> +* controlled range, parallel sorts that require a final >> materialized tape >> +* will use approximately twice the disk space for temp files >> compared to >> +* a more or less equivalent serial sort. > I'm slightly worried about that. Maybe it's OK for a first version, but it'd > be annoying in a query where a sort is below a merge join, for example, so > that you can't do the final merge on the fly because mark/restore support is > needed. My intuition is that we'll *never* end up using this for merge joins. I think that I could do better here (why should workers really care at this point?), but just haven't bothered to. This parallel sort implementation is something written with CREATE INDEX and CLUSTER in mind only (maybe one or two other things, too). I believe that for query execution, partitioning is the future [1]. With merge joins, partitioning is desirable because it lets you push down *everything* to workers, not just sorting (e.g., by aligning partitioning boundaries on each side of each merge join sort in the worker, and having the worker also "synchronize" each side of the join, all independently and without a dependency on a final merge). That's why I think it's okay that I use twice as much space for randomAccess tuplesort.c callers. No real world caller will ever end up needing to do this. It just seems like a good idea to support randomAccess when using this new infrastructure, on general principle. Forcing myself to support that case during initial development actually resulted in much cleaner, less invasive changes to tuplesort.c in general. [1] https://www.postgresql.org/message-id/flat/CAM3SWZR+ATYAzyMT+hm-Bo=1l1smtjbndtibwbtktyqs0dy...@mail.gmail.com#CAM3SWZR+ATYAzyMT+hm-Bo=1l1smtjbndtibwbtktyqs0dy...@mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] raised checkpoint limit & manual checkpoint
On Sat, Sep 24, 2016 at 12:12 PM, Fabien COELHO wrote: > > Hello, > > The checkpoint time limit has just been raised to one day after a discussion > started by Andres Freund: > > https://www.postgresql.org/message-id/20160202001320.GP8743%40awork2.anarazel.de > > I would have gone further up, say one week or even one month, but I think > that this new limit is an improvement over the previous 1 hour maximum. > > Now ISTM that there is a possible use case which arises with this new > setting and is not well addressed by postgresql: > > Let us say that an application has periods of high and low usage, say over a > day, so that I want to avoid a checkpoint from 8 to 20, but I'm okay after > that. I could raise the size and time limits so that they do not occur > during these hours and plan to do a manual CHECKPOINT once a day when I see > fit. > > Now the problem I see is that CHECKPOINT means "do a CHECKPOINT right now as > fast as possible", i.e. there is no throttling whatsoever, which leads to > heavy IO and may result in a very unresponsive database. > > I would suggest that a good complementary feature would be to allow a manual > checkpoint to run over a period of time, say something like: > > CHECKPOINT OVER '10 hours'; > > That would target to complete after this period (whether it succeeds or not > is another issue) instead of going as fast as possible, thus avoiding > some performance degradation. > Isn't it somewhat overlaps with existing parameter checkpoint_completion_target? You can use checkpoint_completion_target to throttle the checkpoints. The option you are suggesting seems to be more straight forward, but how will user decide the time he wants Checkpoint to take. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
On Wed, Sep 21, 2016 at 6:51 PM, Anastasia Lubennikova wrote: > 20.09.2016 08:21, Amit Kapila: > > On Tue, Sep 6, 2016 at 10:18 PM, Anastasia Lubennikova > wrote: > > 28.08.2016 09:13, Amit Kapila: > > > The problem seems really tricky, but the answer is simple. > We store included columns unordered. It was mentioned somewhere in > this thread. > Is there any fundamental problem in storing them in ordered way? I mean to say, you need to anyway store all the column values on leaf page, so why can't we find the exact location for the complete key. Basically use truncated key to reach to leaf level and then use the complete key to find the exact location to store the key. I might be missing some thing here, but if we can store them in ordered fashion, we can use them even for queries containing ORDER BY (where ORDER BY contains included columns). > Let me give you an example: > > create table t (i int, p point); > create index on (i) including (p); > "point" data type doesn't have any opclass for btree. > Should we insert (0, '(0,2)') before (0, '(1,1)') or after? > We have no idea what is the "correct order" for this attribute. > So the answer is "it doesn't matter". When searching in index, > we know that only key attrs are ordered, so only them can be used > in scankey. Other columns are filtered after retrieving data. > > explain select i,p from t where i =0 and p <@ circle '((0,0),2)'; > QUERY PLAN > --- > Index Only Scan using idx on t (cost=0.14..4.20 rows=1 width=20) >Index Cond: (i = 0) >Filter: (p <@ '<(0,0),2>'::circle) > I think here reason for using Filter is that because we don't keep included columns in scan keys, can't we think of having them in scan keys, but use only key columns in scan key to reach till leaf level and then use complete scan key at leaf level. > > The same approach is used for included columns of any type, even if > their data types have opclass. > > Is this truncation concept of high key needed for correctness of patch > or is it just to save space in index? If you need this, then I think > nbtree/Readme needs to be updated. > > > Now it's done only for space saving. We never check included attributes > in non-leaf pages, so why store them? Especially if we assume that included > attributes can be quite long. > There is already a note in documentation: > > +It's the same with other constraints (PRIMARY KEY and EXCLUDE). > This can > +also can be used for non-unique indexes as any columns which are > not required > +for the searching or ordering of records can be included in the > +INCLUDING clause, which can slightly reduce the size of > the index, > +due to storing included attributes only in leaf index pages. > Okay, thanks for clarification. > What should I add to README (or to documentation), > to make it more understandable? > May be add the data representation like only leaf pages contains all the columns and how the scan works. I think you can see if you can extend "Notes About Data Representation" and or "Other Things That Are Handy to Know" sections in existing README. > -- I am getting Assertion failure when I use this patch with database > created with a build before this patch. However, if I create a fresh > database it works fine. Assertion failure details are as below: > > LOG: database system is ready to accept connections > LOG: autovacuum launcher started > TRAP: unrecognized TOAST vartag("((bool) 1)", File: > "src/backend/access/common/h > eaptuple.c", Line: 532) > LOG: server process (PID 1404) was terminated by exception 0x8003 > HINT: See C include file "ntstatus.h" for a description of the hexadecimal > valu > e. > LOG: terminating any other active server processes > > > That is expected behavior, because catalog versions are not compatible. > But I wonder why there was no message about that? > I suppose, that's because CATALOG_VERSION_NO was outdated in my > patch. As well as I know, committer will change it before the commit. > Try new patch with updated value. It should fail with a message about > incompatible versions. > Yeah, that must be reason, but lets not change it now, otherwise we will face conflicts while applying patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactor StartupXLOG?
On Sat, Sep 24, 2016 at 11:01 AM, Thomas Munro wrote: > What would the appetite be for that kind of refactoring work, > considering the increased burden on committers who have to backpatch > bug fixes? Is it a project goal to reduce the size of large > complicated functions like StartupXLOG and heap_update? It seems like > a good way for new players to learn how they work. A lot of appetite. The size of xlog.c is out of control, so something that would be really cool to see is spliiting the whole logic of xlog.c into more independent files, for example low-level file only operations could go into xlogfile.c, backup code paths in xlogbackup.c, etc. This would make necessary to expose some of the shared-memory structures now at the top of xlog.c like XLogCtl but I think that would be really worth it at the end, and closer to the things like xloginsert.c and xlogarchive.c that began such a move. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring speculative insertion with unique indexes a little
On Tue, Sep 20, 2016 at 8:55 AM, Heikki Linnakangas wrote: > Thanks for working on this, and sorry for disappearing and dropping this on > the floor earlier. This doesn't apply anymore, thanks to 9c810a2e. Shouldn't > be hard to fix. Thanks for looking at it again. > I was in support of this earlier in general, even though I had some > questions on the details. But now that I look at the patch again, I'm not so > sure anymore. Honestly, I almost withdrew this patch myself, simply because it has dragged on so long. It has become ridiculous. > I don't think this actually makes things clearer. It adds new > cases that the code needs to deal with. Index AM writers now have to care > about the difference between a UNIQUE_CHECK_PARTIAL and > UNIQUE_CHECK_SPECULATIVE. You can have the exact same implementation for > both, but at the very least the index AM author needs to read the paragraph > you added to the docs to understand the difference. That adds some cognitive > load. I think it gives us the opportunity to discuss the differences, and in particular to explain why this "speculative insertion" thing exists at all. Besides, we can imagine an amcanunique implementation in which the difference might matter. Honestly, since it is highly unlikely that there ever will be another amcanunique access method, the cognitive burden to implementers of new amcanunique AMs is not a concern to me. Rather, the concern with that part of the patch is educating people about how the whole speculative insertion thing works in general, and drawing specific attention to it in a few key places in the code. Speculative insertion is subtle and complicated enough that I feel that that should be well described, as I've said many times. Remember how hard it was for us to come to agreement on the basic requirements in the first place! > Likewise, in ExecInsertIndexTuples(), this makes the deferred-index > case work slightly differently from speculative insertion. It's not a big > difference, but it again forces you to keep one more scenario in mind, when > reading the code This actually does have useful practical consequences, although I didn't point that out earlier (though I should have). To see what I mean, consider the complaint in this recent thread, which is based on an actual user application problem: https://www.postgresql.org/message-id/flat/1472134358.649524...@f146.i.mail.ru#1472134358.649524...@f146.i.mail.ru I go on to explain how this patch represents a partial solution to that [1]. That's what I mean by "useful practical consequences". As I say in [1], I think we could even get a full solution, if we applied this patch and *also* made the ordering in which the relcache returns a list of index OIDs more useful (it should still be based on something stable, to avoid deadlocks, but more than just OID order. Instead, relcache.c can sort indexes such that we insert into primary keys first, then unique indexes, then all other indexes. This also avoids bloat if there is a unique violation, by getting unique indexes out of the way first during ExecInsert()). > So overall, I think we should just drop this. Maybe a comment somewhere > would be in order, to point out that ExecInsertIndexTuples() and > index_insert might perform some unnecessary work, by inserting index tuples > for a doomed heap tuple, if a speculative insertion fails. But that's all. Perhaps. I'm curious about what your thoughts are on what I've said about "useful practical consequences" of finishing insertion earlier for speculative inserters. If you're still not convinced about this patch, having considered that as well, then I will drop the patch (or maybe we just add some comments, as you suggest). [1] https://www.postgresql.org/message-id/CAM3SWZTFTbK_Y%3D7uWJaXYLHnYQ99pV4KFpmjTKbNJR5_%2BQThzA%40mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
On Fri, Sep 23, 2016 at 7:27 AM, Thomas Munro wrote: > A couple of thoughts about abbreviated keys: > > #ifndef TRUST_STRXFRM > if (!collate_c) > abbreviate = false; > #endif > > I think this macro should affect only strxfrm, and we should trust > ucol_getSortKey or disable it independently. ICU's manual says > reassuring things like "Sort keys are most useful in databases" and > "Sort keys are generally only useful in databases or other > circumstances where function calls are extremely expensive". +1. Abbreviated keys are essential to get competitive performance while sorting text, and the fact that ICU makes them safe to reintroduce is a major advantage of adopting ICU. Perhaps we should consider wrapping strxfrm() instead, though, so that other existing callers of strxfrm() (I'm thinking of convert_string_datum()) almost automatically do the right thing. In other words, maybe there should be a pg_strxfrm() or something, with TRUST_STRXFRM changed to be something that can dynamically resolve whether or not it's a collation managed by a trusted collation provider (this could only be resolved at runtime, which I think is fine). > It looks like varstr_abbrev_convert calls strxfrm unconditionally > (assuming TRUST_STRXFRM is defined). This needs to > use ucol_getSortKey instead when appropriate. It looks like it's a > bit more helpful than strxfrm about telling you the output buffer size > it wants, and it doesn't need nul termination, which is nice. > Unfortunately it is like strxfrm in that the output buffer's contents > is unspecified if it ran out of space. One can use the ucol_nextSortKeyPart() interface to just get the first 4/8 bytes of an abbreviated key, reducing the overhead somewhat, so the output buffer size limitation is probably irrelevant. The ICU documentation says something about this being useful for Radix sort, but I suspect it's more often used to generate abbreviated keys. Abbreviated keys were not my original idea. They're really just a standard technique. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - minor fix for meta command only scripts
Hello Heikki, Yeah, it really is quite a mess. I tried to review your patch, and I think it's correct, but I couldn't totally convince myself, because of the existing messiness of the logic. So I bit the bullet and started refactoring. I came up with the attached. It refactors the logic in doCustom() into a state machine. I think this is much clearer, what do you think? The patch did not apply to master because of you committed the sleep fix in between. I updated the patch so that the fix is included as well. I think that this is really needed. The code is much clearer and simple to understand with the state machines & additional functions. This is a definite improvement to the code base. I've done quite some testing with various options (-r, --rate, --latency-limit, -C...) and got pretty reasonnable results. Although I cannot be absolutely sure that the refactoring does not introduce any new bug, I'm convinced that it will be much easier to find them:-) Attached are some small changes to your version: I have added the sleep_until fix. I have fixed a bug introduced in the patch by changing && by || in the (min_sec > 0 && maxsock != -1) condition which was inducing errors with multi-threads & clients... I have factored out several error messages in "commandFailed", in place of the "metaCommandFailed", and added the script number as well in the error messages. All messages are now specific to the failed command. I have added two states to the machine: - CSTATE_CHOOSE_SCRIPT which simplifies threadRun, there is now one call to chooseScript instead of two before. - CSTATE_END_COMMAND which manages is_latencies and proceeding to the next command, thus merging the three instances of updating the stats that were in the first version. The later state means that processing query results is included in the per statement latency, which is an improvement because before I was getting some transaction latency significantly larger that the apparent sum of the per-statement latencies, which did not make much sense... I have added & updated a few comments. There are some places where the break could be a pass through instead, not sure how desirable it is, I'm fine with break. Well, the comment right there says "note this is not included in the statement latency numbers", so apparently it's intentional. Whether it's a good idea or not, I don't know :-). It does seem a bit surprising. Indeed, it also results in apparently inconsistent numbers, and it creates a mess for recording the statement latency because it meant that in some case the latency was collected before the actual end of the command, see the discussion about CSTATE_END_COMMAND above. But what seems more bogus to me is that we do that after recording the *transaction* stats, if this was the last command. So the PQgetResult() of the last command in the transaction is not included in the transaction stats, even though the PQgetResult() calls for any previous commands are. (Perhaps that's what you meant too?) I changed that in my patch, it would've been inconvenient to keep that old behavior, and it doesn't make any sense to me anyway. Fine with me. -- Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 8b24ad5..502e644 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -235,25 +235,97 @@ typedef struct StatsData } StatsData; /* - * Connection state + * Connection state machine states. + */ +typedef enum +{ + /* + * The client must first choose a script to execute. Once chosen, it can + * either be throttled (state CSTATE_START_THROTTLE under --rate) or start + * right away (state CSTATE_START_TX). + */ + CSTATE_CHOOSE_SCRIPT, + + /* + * In CSTATE_START_THROTTLE state, we calculate when to begin the next + * transaction, and advance to CSTATE_THROTTLE. CSTATE_THROTTLE state + * sleeps until that moment. (If throttling is not enabled, doCustom() + * falls directly through from CSTATE_START_THROTTLE to CSTATE_START_TX.) + */ + CSTATE_START_THROTTLE, + CSTATE_THROTTLE, + + /* + * CSTATE_START_TX performs start-of-transaction processing. Establishes + * a new connection for the transaction, in --connect mode, and records + * the transaction start time. + */ + CSTATE_START_TX, + + /* + * We loop through these states, to process each command in the + * script: + * + * CSTATE_START_COMMAND starts the execution of a command. On a SQL + * command, the command is sent to the server, and we move to + * CSTATE_WAIT_RESULT state. On a \sleep meta-command, the timer is + * set, and we enter the CSTATE_SLEEP state to wait for it to expire. + * Other meta-commands are executed immediately, and we proceed to next + * command. + * + * CSTATE_WAIT_RESULT waits until we get a result set back from the server + * for the current command, and proceeds to CSTATE_END_COMMAND. + * + * CSTATE_SLEEP waits until the end of \sleep
Re: [HACKERS] Refactor StartupXLOG?
On Sat, Sep 24, 2016 at 8:24 PM, Heikki Linnakangas wrote: > On 09/24/2016 05:01 AM, Thomas Munro wrote: >> >> What would the appetite be for that kind of refactoring work, >> considering the increased burden on committers who have to backpatch >> bug fixes? Is it a project goal to reduce the size of large >> complicated functions like StartupXLOG and heap_update? It seems like >> a good way for new players to learn how they work. > > > +1. Yes, it does increase the burden of backpatching, but I think it'd still > be very much worth it. Cool. > A couple of little details that caught my eye at a quick read: > >> /* Try to find a backup label. */ >> if (read_backup_label(&checkPointLoc, &backupEndRequired, >> &backupFromStandby)) >> { >> wasShutdown = ProcessBackupLabel(xlogreader, &checkPoint, >> checkPointLoc, >> >> &haveTblspcMap); >> >> /* set flag to delete it later */ >> haveBackupLabel = true; >> } >> else >> { >> /* Clean up any orphaned tablespace map files with no >> backup label. */ >> CleanUpTablespaceMap(); >> ... > > > This is a bit asymmetric: In the true-branch, ProcessBackupLabel() reads the > tablespace map, and sets InArchiveRecovery and StandbyMode, but in the > false-branch, StartupXLog() calls CleanupTablespaceMap() and sets those > variables directly. Right. I need to move all or some of the other branch out to its own function too. > For functions like BeginRedo, BeginHotStandby, ReplayRedo, etc., I think > it'd be better to have the "if (InRecovery)" checks in the caller, rather > than in the functions. Yeah. I was thinking that someone might value the preservation of indention level, since that might make small localised bug fixes easier to backport to the monolithic StartupXLOG. Plain old otherwise-redundant curly braces would achieve that. Or maybe it's better not to worry about preserving that. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Quorum commit for multiple synchronous replication.
On Wed, Sep 21, 2016 at 11:22 PM, Robert Haas wrote: > On Wed, Sep 21, 2016 at 5:54 AM, Petr Jelinek wrote: Reading again the thread, it seems that my previous post [1] was a bit misunderstood. My position is to not introduce any new behavior changes in 9.6, so we could just make the FIRST NUM grammar equivalent to NUM. [1]: https://www.postgresql.org/message-id/CAB7nPqRDvJn18e54ccNpOP1A2_iUN6-iU=4njgmmgiagvcs...@mail.gmail.com >>> >>> I misunderstood your intent, then. But I still stand by what I did >>> understand, namely that 'k (...)' should mean 'any k (...)'. It's much >>> more natural than having it mean 'first k (...)' and I also think it >>> will be more frequent in practice. >>> >> >> I think so as well. > > Well, I agree, but I think making behavior changes after rc1 is a > non-starter. It's better to live with the incompatibility than to > change the behavior so close to release. At least, that's my > position. Getting the release out on time with a minimal bug count is > more important to me than a minor incompatibility in the meaning of > one GUC. > As the release team announced, it's better to postpone changing the syntax of existing s_s_name. I still vote for changing behaviour of existing syntax 'k (n1, n2)' to quorum commit. That is, 1. 'First k (n1, n2, n3)' means that the master server waits for ACKs from k standby servers whose name appear earlier in the list. 2. 'Any k (n1, n2, n3)' means that the master server waits for ACKs from any k listed standby servers. 3. 'n1, n2, n3' is the same as #1 with k=1. 4. '(n1, n2, n3)' is the same as #2 with k=1. Attached updated patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center quorum_commit_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactor StartupXLOG?
On 09/24/2016 05:01 AM, Thomas Munro wrote: What would the appetite be for that kind of refactoring work, considering the increased burden on committers who have to backpatch bug fixes? Is it a project goal to reduce the size of large complicated functions like StartupXLOG and heap_update? It seems like a good way for new players to learn how they work. +1. Yes, it does increase the burden of backpatching, but I think it'd still be very much worth it. A couple of little details that caught my eye at a quick read: /* Try to find a backup label. */ if (read_backup_label(&checkPointLoc, &backupEndRequired, &backupFromStandby)) { wasShutdown = ProcessBackupLabel(xlogreader, &checkPoint, checkPointLoc, &haveTblspcMap); /* set flag to delete it later */ haveBackupLabel = true; } else { /* Clean up any orphaned tablespace map files with no backup label. */ CleanUpTablespaceMap(); ... This is a bit asymmetric: In the true-branch, ProcessBackupLabel() reads the tablespace map, and sets InArchiveRecovery and StandbyMode, but in the false-branch, StartupXLog() calls CleanupTablespaceMap() and sets those variables directly. For functions like BeginRedo, BeginHotStandby, ReplayRedo, etc., I think it'd be better to have the "if (InRecovery)" checks in the caller, rather than in the functions. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.6 TAP tests and extensions
On 24 Sep. 2016 04:04, "Tom Lane" wrote:. > > > > > It's thus sufficient to apply the patch to install the perl modules to > > 9.4, 9.5 and 9.6. Nothing else is needed. I've attached backports for > > 9.4 and 9.5. > > Pushed with cosmetic adjustments --- Thanks. > Looking back over the thread, I see that you also proposed installing > isolationtester and pg_isolation_regress for the benefit of extensions. > I'm very much less excited about that idea. It'd be substantially more > dead weight in typical installations, and I'm not sure that it'd be useful > to common extensions, and I'm not eager to treat isolationtester's API > and behavior as something we need to hold stable for extension use. Fine by me. I have no particular problem expecting those to be installed explicitly by ext devs who want to use them.