Re: Unhappy about API changes in the no-fsm-for-small-rels patch
On Tue, Apr 30, 2019 at 11:42 AM John Naylor wrote: > > On Fri, Apr 26, 2019 at 11:52 AM Amit Kapila wrote: > > As discussed above, we need to issue an > > invalidation for following points: (a) when vacuum finds there is no > > FSM and page has more space now, I think you can detect this in > > RecordPageWithFreeSpace > > I took a brief look and we'd have to know how much space was there > before. That doesn't seem possible without first implementing the idea > to save free space locally in the same way the FSM does. Even if we > have consensus on that, there's no code for it, and we're running out > of time. > > > (b) invalidation to notify the existence of > > FSM, this can be done both by vacuum and backend. > > I still don't claim to be anything but naive in this area, but does > the attached get us any closer? > @@ -776,7 +776,10 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks) if ((rel->rd_smgr->smgr_fsm_nblocks == 0 || rel->rd_smgr->smgr_fsm_nblocks == InvalidBlockNumber) && !smgrexists(rel->rd_smgr, FSM_FORKNUM)) + { smgrcreate(rel->rd_smgr, FSM_FORKNUM, false); + fsm_clear_local_map(rel); + } I think this won't be correct because when we call fsm_extend via vacuum the local map won't be already existing, so it won't issue an invalidation call. Isn't it better to directly call CacheInvalidateRelcache here to notify other backends that their local maps are invalid now? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: POC: Cleaning up orphaned files using undo logs
On Tue, Apr 30, 2019 at 11:45 AM Dilip Kumar wrote: The attached patch will provide mechanism for masking the necessary bits in undo pages for supporting consistency checking for the undo pages. Ideally we can merge this patch with the main interface patch but currently I have kept it separate for mainly because a) this is still a WIP patch and b) review of the changes will be easy. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com 0005-undo-page-consistency-checker_WIP_v3.patch Description: Binary data
Re: Unhappy about API changes in the no-fsm-for-small-rels patch
On Thu, May 2, 2019 at 7:36 AM John Naylor wrote: > > On Wed, May 1, 2019 at 11:24 PM Andres Freund wrote: > > > > Hi, > > > > On 2019-04-18 14:10:29 -0700, Andres Freund wrote: > > > My compromise suggestion would be to try to give John and Amit ~2 weeks > > > to come up with a cleanup proposal, and then decide whether to 1) revert > > > 2) apply the new patch, 3) decide to live with the warts for 12, and > > > apply the patch in 13. As we would already have a patch, 3) seems like > > > it'd be more tenable than without. > > > > I think decision time has come. My tentative impression is that we're > > not there yet, You are right that patch is not in committable shape, but the patch to move the map to relcache is presented and the main work left there is to review/test and add the invalidation calls as per discussion. It is just that I don't want to that in haste leading to some other problems. So, that patch should not take too much time and will resolve the main complaint. Basically, I was planning to re-post that patch as the discussion concludes between me and Alvaro and then probably you can also look into it once to see if that addresses the main complaint. There are a few other points for which John has prepared a patch and that might need some work based on your inputs. >> and should revert the improvements in v12, and apply the > > improved version early in v13. As a second choice, we should live with > > the current approach, if John and Amit "promise" further effort to clean > > this up for v13. > > Yes, the revised approach is not currently as mature as the one in > HEAD. It's not ready. Not wanting to attempt Promise Driven > Development, I'd rather revert, and only try again if there's enough > time and interest. > I can certainly help with moving patch (for cleanup) forward. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Unhappy about API changes in the no-fsm-for-small-rels patch
On Thu, May 2, 2019 at 3:42 AM Alvaro Herrera wrote: > > On 2019-May-01, Amit Kapila wrote: > > > On Tue, Apr 30, 2019 at 7:52 PM Alvaro Herrera > > wrote: > > > > Hmm ... so, if vacuum runs and frees up any space from any of the pages, > > > then it should send out an invalidation -- it doesn't matter what the > > > FSM had, just that there is more free space now. That means every other > > > process will need to determine a fresh FSM, > > > > I think you intend to say the local space map because once FSM is > > created we will send invalidation and we won't further build relcache > > entry having local space map. > > Yeah, I mean the map that records free space. > > > > but that seems correct. Sounds better than keeping outdated entries > > > indicating no-space-available. > > > > Agreed, but as mentioned in one of the above emails, I am also bit > > scared that it should not lead to many invalidation messages for small > > relations, so may be we should send the invalidation message only when > > the entire page is empty. > > I don't think that's a concern, is it? You typically won't be running > multiple vacuums per second, or even multiple vacuums per minute. > That's right. So let's try by adding invalidation call whenever space is reduced. Is there a good way to test whether the new invalidation calls added by this patch has any significant impact? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: using index or check in ALTER TABLE SET NOT NULL
On Thu, 2 May 2019 at 13:08, Tom Lane wrote: > > David Rowley writes: > > On Thu, 2 May 2019 at 06:18, Sergei Kornilov wrote: > >> PS: not think this is blocker for v12 > > > Probably not a blocker, but now does not seem like an unreasonable > > time to lower these INFO messages down to DEBUG. > > Not a blocker perhaps, but it's better if we can get new behavior to > be more or less right the first time. It's not really new behaviour though. The code in question is for ATTACH PARTITION and was added in c31e9d4bafd (back in 2017). bbb96c3704c is the commit for using constraints to speed up SET NOT NULL. The mention of using the constraint for proof was made DEBUG1 in the initial commit. What we need to decide is if we want to make ATTACH PARTITION's INFO message a DEBUG1 in pg12 or not. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
Hi, On 2019-05-01 22:01:53 -0400, Tom Lane wrote: > Andres Freund writes: > > Well, as I said before, I think hiding the to-be-rebuilt index from the > > list of indexes is dangerous too - if somebody added an actual > > CatalogUpdate/Insert (rather than inplace_update) anywhere along the > > index_build() path, we'd not get an assertion failure anymore, but just > > an index without the new entry. And given the fragility with HOT hiding > > that a lot of the time, that seems dangerous to me. > > I think that argument is pretty pointless considering that "REINDEX TABLE > pg_class" does it this way, and that code is nearly old enough to > vote. IMO the reindex_relation() case isn't comparable. By my read the main purpose there is to prevent inserting into not-yet-rebuilt indexes. The relevant comment says: * If we are processing pg_class itself, we want to make sure * that the updates do not try to insert index entries into indexes we * have not processed yet. (When we are trying to recover from corrupted * indexes, that could easily cause a crash.) Note the *not processed yet* bit. That's *not* comparable logic to hiding the index that *already* has been rebuilt, in the middle of reindex_index(). Yes, the way reindex_relation() is currently coded, the RelationSetIndexList() *also* hides the already rebuilt index, but that's hard for reindex_relation() to avoid, because it's outside of reindex_index(). > + * If we are doing one index for reindex_relation, then we will find > that > + * the index is already not present in the index list. In that case we > + * don't have to do anything to the index list here, which we mark by > + * clearing is_pg_class. >*/ > - RelationSetNewRelfilenode(iRel, persistence); > + is_pg_class = (RelationGetRelid(heapRelation) == RelationRelationId); > + if (is_pg_class) > + { > + allIndexIds = RelationGetIndexList(heapRelation); > + if (list_member_oid(allIndexIds, indexId)) > + { > + otherIndexIds = list_delete_oid(list_copy(allIndexIds), > indexId); > + /* Ensure rd_indexattr is valid; see comments for > RelationSetIndexList */ > + (void) RelationGetIndexAttrBitmap(heapRelation, > INDEX_ATTR_BITMAP_ALL); > + } > + else > + is_pg_class = false; > + } That's not pretty either :( Greetings, Andres Freund
Re: Unhappy about API changes in the no-fsm-for-small-rels patch
On Wed, May 1, 2019 at 11:24 PM Andres Freund wrote: > > Hi, > > On 2019-04-18 14:10:29 -0700, Andres Freund wrote: > > My compromise suggestion would be to try to give John and Amit ~2 weeks > > to come up with a cleanup proposal, and then decide whether to 1) revert > > 2) apply the new patch, 3) decide to live with the warts for 12, and > > apply the patch in 13. As we would already have a patch, 3) seems like > > it'd be more tenable than without. > > I think decision time has come. My tentative impression is that we're > not there yet, and should revert the improvements in v12, and apply the > improved version early in v13. As a second choice, we should live with > the current approach, if John and Amit "promise" further effort to clean > this up for v13. Yes, the revised approach is not currently as mature as the one in HEAD. It's not ready. Not wanting to attempt Promise Driven Development, I'd rather revert, and only try again if there's enough time and interest. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
Andres Freund writes: > On 2019-05-01 19:41:24 -0400, Tom Lane wrote: >> OK, so per the other thread, it seems like the error recovery problem >> isn't going to affect this directly. However, I still don't like this >> proposal much; the reason being that it's a rather fundamental change >> in the API of RelationSetNewRelfilenode. This will certainly break >> any external callers of that function --- and silently, too. > Couldn't we just address that by adding a new > RelationSetNewRelfilenodeInternal() that's then wrapped by > RelationSetNewRelfilenode() which just does > RelationSetNewRelfilenodeInternal();CCI();? That's just adding more ugliness ... >> The solution I'm thinking of should have much more localized effects, >> basically just in reindex_index and RelationSetNewRelfilenode, which is >> why I like it better. > Well, as I said before, I think hiding the to-be-rebuilt index from the > list of indexes is dangerous too - if somebody added an actual > CatalogUpdate/Insert (rather than inplace_update) anywhere along the > index_build() path, we'd not get an assertion failure anymore, but just > an index without the new entry. And given the fragility with HOT hiding > that a lot of the time, that seems dangerous to me. I think that argument is pretty pointless considering that "REINDEX TABLE pg_class" does it this way, and that code is nearly old enough to vote. Perhaps there'd be value in rewriting things so that we don't need RelationSetIndexList at all, but it's not real clear to me what we'd do instead, and in any case I don't agree with back-patching such a change. In the near term it seems better to me to make "REINDEX INDEX some-pg_class-index" handle this problem the same way "REINDEX TABLE pg_class" has been doing for many years. Attached is a draft patch for this. It passes check-world with xxx_FORCE_RELEASE, and gets through reindexing pg_class with CLOBBER_CACHE_ALWAYS, but I've not completed a full CCA run. regards, tom lane diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index ce02410..1af959c 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -3261,6 +3261,9 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence, heapRelation; Oid heapId; IndexInfo *indexInfo; + bool is_pg_class; + List *allIndexIds = NIL; + List *otherIndexIds = NIL; volatile bool skipped_constraint = false; PGRUsage ru0; @@ -3331,19 +3334,52 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence, } /* - * Build a new physical relation for the index. Need to do that before - * "officially" starting the reindexing with SetReindexProcessing - - * otherwise the necessary pg_class changes cannot be made with - * encountering assertions. + * RelationSetNewRelfilenode will need to update the index's pg_class row. + * If we are reindexing some index of pg_class, that creates a problem; + * once we call SetReindexProcessing, the index support will complain if + * we try to insert a new index entry. But we can't do that in the other + * order without creating other problems. We solve this by temporarily + * removing the target index from pg_class's index list. It won't get + * updated during RelationSetNewRelfilenode, but that's fine because the + * subsequent index build will include the new entry. (There are more + * comments about this in reindex_relation.) + * + * If we are doing one index for reindex_relation, then we will find that + * the index is already not present in the index list. In that case we + * don't have to do anything to the index list here, which we mark by + * clearing is_pg_class. */ - RelationSetNewRelfilenode(iRel, persistence); + is_pg_class = (RelationGetRelid(heapRelation) == RelationRelationId); + if (is_pg_class) + { + allIndexIds = RelationGetIndexList(heapRelation); + if (list_member_oid(allIndexIds, indexId)) + { + otherIndexIds = list_delete_oid(list_copy(allIndexIds), indexId); + /* Ensure rd_indexattr is valid; see comments for RelationSetIndexList */ + (void) RelationGetIndexAttrBitmap(heapRelation, INDEX_ATTR_BITMAP_ALL); + } + else + is_pg_class = false; + } - /* ensure SetReindexProcessing state isn't leaked */ + /* + * Ensure SetReindexProcessing state isn't leaked. (We don't have to + * clean up the RelationSetIndexList state on error, though; transaction + * abort knows about fixing that.) + */ PG_TRY(); { /* Suppress use of the target index while rebuilding it */ SetReindexProcessing(heapId, indexId); + /* ... and suppress updating it too, if necessary */ + if (is_pg_class) + RelationSetIndexList(heapRelation, otherIndexIds); + + /* Build a new physical relation for the index */ + RelationSetNewRelfilenode(iRel, persistence); + /* Initialize the index and rebuild */ /* Note: we do not need to re-establish pkey setting */ index_build(heapRelation, iRel, in
Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
Hi, On 2019-05-01 19:41:24 -0400, Tom Lane wrote: > I wrote: > > Andres Freund writes: > >> On 2019-05-01 10:06:03 -0700, Andres Freund wrote: > >>> I'm not sure this is the right short-term answer. Why isn't it, for now, > >>> sufficient to do what I suggested with RelationSetNewRelfilenode() not > >>> doing the CommandCounterIncrement(), and reindex_index() then doing the > >>> SetReindexProcessing() before a CommandCounterIncrement()? That's like > >>> ~10 line code change, and a few more with comments. > > > That looks like a hack to me... > > > The main thing I'm worried about right now is that I realized that > > our recovery from errors in this area is completely hosed, cf > > https://www.postgresql.org/message-id/4541.1556736...@sss.pgh.pa.us > > OK, so per the other thread, it seems like the error recovery problem > isn't going to affect this directly. However, I still don't like this > proposal much; the reason being that it's a rather fundamental change > in the API of RelationSetNewRelfilenode. This will certainly break > any external callers of that function --- and silently, too. > > Admittedly, there might not be any outside callers, but I don't really > like that assumption for something we're going to have to back-patch. Couldn't we just address that by adding a new RelationSetNewRelfilenodeInternal() that's then wrapped by RelationSetNewRelfilenode() which just does RelationSetNewRelfilenodeInternal();CCI();? Doesn't have to be ...Internal(), could also be RelationBeginSetNewRelfilenode() or such. I'm not sure why you think using CCI() for this purpose is a hack? To me the ability to have catalog changes only take effect when they're all done, and the system is ready for them, is one of the core purposes of the infrastructure? > The solution I'm thinking of should have much more localized effects, > basically just in reindex_index and RelationSetNewRelfilenode, which is > why I like it better. Well, as I said before, I think hiding the to-be-rebuilt index from the list of indexes is dangerous too - if somebody added an actual CatalogUpdate/Insert (rather than inplace_update) anywhere along the index_build() path, we'd not get an assertion failure anymore, but just an index without the new entry. And given the fragility with HOT hiding that a lot of the time, that seems dangerous to me. Greetings, Andres Freund
Re: using index or check in ALTER TABLE SET NOT NULL
David Rowley writes: > On Thu, 2 May 2019 at 06:18, Sergei Kornilov wrote: >> PS: not think this is blocker for v12 > Probably not a blocker, but now does not seem like an unreasonable > time to lower these INFO messages down to DEBUG. Not a blocker perhaps, but it's better if we can get new behavior to be more or less right the first time. regards, tom lane
Re: PostgreSQL Asian language support for full text search using ICU (and also updating pg_trgm)
[redirected to hackers list since I think this topic is related to adding new PostgreSQL feature.] I think there's no doubt that it would be nice if PostgreSQL natively supports Asian languages. For the first step, I briefly tested the ICU tokenizer (ubrk_open and other functions) with Japanese, the only Asian language I understand. The result was a little bit different from the most popular Japanese tokenizer "Mecab" [1], but it seems I can live with that as far as it's used for full text search purpose. Of course more tests would be needed though. In addition to the accuracy of tokenizing, performance is of course important. This needs more work. I think same studies would be needed for other Asian languages. Hope someone who is familiar with other Asian languages volunteers to do the task. [1] https://taku910.github.io/mecab/ Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp From: Chanon Sajjamanochai Subject: PostgreSQL Asian language support for full text search using ICU (and also updating pg_trgm) Date: Wed, 1 May 2019 08:55:50 +0700 Message-ID: > Hello, > > Currently PostgreSQL doesn't support full text search natively for many > Asian languages such as Chinese, Japanese and others. These languages are > used by a large portion of the population of the world. > > The two key modules that could be modified to support Asian languages are > the full text search module (including tsvector) and pg_trgm. > > I would like to propose that this support be added to PostgreSQL. > > For full text search, PostgreSQL could add a new parser ( > https://www.postgresql.org/docs/9.2/textsearch-parsers.html) that > implements ICU word tokenization. This should be a lot more easier than > before now that PostgreSQL itself already includes ICU dependencies for > other things. > > Then allow the ICU parser to be chosen at run-time (via a run-time config > or an option to to_tsvector). That is all that is needed to support full > text search for many more Asian languages natively in PostgreSQL such as > Chinese, Japanese and Thai. > > For example Elastic Search implements this using its ICU Tokenizer plugin: > https://www.elastic.co/guide/en/elasticsearch/guide/current/icu-tokenizer.html > > Some information about the related APIs in ICU for this are at: > http://userguide.icu-project.org/boundaryanalysis > > Another simple improvement that would give another option for searching for > Asian languages is to add a run-time setting for pg_trgm that would tell it > to not drop non-ascii characters, as currently it only indexes ascii > characters and thus all Asian language characters are dropped. > > I emphasize 'run-time setting' because when using PostgreSQL via a > Database-As-A-Service service provider, most of the time it is not possible > to change the config files, recompile sources, or add any new extensions. > > PostgreSQL is an awesome project and probably the best RDBMS right now. I > hope the maintainers consider this suggestion. > > Best Regards, > Chanon
Re: using index or check in ALTER TABLE SET NOT NULL
On Thu, 2 May 2019 at 06:18, Sergei Kornilov wrote: > > > I proposed that we didn't need those messages at all, which Robert pushed > > back on, but I'd be willing to compromise by reducing them to NOTICE or > > DEBUG level. > > I posted patch with such change in a separate topic: > https://commitfest.postgresql.org/23/2076/ > > PS: not think this is blocker for v12 Probably not a blocker, but now does not seem like an unreasonable time to lower these INFO messages down to DEBUG. If not, then we can just remove the item from the open items list. I just put it there as I didn't want it getting forgotten about as it wasn't going to be anybody's priority to think hard about it on the final few days of the last commitfest of the release. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
I wrote: > Andres Freund writes: >> On 2019-05-01 10:06:03 -0700, Andres Freund wrote: >>> I'm not sure this is the right short-term answer. Why isn't it, for now, >>> sufficient to do what I suggested with RelationSetNewRelfilenode() not >>> doing the CommandCounterIncrement(), and reindex_index() then doing the >>> SetReindexProcessing() before a CommandCounterIncrement()? That's like >>> ~10 line code change, and a few more with comments. > That looks like a hack to me... > The main thing I'm worried about right now is that I realized that > our recovery from errors in this area is completely hosed, cf > https://www.postgresql.org/message-id/4541.1556736...@sss.pgh.pa.us OK, so per the other thread, it seems like the error recovery problem isn't going to affect this directly. However, I still don't like this proposal much; the reason being that it's a rather fundamental change in the API of RelationSetNewRelfilenode. This will certainly break any external callers of that function --- and silently, too. Admittedly, there might not be any outside callers, but I don't really like that assumption for something we're going to have to back-patch. The solution I'm thinking of should have much more localized effects, basically just in reindex_index and RelationSetNewRelfilenode, which is why I like it better. regards, tom lane
Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)
Andres Freund writes: > On 2019-05-01 14:44:12 -0400, Tom Lane wrote: >> I'm busy looking into the REINDEX-on-pg_class mess, and one thing I found >> while testing HEAD is that with CLOBBER_CACHE_ALWAYS on, once you've >> gotten a failure, your session is hosed: >> >> regression=# select 1; >> ?column? >> -- >> 1 >> (1 row) >> >> regression=# reindex index pg_class_oid_index; >> psql: ERROR: could not read block 0 in file "base/16384/35344": read only 0 >> of 8192 bytes >> regression=# select 1; >> psql: ERROR: could not open file "base/16384/35344": No such file or >> directory > Yea, I noticed that too. Lead me astray for a while, because it > triggered apparent REINDEX failures for pg_index too, even though not > actually related to the reindex. It seems that the reason we fail to recover is that we detect the error during CommandEndInvalidationMessages, after we've updated the relcache entry for pg_class_oid_index; of course at that point, any additional pg_class access is going to fail because it'll try to use the not-yet-existent index. However, when we then abort the transaction, we fail to revert pg_class_oid_index's relcache entry because *there is not yet an invalidation queue entry telling us to do so*. This is, therefore, an order-of-operations bug in CommandEndInvalidationMessages. It should attach the "current command" queue entries to PriorCmdInvalidMsgs *before* it processes them, not after. In this way they'll be available to be reprocessed by AtEOXact_Inval or AtEOSubXact_Inval if we fail during CCI. (A different way we could attack this is to make AtEOXact_Inval and AtEOSubXact_Inval process "current" messages, as they do not today. But I think that's a relatively inefficient solution, as it would force those messages to be reprocessed even in the much-more-common case where we abort before reaching CommandEndInvalidationMessages.) The attached quick-hack patch changes that around, and seems to survive testing (though I've not completed a CCA run with it). The basic change I had to make to make this work is to make AppendInvalidationMessageList actually append the current-commands list to the prior-cmds list, not prepend as it's really always done. (In that way, we can still scan the current-commands list just by starting at its head.) The comments for struct InvalidationChunk explicitly disclaim any significance to the order of the entries, so this should be okay, and I'm not seeing any problem in testing. It's been a long time since I wrote that code, but I think the reason I did it like that was the thought that in a long transaction, the prior-cmds list might be much longer than the current-commands list, so iterating down the prior-cmds list could add an O(N^2) cost. The easy answer to that, of course, is to make the lists doubly-linked, and I'd do that before committing; this version is just for testing. (It's short on comments, too.) The thing I was worried about in RelationCacheInvalidate does seem to be a red herring, at least fixing it is not necessary to make the broken-session-state problem go away. Comments? regards, tom lane diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c index f09e3a9..8894bdf 100644 *** a/src/backend/utils/cache/inval.c --- b/src/backend/utils/cache/inval.c *** AddInvalidationMessage(InvalidationChunk *** 264,287 } /* ! * Append one list of invalidation message chunks to another, resetting ! * the source chunk-list pointer to NULL. */ static void AppendInvalidationMessageList(InvalidationChunk **destHdr, InvalidationChunk **srcHdr) { ! InvalidationChunk *chunk = *srcHdr; ! if (chunk == NULL) return; /* nothing to do */ ! while (chunk->next != NULL) ! chunk = chunk->next; ! chunk->next = *destHdr; ! *destHdr = *srcHdr; *srcHdr = NULL; } --- 264,294 } /* ! * Append the "source" list of invalidation message chunks to the "dest" ! * list, resetting the source chunk-list pointer to NULL. ! * Note that if the caller hangs onto the previous source pointer, ! * the source list is still separately traversable afterwards. */ static void AppendInvalidationMessageList(InvalidationChunk **destHdr, InvalidationChunk **srcHdr) { ! InvalidationChunk *chunk; ! if (*srcHdr == NULL) return; /* nothing to do */ ! chunk = *destHdr; ! if (chunk == NULL) ! *destHdr = *srcHdr; ! else ! { ! while (chunk->next != NULL) ! chunk = chunk->next; ! chunk->next = *srcHdr; ! } *srcHdr = NULL; } *** xactGetCommittedInvalidationMessages(Sha *** 858,867 */ oldcontext = MemoryContextSwitchTo(CurTransactionContext); - ProcessInvalidationMessagesMulti(&transInvalInfo->CurrentCmdInvalidMsgs, - MakeSharedInvalidMessagesArray); ProcessInvalidationMessagesMulti(&transInvalInfo->PriorCmdInvalidMsgs, M
Re: Unhappy about API changes in the no-fsm-for-small-rels patch
On 2019-May-01, Amit Kapila wrote: > On Tue, Apr 30, 2019 at 7:52 PM Alvaro Herrera > wrote: > > Hmm ... so, if vacuum runs and frees up any space from any of the pages, > > then it should send out an invalidation -- it doesn't matter what the > > FSM had, just that there is more free space now. That means every other > > process will need to determine a fresh FSM, > > I think you intend to say the local space map because once FSM is > created we will send invalidation and we won't further build relcache > entry having local space map. Yeah, I mean the map that records free space. > > but that seems correct. Sounds better than keeping outdated entries > > indicating no-space-available. > > Agreed, but as mentioned in one of the above emails, I am also bit > scared that it should not lead to many invalidation messages for small > relations, so may be we should send the invalidation message only when > the entire page is empty. I don't think that's a concern, is it? You typically won't be running multiple vacuums per second, or even multiple vacuums per minute. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: to_timestamp docs
On Thu, May 2, 2019 at 12:49 AM Alexander Korotkov wrote: > > It works globally (but only for remaining string if you don't put it > > at the beginning) > > and you can set it only once. For example: > > > > =# SELECT to_timestamp('JUL JUL JUL','FXMON_MON_MON'); > > ERROR: invalid value " J" for "MON" > > DETAIL: The given value did not match any of the allowed values for this > > field. > > Actually, FX takes effect on subsequent format patterns. This is not > documented, but it copycats Oracle behavior. Sure, normally FX should > be specified as the first item. We could document current behavior or > restrict specifying FX not as first item. This is also not new in 12, > so documenting current behavior is better for compatibility. I went to Oracle's documentation. It seems that the behavior is slightly different. Their documentation says: "A modifier can appear in a format model more than once. In such a case, each subsequent occurrence toggles the effects of the modifier. Its effects are enabled for the portion of the model following its first occurrence, and then disabled for the portion following its second, and then reenabled for the portion following its third, and so on." In PostgreSQL one cannot disable exact mode using second FX. I think we shouldn't add some restriction for FX. Instead PostgreSQL's documentation can be fixed. And current explanation in the documentation might be wrong as Bruce pointed. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: to_timestamp docs
On Thu, May 2, 2019 at 12:49:23AM +0300, Alexander Korotkov wrote: > On Wed, May 1, 2019 at 11:20 PM Arthur Zakirov > wrote: > > Hello, > > Not sure if we need some additional checks here if FX is set. > > I'd like to add that this behavior is not new in 12. It was the same before. Agreed, but since we are looking at it, let's document it. > > > It seems DD and (as numerics?) in FX mode eat trailing whitespace, > > > while MON does not? Also, I used these queries to determine it is > > > "trailing" whitespace that "FXMON" controls: > > > > > > SELECT to_timestamp('JUL JUL JUL','MON_FXMON_MON'); > > > to_timestamp > > > - > > > 0001-07-01 00:00:00-04:56:02 BC > > > > > > SELECT to_timestamp('JUL JUL JUL','MON_FXMON_MON'); > > > ERROR: invalid value " J" for "MON" > > > DETAIL: The given value did not match any of the allowed values > > > for this field. > > > > The problem here is that you need to specify FX only once and at beginning > > of > > the format string. It is stated in the documentation: > > > > "FX must be specified as the first item in the template." > > > > It works globally (but only for remaining string if you don't put it > > at the beginning) > > and you can set it only once. For example: > > > > =# SELECT to_timestamp('JUL JUL JUL','FXMON_MON_MON'); > > ERROR: invalid value " J" for "MON" > > DETAIL: The given value did not match any of the allowed values for this > > field. > > Actually, FX takes effect on subsequent format patterns. This is not > documented, but it copycats Oracle behavior. Sure, normally FX should > be specified as the first item. We could document current behavior or > restrict specifying FX not as first item. This is also not new in 12, > so documenting current behavior is better for compatibility. Agreed. Since is it pre-12 behavior, I suggest we just document it and not change it. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: to_timestamp docs
On Wed, May 1, 2019 at 11:20 PM Arthur Zakirov wrote: > Hello, > > On Wed, May 1, 2019 at 6:05 PM Bruce Momjian wrote: > > Thanks. I think I see the sentence you are thinking of: > > > >to_timestamp and to_date > >skip multiple blank spaces at the beginning of the input string > >and around date and time values unless the FX > >option is used. > > > > However, first, it is unclear what 'skip' means here, i.e., does it mean > > multiple blank spaces become a single space, or they are ignored. > > I worked at to_timestamp some time ago. In this case multiple bank spaces at > the beginning should be ignored. > > > Second, I see inconsistent behaviour around the use of FX for various > > patterns, e.g.: > > > > SELECT to_timestamp('5 1976','FXDD_FX'); > > to_timestamp > > > > 1976-01-05 00:00:00-05 > > Hm, I think strspace_len() is partly to blame here, which is called by > from_char_parse_int_len(): > > /* > * Skip any whitespace before parsing the integer. > */ > *src += strspace_len(*src); > > But even if you remove this line of code then strtol() will eat > survived whitespaces: > > result = strtol(init, src, 10); > > Not sure if we need some additional checks here if FX is set. I'd like to add that this behavior is not new in 12. It was the same before. > > It seems DD and (as numerics?) in FX mode eat trailing whitespace, > > while MON does not? Also, I used these queries to determine it is > > "trailing" whitespace that "FXMON" controls: > > > > SELECT to_timestamp('JUL JUL JUL','MON_FXMON_MON'); > > to_timestamp > > - > > 0001-07-01 00:00:00-04:56:02 BC > > > > SELECT to_timestamp('JUL JUL JUL','MON_FXMON_MON'); > > ERROR: invalid value " J" for "MON" > > DETAIL: The given value did not match any of the allowed values > > for this field. > > The problem here is that you need to specify FX only once and at beginning of > the format string. It is stated in the documentation: > > "FX must be specified as the first item in the template." > > It works globally (but only for remaining string if you don't put it > at the beginning) > and you can set it only once. For example: > > =# SELECT to_timestamp('JUL JUL JUL','FXMON_MON_MON'); > ERROR: invalid value " J" for "MON" > DETAIL: The given value did not match any of the allowed values for this > field. Actually, FX takes effect on subsequent format patterns. This is not documented, but it copycats Oracle behavior. Sure, normally FX should be specified as the first item. We could document current behavior or restrict specifying FX not as first item. This is also not new in 12, so documenting current behavior is better for compatibility. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)
Andres Freund writes: > On 2019-05-01 14:44:12 -0400, Tom Lane wrote: >> There might be an argument for treating rd_createSubid the same way, >> not sure. That field wouldn't ever be set on a system catalog, so >> it's less of a hazard, but there's something to be said for treating >> the two fields alike. > Seems sensible to treat it the same way. Not sure if there's an actual > problem where the current treatment could cause a problem, but seems > worth improving. So if I try to treat rd_createSubid the same way, it passes regression in normal mode, but CLOBBER_CACHE_ALWAYS is a total disaster: all table creations fail, eg regression=# CREATE TABLE BOOLTBL1 (f1 bool); psql: ERROR: relation 92773 deleted while still in use What is happening is that once heap_create has created a relcache entry for the new relation, any RelationCacheInvalidate call causes us to try to reload that relcache entry, and that fails because there's no pg_class or pg_attribute entries for the new relation yet. This gets back to the thing we were poking at in the other thread, which is whether or not a relcache entry is *always* reconstructible from on-disk state. I had in the back of my mind "um, what about initial table creation?" but hadn't looked closely. It seems that the only reason that it works is that RelationCacheInvalidate is unwilling to touch new-in-transaction relcache entries. The ideal fix for this, perhaps, would be to rejigger things far enough that we would not try to create a relcache entry for a new rel until it was supported by some minimal set of catalog entries. But that's more change than I really want to undertake right now, and for sure I wouldn't want to back-patch it. Anyway, I believe the original justification for skipping new-in-transaction entries here, which is that they couldn't possibly be subjects of any cross-backend notification messages. I think the argument that that applies to rd_newRelfilenodeSubid is a whole lot weaker though. But I'm going to let this go for the moment, because it's not clear whether a patch like this is actually relevant to our immediate problem. There seem to be some other order-of-operations issues in invalidation processing, and maybe fixing those would suffice. More later. regards, tom lane
Re: to_timestamp docs
On Wed, May 1, 2019 at 11:20:05PM +0300, Arthur Zakirov wrote: > Hello, > > On Wed, May 1, 2019 at 6:05 PM Bruce Momjian wrote: > > Thanks. I think I see the sentence you are thinking of: > > > >to_timestamp and to_date > >skip multiple blank spaces at the beginning of the input string > >and around date and time values unless the FX > >option is used. > > > > However, first, it is unclear what 'skip' means here, i.e., does it mean > > multiple blank spaces become a single space, or they are ignored. > > I worked at to_timestamp some time ago. In this case multiple bank spaces at > the beginning should be ignored. OK. > > Second, I see inconsistent behaviour around the use of FX for various > > patterns, e.g.: > > > > SELECT to_timestamp('5 1976','FXDD_FX'); > > to_timestamp > > > > 1976-01-05 00:00:00-05 > > Hm, I think strspace_len() is partly to blame here, which is called by > from_char_parse_int_len(): > > /* > * Skip any whitespace before parsing the integer. > */ > *src += strspace_len(*src); > > But even if you remove this line of code then strtol() will eat > survived whitespaces: > > result = strtol(init, src, 10); > > Not sure if we need some additional checks here if FX is set. Yes, I suspected it was part of the input function, but it seems it is done in two places. It seems we need the opposite of strspace_len() in that place to throw an error if we are in FX mode. > The problem here is that you need to specify FX only once and at beginning of > the format string. It is stated in the documentation: > > "FX must be specified as the first item in the template." Uh, FX certainly changes behavior if it isn't the first thing in the format string. > It works globally (but only for remaining string if you don't put it > at the beginning) Uh, then the documentation is wrong? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: to_timestamp docs
Hello, On Wed, May 1, 2019 at 6:05 PM Bruce Momjian wrote: > Thanks. I think I see the sentence you are thinking of: > >to_timestamp and to_date >skip multiple blank spaces at the beginning of the input string >and around date and time values unless the FX >option is used. > > However, first, it is unclear what 'skip' means here, i.e., does it mean > multiple blank spaces become a single space, or they are ignored. I worked at to_timestamp some time ago. In this case multiple bank spaces at the beginning should be ignored. > Second, I see inconsistent behaviour around the use of FX for various > patterns, e.g.: > > SELECT to_timestamp('5 1976','FXDD_FX'); > to_timestamp > > 1976-01-05 00:00:00-05 Hm, I think strspace_len() is partly to blame here, which is called by from_char_parse_int_len(): /* * Skip any whitespace before parsing the integer. */ *src += strspace_len(*src); But even if you remove this line of code then strtol() will eat survived whitespaces: result = strtol(init, src, 10); Not sure if we need some additional checks here if FX is set. > It seems DD and (as numerics?) in FX mode eat trailing whitespace, > while MON does not? Also, I used these queries to determine it is > "trailing" whitespace that "FXMON" controls: > > SELECT to_timestamp('JUL JUL JUL','MON_FXMON_MON'); > to_timestamp > - > 0001-07-01 00:00:00-04:56:02 BC > > SELECT to_timestamp('JUL JUL JUL','MON_FXMON_MON'); > ERROR: invalid value " J" for "MON" > DETAIL: The given value did not match any of the allowed values for > this field. The problem here is that you need to specify FX only once and at beginning of the format string. It is stated in the documentation: "FX must be specified as the first item in the template." It works globally (but only for remaining string if you don't put it at the beginning) and you can set it only once. For example: =# SELECT to_timestamp('JUL JUL JUL','FXMON_MON_MON'); ERROR: invalid value " J" for "MON" DETAIL: The given value did not match any of the allowed values for this field. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
pg_upgrade --clone error checking
With the new pg_upgrade --clone, if we are going to end up throwing the error "file cloning not supported on this platform" (which seems to depend only on ifdefs) I think we should throw it first thing, before any other checks are done and certainly before pg_dump gets run. This might result in some small amount of code duplication, but I think it would be worth the cost. For cases where we might throw "could not clone file between old and new data directories", I wonder if we shouldn't do some kind of dummy copy to catch that error earlier, as well. Maybe that one is not worth it. Cheers, Jeff
Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
Hi, On 2019-05-01 10:21:15 -0700, Andres Freund wrote: > FWIW, the dirty-hack version (attached) of the CommandCounterIncrement() > approach fixes the issue for a REINDEX pg_class_oid_index; in solation > even when using CCA. Started a whole CCA testrun with it, but the > results of that will obviously not be in quick. Not finished yet, but it got pretty far: parallel group (5 tests): create_index_spgist index_including_gist index_including create_view create_index create_index ... ok 500586 ms create_index_spgist ... ok86890 ms create_view ... ok 466512 ms index_including ... ok 150279 ms index_including_gist ... ok 109087 ms test reindex_catalog ... ok 2285 ms parallel group (16 tests): create_cast roleattributes drop_if_exists create_aggregate vacuum create_am hash_func select create_function_3 constraints typed_table rolenames errors updatable_views triggers inherit that's where it's at right now: parallel group (20 tests): init_privs security_label gin password drop_operator lock gist tablesample spgist Greetings, Andres Freund
Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)
Andres Freund writes: > On 2019-05-01 14:44:12 -0400, Tom Lane wrote: >> This seems quite wrong, because it prevents us from rebuilding the >> entry with corrected values. In particular notice that the change >> causes us to skip the RelationInitPhysicalAddr call that would >> normally be applied to a nailed mapped index in that loop. That's >> completely fatal in this case --- it keeps us from restoring the >> correct relfilenode that the mapper would now tell us, if we only >> bothered to ask. > Indeed. I'm a bit surprised that doesn't lead to more problems. > Not sure I understand where the RelationCacheInvalidate() call is coming > from in this case though. Shouldn't the entry have been invalidated > individually through ATEOXact_Inval(false)? In CLOBBER_CACHE_ALWAYS mode it's likely that we get a RelationCacheInvalidate call first. Note that the change I'm talking about here is not sufficient to fix the failure; there are more problems behind it. I just wanted to know if there was something I was missing about that old patch. regards, tom lane
Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
Andres Freund writes: > On 2019-05-01 10:06:03 -0700, Andres Freund wrote: >> I'm not sure this is the right short-term answer. Why isn't it, for now, >> sufficient to do what I suggested with RelationSetNewRelfilenode() not >> doing the CommandCounterIncrement(), and reindex_index() then doing the >> SetReindexProcessing() before a CommandCounterIncrement()? That's like >> ~10 line code change, and a few more with comments. That looks like a hack to me... The main thing I'm worried about right now is that I realized that our recovery from errors in this area is completely hosed, cf https://www.postgresql.org/message-id/4541.1556736...@sss.pgh.pa.us The problem with CCA is actually kind of convenient for testing that, since it means you don't have to inject any new fault to get an error to be thrown while the index relcache entry is in the needing-to-be- reverted state. So I'm going to work on fixing the recovery first. But I suspect that doing this right will require the more complicated approach anyway. regards, tom lane
Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)
Hi, On 2019-05-01 14:44:12 -0400, Tom Lane wrote: > I'm busy looking into the REINDEX-on-pg_class mess, and one thing I found > while testing HEAD is that with CLOBBER_CACHE_ALWAYS on, once you've > gotten a failure, your session is hosed: > > regression=# select 1; > ?column? > -- > 1 > (1 row) > > regression=# reindex index pg_class_oid_index; > psql: ERROR: could not read block 0 in file "base/16384/35344": read only 0 > of 8192 bytes > regression=# select 1; > psql: ERROR: could not open file "base/16384/35344": No such file or > directory Yea, I noticed that too. Lead me astray for a while, because it triggered apparent REINDEX failures for pg_index too, even though not actually related to the reindex. > The problem is that the nailed relcache entry for pg_class_oid_index got > updated to point to the new relfilenode, and that change did not get > undone by transaction abort. There seem to be several bugs contributing > to that, but the one I'm asking about right now is that commit ae9aba69a > caused RelationCacheInvalidate to skip over relations that have > rd_newRelfilenodeSubid set, as indeed pg_class_oid_index will have at the > instant of the failure. That commit message is quite unhelpful about what exactly this was intended to fix. I assume it was intended to make COPY FREEZE usage predictable, but... > This seems quite wrong, because it prevents us from rebuilding the > entry with corrected values. In particular notice that the change > causes us to skip the RelationInitPhysicalAddr call that would > normally be applied to a nailed mapped index in that loop. That's > completely fatal in this case --- it keeps us from restoring the > correct relfilenode that the mapper would now tell us, if we only > bothered to ask. Indeed. I'm a bit surprised that doesn't lead to more problems. Not sure I understand where the RelationCacheInvalidate() call is coming from in this case though. Shouldn't the entry have been invalidated individually through ATEOXact_Inval(false)? > I think perhaps what we should do is revert ae9aba69a and instead do > > relcacheInvalsReceived++; > > - if (RelationHasReferenceCountZero(relation)) > + if (RelationHasReferenceCountZero(relation) && > + relation->rd_newRelfilenodeSubid == InvalidSubTransactionId) > { > /* Delete this entry immediately */ > Assert(!relation->rd_isnailed); > > so that entries with rd_newRelfilenodeSubid nonzero are preserved but are > rebuilt. Seems like a reasonable approach. > There might be an argument for treating rd_createSubid the same way, > not sure. That field wouldn't ever be set on a system catalog, so > it's less of a hazard, but there's something to be said for treating > the two fields alike. Seems sensible to treat it the same way. Not sure if there's an actual problem where the current treatment could cause a problem, but seems worth improving. Greetings, Andres Freund
Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)
[ blast-from-the-past department ] Simon Riggs writes: > On 11 December 2012 03:01, Noah Misch wrote: >> Until these threads, I did not know that a relcache invalidation could trip >> up >> the WAL avoidance optimization, and now this. I poked at the relevant >> relcache.c code, and it already takes pains to preserve the needed facts. >> The >> header comment of RelationCacheInvalidate() indicates that entries bearing an >> rd_newRelfilenodeSubid can safely survive the invalidation, but the code does >> not implement that. I think the comment is right, and this is just an >> oversight in the code going back to its beginning (fba8113c). > I think the comment is right also and so the patch is good. I will > apply, barring objections. I'm busy looking into the REINDEX-on-pg_class mess, and one thing I found while testing HEAD is that with CLOBBER_CACHE_ALWAYS on, once you've gotten a failure, your session is hosed: regression=# select 1; ?column? -- 1 (1 row) regression=# reindex index pg_class_oid_index; psql: ERROR: could not read block 0 in file "base/16384/35344": read only 0 of 8192 bytes regression=# select 1; psql: ERROR: could not open file "base/16384/35344": No such file or directory The problem is that the nailed relcache entry for pg_class_oid_index got updated to point to the new relfilenode, and that change did not get undone by transaction abort. There seem to be several bugs contributing to that, but the one I'm asking about right now is that commit ae9aba69a caused RelationCacheInvalidate to skip over relations that have rd_newRelfilenodeSubid set, as indeed pg_class_oid_index will have at the instant of the failure. This seems quite wrong, because it prevents us from rebuilding the entry with corrected values. In particular notice that the change causes us to skip the RelationInitPhysicalAddr call that would normally be applied to a nailed mapped index in that loop. That's completely fatal in this case --- it keeps us from restoring the correct relfilenode that the mapper would now tell us, if we only bothered to ask. I think perhaps what we should do is revert ae9aba69a and instead do relcacheInvalsReceived++; - if (RelationHasReferenceCountZero(relation)) + if (RelationHasReferenceCountZero(relation) && + relation->rd_newRelfilenodeSubid == InvalidSubTransactionId) { /* Delete this entry immediately */ Assert(!relation->rd_isnailed); so that entries with rd_newRelfilenodeSubid nonzero are preserved but are rebuilt. There might be an argument for treating rd_createSubid the same way, not sure. That field wouldn't ever be set on a system catalog, so it's less of a hazard, but there's something to be said for treating the two fields alike. Thoughts? regards, tom lane
Re: Adding a test for speculative insert abort case
Hi, On 2019-04-30 18:34:42 -0700, Melanie Plageman wrote: > On Tue, Apr 30, 2019 at 5:22 PM Andres Freund wrote: > > > > > Not easily so - that's why the ON CONFLICT patch didn't add code > > coverage for it :(. I wonder if you could whip something up by having > > another non-unique expression index, where the expression acquires a > > advisory lock? If that advisory lock where previously acquired by > > another session, that should allow to write a reliable isolation test? > > > > > So, I took a look at one of the existing tests that does something like what > you mentioned and tried the following: > -- > create table t1(key int, val text); > create unique index t1_uniq_idx on t1(key); > create or replace function t1_lock_func(int) returns int immutable language > sql AS > 'select pg_advisory_xact_lock_shared(1); select $1'; > create index t1_lock_idx ON t1(t1_lock_func(key)); > -- > s1: > begin isolation level read committed; > insert into t1 values(1, 'someval'); > s2: > set default_transaction_isolation = 'read committed'; > insert into t1 values(1, 'anyval') on conflict(key) do update set val = > 'updatedval'; > -- > > So, the above doesn't work because s2 waits to acquire the lock in the first > phase of the speculative insert -- when it is just checking the index, > before > inserting to the table and before inserting to the index. Couldn't that be addressed by having t1_lock_func() acquire two locks? One for blocking during the initial index probe, and one for the speculative insertion? I'm imagining something like if (pg_try_advisory_xact_lock(1)) pg_advisory_xact_lock(2); else pg_advisory_xact_lock(1); in t1_lock_func. If you then make the session something roughly like s1: pg_advisory_xact_lock(1); s1: pg_advisory_xact_lock(2); s2: upsert t1 s1: pg_advisory_xact_unlock(1); s2: s2: s1: insert into t1 values(1, 'someval'); s1: pg_advisory_xact_unlock(2); s2: s2: spec-conflict Greetings, Andres Freund
Re: Adding a test for speculative insert abort case
On Tue, Apr 30, 2019 at 7:14 PM Thomas Munro wrote: > I think it'd be nice to have a set of macros that can create wait > points in the C code that isolation tests can control, in a special > build. Perhaps there could be shm hash table of named wait points in > shared memory; if DEBUG_WAIT_POINT("foo") finds that "foo" is not > present, it continues, but if it finds an entry it waits for it to go > away. Then isolation tests could add/remove names and signal a > condition variable to release waiters. > > I contemplated that while working on SKIP LOCKED, which had a bunch of > weird edge cases that I tested by inserting throw-away wait-point code > like this: > > > https://www.postgresql.org/message-id/CADLWmXXss83oiYD0pn_SfQfg%2ByNEpPbPvgDb8w6Fh--jScSybA%40mail.gmail.com > > Yes, I agree it would be nice to have a framework like this. Greenplum actually has a fault injection framework that, I believe, works similarly to what you are describing -- i.e. sets a variable in shared memory. There is an extension, gp_inject_fault, which allows you to set the faults. Andreas Scherbaum wrote a blog post about how to use it [1]. The Greenplum implementation is not documented particularly well in the code, but, it is something that folks working on Greenplum have talked about modifying and proposing to Postgres. [1] http://engineering.pivotal.io/post/testing_greenplum_database_using_fault_injection/ -- Melanie Plageman
Re: using index or check in ALTER TABLE SET NOT NULL
Hi > I proposed that we didn't need those messages at all, which Robert pushed > back on, but I'd be willing to compromise by reducing them to NOTICE or > DEBUG level. I posted patch with such change in a separate topic: https://commitfest.postgresql.org/23/2076/ PS: not think this is blocker for v12 regards, Sergei
Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
On 2019-05-01 10:06:03 -0700, Andres Freund wrote: > I'm not sure this is the right short-term answer. Why isn't it, for now, > sufficient to do what I suggested with RelationSetNewRelfilenode() not > doing the CommandCounterIncrement(), and reindex_index() then doing the > SetReindexProcessing() before a CommandCounterIncrement()? That's like > ~10 line code change, and a few more with comments. > > There is the danger that the current and above approach basically relies > on there not to be any non-inplace updates during reindex. But at the > moment code does take care to use inplace updates > (cf. index_update_stats()). > > It's not clear to me whether the approach of using > RelationSetIndexList() in reindex_index() would be meaningfully more > robust against non-inplace updates during reindex either - ISTM we'd > just as well skip the necessary index insertions if we hid the index > being rebuilt. Skipping to-be-rebuilt indexes works for > reindex_relation() because they're going to be rebuilt subsequently (and > thus the missing index rows don't matter) - but it'd not work for > reindexing a single index, because it'll not get the result at a later > stage. FWIW, the dirty-hack version (attached) of the CommandCounterIncrement() approach fixes the issue for a REINDEX pg_class_oid_index; in solation even when using CCA. Started a whole CCA testrun with it, but the results of that will obviously not be in quick. Greetings, Andres Freund diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index a05b6a07ad0..6698262e202 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -6108,6 +6108,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, /* Process xmin */ xid = HeapTupleHeaderGetXmin(tuple); + if (xid == FrozenTransactionId) xmin_frozen = ((xid == FrozenTransactionId) || HeapTupleHeaderXminFrozen(tuple)); if (TransactionIdIsNormal(xid)) diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index ce024101fc9..98f26e13627 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -3344,6 +3344,8 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence, /* Suppress use of the target index while rebuilding it */ SetReindexProcessing(heapId, indexId); + CommandCounterIncrement(); + /* Initialize the index and rebuild */ /* Note: we do not need to re-establish pkey setting */ index_build(heapRelation, iRel, indexInfo, true, true); diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c index e9add1b9873..f0a4bac75f6 100644 --- a/src/backend/commands/sequence.c +++ b/src/backend/commands/sequence.c @@ -315,6 +315,7 @@ ResetSequence(Oid seq_relid) * Create a new storage file for the sequence. */ RelationSetNewRelfilenode(seq_rel, seq_rel->rd_rel->relpersistence); + CommandCounterIncrement(); /* * Ensure sequence's relfrozenxid is at 0, since it won't contain any @@ -490,6 +491,7 @@ AlterSequence(ParseState *pstate, AlterSeqStmt *stmt) * changes transactional. */ RelationSetNewRelfilenode(seqrel, seqrel->rd_rel->relpersistence); + CommandCounterIncrement(); /* * Ensure sequence's relfrozenxid is at 0, since it won't contain any diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index f6c9cf3b0ec..eef922d8718 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -1791,6 +1791,8 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged, table_close(toastrel, NoLock); } + CommandCounterIncrement(); + /* * Reconstruct the indexes to match, and we're done. */ diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index 90ff8ccf54f..2bdfb2d5a9c 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -3541,7 +3541,7 @@ RelationSetNewRelfilenode(Relation relation, char persistence) * Make the pg_class row change visible, as well as the relation map * change if any. This will cause the relcache entry to get updated, too. */ - CommandCounterIncrement(); + //CommandCounterIncrement(); /* * Mark the rel as having been given a new relfilenode in the current
Re: walsender vs. XLogBackgroundFlush during shutdown
Hi, On Wed, 1 May 2019 at 17:02, Tomas Vondra wrote: > OK, so that seems like a separate issue, somewhat unrelated to the issue I > reported. And I'm not sure it's a walsender issue - it seems it might be a > psycopg2 issue in not reporting the flush properly, no? Agree, it is a different issue, but I am unsure what to blame, postgres or psycopg2. Right now in the psycopg2 we confirm more or less every XLogData message, but at the same time LSN on the primary is moving forward and we get updates with keepalive messages. I perfectly understand the need to periodically send updates of flush = walEnd (which comes from keepalive). It might happen that there is no transaction activity but WAL is still generated and as a result replication slot will prevent WAL from being cleaned up. >From the client side perspective, it confirmed everything that it should, but from the postgres side, this is not enough to shut down cleanly. Maybe it is possible to change the check (sentPtr == replicatedPtr) to something like (lastMsgSentPtr <= replicatedPtr) or it would be unsafe? > >Actually, the same problem will happen even in the case when the > >consumer script receives some message, but not very intensively, but > >it is just a bit harder to reproduce it. > > > >If you attach to the walsender with gdb, you'll see the following picture: > >(gdb) bt > >#0 0x7fd6623d296a in __libc_send (fd=8, buf=0x55cb958dca08, > >len=94, flags=0) at ../sysdeps/unix/sysv/linux/send.c:28 > >#1 0x55cb93aa7ce9 in secure_raw_write (port=0x55cb958d71e0, > >ptr=0x55cb958dca08, len=94) at be-secure.c:318 > >#2 0x55cb93aa7b87 in secure_write (port=0x55cb958d71e0, > >ptr=0x55cb958dca08, len=94) at be-secure.c:265 > >#3 0x55cb93ab6bf9 in internal_flush () at pqcomm.c:1433 > >#4 0x55cb93ab6b89 in socket_flush () at pqcomm.c:1409 > >#5 0x55cb93dac30b in send_message_to_frontend > >(edata=0x55cb942b4380 ) at elog.c:3317 > >#6 0x55cb93da8973 in EmitErrorReport () at elog.c:1481 > >#7 0x55cb93da5abf in errfinish (dummy=0) at elog.c:481 > >#8 0x55cb93da852d in elog_finish (elevel=13, fmt=0x55cb93f32de3 > >"sending replication keepalive") at elog.c:1376 > >#9 0x55cb93bcae71 in WalSndKeepalive (requestReply=true) at > >walsender.c:3358 > > Is it stuck in the send() call forever, or did you happen to grab > this bracktrace? No, it didn't stuck there. During the shutdown postgres starts sending a few thousand keepalive messages per second and receives back so many feedback message, therefore the chances of interrupting somewhere in the send are quite high. The loop never breaks because psycopg2 is always replying with the same flush as very last time, which was set during processing of XLogData message. > I think having a report of an issue, with a way to reproduce it is a first > (and quite important) step. So thanks for doing that. > > That being said, I think those are two separate issues, with different > causes and likely different fixes. I don't think fixing the xlog flush > will resolve your issue, and vice versa. Agree, these are different issues. Regards, -- Alexander Kukushkin
Re: pgsql: Compute XID horizon for page level index vacuum on primary.
On Wed, May 1, 2019 at 12:50 PM Tom Lane wrote: > > Not strongly enough to argue about it very hard. The current behavior > > is a little weird, but it's a long way from being the weirdest thing > > we ship, and it appears that we have no tangible evidence that it > > causes a problem in practice. > > I think there's nothing that fails to suck about a hardwired "+ 10". It avoids a performance regression without adding another GUC. That may not be enough reason to keep it like that, but it is one thing that does fail to suck. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: hyrax vs. RelationBuildPartitionDesc
On Wed, May 1, 2019 at 11:59 AM Andres Freund wrote: > The message I'm replying to is marked as an open item. Robert, what do > you think needs to be done here before release? Could you summarize, > so we then can see what others think? The original issue on this thread was that hyrax started running out of memory when it hadn't been doing so previously. That happened because, for complicated reasons, commit 898e5e3290a72d288923260143930fb32036c00c resulted in the leak being hit lots of times in CLOBBER_CACHE_ALWAYS builds instead of just once. Commits 2455ab48844c90419714e27eafd235a85de23232 and d3f48dfae42f9655425d1f58f396e495c7fb7812 fixed that problem. In the email at issue, Tom complains about two things. First, he complains about the fact that I try to arrange things so that relcache data remains valid for as long as required instead of just copying it. Second, he complains about the fact repeated ATTACH and DETACH PARTITION operations can produce a slow session-lifespan memory leak. I think it's reasonable to fix the second issue for v12. I am not sure how important it is, because (1) the leak is small, (2) it seems unlikely that anyone would execute enough ATTACH/DETACH PARTITION operations in one backend for the leak to amount to anything significant, and (3) if a relcache flush ever happens when you don't have the relation open, all of the "leaked" memory will be un-leaked. However, I believe that we could fix it as follows. First, invent rd_pdoldcxt and put the first old context there; if that pointer is already in use, then parent the new context under the old one. Second, in RelationDecrementReferenceCount, if the refcount hits 0, nuke rd_pdoldcxt and set the pointer back to NULL. With that change, you would only keep the old PartitionDesc around until the ref count hits 0, whereas at present, you keep the old PartitionDesc around until an invalidation happens while the ref count is 0. I think the first issue is not v12 material. Tom proposed to fix it by copying all the relevant data out of the relcache, but his own performance results show a pretty significant hit, and AFAICS he hasn't pointed to anything that's actually broken by the current approach. What I think should be done is actually generalize the approach I took in this patch, so that instead of the partition directory holding a reference count, the planner or executor holds one. Then not only could people who want information about the PartitionDesc avoid copying stuff from the relcache, but maybe other things as well. I think that would be significantly better than continuing to double-down on the current copy-everything approach, which really only works well in a world where a table can't have all that much metadata, which is clearly not true for PostgreSQL any more. I'm not sure that Tom is actually opposed to this approach -- although I may have misunderstood his position -- but where we disagree, I think, is that I see what I did in this commit as a stepping-stone toward a better world, and he sees it as something that should be killed with fire until that better world has fully arrived. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
Hi, On 2019-05-01 12:20:22 -0400, Tom Lane wrote: > Andres Freund writes: > > I see the bug. Turns out we need to figure out another way to solve the > > assertion triggered by doing catalog updates within > > RelationSetNewRelfilenode() - we can't just move the > > SetReindexProcessing() before it. When CCA is enabled, the > > CommandCounterIncrement() near the tail of RelationSetNewRelfilenode() > > triggers a rebuild of the catalog entries - but without the > > SetReindexProcessing() those scans will try to use the index currently > > being rebuilt. > > Yeah. I think what this demonstrates is that REINDEX INDEX has to have > RelationSetIndexList logic similar to what REINDEX TABLE has, to control > which indexes get updated when while we're rebuilding an index of > pg_class. In hindsight that seems glaringly obvious ... I wonder how we > missed that when we built that infrastructure for REINDEX TABLE? I'm not sure this is the right short-term answer. Why isn't it, for now, sufficient to do what I suggested with RelationSetNewRelfilenode() not doing the CommandCounterIncrement(), and reindex_index() then doing the SetReindexProcessing() before a CommandCounterIncrement()? That's like ~10 line code change, and a few more with comments. There is the danger that the current and above approach basically relies on there not to be any non-inplace updates during reindex. But at the moment code does take care to use inplace updates (cf. index_update_stats()). It's not clear to me whether the approach of using RelationSetIndexList() in reindex_index() would be meaningfully more robust against non-inplace updates during reindex either - ISTM we'd just as well skip the necessary index insertions if we hid the index being rebuilt. Skipping to-be-rebuilt indexes works for reindex_relation() because they're going to be rebuilt subsequently (and thus the missing index rows don't matter) - but it'd not work for reindexing a single index, because it'll not get the result at a later stage. > I'm pretty sure that infrastructure is my fault, so I'll take a > whack at fixing this. > > Did you figure out why this doesn't also happen in HEAD? It does for me now, at least when just doing a reindex in isolation (CCA tests would have taken too long last night). I'm not sure why I wasn't previously able to trigger it and markhor hasn't run yet on master. Greetings, Andres Freund
Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
I wrote: > Did you figure out why this doesn't also happen in HEAD? ... actually, HEAD *is* broken with CCA, just differently. I'm on it. regards, tom lane
Re: pgsql: Compute XID horizon for page level index vacuum on primary.
Robert Haas writes: > On Wed, May 1, 2019 at 12:15 PM Andres Freund wrote: >> My current inclination is to not do anything for v12. Robert, do you >> disagree? > Not strongly enough to argue about it very hard. The current behavior > is a little weird, but it's a long way from being the weirdest thing > we ship, and it appears that we have no tangible evidence that it > causes a problem in practice. I think there's nothing that fails to suck about a hardwired "+ 10". We should either remove that and use effective_io_concurrency as-is, or decide that it's worth having a separate GUC for bulk operations. At this stage of the cycle I'd incline to the former, but if somebody is excited enough to prepare a patch for a new GUC, I wouldn't push back on that solution. regards, tom lane
Re: walsender vs. XLogBackgroundFlush during shutdown
On Wed, May 01, 2019 at 08:53:15AM -0700, Andres Freund wrote: Hi, On 2019-05-01 02:28:45 +0200, Tomas Vondra wrote: The reason for that is pretty simple - the walsenders are doing logical decoding, and during shutdown they're waiting for WalSndCaughtUp=true. But per XLogSendLogical() this only happens if if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr()) { WalSndCaughtUp = true; ... } That is, we need to get beyong GetFlushRecPtr(). But that may never happen, because there may be incomplete (and unflushed) page in WAL buffers, not forced out by any transaction. So if there's a WAL record overflowing to the unflushed page, the walsender will never catch up. Now, this situation is apparently expected, because WalSndWaitForWal() does this: /* * If we're shutting down, trigger pending WAL to be written out, * otherwise we'd possibly end up waiting for WAL that never gets * written, because walwriter has shut down already. */ if (got_STOPPING) XLogBackgroundFlush(); but unfortunately that does not actually do anything, because right at the very beginning XLogBackgroundFlush() does this: /* back off to last completed page boundary */ WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ; That is, it intentionally ignores the incomplete page, which means the walsender can't read the record and reach GetFlushRecPtr(). XLogBackgroundFlush() does this since (at least) 2007, so how come we never had issues with this before? I assume that's because of the following logic: /* if we have already flushed that far, consider async commit records */ if (WriteRqst.Write <= LogwrtResult.Flush) { SpinLockAcquire(&XLogCtl->info_lck); WriteRqst.Write = XLogCtl->asyncXactLSN; SpinLockRelease(&XLogCtl->info_lck); flexible = false; /* ensure it all gets written */ } and various pieces of the code doing XLogSetAsyncXactLSN() to force flushing. I wonder if the issue is that we're better at avoiding unnecessary WAL to be written due to 6ef2eba3f57f17960b7cd4958e18aa79e357de2f I don't think so, because (a) there are no async commits involved, and (b) we originally ran into the issue on 9.6 and 6ef2eba3f57f1 is only in 10+. But I don't think we're safe without the failover slots patch, because any output plugin can do something similar - say, LogLogicalMessage() or something like that. I'm not aware of a plugin doing such things, but I don't think it's illegal / prohibited either. (Of course, plugins that generate WAL won't be useful for decoding on standby in the future.) FWIW, I'd consider such an output plugin outright broken. Why? Is that prohibited somewhere, either explicitly or implicitly? I do see obvious issues with generating WAL from plugin (infinite cycle and so on), but I suppose that's more a regular coding issue than something that would make all plugins doing that broken. FWIW I don't see any particular need to generate WAL from output plugins, I mentioned it mostly jsut as a convenient way to trigger the issue. I suppose there are other ways to generate a bit of WAL during shutdown. So what I think we should do is to tweak XLogBackgroundFlush() so that during shutdown it skips the back-off to page boundary, and flushes even the last piece of WAL. There are only two callers anyway, so something like XLogBackgroundFlush(bool) would be simple enough. I think it then just ought to be a normal XLogFlush(). I.e. something along the lines of: /* * If we're shutting down, trigger pending WAL to be written out, * otherwise we'd possibly end up waiting for WAL that never gets * written, because walwriter has shut down already. */ if (got_STOPPING && !RecoveryInProgress()) XLogFlush(GetXLogInsertRecPtr()); /* Update our idea of the currently flushed position. */ if (!RecoveryInProgress()) RecentFlushPtr = GetFlushRecPtr(); else RecentFlushPtr = GetXLogReplayRecPtr(NULL); Perhaps. That would work too, I guess. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: New vacuum option to do only freezing
On Wed, May 1, 2019 at 12:14 PM Andres Freund wrote: > On 2019-04-06 16:13:53 +0900, Michael Paquier wrote: > > On Sat, Apr 06, 2019 at 11:31:31AM +0900, Masahiko Sawada wrote: > > > Yes, but Fujii-san pointed out that this option doesn't support toast > > > tables and I think there is not specific reason why not supporting > > > them. So it might be good to add toast.vacuum_index_cleanup. Attached > > > patch. > > > > Being able to control that option at toast level sounds sensible. I > > have added an open item about that. > > Robert, what is your stance on this open item? It's been an open item > for about three weeks, without any progress. The actual bug in this patch needs to be fixed, but I see we have another open item for that. This open item, as I understand it, is all about whether we should add another reloption so that you can control this behavior separately for TOAST tables. In my opinion, that's not a critical change and the open item should be dropped, but others might see it differently. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: using index or check in ALTER TABLE SET NOT NULL
Andres Freund writes: > There's still an open item "Debate INFO messages in ATTACH PARTITION and > SET NOT NULL" for this thread. Is there anything further to be done? Or > are we content with this for v12? IIRC, that's based on my complaint that these messages have no business being INFO level. I'm definitely not content with the current state. I proposed that we didn't need those messages at all, which Robert pushed back on, but I'd be willing to compromise by reducing them to NOTICE or DEBUG level. The actually intended use of INFO level is to mark messages that the user actively asked for (via VERBOSE or some morally-equivalent option), which is why that level has such high priority. If we had something like a VERBOSE option for ALTER TABLE, it'd make plenty of sense to follow VACUUM's precedent: if (params->options & VACOPT_VERBOSE) elevel = INFO; else elevel = DEBUG2; It seems a bit late to be inventing such a thing for v12 though. In the meantime, I'm not very content with random subsets of ALTER TABLE acting as if they have been granted permission to be VERBOSE. It's unfriendly, and it's inconsistent because there are so many other expensive ALTER TABLE actions that don't do this. regards, tom lane
Re: pgsql: Compute XID horizon for page level index vacuum on primary.
On Wed, May 1, 2019 at 12:15 PM Andres Freund wrote: > On 2019-04-01 18:26:59 -0700, Andres Freund wrote: > > I'm not yet convinced it's necessary to create a new GUC, but also not > > strongly opposed. I've created an open items issue for it, so we don't > > forget. > > My current inclination is to not do anything for v12. Robert, do you > disagree? Not strongly enough to argue about it very hard. The current behavior is a little weird, but it's a long way from being the weirdest thing we ship, and it appears that we have no tangible evidence that it causes a problem in practice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
Andres Freund writes: > I see the bug. Turns out we need to figure out another way to solve the > assertion triggered by doing catalog updates within > RelationSetNewRelfilenode() - we can't just move the > SetReindexProcessing() before it. When CCA is enabled, the > CommandCounterIncrement() near the tail of RelationSetNewRelfilenode() > triggers a rebuild of the catalog entries - but without the > SetReindexProcessing() those scans will try to use the index currently > being rebuilt. Yeah. I think what this demonstrates is that REINDEX INDEX has to have RelationSetIndexList logic similar to what REINDEX TABLE has, to control which indexes get updated when while we're rebuilding an index of pg_class. In hindsight that seems glaringly obvious ... I wonder how we missed that when we built that infrastructure for REINDEX TABLE? I'm pretty sure that infrastructure is my fault, so I'll take a whack at fixing this. Did you figure out why this doesn't also happen in HEAD? regards, tom lane
Re: using index or check in ALTER TABLE SET NOT NULL
On 2019-03-25 11:09:47 -0400, Robert Haas wrote: > On Mon, Mar 25, 2019 at 3:51 AM David Steele wrote: > > On 3/17/19 3:40 PM, David Rowley wrote: > > > On Thu, 14 Mar 2019 at 06:24, Robert Haas wrote: > > > > > > I think we can mark this patch as committed now as I don't think the > > > lack of a way to test it is likely to cause it to be reverted. > > > > Should we mark this as committed now or is there more tweaking to be done? > > I have now marked it as Committed. There's still an open item "Debate INFO messages in ATTACH PARTITION and SET NOT NULL" for this thread. Is there anything further to be done? Or are we content with this for v12? - Andres
Re: Unhappy about API changes in the no-fsm-for-small-rels patch
On Wed, May 1, 2019 at 09:08:54AM -0700, Andres Freund wrote: > So sure, there's a few typo fixes, one bugfix, and one buildfarm test > stability issue. Doesn't seem crazy for a nontrivial improvement. OK, my ignorant opinion was just based on the length of discussion threads. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: compiler warning in pgcrypto imath.c
Hi Noah, On 2019-03-23 00:02:36 -0700, Noah Misch wrote: > On Sat, Mar 23, 2019 at 10:20:16AM +0900, Michael Paquier wrote: > > On Fri, Mar 22, 2019 at 08:20:53PM -0400, Jeff Janes wrote: > > > PostgreSQL 12devel on aarch64-unknown-linux-gnu, compiled by gcc > > > (Ubuntu/Linaro 7.3.0-27ubuntu1~18.04) 7.3.0, 64-bit > > > > Adding Noah in CC as he has done the update of imath lately. > > > > > The attached patch adds PG_USED_FOR_ASSERTS_ONLY to silence it. Perhaps > > > there is a better way, given that we want to change imath.c as little as > > > possible from its upstream? > > > > Maybe others have better ideas, but marking the variable with > > PG_USED_FOR_ASSERTS_ONLY as you propose seems like the least invasive > > method of all. > > That patch looks good. Thanks. The main alternative would be to pass > -Wno-unused for this file. Since you're proposing only one instance > PG_USED_FOR_ASSERTS_ONLY, I favor PG_USED_FOR_ASSERTS_ONLY over -Wno-unused. This is marked as an open item, owned by you. Could you commit the patch or otherwise resovle the issue? Greetings, Andres Freund
Re: pgsql: Compute XID horizon for page level index vacuum on primary.
Hi, On 2019-04-01 18:26:59 -0700, Andres Freund wrote: > I'm not yet convinced it's necessary to create a new GUC, but also not > strongly opposed. I've created an open items issue for it, so we don't > forget. My current inclination is to not do anything for v12. Robert, do you disagree? Greetings, Andres Freund
Re: New vacuum option to do only freezing
Hi, On 2019-04-06 16:13:53 +0900, Michael Paquier wrote: > On Sat, Apr 06, 2019 at 11:31:31AM +0900, Masahiko Sawada wrote: > > Yes, but Fujii-san pointed out that this option doesn't support toast > > tables and I think there is not specific reason why not supporting > > them. So it might be good to add toast.vacuum_index_cleanup. Attached > > patch. > > Being able to control that option at toast level sounds sensible. I > have added an open item about that. Robert, what is your stance on this open item? It's been an open item for about three weeks, without any progress. Greetings, Andres Freund
Re: reloption to prevent VACUUM from truncating empty pages at the end of relation
Hi, On 2019-04-08 19:22:04 +0900, Fujii Masao wrote: > On Mon, Apr 8, 2019 at 5:30 PM Masahiko Sawada wrote: > > "TRUNCATE" option for vacuum command should be added to the open items? > > Yes, I think. > Attached is the patch which adds TRUNCATE option into VACUUM. This has been an open item for about three weeks. Please work on resolving this soon. - Andres
Re: Unhappy about API changes in the no-fsm-for-small-rels patch
Hi, On 2019-05-01 11:28:11 -0400, Bruce Momjian wrote: > On Wed, May 1, 2019 at 08:24:25AM -0700, Andres Freund wrote: > > Hi, > > > > On 2019-04-18 14:10:29 -0700, Andres Freund wrote: > > > My compromise suggestion would be to try to give John and Amit ~2 weeks > > > to come up with a cleanup proposal, and then decide whether to 1) revert > > > 2) apply the new patch, 3) decide to live with the warts for 12, and > > > apply the patch in 13. As we would already have a patch, 3) seems like > > > it'd be more tenable than without. > > > > I think decision time has come. My tentative impression is that we're > > not there yet, and should revert the improvements in v12, and apply the > > improved version early in v13. As a second choice, we should live with > > the current approach, if John and Amit "promise" further effort to clean > > this up for v13. > > My ignorant opinion is that I have been surprised by the churn caused by > this change, and have therefore questioned the value of it. Hm, I don't think there has been that much churn? Sure, there was a revert to figure out a regression test instability, but that doesn't seem that bad. Relevant commits in date order are: andres-classification: cleanup commit 06c8a5090ed9ec188557a86d4de11384f5128ec0 Author: Amit Kapila Date: 2019-03-16 06:55:56 +0530 Improve code comments in b0eaa4c51b. Author: John Naylor Discussion: https://postgr.es/m/CACPNZCswjyGJxTT=mxHgK=Z=mJ9uJ4WEx_UO=bnwpr_i0ea...@mail.gmail.com andres-classification: incremental improvement commit 13e8643bfc29d3c1455c0946281cdfc24758ffb6 Author: Amit Kapila Date: 2019-03-15 08:25:57 +0530 During pg_upgrade, conditionally skip transfer of FSMs. andres-classification: additional tests commit 6f918159a97acf76ee2512e44f5ed5dcaaa0d923 Author: Amit Kapila Date: 2019-03-12 08:14:28 +0530 Add more tests for FSM. andres-classification: cleanup commit a6e48da08844eeb5a72c8b59dad3aaab6e891fac Author: Amit Kapila Date: 2019-03-11 08:16:14 +0530 Fix typos in commit 8586bf7ed8. andres-classification: bugfix commit 9c32e4c35026bd52aaf340bfe7594abc653e42f0 Author: Amit Kapila Date: 2019-03-01 07:38:47 +0530 Clear the local map when not used. andres-classification: docs addition commit 29d108cdecbe918452e70041d802cc515b2d56b8 Author: Amit Kapila Date: 2019-02-20 17:37:39 +0530 Doc: Update the documentation for FSM behavior for small tables. andres-classification: regression test stability commit 08ecdfe7e5e0a31efbe1d58fefbe085b53bc79ca Author: Amit Kapila Date: 2019-02-04 10:08:29 +0530 Make FSM test portable. andres-classification: feature commit b0eaa4c51bbff3e3c600b11e5d104d6feb9ca77f Author: Amit Kapila Date: 2019-02-04 07:49:15 +0530 Avoid creation of the free space map for small heap relations, take 2. So sure, there's a few typo fixes, one bugfix, and one buildfarm test stability issue. Doesn't seem crazy for a nontrivial improvement. > Frankly, there has been so much churn I am unclear if it can be easily > reverted. Doesn't seem that hard? There's some minor conflicts, but nothing bad? Greetings, Andres Freund
Re: hyrax vs. RelationBuildPartitionDesc
Hi The message I'm replying to is marked as an open item. Robert, what do you think needs to be done here before release? Could you summarize, so we then can see what others think? Greetings, Andres Freund
Re: walsender vs. XLogBackgroundFlush during shutdown
Hi, On 2019-05-01 02:28:45 +0200, Tomas Vondra wrote: > The reason for that is pretty simple - the walsenders are doing logical > decoding, and during shutdown they're waiting for WalSndCaughtUp=true. > But per XLogSendLogical() this only happens if > >if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr()) >{ >WalSndCaughtUp = true; >... >} > > That is, we need to get beyong GetFlushRecPtr(). But that may never > happen, because there may be incomplete (and unflushed) page in WAL > buffers, not forced out by any transaction. So if there's a WAL record > overflowing to the unflushed page, the walsender will never catch up. > > Now, this situation is apparently expected, because WalSndWaitForWal() > does this: > >/* > * If we're shutting down, trigger pending WAL to be written out, > * otherwise we'd possibly end up waiting for WAL that never gets > * written, because walwriter has shut down already. > */ >if (got_STOPPING) >XLogBackgroundFlush(); > > but unfortunately that does not actually do anything, because right at > the very beginning XLogBackgroundFlush() does this: > >/* back off to last completed page boundary */ >WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ; > That is, it intentionally ignores the incomplete page, which means the > walsender can't read the record and reach GetFlushRecPtr(). > > XLogBackgroundFlush() does this since (at least) 2007, so how come we > never had issues with this before? I assume that's because of the following logic: /* if we have already flushed that far, consider async commit records */ if (WriteRqst.Write <= LogwrtResult.Flush) { SpinLockAcquire(&XLogCtl->info_lck); WriteRqst.Write = XLogCtl->asyncXactLSN; SpinLockRelease(&XLogCtl->info_lck); flexible = false; /* ensure it all gets written */ } and various pieces of the code doing XLogSetAsyncXactLSN() to force flushing. I wonder if the issue is that we're better at avoiding unnecessary WAL to be written due to 6ef2eba3f57f17960b7cd4958e18aa79e357de2f > But I don't think we're safe without the failover slots patch, because > any output plugin can do something similar - say, LogLogicalMessage() or > something like that. I'm not aware of a plugin doing such things, but I > don't think it's illegal / prohibited either. (Of course, plugins that > generate WAL won't be useful for decoding on standby in the future.) FWIW, I'd consider such an output plugin outright broken. > So what I think we should do is to tweak XLogBackgroundFlush() so that > during shutdown it skips the back-off to page boundary, and flushes even > the last piece of WAL. There are only two callers anyway, so something > like XLogBackgroundFlush(bool) would be simple enough. I think it then just ought to be a normal XLogFlush(). I.e. something along the lines of: /* * If we're shutting down, trigger pending WAL to be written out, * otherwise we'd possibly end up waiting for WAL that never gets * written, because walwriter has shut down already. */ if (got_STOPPING && !RecoveryInProgress()) XLogFlush(GetXLogInsertRecPtr()); /* Update our idea of the currently flushed position. */ if (!RecoveryInProgress()) RecentFlushPtr = GetFlushRecPtr(); else RecentFlushPtr = GetXLogReplayRecPtr(NULL); Greetings, Andres Freund
Re: New vacuum option to do only freezing
Hi, This thread is referenced an open item, and we ought to make some progress on it. On a more cosmetic note: On 2019-04-16 09:10:19 -0700, Andres Freund wrote: > On 2019-04-16 12:01:36 -0400, Tom Lane wrote: > > (BTW, I don't understand why that code will throw "found xmin %u from > > before relfrozenxid %u" if HeapTupleHeaderXminFrozen is true? Shouldn't > > the whole if-branch at lines 6113ff be skipped if xmin_frozen?) > > I *think* that just looks odd, but isn't actively wrong. That's because > TransactionIdIsNormal() won't trigger, as: > > #define HeapTupleHeaderGetXmin(tup) \ > ( \ > HeapTupleHeaderXminFrozen(tup) ? \ > FrozenTransactionId : HeapTupleHeaderGetRawXmin(tup) \ > ) > > which afaict makes > xmin_frozen = ((xid == FrozenTransactionId) || > HeapTupleHeaderXminFrozen(tuple)); > redundant. > > Looks like that was introduced relatively recently, in > > commit d2599ecfcc74fea9fad1720a70210a740c716730 > Author: Alvaro Herrera > Date: 2018-05-04 15:24:44 -0300 > > Don't mark pages all-visible spuriously > > > @@ -6814,6 +6815,8 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, > > /* Process xmin */ > xid = HeapTupleHeaderGetXmin(tuple); > + xmin_frozen = ((xid == FrozenTransactionId) || > + HeapTupleHeaderXminFrozen(tuple)); > if (TransactionIdIsNormal(xid)) > { > if (TransactionIdPrecedes(xid, relfrozenxid)) > @@ -6832,9 +6835,8 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, > > frz->t_infomask |= HEAP_XMIN_FROZEN; > changed = true; > + xmin_frozen = true; > } > - else > - totally_frozen = false; > } Alvaro, could we perhaps clean this up a bit? This is pretty confusing looking. I think this probably could just be changed to boolxmin_frozen = false; xid = HeapTupleHeaderGetXmin(tuple); if (xid == FrozenTransactionId) xmin_frozen = true; else if (TransactionIdIsNormal(xid)) or somesuch. There's no need to check for HeapTupleHeaderXminFrozen(tuple) etc, because HeapTupleHeaderGetXmin() already does so - and if it didn't, the issue Tom points out would be problematic. Greetings, Andres Freund
Re: Unhappy about API changes in the no-fsm-for-small-rels patch
On Wed, May 1, 2019 at 08:24:25AM -0700, Andres Freund wrote: > Hi, > > On 2019-04-18 14:10:29 -0700, Andres Freund wrote: > > My compromise suggestion would be to try to give John and Amit ~2 weeks > > to come up with a cleanup proposal, and then decide whether to 1) revert > > 2) apply the new patch, 3) decide to live with the warts for 12, and > > apply the patch in 13. As we would already have a patch, 3) seems like > > it'd be more tenable than without. > > I think decision time has come. My tentative impression is that we're > not there yet, and should revert the improvements in v12, and apply the > improved version early in v13. As a second choice, we should live with > the current approach, if John and Amit "promise" further effort to clean > this up for v13. My ignorant opinion is that I have been surprised by the churn caused by this change, and have therefore questioned the value of it. Frankly, there has been so much churn I am unclear if it can be easily reverted. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Unhappy about API changes in the no-fsm-for-small-rels patch
Hi, On 2019-04-18 14:10:29 -0700, Andres Freund wrote: > My compromise suggestion would be to try to give John and Amit ~2 weeks > to come up with a cleanup proposal, and then decide whether to 1) revert > 2) apply the new patch, 3) decide to live with the warts for 12, and > apply the patch in 13. As we would already have a patch, 3) seems like > it'd be more tenable than without. I think decision time has come. My tentative impression is that we're not there yet, and should revert the improvements in v12, and apply the improved version early in v13. As a second choice, we should live with the current approach, if John and Amit "promise" further effort to clean this up for v13. Greetings, Andres Freund
Re: to_timestamp docs
On Wed, May 1, 2019 at 10:01:50AM -0400, Tom Lane wrote: > Bruce Momjian writes: > > I don't think the changes made in PG 12 are documented accurately. > > That code is swapped out of my head at the moment, but it looks > to me like the para before the one you changed is where we discuss > the behavior for whitespace. I'm not sure that this change is > right, or an improvement, in the context of both paras. Thanks. I think I see the sentence you are thinking of: to_timestamp and to_date skip multiple blank spaces at the beginning of the input string and around date and time values unless the FX option is used. However, first, it is unclear what 'skip' means here, i.e., does it mean multiple blank spaces become a single space, or they are ignored. That should be clarified, though I am unclear if that matters based on how separators are handled. Also, I think "blank spaces" should be "whitespace". Second, I see inconsistent behaviour around the use of FX for various patterns, e.g.: SELECT to_timestamp('5 1976','FXDD_FX'); to_timestamp 1976-01-05 00:00:00-05 SELECT to_timestamp('JUL JUL','FXMON_FXMON'); to_timestamp - 0001-07-01 00:00:00-04:56:02 BC SELECT to_timestamp('JULJUL','FXMON_FXMON'); ERROR: invalid value " " for "MON" DETAIL: The given value did not match any of the allowed values for this field. It seems DD and (as numerics?) in FX mode eat trailing whitespace, while MON does not? Also, I used these queries to determine it is "trailing" whitespace that "FXMON" controls: SELECT to_timestamp('JUL JUL JUL','MON_FXMON_MON'); to_timestamp - 0001-07-01 00:00:00-04:56:02 BC SELECT to_timestamp('JUL JUL JUL','MON_FXMON_MON'); ERROR: invalid value " J" for "MON" DETAIL: The given value did not match any of the allowed values for this field. Once we figure out how it is behaving I think we can pull together the FX text above to reference the separator text below. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: walsender vs. XLogBackgroundFlush during shutdown
On Wed, May 01, 2019 at 02:13:10PM +0200, Alexander Kukushkin wrote: Hi Tomas, On Wed, 1 May 2019 at 02:28, Tomas Vondra wrote: I see there's an ongoing discussion about race conditions in walreceiver blocking shutdown, so let me start a new thread about a race condition in walsender during shutdown. The symptoms are rather simple - 'pg_ctl -m fast shutdown' gets stuck, waiting for walsender processes to catch-up and terminate indefinitely. I can confirm, during the past couple of years we observed such a problem a few times and this is really annoying. The reason for that is pretty simple - the walsenders are doing logical decoding, and during shutdown they're waiting for WalSndCaughtUp=true. But per XLogSendLogical() this only happens if if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr()) { WalSndCaughtUp = true; ... } After a couple of days investigating and debugging I came to a slightly different conclusion, WalSndCaughtUp is set to true in my case. Since I am mostly a python person, I decided to use psycopg2 for my investigation. I took an example from http://initd.org/psycopg/docs/advanced.html#logical-replication-quick-start as a starting point, created a slot and started the script. I wasn't running any transactions, therefore the DemoConsumer.__call__ was never executed and cursor.send_feedback(flush_lsn=msg.data_start) was never called either. Basically, the only what the psycopg2 internals was doing - periodically sending keepalive messages or replying to keepalives sent by postgres. OK, so that seems like a separate issue, somewhat unrelated to the issue I reported. And I'm not sure it's a walsender issue - it seems it might be a psycopg2 issue in not reporting the flush properly, no? Actually, the same problem will happen even in the case when the consumer script receives some message, but not very intensively, but it is just a bit harder to reproduce it. If you attach to the walsender with gdb, you'll see the following picture: (gdb) bt #0 0x7fd6623d296a in __libc_send (fd=8, buf=0x55cb958dca08, len=94, flags=0) at ../sysdeps/unix/sysv/linux/send.c:28 #1 0x55cb93aa7ce9 in secure_raw_write (port=0x55cb958d71e0, ptr=0x55cb958dca08, len=94) at be-secure.c:318 #2 0x55cb93aa7b87 in secure_write (port=0x55cb958d71e0, ptr=0x55cb958dca08, len=94) at be-secure.c:265 #3 0x55cb93ab6bf9 in internal_flush () at pqcomm.c:1433 #4 0x55cb93ab6b89 in socket_flush () at pqcomm.c:1409 #5 0x55cb93dac30b in send_message_to_frontend (edata=0x55cb942b4380 ) at elog.c:3317 #6 0x55cb93da8973 in EmitErrorReport () at elog.c:1481 #7 0x55cb93da5abf in errfinish (dummy=0) at elog.c:481 #8 0x55cb93da852d in elog_finish (elevel=13, fmt=0x55cb93f32de3 "sending replication keepalive") at elog.c:1376 #9 0x55cb93bcae71 in WalSndKeepalive (requestReply=true) at walsender.c:3358 Is it stuck in the send() call forever, or did you happen to grab this bracktrace? All above text probably looks like a brain dump, but I don't think that it conflicts with Tomas's findings it rather compliments them. I am very glad that now I know how to mitigate the problem on the client side, but IMHO it is also very important to fix the server behavior if it is ever possible. I think having a report of an issue, with a way to reproduce it is a first (and quite important) step. So thanks for doing that. That being said, I think those are two separate issues, with different causes and likely different fixes. I don't think fixing the xlog flush will resolve your issue, and vice versa. FWIW attached is a patch that I used to reliably trigger the xlog flush issue - it simply adds LogLogicalMessage() to commit handler in the built-in output plugin. So all you need to do is create a subscription, start generating commit and trigger a restart. The message is 8kB, so it's definitely long enough to overflow to the next xlog page. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c index bf64c8e4a4..a18864666e 100644 --- a/src/backend/replication/pgoutput/pgoutput.c +++ b/src/backend/replication/pgoutput/pgoutput.c @@ -16,6 +16,7 @@ #include "replication/logical.h" #include "replication/logicalproto.h" +#include "replication/message.h" #include "replication/origin.h" #include "replication/pgoutput.h" @@ -247,11 +248,19 @@ static void pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, XLogRecPtr commit_lsn) { + charmessage[8192]; + OutputPluginUpdateProgress(ctx); OutputPluginPrepareWrite(ctx, true); logicalrep_write_commit(ctx->out, txn, commit_lsn); OutputPluginWrite(ctx, true); + + /* a simple string */ + memset(message, 'a', 8
Re: to_timestamp docs
Bruce Momjian writes: > I don't think the changes made in PG 12 are documented accurately. That code is swapped out of my head at the moment, but it looks to me like the para before the one you changed is where we discuss the behavior for whitespace. I'm not sure that this change is right, or an improvement, in the context of both paras. regards, tom lane
to_timestamp docs
I don't think the changes made in PG 12 are documented accurately. It currently says: to_timestamp and to_date matches any single separator in the input string or is skipped However, I think it is more accurate to say _multiple_ whitespace can also be matched by a single separator: SELECT to_timestamp('%1976','_'); to_timestamp 1976-01-01 00:00:00-05 SELECT to_timestamp('%%1976','_'); ERROR: invalid value "%197" for "" DETAIL: Value must be an integer. -- two spaces --> SELECT to_timestamp(' 1976','_'); to_timestamp 1976-01-01 00:00:00-05 --> SELECT to_timestamp(E'\t\t\t1976','_'); to_timestamp 1976-01-01 00:00:00-05 Proposed patch attached. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml new file mode 100644 index d751766..b27f4d4 *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** SELECT regexp_match('abc01234xyz', '(?:( *** 6390,6396 A separator (a space or non-letter/non-digit character) in the template string of to_timestamp and to_date !matches any single separator in the input string or is skipped, unless the FX option is used. For example, to_timestamp('2000JUN', '///MON') and to_timestamp('2000/JUN', ' MON') work, but --- 6390,6396 A separator (a space or non-letter/non-digit character) in the template string of to_timestamp and to_date !matches multiple whitespace characters in the input string, a single separator, or nothing, unless the FX option is used. For example, to_timestamp('2000JUN', '///MON') and to_timestamp('2000/JUN', ' MON') work, but
Re: walsender vs. XLogBackgroundFlush during shutdown
Hi Tomas, On Wed, 1 May 2019 at 02:28, Tomas Vondra wrote: > I see there's an ongoing discussion about race conditions in walreceiver > blocking shutdown, so let me start a new thread about a race condition in > walsender during shutdown. > > The symptoms are rather simple - 'pg_ctl -m fast shutdown' gets stuck, > waiting for walsender processes to catch-up and terminate indefinitely. I can confirm, during the past couple of years we observed such a problem a few times and this is really annoying. > The reason for that is pretty simple - the walsenders are doing logical > decoding, and during shutdown they're waiting for WalSndCaughtUp=true. > But per XLogSendLogical() this only happens if > > if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr()) > { > WalSndCaughtUp = true; > ... > } After a couple of days investigating and debugging I came to a slightly different conclusion, WalSndCaughtUp is set to true in my case. Since I am mostly a python person, I decided to use psycopg2 for my investigation. I took an example from http://initd.org/psycopg/docs/advanced.html#logical-replication-quick-start as a starting point, created a slot and started the script. I wasn't running any transactions, therefore the DemoConsumer.__call__ was never executed and cursor.send_feedback(flush_lsn=msg.data_start) was never called either. Basically, the only what the psycopg2 internals was doing - periodically sending keepalive messages or replying to keepalives sent by postgres. In the postgres debug log they are visible as: 2019-05-01 12:58:32.785 CEST [13939] DEBUG: write 0/0 flush 0/0 apply 0/0 If you try to do a fast shutdown of postgres while the script is running, it will never finish, and in the postgres log you will get indefinite stream of messages: 2019-05-01 13:00:02.880 CEST [13939] DEBUG: write 0/0 flush 0/0 apply 0/0 2019-05-01 13:00:02.880 CEST [13939] DEBUG: sending replication keepalive 2019-05-01 13:00:02.880 CEST [13939] DEBUG: write 0/0 flush 0/0 apply 0/0 2019-05-01 13:00:02.880 CEST [13939] DEBUG: sending replication keepalive 2019-05-01 13:00:02.881 CEST [13939] DEBUG: write 0/0 flush 0/0 apply 0/0 2019-05-01 13:00:02.881 CEST [13939] DEBUG: sending replication keepalive 2019-05-01 13:00:02.881 CEST [13939] DEBUG: write 0/0 flush 0/0 apply 0/0 2019-05-01 13:00:02.881 CEST [13939] DEBUG: sending replication keepalive 2019-05-01 13:00:02.881 CEST [13939] DEBUG: write 0/0 flush 0/0 apply 0/0 2019-05-01 13:00:02.881 CEST [13939] DEBUG: sending replication keepalive 2019-05-01 13:00:02.881 CEST [13939] DEBUG: write 0/0 flush 0/0 apply 0/0 Actually, the same problem will happen even in the case when the consumer script receives some message, but not very intensively, but it is just a bit harder to reproduce it. If you attach to the walsender with gdb, you'll see the following picture: (gdb) bt #0 0x7fd6623d296a in __libc_send (fd=8, buf=0x55cb958dca08, len=94, flags=0) at ../sysdeps/unix/sysv/linux/send.c:28 #1 0x55cb93aa7ce9 in secure_raw_write (port=0x55cb958d71e0, ptr=0x55cb958dca08, len=94) at be-secure.c:318 #2 0x55cb93aa7b87 in secure_write (port=0x55cb958d71e0, ptr=0x55cb958dca08, len=94) at be-secure.c:265 #3 0x55cb93ab6bf9 in internal_flush () at pqcomm.c:1433 #4 0x55cb93ab6b89 in socket_flush () at pqcomm.c:1409 #5 0x55cb93dac30b in send_message_to_frontend (edata=0x55cb942b4380 ) at elog.c:3317 #6 0x55cb93da8973 in EmitErrorReport () at elog.c:1481 #7 0x55cb93da5abf in errfinish (dummy=0) at elog.c:481 #8 0x55cb93da852d in elog_finish (elevel=13, fmt=0x55cb93f32de3 "sending replication keepalive") at elog.c:1376 #9 0x55cb93bcae71 in WalSndKeepalive (requestReply=true) at walsender.c:3358 #10 0x55cb93bca062 in WalSndDone (send_data=0x55cb93bc9e29 ) at walsender.c:2872 #11 0x55cb93bc9155 in WalSndLoop (send_data=0x55cb93bc9e29 ) at walsender.c:2194 #12 0x55cb93bc7b11 in StartLogicalReplication (cmd=0x55cb95931cc0) at walsender.c:1109 #13 0x55cb93bc83d6 in exec_replication_command (cmd_string=0x55cb958b2360 "START_REPLICATION SLOT \"test\" LOGICAL 0/") at walsender.c:1541 #14 0x55cb93c31653 in PostgresMain (argc=1, argv=0x55cb958deb68, dbname=0x55cb958deb48 "postgres", username=0x55cb958deb28 "akukushkin") at postgres.c:4178 #15 0x55cb93b95185 in BackendRun (port=0x55cb958d71e0) at postmaster.c:4361 #16 0x55cb93b94824 in BackendStartup (port=0x55cb958d71e0) at postmaster.c:4033 #17 0x55cb93b90ccd in ServerLoop () at postmaster.c:1706 #18 0x55cb93b90463 in PostmasterMain (argc=3, argv=0x55cb958ac710) at postmaster.c:1379 #19 0x55cb93abb08e in main (argc=3, argv=0x55cb958ac710) at main.c:228 (gdb) f 10 #10 0x55cb93bca062 in WalSndDone (send_data=0x55cb93bc9e29 ) at walsender.c:2872 2872WalSndKeepalive(true); (gdb) p WalSndCaughtUp $1 = true (gdb) p *MyWalSnd $2 = {pid = 21845, state = WALSNDSTATE_STREAMING, sentPtr = 23586
Re: [PATCH v4] Add \warn to psql
I guess that you have a verbose ~/.psqlrc. Can you try with adding -X to psql option when calling psql from the tap test? Ah, true. This patch works for me: diff --git a/src/bin/psql/t/001_psql.pl b/src/bin/psql/t/001_psql.pl index 32dd43279b..637baa94c9 100644 --- a/src/bin/psql/t/001_psql.pl +++ b/src/bin/psql/t/001_psql.pl @@ -20,7 +20,7 @@ sub psql { local $Test::Builder::Level = $Test::Builder::Level + 1; my ($opts, $stat, $in, $out, $err, $name) = @_; - my @cmd = ('psql', split /\s+/, $opts); + my @cmd = ('psql', '-X', split /\s+/, $opts); $node->command_checks_all(\@cmd, $stat, $out, $err, $name, $in); return; } Please find attached :) Good. Works for me, even with a verbose .psqlrc. Switched back to ready. -- Fabien.
Re: [PATCH v4] Add \warn to psql
On Wed, May 01, 2019 at 10:05:44AM +0300, Arthur Zakirov wrote: > (Unfortunately I accidentally sent my previous two messages using my personal > email address because of my email client configuration. This address is not > verified by PostgreSQL.org services and messages didn't reach hackers mailing > lists, so I recent latest message). > > On Tue, Apr 30, 2019 at 4:46 PM Fabien COELHO wrote: > > > Unfortunately new TAP test doesn't pass on my machine. I'm not good at > > > Perl > > > and didn't get the reason of the failure quickly. > > > > I guess that you have a verbose ~/.psqlrc. > > > > Can you try with adding -X to psql option when calling psql from the tap > > test? > > Ah, true. This patch works for me: > > diff --git a/src/bin/psql/t/001_psql.pl b/src/bin/psql/t/001_psql.pl > index 32dd43279b..637baa94c9 100644 > --- a/src/bin/psql/t/001_psql.pl > +++ b/src/bin/psql/t/001_psql.pl > @@ -20,7 +20,7 @@ sub psql > { > local $Test::Builder::Level = $Test::Builder::Level + 1; > my ($opts, $stat, $in, $out, $err, $name) = @_; > - my @cmd = ('psql', split /\s+/, $opts); > + my @cmd = ('psql', '-X', split /\s+/, $opts); > $node->command_checks_all(\@cmd, $stat, $out, $err, $name, $in); > return; > } Please find attached :) Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 66daae6f73e704ba05dec83b70eebabf9470d768 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Sun, 21 Apr 2019 11:08:40 -0700 Subject: [PATCH v7] Add \warn to psql To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.20.1" This is a multi-part message in MIME format. --2.20.1 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit - Does what it says on the label - Adds TAP tests for same - In passing, expose the -n option for \echo, \qecho, and \warn in \? output create mode 100755 src/bin/psql/t/001_psql.pl --2.20.1 Content-Type: text/x-patch; name="v7-0001-Add-warn-to-psql.patch" Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename="v7-0001-Add-warn-to-psql.patch" diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index b86764003d..e474efb521 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -3220,6 +3220,25 @@ testdb=> \setenv LESS -imx4F + +\warn text [ ... ] + + +Prints the arguments to the standard error, separated by one +space and followed by a newline. This can be useful to +intersperse information in the output of scripts when not polluting +standard output is desired. For example: + + +=> \warn `date` +Sun Apr 28 21:00:10 PDT 2019 + +If the first argument is an unquoted -n the trailing +newline is not written. + + + + \watch [ seconds ] diff --git a/src/bin/psql/.gitignore b/src/bin/psql/.gitignore index c2862b12d6..d324c1c1fa 100644 --- a/src/bin/psql/.gitignore +++ b/src/bin/psql/.gitignore @@ -3,3 +3,4 @@ /sql_help.c /psql +/tmp_check diff --git a/src/bin/psql/Makefile b/src/bin/psql/Makefile index 69bb297fe7..9473ab01cb 100644 --- a/src/bin/psql/Makefile +++ b/src/bin/psql/Makefile @@ -60,8 +60,15 @@ uninstall: clean distclean: rm -f psql$(X) $(OBJS) lex.backup + rm -rf tmp_check # files removed here are supposed to be in the distribution tarball, # so do not clean them in the clean/distclean rules maintainer-clean: distclean rm -f sql_help.h sql_help.c psqlscanslash.c + +check: + $(prove_check) + +installcheck: + $(prove_installcheck) diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 8254d61099..f6fedb45b7 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -319,7 +319,7 @@ exec_command(const char *cmd, status = exec_command_ef_ev(scan_state, active_branch, query_buf, true); else if (strcmp(cmd, "ev") == 0) status = exec_command_ef_ev(scan_state, active_branch, query_buf, false); - else if (strcmp(cmd, "echo") == 0 || strcmp(cmd, "qecho") == 0) + else if (strcmp(cmd, "echo") == 0 || strcmp(cmd, "qecho") == 0 || strcmp(cmd, "warn") == 0) status = exec_command_echo(scan_state, active_branch, cmd); else if (strcmp(cmd, "elif") == 0) status = exec_command_elif(scan_state, cstack, query_buf); @@ -1114,7 +1114,7 @@ exec_command_ef_ev(PsqlScanState scan_state, bool active_branch, } /* - * \echo and \qecho -- echo arguments to stdout or query output + * \echo, \qecho, and \warn -- echo arguments to stdout, query output, or stderr */ static backslashResult exec_command_echo(PsqlScanState scan_state, bool active_branch, const char *cmd) @@ -1129,6 +1129,8 @@ exec_command_echo(PsqlScanState scan_state, bool active_branch, const char *cmd) if (strcmp(cmd, "qecho") == 0) fout = pset.queryFout; +
Re: [PATCH v4] Add \warn to psql
(Unfortunately I accidentally sent my previous two messages using my personal email address because of my email client configuration. This address is not verified by PostgreSQL.org services and messages didn't reach hackers mailing lists, so I recent latest message). On Tue, Apr 30, 2019 at 4:46 PM Fabien COELHO wrote: > > Unfortunately new TAP test doesn't pass on my machine. I'm not good at Perl > > and didn't get the reason of the failure quickly. > > I guess that you have a verbose ~/.psqlrc. > > Can you try with adding -X to psql option when calling psql from the tap > test? Ah, true. This patch works for me: diff --git a/src/bin/psql/t/001_psql.pl b/src/bin/psql/t/001_psql.pl index 32dd43279b..637baa94c9 100644 --- a/src/bin/psql/t/001_psql.pl +++ b/src/bin/psql/t/001_psql.pl @@ -20,7 +20,7 @@ sub psql { local $Test::Builder::Level = $Test::Builder::Level + 1; my ($opts, $stat, $in, $out, $err, $name) = @_; - my @cmd = ('psql', split /\s+/, $opts); + my @cmd = ('psql', '-X', split /\s+/, $opts); $node->command_checks_all(\@cmd, $stat, $out, $err, $name, $in); return; } -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company