Re: [PATCH] initdb: do not exit after warn_on_mount_point
Julien Rouhaud writes: > On Sun, Sep 11, 2022 at 06:22:21PM +, andrey.ara...@nixaid.com wrote: >> Everyone using containerized postgresql image cannot use /var/lib/postgresql >> as the mountpoint but has to use /var/lib/postgresql/data instead due to this >> issue [4] due to [5]. > initdb had this behavior for almost 10 years, and for good reasons: it > prevents > actual problems that were seen in the field. The long and the short of this is that one person losing their data outweighs thousands of people being very mildly inconvenienced by having to create an extra level of directory. I understand that you don't think so, but you'd change your mind PDQ if it was *your* data that got irretrievably corrupted. We are not going to remove this check. If anything, the fault I'd find with the existing code is that it's not sufficiently thorough about detecting what is a mount point. AFAICS, neither the lost+found check nor the extra-dot-files check will trigger on modern filesystems such as XFS. I wonder if we could do something like comparing the stat(2) st_dev numbers for the proposed data directory vs. its parent directory. regards, tom lane
Re: broken table formatting in psql
po 12. 9. 2022 v 7:37 odesílatel John Naylor napsal: > On Thu, Sep 8, 2022 at 12:51 PM Pavel Stehule > wrote: > > > > > >> Does anyone want to advocate for backpatching these missing ranges to > >> v12 and up? v12 still has a table in-line so trivial to remedy, but > >> v13 and up use a script, so these exceptions would likely have to use > >> hard-coded branches to keep from bringing in new changes. > >> > >> If so, does anyone want to advocate for including this patch in v15? > >> It claims Unicode 14.0.0, and this would make that claim more > >> technically correct as well as avoiding additional branches. > > > > > > I think it can be fixed just in v15 and master. This issue has no > impact on SQL. > > Well, if the regressions from v11 are not important enough to > backpatch, there is not as much of a case to backpatch the full fix to > v15 either. > This is not a critical issue, really. On second thought, I don't see the point in releasing fresh Postgres with this bug, where there is know bugfix - and this bugfix should be compatible (at this moment) with 16. PostgreSQL 15 was not released yet. Regards Pavel -- > John Naylor > EDB: http://www.enterprisedb.com >
Re: broken table formatting in psql
On Thu, Sep 8, 2022 at 12:51 PM Pavel Stehule wrote: > > >> Does anyone want to advocate for backpatching these missing ranges to >> v12 and up? v12 still has a table in-line so trivial to remedy, but >> v13 and up use a script, so these exceptions would likely have to use >> hard-coded branches to keep from bringing in new changes. >> >> If so, does anyone want to advocate for including this patch in v15? >> It claims Unicode 14.0.0, and this would make that claim more >> technically correct as well as avoiding additional branches. > > > I think it can be fixed just in v15 and master. This issue has no impact on > SQL. Well, if the regressions from v11 are not important enough to backpatch, there is not as much of a case to backpatch the full fix to v15 either. -- John Naylor EDB: http://www.enterprisedb.com
Re: Handle infinite recursion in logical replication setup
On Fri, 9 Sept 2022 at 11:12, Amit Kapila wrote: > > On Thu, Sep 8, 2022 at 9:32 AM vignesh C wrote: > > > > > > The attached patch has the changes to handle the same. > > > > Pushed. I am not completely sure whether we want the remaining > documentation patch in this thread in its current form or by modifying > it. Johnathan has shown some interest in it. I feel you can start a > separate thread for it to see if there is any interest in the same and > close the CF entry for this work. Thanks for pushing the patch. I have closed this entry in commitfest. I will wait for some more time and see the response regarding the documentation patch and then start a new thread if required. Regards, Vignesh
Re: [PATCH] initdb: do not exit after warn_on_mount_point
On Sun, Sep 11, 2022 at 06:22:21PM +, andrey.ara...@nixaid.com wrote: > September 11, 2022 12:18 PM, "Julien Rouhaud" wrote: > > > Hi, > > > > On Sat, Sep 10, 2022 at 07:14:59PM +, andrey.ara...@nixaid.com wrote: > > > >> Have slightly improved the logic so that it does not report an error > >> "directory \"%s\" exists but is not empty" > >> when it is only supposed to warn the user about the mountpoint, without > >> exiting. > >> > >> To me, my patch looks like a typo fix where exit(1) should not be called on > >> the warn_on_mount_point(), but only warn and continue as more people are > >> mounting the device at `/var/lib/postgresql/data` (PGDATA) in the > >> containerized world (K8s deployments, especially now in the Akash Network I > >> am working for) for making sure their data persist. > > > > This definitely isn't a typo, as it's really strongly discouraged to use a > > mount point for the data directory. You can refer to this thread [1] for > > more > > explanations. > > > > [1] > > https://www.postgresql.org/message-id/flat/CAKoxK+6H40imynM5P31bf0DnpN-5f5zeROjcaj6BKVAjxdEA9w@mail. > > mail.com > > > I've read the "why not using a mountpoint as PGDATA?" thread [1] as well as > Bugzilla "postgresql-setup fails when /var/lib/pgsql/data is mount point" > thead [2] but could not find any good reason why not to mount the PGDATA > directly, except probably for the NFS mount point, but who does that anyway? What about this part in Tom's original answer: 3. If, some day, the filesystem is accidentally unmounted while the database is up, it will continue to write into files that are now getting placed in the mount-point directory on the parent volume. This usually results in an unrecoverably messed-up database by the time you realize what's going wrong. (There are horror stories about such cases in the PG community mailing list archives, dating from before we installed the don't-use-a-mount-point defenses in initdb.) > Everyone using containerized postgresql image cannot use /var/lib/postgresql > as the mountpoint but has to use /var/lib/postgresql/data instead due to this > issue [4] due to [5]. Hence, everyone using containerized version of > postgresql with the device (say Ceph's RBD) mounted over > /var/lib/postgresql/data directory has to either specify: > > - PGDATA=/var/lib/postgresql/data/ > > OR > > make sure to remove $PGDATA/lost+found directory. initdb had this behavior for almost 10 years, and for good reasons: it prevents actual problems that were seen in the field. It's unfortunate that the docker postgres images didn't take that behavior into account, which was introduced more than a year before any work started on those images, but if you're not happy with those workarounds it's something that should be changed in the docker images. > Both of these hacks are only for the initdb to fail detect the mountpoint > which, on its own, is supposed to be only a warning which is seen from the > function name warn_on_mount_point() and its messages [6] look to be of only > advisory nature to me. As Tom confirmed at [1], you shouldn't assume anything about the criticality based on the function name. If anything, the "warn" part is only saying that the function itself won't exit(1). And yes this function is only adding information on the reason why the given directory can't be used, but it doesn't change the fact that the given directory can't be used. [1] https://www.postgresql.org/message-id/813162.1662908...@sss.pgh.pa.us
Re: [PATCH] Clarify the comments about varlena header encoding
On Sun, Sep 11, 2022 at 5:06 PM Aleksander Alekseev wrote: > > Hi John, > > Many thanks for the feedback! > > > Or put the now-standard 0b in front. > > Good idea. Now that I look at the results, though, it's distracting and not good for readability. I'm not actually sure we need to do anything here, but I am somewhat in favor of putting [un]aligned in parentheses, as already discussed. Even there, in the first email you said: > Also one can read this as "aligned, uncompressed > data" (which again is wrong). I'm not sure it rises to the level of "wrong", because a blob of bytes immediately after an aligned uint32 is in fact aligned. The important thing is: a zero byte is always either a padding byte or part of a 4-byte header, so it's the alignment of the header we really care about. > > I think the problem is ambiguity about what a "toast pointer" is. This > > comment: > > > > * struct varatt_external is a traditional "TOAST pointer", that is, the > > Right. The comment for varatt_external says that it IS a TOAST > pointer. Well, the word "traditional" is not very informative, but it is there. And afterwards there is also varatt_indirect, varatt_expanded, and varattrib_1b_e, which all mention "TOAST pointer". > However the comments for varlena headers bit layout > implicitly include it into a TOAST pointer, which contradicts the > previous comments. I suggest we fix this ambiguity by explicitly > enumerating the type tag in the comments for varlena headers. - * 1000 1-byte length word, unaligned, TOAST pointer + * 0b1000 1-byte length word (unaligned), type tag, TOAST pointer This is distracting from the point of this whole comment, which, I will say again is: How to look at the first byte to determine what kind of varlena we're looking at. There is no reason to mention the type tag here, at all. - * In TOAST pointers the va_tag field (see varattrib_1b_e) is used to discern - * the specific type and length of the pointer datum. + * For the TOAST pointers the type tag (see varattrib_1b_e.va_tag field) is + * used to discern the specific type and length of the pointer datum. I don't think this clarifies anything, it's just a rephrasing. More broadly, I think the description of varlenas in this header is at a kind of "local maximum" -- minor adjustments are more likely to make it worse. To significantly improve clarity might require a larger rewriting, but I'm not personally interested in taking part in that. -- John Naylor EDB: http://www.enterprisedb.com
RE: why can't a table be part of the same publication as its schema
On Monday, September 12, 2022 1:08 AM Mark Dilger wrote: > > > On Sep 10, 2022, at 4:17 PM, Robert Haas wrote: > > > >>> I don't understand why we > >>> used this ALL TABLES IN SCHEMA language. > >> > >> The conversation, as I recall, was that "ADD SCHEMA foo" would only mean > all tables in foo, until publication of other object types became supported, > at > which point "ADD SCHEMA foo" would suddenly mean more than it did before. > People might find that surprising, so the "ALL TABLES IN" was intended to > future-proof against surprising behavioral changes. > > > > If I encountered this syntax in a vacuum, that's not what I would > > think. I would think that ADD ALL TABLES IN SCHEMA meant add all the > > tables in the schema to the publication one by one as individual > > objects > > Yes, it appears the syntax was chosen to avoid one kind of confusion, but > created > another kind. Per the docs on this feature: > > FOR ALL TABLES IN SCHEMA > Marks the publication as one that replicates changes for all tables in the > specified list of schemas, including tables created in the future. > > Like you, I wouldn't expect that definition, given the behavior of GRANT with > respect to the same grammatical construction. I'm a bit unsure if it should be compared to GRANT. Because even if we chose "ALTER PUBLICATION p1 { ADD | DROP } SCHEMA name", it's also not consistent with the meaning of GRANT ON SCHEMA, as GRANT ON SCHEMA doesn't grant rights on the tables within schema if I understand correctly. I feel we'd better compare the syntax with the existing publication command: FOR ALL TABLES. If you create a publication FOR ALL TABLES, it means publishing all the tables in the database *including* tables created in the future. I think both the syntax and meaning of ALL TABLES IN SCHEMA are consistent with the existing FOR ALL TABLES. And the behavior is clearly documented, so personally I think it's fine. https://www.postgresql.org/docs/devel/sql-createpublication.html -- FOR ALL TABLES Marks the publication as one that replicates changes for all tables in the database, including tables created in the future. FOR ALL TABLES IN SCHEMA Marks the publication as one that replicates changes for all tables in the specified list of schemas, including tables created in the future. -- Besides, as mentioned(and suggested by Tom[1]), we might support publishing SEQUENCE(or others) in the future. It would give more flexibility to user if we have another FOR ALL SEQUENCES(or other objects) IN SCHEMA. [1] https://www.postgresql.org/message-id/155565.1628954580%40sss.pgh.pa.us Best regards, Hou zj
Re: list of acknowledgments for PG15
On Thu, Sep 8, 2022 at 9:13 PM Peter Eisentraut wrote: > Attached is the plain-text list of acknowledgments for the PG15 release > notes, current through REL_15_BETA4. Please check for problems such as > wrong sorting, duplicate names in different variants, or names in the > wrong order etc. (Note that the current standard is given name followed > by surname, independent of cultural origin.) Thanks as usual! I think these are Japanese names that are in the surname-followed-by-given-name order: Kamigishi Rei Kawamoto Masaya Okano Naoki Best regards, Etsuro Fujita
Re: Switching XLog source from archive to streaming when primary available
On Sat, Sep 10, 2022 at 3:35 AM Nathan Bossart wrote: > > Well, for wal_keep_size, using bytes makes sense. Given you know how much > disk space you have, you can set this parameter accordingly to avoid > retaining too much of it for standby servers. For your proposed parameter, > it's not so simple. The same setting could have wildly different timing > behavior depending on the server. I still think that a timeout is the most > intuitive. Hm. In v3 patch, I've used the timeout approach, but tracking the duration server spends in XLOG_FROM_ARCHIVE as opposed to tracking last failed time in streaming from primary. > So I think it > would be difficult for the end user to decide on a value. However, even > the timeout approach has this sort of problem. If your parameter is set to > 1 minute, but the current archive takes 5 minutes to recover, you won't > really be testing streaming replication once a minute. That would likely > need to be documented. Added a note in the docs. On Fri, Sep 9, 2022 at 10:46 AM Kyotaro Horiguchi wrote: > > Being late for the party. Thanks for reviewing this. > It seems to me that the function is getting too long. I think we > might want to move the core part of the patch into another function. Yeah, the WaitForWALToBecomeAvailable() (without this patch) has around 460 LOC out of which WAL fetching from the chosen source is of 240 LOC, IMO, this code will be a candidate for a new function. I think that part can be discussed separately. Having said that, I moved the new code to a new function. > I think it might be better if intentionalSourceSwitch doesn't need > lastSourceFailed set. It would look like this: > > > if (lastSourceFailed || switchSource) > > { > > if (nonblocking && lastSourceFailed) > >return XLREAD_WOULDBLOCK; I think the above looks good, done that way in the latest patch. > I don't think the flag first_time is needed. Addressed this in the v4 patch. Please review the attached v4 patch addressing above review comments. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v4-0001-Allow-standby-to-switch-WAL-source-from-archive-t.patch Description: Binary data
Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)
On Mon, Sep 12, 2022 at 02:34:48PM +1200, Thomas Munro wrote: > On Mon, Sep 12, 2022 at 2:27 PM Justin Pryzby wrote: >> Yeah ... I just realized that I've already forgotten the relevant >> chronology. >> >> The io_concurrency bugfix wasn't included in 15b4, so (if I understood >> you correctly), that might explain these symptoms - right ? > > Yeah. Could you double-check if the issues you are seeing persist after syncing up with the latest point of REL_15_STABLE? For now, I have added an open item just to be on the safe side. -- Michael signature.asc Description: PGP signature
Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)
On Mon, Sep 12, 2022 at 2:27 PM Justin Pryzby wrote: > On Mon, Sep 12, 2022 at 02:25:48PM +1200, Thomas Munro wrote: > > On Mon, Sep 12, 2022 at 1:42 PM Justin Pryzby wrote: > > > But yesterday I started from initdb and restored this cluster from > > > backup, and > > > started up sqlsmith, and sent some kill -9, and now got more corruption. > > > Looks like it took ~10 induced crashes before this happened. > > > > $SUBJECT says 15b4, which doesn't have the fix. Are you still using > > maintainance_io_concurrent=0? > > Yeah ... I just realized that I've already forgotten the relevant > chronology. > > The io_concurrency bugfix wasn't included in 15b4, so (if I understood > you correctly), that might explain these symptoms - right ? Yeah.
Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)
On Mon, Sep 12, 2022 at 02:25:48PM +1200, Thomas Munro wrote: > On Mon, Sep 12, 2022 at 1:42 PM Justin Pryzby wrote: > > On Mon, Sep 12, 2022 at 10:44:38AM +1200, Thomas Munro wrote: > > > On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby > > > wrote: > > > > < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR: MultiXactId 133553154 > > > > has not been created yet -- apparent wraparound > > > > > > I guess what happened here is that after one of your (apparently > > > several?) OOM crashes, crash recovery didn't run all the way to the > > > true end of the WAL due to the maintenance_io_concurrency=0 bug. In > > > the case you reported, it couldn't complete an end-of-recovery > > > checkpoint until you disabled recovery_prefetch, but that's only > > > because of the somewhat unusual way that vismap pages work. In > > > another case it might have been able to (bogusly) complete a > > > checkpoint, leaving things in an inconsistent state. > > > > I think you're saying is that this can be explained by the > > io_concurrency bug in recovery_prefetch, if run under 15b3. > > Well I don't know, but it's one way I could think of that you could > have a data page referring to a multixact that isn't on disk after > recovery (because the data page happens to have been flushed, but we > didn't replay the WAL that would create the multixact). > > > But yesterday I started from initdb and restored this cluster from backup, > > and > > started up sqlsmith, and sent some kill -9, and now got more corruption. > > Looks like it took ~10 induced crashes before this happened. > > $SUBJECT says 15b4, which doesn't have the fix. Are you still using > maintainance_io_concurrent=0? Yeah ... I just realized that I've already forgotten the relevant chronology. The io_concurrency bugfix wasn't included in 15b4, so (if I understood you correctly), that might explain these symptoms - right ? -- Justin
Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)
On Mon, Sep 12, 2022 at 1:42 PM Justin Pryzby wrote: > On Mon, Sep 12, 2022 at 10:44:38AM +1200, Thomas Munro wrote: > > On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby wrote: > > > < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR: MultiXactId 133553154 > > > has not been created yet -- apparent wraparound > > > > I guess what happened here is that after one of your (apparently > > several?) OOM crashes, crash recovery didn't run all the way to the > > true end of the WAL due to the maintenance_io_concurrency=0 bug. In > > the case you reported, it couldn't complete an end-of-recovery > > checkpoint until you disabled recovery_prefetch, but that's only > > because of the somewhat unusual way that vismap pages work. In > > another case it might have been able to (bogusly) complete a > > checkpoint, leaving things in an inconsistent state. > > I think you're saying is that this can be explained by the > io_concurrency bug in recovery_prefetch, if run under 15b3. Well I don't know, but it's one way I could think of that you could have a data page referring to a multixact that isn't on disk after recovery (because the data page happens to have been flushed, but we didn't replay the WAL that would create the multixact). > But yesterday I started from initdb and restored this cluster from backup, and > started up sqlsmith, and sent some kill -9, and now got more corruption. > Looks like it took ~10 induced crashes before this happened. $SUBJECT says 15b4, which doesn't have the fix. Are you still using maintainance_io_concurrent=0?
Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)
On Sun, Sep 11, 2022 at 6:42 PM Justin Pryzby wrote: > I think you're saying is that this can be explained by the > io_concurrency bug in recovery_prefetch, if run under 15b3. > > But yesterday I started from initdb and restored this cluster from backup, and > started up sqlsmith, and sent some kill -9, and now got more corruption. > Looks like it took ~10 induced crashes before this happened. Have you tested fsync on the system? The symptoms here are all over the place. This assertion failure seems like a pretty good sign that the problems happen during recovery, or because basic guarantees needed by for crash safety aren't met: > #2 0x00962c5c in ExceptionalCondition > (conditionName=conditionName@entry=0x9ce238 "P_ISLEAF(opaque) && > !P_ISDELETED(opaque)", errorType=errorType@entry=0x9bad97 "FailedAssertion", > fileName=fileName@entry=0x9cdcd1 "nbtpage.c", > lineNumber=lineNumber@entry=1778) at assert.c:69 > #3 0x00507e34 in _bt_rightsib_halfdeadflag > (rel=rel@entry=0x7f4138a238a8, leafrightsib=leafrightsib@entry=53) at > nbtpage.c:1778 > #4 0x00507fba in _bt_mark_page_halfdead > (rel=rel@entry=0x7f4138a238a8, leafbuf=leafbuf@entry=13637, > stack=stack@entry=0x144ca20) at nbtpage.c:2121 This shows that the basic rules for page deletion have somehow seemingly been violated. It's as if a page deletion went ahead, but didn't work as an atomic operation -- there were some lost writes for some but not all pages. Actually, it looks like a mix of states from before and after both the first and the second phases of page deletion -- so not just one atomic operation. -- Peter Geoghegan
Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)
On Mon, Sep 12, 2022 at 10:44:38AM +1200, Thomas Munro wrote: > On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby wrote: > > < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR: MultiXactId 133553154 has > > not been created yet -- apparent wraparound > > I guess what happened here is that after one of your (apparently > several?) OOM crashes, crash recovery didn't run all the way to the > true end of the WAL due to the maintenance_io_concurrency=0 bug. In > the case you reported, it couldn't complete an end-of-recovery > checkpoint until you disabled recovery_prefetch, but that's only > because of the somewhat unusual way that vismap pages work. In > another case it might have been able to (bogusly) complete a > checkpoint, leaving things in an inconsistent state. I think you're saying is that this can be explained by the io_concurrency bug in recovery_prefetch, if run under 15b3. But yesterday I started from initdb and restored this cluster from backup, and started up sqlsmith, and sent some kill -9, and now got more corruption. Looks like it took ~10 induced crashes before this happened. At the moment, I have no reason to believe this issue is related to prefetch_recovery; I am wondering about changes to vacuum. < 2022-09-11 20:19:03.071 CDT telsasoft >ERROR: MultiXactId 732646 has not been created yet -- apparent wraparound < 2022-09-11 20:24:00.530 CDT telsasoft >ERROR: MultiXactId 732646 has not been created yet -- apparent wraparound Program terminated with signal 6, Aborted. #0 0x7f413716b1f7 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7_4.2.x86_64 libgcc-4.8.5-44.el7.x86_64 libxml2-2.9.1-6.el7_9.6.x86_64 lz4-1.8.3-1.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) bt #0 0x7f413716b1f7 in raise () from /lib64/libc.so.6 #1 0x7f413716c8e8 in abort () from /lib64/libc.so.6 #2 0x00962c5c in ExceptionalCondition (conditionName=conditionName@entry=0x9ce238 "P_ISLEAF(opaque) && !P_ISDELETED(opaque)", errorType=errorType@entry=0x9bad97 "FailedAssertion", fileName=fileName@entry=0x9cdcd1 "nbtpage.c", lineNumber=lineNumber@entry=1778) at assert.c:69 #3 0x00507e34 in _bt_rightsib_halfdeadflag (rel=rel@entry=0x7f4138a238a8, leafrightsib=leafrightsib@entry=53) at nbtpage.c:1778 #4 0x00507fba in _bt_mark_page_halfdead (rel=rel@entry=0x7f4138a238a8, leafbuf=leafbuf@entry=13637, stack=stack@entry=0x144ca20) at nbtpage.c:2121 #5 0x0050af1d in _bt_pagedel (rel=rel@entry=0x7f4138a238a8, leafbuf=leafbuf@entry=13637, vstate=vstate@entry=0x7ffef18de8b0) at nbtpage.c:2004 #6 0x0050c996 in btvacuumpage (vstate=vstate@entry=0x7ffef18de8b0, scanblkno=scanblkno@entry=36) at nbtree.c:1342 #7 0x0050caf8 in btvacuumscan (info=info@entry=0x7ffef18deac0, stats=stats@entry=0x142fb70, callback=callback@entry=0x67e89b , callback_state=callback_state@entry=0x1461220, cycleid=) at nbtree.c:997 #8 0x0050cc2f in btbulkdelete (info=0x7ffef18deac0, stats=0x142fb70, callback=0x67e89b , callback_state=0x1461220) at nbtree.c:801 #9 0x004fc64b in index_bulk_delete (info=info@entry=0x7ffef18deac0, istat=istat@entry=0x0, callback=callback@entry=0x67e89b , callback_state=callback_state@entry=0x1461220) at indexam.c:701 #10 0x0068108c in vac_bulkdel_one_index (ivinfo=ivinfo@entry=0x7ffef18deac0, istat=istat@entry=0x0, dead_items=0x1461220) at vacuum.c:2324 #11 0x004f72ae in lazy_vacuum_one_index (indrel=, istat=0x0, reltuples=, vacrel=vacrel@entry=0x142f100) at vacuumlazy.c:2726 #12 0x004f738b in lazy_vacuum_all_indexes (vacrel=vacrel@entry=0x142f100) at vacuumlazy.c:2328 #13 0x004f75df in lazy_vacuum (vacrel=vacrel@entry=0x142f100) at vacuumlazy.c:2261 #14 0x004f7f14 in lazy_scan_heap (vacrel=vacrel@entry=0x142f100) at vacuumlazy.c:1264 #15 0x004f895f in heap_vacuum_rel (rel=0x7f4138a67c00, params=0x143cbec, bstrategy=0x143ea20) at vacuumlazy.c:534 #16 0x0067f62b in table_relation_vacuum (bstrategy=, params=0x143cbec, rel=0x7f4138a67c00) at ../../../src/include/access/tableam.h:1680 #17 vacuum_rel (relid=1249, relation=, params=params@entry=0x143cbec) at vacuum.c:2086 #18 0x0068065c in vacuum (relations=0x144a118, params=params@entry=0x143cbec, bstrategy=, bstrategy@entry=0x143ea20, isTopLevel=isTopLevel@entry=true) at vacuum.c:475 #19 0x00796a0e in autovacuum_do_vac_analyze (tab=tab@entry=0x143cbe8, bstrategy=bstrategy@entry=0x143ea20) at autovacuum.c:3149 #20 0x007987bf in do_autovacuum () at autovacuum.c:2472 #21 0x00798e72 in AutoVacWorkerMain (argc=argc@entry=0, argv=argv@entry=0x0) at autovacuum.c:1715 #22 0x00798eed in StartAutoVacWorker () at autovacuum.c:1493 #23 0x0079fe49 in StartAutovacuumWorker () at postmaster.c:5534 #24 0x007a0c44 in sigusr1_handler (postgres_signal_arg=) at postmaster.c:5239 #25 #26
Re: Background writer and checkpointer in crash recovery
On Tue, Aug 03, 2021 at 02:19:22PM +1200, Thomas Munro wrote: > On Tue, Aug 3, 2021 at 9:52 AM Thomas Munro wrote: > > On Tue, Aug 3, 2021 at 1:17 AM Robert Haas wrote: > > > That's great. I just realized that this leaves us with identical > > > RequestCheckpoint() calls in two nearby places. Is there any reason > > > not to further simplify as in the attached? > > > > LGTM. > > And pushed. Gripe: this made the "ps" display worse than before. 7ff23c6d2 Run checkpointer and bgwriter in crash recovery. A couple years ago, I complained that during the end-of-recovery checkpoint, the "ps" display still said "recovering N", which made it look like it was stuck on a particular WAL file. That led to commit df9274adf, which updated the startup process's "ps" to say "end-of-recovery checkpoint". But since the start process no longer does the checkpoint, it still says: postgres 19738 11433 5 19:33 ?00:00:01 postgres: startup recovering 0001000C00FB postgres 19739 11433 3 19:33 ?00:00:00 postgres: checkpointer performing end-of-recovery checkpoint That looks inconsistent. It'd be better if the startup process's "ps" were cleared. -- Justin
Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~
On Mon, Sep 12, 2022 at 09:42:25AM +1200, Thomas Munro wrote: > When looking the function up it made sense to use the name ending in > '...A', but when calling directly I think we shouldn't use the A > suffix, we should let the macros do that for us[1]. (I > wondered for a moment if that would even make Windows and Unix code > the same, but sadly not due to the extra NULL arguments.) Good idea, I did not noticed this part. This should work equally, so done this way and applied. I am keeping an eye on the buildfarm. -- Michael signature.asc Description: PGP signature
Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)
On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby wrote: > < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR: MultiXactId 133553154 has > not been created yet -- apparent wraparound I guess what happened here is that after one of your (apparently several?) OOM crashes, crash recovery didn't run all the way to the true end of the WAL due to the maintenance_io_concurrency=0 bug. In the case you reported, it couldn't complete an end-of-recovery checkpoint until you disabled recovery_prefetch, but that's only because of the somewhat unusual way that vismap pages work. In another case it might have been able to (bogusly) complete a checkpoint, leaving things in an inconsistent state.
Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)
On Sat, Sep 10, 2022 at 5:01 PM Justin Pryzby wrote: > BTW, after a number of sigabrt's, I started seeing these during > recovery: > > < 2022-09-09 19:44:04.180 CDT >LOG: unexpected pageaddr 1214/AF0FE000 in > log segment 0001121400B4, offset 1040384 > < 2022-09-09 23:20:50.830 CDT >LOG: unexpected pageaddr 1214/CF65C000 in > log segment 0001121400D8, offset 6668288 That's just what it looks like when we discover the end of the WAL by hitting a page that hasn't been overwritten yet in a recycled WAL segment, so the pageaddr is off by a multiple of 16MB. Depending on timing and chance you might be more used to seeing the error where we hit zeroes in a partially filled page, the famous 'wanted 24, got 0', and you can also hit a fully zero-initialised page 'invalid magic number '. All of these are expected, and more exotic errors are possible with power loss torn writes or on crash of a streaming standbys where we currently fail to zero the rest of overwritten pages.
Re: Splitting up guc.c
Here's a v3 that gets rid of guc_hooks.c in favor of moving the hook functions to related modules (though some did end up in variables.c for lack of a better idea). I also pushed all the hook function declarations to guc_hooks.h. Unsurprisingly, removal of guc.h #includes from header files led to discovery of some surprising indirect dependencies, notably a lot of places were evidently depending on indirect inclusions of array.h. I think this is code-complete at this point. I'd like to not sit on it too long, because it'll inevitably get side-swiped by additions of new GUCs. On the other hand, pushing it in the middle of a CF would presumably break other people's patches. Maybe push it at the end of this CF, to give people a month to rebase anything that's affected? regards, tom lane split-up-guc-code-3.patch.gz Description: split-up-guc-code-3.patch.gz
Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~
On Sun, Sep 11, 2022 at 12:29 PM Michael Paquier wrote: > On Fri, Sep 09, 2022 at 08:11:09PM +0900, Michael Paquier wrote: > > Based on what I can see, the Windows animals seem to have digested > > 47bd0b3 (cygwin, MinGW and MSVC), so I think that we are good. Great, that's a lot of nice cleanup. > The last part that's worth adjusting is ldap_start_tls_sA(), which > would lead to the attached simplification. -if ((r = _ldap_start_tls_sA(*ldap, NULL, NULL, NULL, NULL)) != LDAP_SUCCESS) +if ((r = ldap_start_tls_sA(*ldap, NULL, NULL, NULL, NULL)) != LDAP_SUCCESS) When looking the function up it made sense to use the name ending in '...A', but when calling directly I think we shouldn't use the A suffix, we should let the macros do that for us[1]. (I wondered for a moment if that would even make Windows and Unix code the same, but sadly not due to the extra NULL arguments.) [1] https://docs.microsoft.com/en-us/windows/win32/api/winldap/nf-winldap-ldap_start_tls_sa
Re: New docs chapter on Transaction Management and related changes
On Wed, Sep 7, 2022 at 8:05 AM Simon Riggs wrote: > > On Tue, 6 Sept 2022 at 21:33, Justin Pryzby wrote: > > > > On Tue, Sep 06, 2022 at 04:16:02PM +0100, Simon Riggs wrote: > > > New chapter on transaction management, plus a few related changes. > > > > > > Markup and links are not polished yet, so please comment initially on > > > the topics, descriptions and wording. > > > I've also added further notes about prepared transactions. > > I attach a diff against the original patch, plus a new clean copy. > Some feedback on the v4 patch, hopefully useful. + +Transactions may be started explicitly using BEGIN and COMMIT commands, known as +a transaction block. A transaction will also be started and ended implicitly for +each request when outside of a transaction block. + Transactions may be managed explicitly using BEGIN and COMMIT commands, known as a transaction block. +Committed subtransactions are recorded as committed if the main transaction +commits. The word subtransaction is often abbreviated to "subxact". + Committed subtransactions are only recorded as committed if the main transaction commits, otherwise any work done in a subtransaction will be rolled back or aborted. The word subtransaction is often abbreviated as "subxact". + +Subtransactions may be started explicitly by using the SAVEPOINT command, but may +also be started in other ways, such as PL/pgSQL's EXCEPTION clause. PL/Python and +PL/TCL also support explicit subtransactions. Working with the C API, users may +also call BeginInternalSubTransaction(). + I think this paragraph (or something similar) should be the opening paragraph for this section, so that readers are immediately given context for what PostgreSQL considers to be a subtransaction. +prepared transactions that were prepared before the last checkpoint. In the typical +case, shorter-lived prepared transactions are stored only in shared memory and WAL. +Currently-prepared transactions can be inspected using the pg_prepared_xacts view. + Transactions that are currently prepared can be inspected using the pg_prepated_xacts view. * I thought the hyphenated wording looked odd, though I understand why you used it. We don't use it elsewhere though (just `currently prepared` san hyphen) so re-worded to match the other wording. Robert Treat https://xzilla.net
Re: Splitting up guc.c
I wrote: > Michael Paquier writes: >> One part that I have found a bit strange lately about guc.c is that we >> have mix the core machinery with the SQL-callable parts. What do you >> think about the addition of a gucfuncs.c in src/backend/utils/adt/ to >> split things a bit more? > I might be wrong, but I think the SQL-callable stuff makes use > of some APIs that are currently private in guc.c. So we'd have > to expose more API to make that possible. Maybe that wouldn't > be a bad thing, but it seems to be getting beyond the original > idea here. I tried this just to see, and it worked out better than I thought. The key extra idea is to also pull out the functions implementing the SET and SHOW commands, because (unsurprisingly) those are just about in the same place dependency-wise as the SQL functions, and they have some common subroutines. I had to export get_config_unit_name(), config_enum_get_options(), and _ShowOption() (here renamed to ShowGUCOption()) to make this work. That doesn't seem too awful. v2 attached does this, without any further relocation of hook functions as yet. I now see these file sizes: $ wc guc*c 2629 9372 69467 guc-file.c 6425 22282 176816 guc.c 1048 3005 26962 guc_funcs.c 939 2693 22915 guc_hooks.c 4877 13163 126769 guc_tables.c 15918 50515 422929 total $ size guc*o textdata bss dec hex filename 13653 4 112 1376935c9 guc-file.o 46589 0 564 47153b831 guc.o 8509 0 08509213d guc_funcs.o 6951 0 11270631b97 guc_hooks.o 43570 62998 216 106784 1a120 guc_tables.o So this removes just about a thousand more lines from guc.c, which seems worth doing. regards, tom lane split-up-guc-code-2.patch.gz Description: split-up-guc-code-2.patch.gz
Re: why can't a table be part of the same publication as its schema
> On Sep 10, 2022, at 4:17 PM, Robert Haas wrote: > >>> I don't understand why we >>> used this ALL TABLES IN SCHEMA language. >> >> The conversation, as I recall, was that "ADD SCHEMA foo" would only mean all >> tables in foo, until publication of other object types became supported, at >> which point "ADD SCHEMA foo" would suddenly mean more than it did before. >> People might find that surprising, so the "ALL TABLES IN" was intended to >> future-proof against surprising behavioral changes. > > If I encountered this syntax in a vacuum, that's not what I would > think. I would think that ADD ALL TABLES IN SCHEMA meant add all the > tables in the schema to the publication one by one as individual > objects Yes, it appears the syntax was chosen to avoid one kind of confusion, but created another kind. Per the docs on this feature: FOR ALL TABLES IN SCHEMA Marks the publication as one that replicates changes for all tables in the specified list of schemas, including tables created in the future. Like you, I wouldn't expect that definition, given the behavior of GRANT with respect to the same grammatical construction. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] initdb: do not exit after warn_on_mount_point
Julien Rouhaud writes: > On Sat, Sep 10, 2022 at 07:14:59PM +, andrey.ara...@nixaid.com wrote: >> Have slightly improved the logic so that it does not report an error >> "directory \"%s\" exists but is not empty" >> when it is only supposed to warn the user about the mountpoint, without >> exiting. > This definitely isn't a typo, as it's really strongly discouraged to use a > mount point for the data directory. Absolutely. I think maybe the problem here is that warn_on_mount_point() is a pretty bad name for that helper function, as this is not "just a warning". regards, tom lane
Check SubPlan clause for nonnullable rels/Vars
Hi hackers, While wandering around the codes of reducing outer joins, I noticed that when determining which base rels/Vars are forced nonnullable by given clause, we don't take SubPlan into consideration. Does anyone happen to know what is the concern behind that? IMO, for SubPlans of type ALL/ANY/ROWCOMPARE, we should be able to find additional nonnullable rels/Vars by descending through their testexpr. As we know, ALL_SUBLINK/ANY_SUBLINK combine results across tuples produced by the subplan using AND/OR semantics. ROWCOMPARE_SUBLINK doesn't allow multiple tuples from the subplan. So we can tell whether the subplan is strict or not by checking its testexpr, leveraging the existing codes in find_nonnullable_rels/vars_walker. Below is an example: # explain (costs off) select * from a left join b on a.i = b.i where b.i = ANY (select i from c where c.j = b.j); QUERY PLAN --- Hash Join Hash Cond: (b.i = a.i) -> Seq Scan on b Filter: (SubPlan 1) SubPlan 1 -> Seq Scan on c Filter: (j = b.j) -> Hash -> Seq Scan on a (9 rows) BTW, this change would also have impact on SpecialJoinInfo, especially for the case of identity 3, because find_nonnullable_rels() is also used to determine strict_relids from the clause. As an example, consider select * from a left join b on a.i = b.i left join c on b.j = ANY (select j from c); Now we can know the SubPlan is strict for 'b'. Thus the b/c join would be considered to be legal. Thanks Richard v1-0001-Check-SubPlan-clause-for-nonnullable-rels-Vars.patch Description: Binary data
Re: [PATCH] initdb: do not exit after warn_on_mount_point
Hi, On Sat, Sep 10, 2022 at 07:14:59PM +, andrey.ara...@nixaid.com wrote: > > Have slightly improved the logic so that it does not report an error > "directory \"%s\" exists but is not empty" > when it is only supposed to warn the user about the mountpoint, without > exiting. > > To me, my patch looks like a typo fix where exit(1) should not be called on > the warn_on_mount_point(), but only warn and continue as more people are > mounting the device at `/var/lib/postgresql/data` (PGDATA) in the > containerized world (K8s deployments, especially now in the Akash Network I > am working for) for making sure their data persist. This definitely isn't a typo, as it's really strongly discouraged to use a mount point for the data directory. You can refer to this thread [1] for more explanations. [1] https://www.postgresql.org/message-id/flat/CAKoxK%2B6H40imynM5P31bf0DnpN-5f5zeROjcaj6BKVAjxdEA9w%40mail.gmail.com
Re: [PATCH] Clarify the comments about varlena header encoding
Hi John, Many thanks for the feedback! > Or put the now-standard 0b in front. Good idea. > I think the problem is ambiguity about what a "toast pointer" is. This > comment: > > * struct varatt_external is a traditional "TOAST pointer", that is, the Right. The comment for varatt_external says that it IS a TOAST pointer. However the comments for varlena headers bit layout implicitly include it into a TOAST pointer, which contradicts the previous comments. I suggest we fix this ambiguity by explicitly enumerating the type tag in the comments for varlena headers. > The patch added xxx's for the type tag in a comment about > the header. This is more misleading than what is there now. OK, here is another attempt. Changes compared to v1: * "xxx xxx xxx" were removed, according to the feedback * 0b prefix was added in order to make sure the reader will not misread this as a hex values * The clarification about the type tag was added * The references to "first case", "second case", etc were removed Hopefully it's better now. -- Best regards, Aleksander Alekseev v2-0001-Clarify-the-comments-about-varlena-header-encodin.patch Description: Binary data
Re: [PATCH] initdb: do not exit after warn_on_mount_point
Hi Tom, I've updated the patch by adding the explanation behind and more comments. (please see the attachment) Have slightly improved the logic so that it does not report an error "directory \"%s\" exists but is not empty" when it is only supposed to warn the user about the mountpoint, without exiting. To me, my patch looks like a typo fix where exit(1) should not be called on the warn_on_mount_point(), but only warn and continue as more people are mounting the device at `/var/lib/postgresql/data` (PGDATA) in the containerized world (K8s deployments, especially now in the Akash Network I am working for) for making sure their data persist. As a workaround, we either have to `rmdir /var/lib/postgresql/data/lost+found` before running `docker-entrypoint.sh postgres` which in turn calls the `initdb`, or, alternatively we have to pass `PGDATA=/var/lib/postgresql/data/` while mounting persistent storage over `/var/lib/postgresql/data` path so that it won't exit on the very first run. To me, this in itself is an odd behavior, which led me to finding this typo (from my point of view) to which I've made this patch. Please let me know if it makes sense or requires more information / explanation. Kind regards, Andrey Arapov September 10, 2022 5:10 PM, "Tom Lane" wrote: > andrey.ara...@nixaid.com writes: > >> please find my first patch for PostgreSQL is attached. > > You haven't explained why you think this would be a good > change, or even a safe one. > > regards, tom lane 0001-initdb-do-not-exit-when-PGDATA-PGDATA-pg_wal-is-a-mo.patch Description: Binary data