Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set
On Sun, Feb 13, 2022 at 02:55:26PM -0800, Andres Freund wrote: > On 2022-02-14 11:23:18 +1300, Thomas Munro wrote: >> Ahh, right. I assume it still needs perl2host() treatment for MSYS2 >> systems, because jacana's log shows TESTDIR is set to a Unixoid path >> that I assume pg_regress's runtime can't use. That leads to the >> attached. > > Looks sane to me. This looks like a nice cleanup, indeed. Nice catch. -- Michael signature.asc Description: PGP signature
Re: pg_receivewal.exe unhandled exception in zlib1.dll
On Fri, Feb 11, 2022 at 07:50:44AM -0800, Andres Freund wrote: > On 2022-02-11 16:33:11 +0100, Juan José Santamaría Flecha wrote: >> The problem comes from the file descriptor passed to gzdopen() in >> 'src/bin/pg_basebackup/walmethods.c'. Using gzopen() instead, solves the >> issue without ifdefing for WIN32. Please find attached a patch for so. This looks wrong to me as gzclose() would close() the file descriptor we pass in via gzdopen(), and some code paths of walmethods.c rely on this assumption so this would leak fds on repeated errors. > I hit this as well. The problem is really caused by using a debug build of > postgres vs a production build of zlib. The use different C runtimes, with > different file descriptors. A lot of resources in the windows world are > unfortunately tied to the C runtime and that there can multiple C runtimes in > a single process. It may be worth warning about that at build time, or just document the matter perhaps? I don't recall seeing any MSIs related to zlib that have compile with the debug options. -- Michael signature.asc Description: PGP signature
Re: How did queensnake corrupt zic.o?
Thomas Munro writes: > We see a successful compile and then a failure to read the file while > linking. We see that the animal got into that state recently and then > fixed itself, and now it's back in that state. I don't know if it's > significant, but it happened to fix itself when a configure change > came along, which might be explained by ccache invalidation; that is, > the failure mode doesn't depend on the input files, but once it's > borked you need a change to kick ccache. My memory may be playing > tricks on me but I vaguely recall seeing another animal do this, a > while back. queensnake's seen repeated cycles of unexplainable build failures. I wonder if it's using a bogus ccache version, or if the machine itself is flaky. regards, tom lane
sockaddr_un.sun_len vs. reality
For a very long time, the AIX 7.1 buildfarm members (sungazer and tern) have complained about ip.c:236:17: warning: conversion from 'long unsigned int' to 'uchar_t' {aka 'unsigned char'} changes value from '1025' to '1' [-Woverflow] I'd ignored this as not being very interesting, but I got motivated to recheck it today. The warning is coming from #ifdef HAVE_STRUCT_SOCKADDR_STORAGE_SS_LEN unp->sun_len = sizeof(struct sockaddr_un); #endif so we can infer that the sun_len field is unsigned char and the size of struct sockaddr_un doesn't fit. Poking around a bit, I find that sun_len exists on most BSD-ish platforms and it seems to be unsigned char (almost?) everywhere. But on other platforms sizeof(struct sockaddr_un) is only a bit over 100, so there's not an overflow problem. It's not real clear that there's any problem on AIX either; given that the regression tests pass, I strongly suspect that nothing is paying attention to the sun_len field. But if we're going to fill it in at all, we should probably try to fill it in with something less misleading than "1". Googling finds that this seems to be a sore spot for various people, eg [1] mentions the existence of the SUN_LEN() macro on some platforms and then says why you shouldn't use it. I'm leaning to adding a compile-time clamp on the value, that is unp->sun_len = Min(sizeof(struct sockaddr_un), (1 << (sizeof(unp->sun_len) * 8)) - 1); (This might need a little bit of refinement if sun_len could be as wide as int, but in any case it should still reduce to a compile-time constant.) Or we could use something involving strlen(unp->sun_path), perhaps trying to use SUN_LEN() if it exists. But that implies expending extra run-time cycles for strlen(), and I've seen no indication that there's any value here except suppressing a compiler warning. Thoughts? regards, tom lane [1] http://mail-index.netbsd.org/tech-net/2006/10/11/0008.html
Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
At Mon, 14 Feb 2022 14:42:18 +0900 (JST), Kyotaro Horiguchi wrote in > I'll resend new version soon to avoid the confusion.. In this version , 0001 gets one fix and two comment updates. -* Archive recovery have ended. Crash recovery ever after should always +* Archive recovery has ended. Crash recovery ever after should always - /* the second term is just in case */ - if (PriorRedoPtr != InvalidXLogRecPtr || RedoRecPtr > PriorRedoPtr) + /* +* Update the average distance between checkpoints/restartpoints if the +* prior checkpoint exists. The second term is just in case. +*/ + if (PriorRedoPtr != InvalidXLogRecPtr && RedoRecPtr > PriorRedoPtr) UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr); regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 4f96baaece1cff35f89893d8e7aa1fdf57435f53 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 8 Feb 2022 16:42:53 +0900 Subject: [PATCH v10 1/5] Get rid of unused path to handle concurrent checkpoints CreateRestartPoint considered the case a concurrent checkpoint is running. But 7ff23c6d27 eliminates the possibility that multiple checkpoints run simultaneously. That code path, if it were passed, might leave unrecoverable database by removing WAL segments that are required by the last established restartpoint. In passing, the code in the function gets tightened-up and tidied-up in some points so that it gets easier to read. --- src/backend/access/transam/xlog.c | 67 --- src/backend/postmaster/checkpointer.c | 1 - 2 files changed, 41 insertions(+), 27 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 958220c495..8c2882b49f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9659,6 +9659,9 @@ CreateRestartPoint(int flags) XLogSegNo _logSegNo; TimestampTz xtime; + /* we don't assume concurrent checkpoint/restartpoint to run */ + Assert (!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER); + /* Get a local copy of the last safe checkpoint record. */ SpinLockAcquire(>info_lck); lastCheckPointRecPtr = XLogCtl->lastCheckPointRecPtr; @@ -9724,7 +9727,7 @@ CreateRestartPoint(int flags) /* Also update the info_lck-protected copy */ SpinLockAcquire(>info_lck); - XLogCtl->RedoRecPtr = lastCheckPoint.redo; + XLogCtl->RedoRecPtr = RedoRecPtr; SpinLockRelease(>info_lck); /* @@ -9743,7 +9746,12 @@ CreateRestartPoint(int flags) /* Update the process title */ update_checkpoint_display(flags, true, false); - CheckPointGuts(lastCheckPoint.redo, flags); + CheckPointGuts(RedoRecPtr, flags); + + /* + * Update pg_control, using current time. + */ + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); /* * Remember the prior checkpoint's redo ptr for @@ -9751,30 +9759,23 @@ CreateRestartPoint(int flags) */ PriorRedoPtr = ControlFile->checkPointCopy.redo; + ControlFile->checkPoint = lastCheckPointRecPtr; + ControlFile->checkPointCopy = lastCheckPoint; + /* - * Update pg_control, using current time. Check that it still shows - * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing; - * this is a quick hack to make sure nothing really bad happens if somehow - * we get here after the end-of-recovery checkpoint. + * Ensure minRecoveryPoint is past the checkpoint record while archive + * recovery is still ongoing. Normally, this will have happened already + * while writing out dirty buffers, but not necessarily - e.g. because no + * buffers were dirtied. We do this because a non-exclusive base backup + * uses minRecoveryPoint to determine which WAL files must be included in + * the backup, and the file (or files) containing the checkpoint record + * must be included, at a minimum. Note that for an ordinary restart of + * recovery there's no value in having the minimum recovery point any + * earlier than this anyway, because redo will begin just after the + * checkpoint record. */ - LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); - if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY && - ControlFile->checkPointCopy.redo < lastCheckPoint.redo) + if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY) { - ControlFile->checkPoint = lastCheckPointRecPtr; - ControlFile->checkPointCopy = lastCheckPoint; - - /* - * Ensure minRecoveryPoint is past the checkpoint record. Normally, - * this will have happened already while writing out dirty buffers, - * but not necessarily - e.g. because no buffers were dirtied. We do - * this because a non-exclusive base backup uses minRecoveryPoint to - * determine which WAL files must be included in the backup, and the - * file (or files) containing the checkpoint record must be included, - * at a minimum. Note that for an ordinary restart of recovery there's - * no value in having the minimum recovery point any
Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
At Mon, 14 Feb 2022 14:40:22 +0900 (JST), Kyotaro Horiguchi wrote in > It doesn't apply even on pg13 (due to LSN_FORMAT_ARGS). I will make > the per-version patches if you are fine with this. Oops! I forgot to rename the patch to avoid confusion on CF-bots. I'll resend new version soon to avoid the confusion.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
At Fri, 11 Feb 2022 01:00:03 +0900, Fujii Masao wrote in > > > On 2022/02/09 11:52, Kyotaro Horiguchi wrote: > > 0001: The fix of CreateRestartPoint > > This patch might be ok for the master branch. But since concurrent > checkpoint and restartpoint can happen in v14 or before, we would need > another patch based on that assumption, for the backport. How about > fixing the bug all the branches at first, then apply this patch in the > master to improve the code? For backbranches, the attached for pg14 does part of the full patch. Of the following, I think we should do (a) and (b) to make future backpatchings easier. a) Use RedoRecPtr and PriorRedoPtr after they are assigned. b) Move assignment to PriorRedoPtr into the ControlFileLock section. c) Skip udpate of minRecoveryPoint only when the checkpoint gets old. d) Skip call to UpdateCheckPointDistanceEstimate() when RedoRecPtr <= PriorRedoPtr. # Mmm. The v9-0001 contains a silly mistake here.. Still I'm not sure whether that case really happens and how checkpoint behaves *after* that happenes, but at least it protects database from the possible unrecoverable state due to the known issue here.. It doesn't apply even on pg13 (due to LSN_FORMAT_ARGS). I will make the per-version patches if you are fine with this. regards. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From df516b487617c570332ea076b91773b810629a11 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Mon, 14 Feb 2022 13:04:33 +0900 Subject: [PATCH v2] Correctly update contfol file at the end of archive recovery CreateRestartPoint runs WAL file cleanup basing on the checkpoint just have finished in the function. If the database has exited DB_IN_ARCHIVE_RECOVERY state when the function is going to update control file, the function refrains from updating the file at all then proceeds to WAL cleanup having the latest REDO LSN, which is now inconsistent with the control file. As the result, the succeeding cleanup procedure overly removes WAL files against the control file and leaves unrecoverable database until the next checkpoint finishes. Since all buffers have flushed out, we may safely regard that restart point established regardless of the recovery state. Thus we may and should update checkpoint LSNs of checkpoint file in that case. Still we update minRecoveryPoint only during archive recovery but explicitly clear if we have exited recovery. Addition to that fix, this commit makes some cosmetic changes that consist with the changes we are going to make on the master branch. --- src/backend/access/transam/xlog.c | 81 --- 1 file changed, 53 insertions(+), 28 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 6208e123e5..28c3c4b7cf 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9653,7 +9653,7 @@ CreateRestartPoint(int flags) /* Also update the info_lck-protected copy */ SpinLockAcquire(>info_lck); - XLogCtl->RedoRecPtr = lastCheckPoint.redo; + XLogCtl->RedoRecPtr = RedoRecPtr; SpinLockRelease(>info_lck); /* @@ -9672,7 +9672,10 @@ CreateRestartPoint(int flags) /* Update the process title */ update_checkpoint_display(flags, true, false); - CheckPointGuts(lastCheckPoint.redo, flags); + CheckPointGuts(RedoRecPtr, flags); + + /* Update pg_control */ + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); /* * Remember the prior checkpoint's redo ptr for @@ -9681,52 +9684,74 @@ CreateRestartPoint(int flags) PriorRedoPtr = ControlFile->checkPointCopy.redo; /* - * Update pg_control, using current time. Check that it still shows - * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing; - * this is a quick hack to make sure nothing really bad happens if somehow - * we get here after the end-of-recovery checkpoint. + * Update pg_control, using current time if no later checkpoints have been + * performed. */ - LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); - if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY && - ControlFile->checkPointCopy.redo < lastCheckPoint.redo) + if (PriorRedoPtr < RedoRecPtr) { ControlFile->checkPoint = lastCheckPointRecPtr; ControlFile->checkPointCopy = lastCheckPoint; ControlFile->time = (pg_time_t) time(NULL); /* - * Ensure minRecoveryPoint is past the checkpoint record. Normally, - * this will have happened already while writing out dirty buffers, - * but not necessarily - e.g. because no buffers were dirtied. We do - * this because a non-exclusive base backup uses minRecoveryPoint to - * determine which WAL files must be included in the backup, and the - * file (or files) containing the checkpoint record must be included, - * at a minimum. Note that for an ordinary restart of recovery there's - * no value in having the minimum recovery point any earlier than this + * Ensure minRecoveryPoint is past the checkpoint record while
How did queensnake corrupt zic.o?
Hi, We see a successful compile and then a failure to read the file while linking. We see that the animal got into that state recently and then fixed itself, and now it's back in that state. I don't know if it's significant, but it happened to fix itself when a configure change came along, which might be explained by ccache invalidation; that is, the failure mode doesn't depend on the input files, but once it's borked you need a change to kick ccache. My memory may be playing tricks on me but I vaguely recall seeing another animal do this, a while back.
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Sun, Feb 13, 2022 at 12:04 PM Dilip Kumar wrote: > > On Sun, Feb 13, 2022 at 10:12 AM Dilip Kumar wrote: > Next, I am planning to do some more tests, where we are having pgbench > running and concurrently we do CREATEDB maybe every 1 minute and see > what is the CREATEDB time as well as what is the impact on pgbench > performance. Because currently I have only measured CREATEDB time but > we must be knowing the impact of createdb on the other system as well. I have done tests with the pgbench as well. So basically I did not notice any significant difference in the TPS, I was expecting there should be some difference due to the checkpoint on the head so maybe I need to test with more backend maybe. And createdb time there is a huge difference. I think this is because template1 db is very small so patch is getting completed in no time whereas head is taking huge time because of high dirty shared buffers (due to concurrent pgbench). config: echo "logging_collector=on" >> data/postgresql.conf echo "port = 5432" >> data/postgresql.conf echo "max_wal_size=64GB" >> data/postgresql.conf echo "checkpoint_timeout=15min" >> data/postgresql.conf echo "shared_buffers=32GB" >> data/postgresql.conf Test: ./pgbench -i -s 1000 postgres ./pgbench -c 32 -j 32 -T 1200 -M prepared postgres >> result.txt -- Concurrently run below script every 1 mins CREATE DATABASE mydb log_copied_blocks=true/false; Results: - Pgbench TPS: Did not observe any difference head vs patch - Create db time(very small template): head: 21000 ms to 42000 ms (at different time) patch: 80 ms -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: pgsql: Add TAP test to automate the equivalent of check_guc
On Sat, Feb 12, 2022 at 12:10:46PM -0800, Andres Freund wrote: > Tap tests currently are always executed in the source directory above t/, as > far as I can tell. So I don't think VPATH/non-VPATH or windows / non-windows > would break using a relative reference from the test dir. That would mean to rely on $ENV{PWD} for this job, as of something like that: -# Find the location of postgresql.conf.sample, based on the information -# provided by pg_config. -my $sample_file = - $node->config_data('--sharedir') . '/postgresql.conf.sample'; +my $rootdir = $ENV{PWD} . "/../../../.."; +my $sample_file = "$rootdir/src/backend/utils/misc/postgresql.conf.sample"; A run through the CI shows that this is working, and it also works with Debian's patch for the config data. I still tend to prefer the readability of a TAP test over the main regression test suite for this facility. Even if we could use the same trick with PG_ABS_SRCDIR, I'd rather not add this kind of dependency with a file in the source tree in src/test/regress/. Do others have any opinions on the matter? -- Michael signature.asc Description: PGP signature
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Sun, Feb 13, 2022 at 9:56 PM Robert Haas wrote: > > On Sun, Feb 13, 2022 at 1:34 AM Dilip Kumar wrot> > > test4: > > 32 GB shared buffers, template DB size = 10GB, dirty shared buffer=70% > > Head: 47656 ms > > Patch: 79767 ms > > This seems like the most surprising result of the bunch. Here, the > template DB is both small enough to fit in shared_buffers and small > enough not to trigger a checkpoint all by itself, and yet the patch > loses. Well this is not really surprising to me because what I have noticed is that with the new approach the createdb time is completely dependent upon the template db size. So if the source db size is 10GB it is taking around 80sec and the shared buffers size does not have a major impact. Maybe a very small shared buffer can have more impact so I will test that as well. > > Did you checkpoint between one test and the next, or might this test > have been done after a bunch of WAL had already been written since the > last checkpoint so that the 10GB pushed it over the edge? Not really, I am running each test with a new initdb so that could not be an issue. > BTW, you have test4 twice in your list of results. My bad, those are different tests. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Add client connection check during the execution of the query
On Fri, Jan 14, 2022 at 7:30 PM Thomas Munro wrote: > On Fri, Jan 14, 2022 at 4:35 PM Andres Freund wrote: > > The more I think about it, the less I see why we *ever* need to re-arm the > > latch in pq_check_connection() in this approach. pq_check_connection() is > > only > > used from from ProcessInterrupts(), and there's plenty things inside > > ProcessInterrupts() that can cause latches to be reset (e.g. parallel > > message > > processing causing log messages to be sent to the client, causing network > > IO, > > which obviously can do a latch reset). > > Thanks for the detailed explanation. I guess I was being overly > cautious and a little myopic, "leave things exactly the way you found > them", so I didn't have to think about any of that. I see now that > the scenario I was worrying about would be a bug in whatever > latch-wait loop happens to reach this code. Alright then, here is > just... one... more... patch, this time consuming any latch that gets > in the way and retrying, with no restore. And pushed. My excuse for taking so long to get this into the tree is that it was tedious to retest this thing across so many OSes and determine that it really does behave reliably for killed processes AND lost processes/yanked cables/keepalive timeout, even with buffered data. In the process I learned a bit more about TCP and got POLLRDHUP added to FreeBSD (not that it matters for PostgreSQL 15, now that we can use EV_EOF). As for the FD_CLOSE behaviour I thought I saw on Windows upthread: it was a mirage, caused by the RST thing. There may be some other way to implement this feature on that TCP implementation, but I don't know what it is.
Re: Consistently use "startup process"/"WAL sender" instead of "Startup process"/"WAL Sender" in comments and docs.
On Sun, Feb 13, 2022 at 5:49 PM John Naylor wrote: > > On Sun, Feb 13, 2022 at 3:13 PM Bharath Rupireddy > wrote: > > > > Hi, > > > > In the code comments, it is being used as "Startup process" in the > > middle of the sentences whereas in most of the places "startup > > process" is used. Also, "WAL Sender" is being used instead of "WAL > > sender". Let's be consistent across the docs and code comments. > > FWIW, docs need to hold to a higher standard than code comments. Thanks. In general, I agree that the docs and error/log messages (user-facing) need to be of higher standard, but at the same time code comments too IMHO. Because many hackers/developers consider code comments a great place to learn the internals, being consistent there does no harm. Regards, Bharath Rupireddy.
Re: buildfarm warnings
On Mon, Feb 14, 2022 at 6:10 AM Tom Lane wrote: > Another thing I've been ignoring on the grounds that I-don't-feel- > like-fixing-this is that the Solaris and OpenIndiana machines whine > about every single reference from loadable modules to the core code, > eg this from haddock while building contrib/fuzzystrmatch: > > Undefined first referenced > symbol in file > cstring_to_text dmetaphone.o > varstr_levenshtein fuzzystrmatch.o > text_to_cstring dmetaphone.o > errstart_cold fuzzystrmatch.o > errmsg fuzzystrmatch.o > palloc dmetaphone.o > ExceptionalConditionfuzzystrmatch.o > errfinish fuzzystrmatch.o > varstr_levenshtein_less_equal fuzzystrmatch.o > repallocdmetaphone.o > errcode fuzzystrmatch.o > errmsg_internal fuzzystrmatch.o > pg_detoast_datum_packed dmetaphone.o > ld: warning: symbol referencing errors > > Presumably, this means that the Solaris linker would really like > to see the executable the module will load against, analogously to > the BE_DLLLIBS settings we use on some other platforms such as macOS. > It evidently still works without that, but we might be losing > performance from an extra indirection or the like. And in any case, > this amount of noise would be very distracting if you wanted to do > actual development on that platform. I'm not especially interested > in tracking down the correct incantation, but I'd gladly push a patch > if someone submits one. I took a quick look at this on an OpenIndiana vagrant image, using the GNU compiler and [Open]Solaris's linker. It seems you can't use -l for this (it only likes files called libX.Y) but the manual[1] says that "shared objects that reference symbols from an application can use the -z defs option, together with defining the symbols by using an extern mapfile directive", so someone might need to figure out how to build and feed in such a file... [1] https://docs.oracle.com/cd/E26502_01/html/E26507/chapter2-90421.html
Re: Possible uninitialized use of the variables (src/backend/access/transam/twophase.c)
On Thu, Feb 10, 2022 at 11:38:44AM +0900, Michael Paquier wrote: > The proposed change is incomplete anyway once you consider this > argument. First, there is no need to set up those fields in > EndPrepare() anymore if there is no origin data, no? It would be good > to comment that these are filled in EndPrepare(), I guess, once you do > the initialization in StartPrepare(). That still looked better on a fresh look in terms of consistency, so applied this way. -- Michael signature.asc Description: PGP signature
Re: pgsql: Add TAP test to automate the equivalent of check_guc
On Sat, Feb 12, 2022 at 05:31:55PM +0100, Christoph Berg wrote: > To support multiple major versions in parallel, we need > /usr/lib/postgresql/NN/lib and /usr/share/postgresql/NN, so it's not a > single $PREFIX. But you are correct, and ./configure supports that. Yep, I don't quite see the problem, but I don't have the years of experience in terms of packaging you have, either. (NB: I use various --libdir and --bindir combinations that for some of packaging of PG I maintain for pg_upgrade, but that's not as complex as Debian, I guess). > I was confusing that with this: The problem that led to the pg_config > patch years ago was that we have a /usr/bin/pg_config in > (non-major-version-dependant) libpq-dev, and > /usr/lib/postgresql/NN/bin/pg_config in the individual > postgresql-server-dev-NN packages, and iirc the /usr/bin version > didn't particularly like the other binaries being in > /usr/lib/postgresql/NN/bin/. > > I guess it's time to revisit that problem now and see if it can be > solved more pretty today on our side. If you can solve that, my take is that it could make things nicer in the long-run for some of the TAP test facilities. Still, I'll try to look at a solution for this thread that does not interact badly with what you are doing, and I am going to grab your patch when testing things. -- Michael signature.asc Description: PGP signature
Re: pgsql: Add suport for server-side LZ4 base backup compression.
On Sun, Feb 13, 2022 at 11:13:51AM -0500, Robert Haas wrote: > Well, that's because I didn't realize that LZ4 might be set to > something that doesn't work at all. So Michael's thing worked, but > because it (in my view) fixed the problem in a more surprising place > than I would have preferred, I made a commit later that turned out to > break the buildfarm. So you can blame either one of us that you like. > > Thanks, Michael, for preparing a patch. Patch that was slightly wrong, as the tests of pg_verifybackup still failed once I moved away the lz4 command in my environment because LZ4 gets set to an empty value. I have checked this case locally, and applied the patch to add the ./configure check, so copperhead should get back to green. A last thing, that has been mentioned by Andres upthread, is that we should be able to remove the extra commands run with --version in the tests of pg_basebackup, as of the attached. I have not done that yet, as it seems better to wait for copperhead first, and the tests of pg_basebackup run before pg_verifybackup so I don't want to mask one error for another in case the buildfarm says something. -- Michael diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index a9b2e090b1..7c3074329d 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -768,8 +768,7 @@ SKIP: my $gzip = $ENV{GZIP_PROGRAM}; skip "program gzip is not found in your system", 1 if ( !defined $gzip - || $gzip eq '' - || system_log($gzip, '--version') != 0); + || $gzip eq ''); my $gzip_is_valid = system_log($gzip, '--test', @zlib_files, @zlib_files2, @zlib_files3); diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl index 0e6e685aa6..1717e22a9c 100644 --- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl +++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl @@ -130,8 +130,7 @@ SKIP: my $gzip = $ENV{GZIP_PROGRAM}; skip "program gzip is not found in your system", 1 if ( !defined $gzip - || $gzip eq '' - || system_log($gzip, '--version') != 0); + || $gzip eq ''); my $gzip_is_valid = system_log($gzip, '--test', @zlib_wals); is($gzip_is_valid, 0, @@ -186,8 +185,7 @@ SKIP: my $lz4 = $ENV{LZ4}; skip "program lz4 is not found in your system", 1 if ( !defined $lz4 - || $lz4 eq '' - || system_log($lz4, '--version') != 0); + || $lz4 eq ''); my $lz4_is_valid = system_log($lz4, '-t', @lz4_wals); is($lz4_is_valid, 0, signature.asc Description: PGP signature
Re: [PATCH] nodeindexscan with reorder memory leak
On Thu, Feb 10, 2022 at 2:35 AM Tom Lane wrote: > Alexander Korotkov writes: > > I've rechecked that the now patched code branch is covered by > > regression tests. I think the memory leak issue is independent of the > > computational errors we've observed. > > So, I'm going to push and backpatch this if no objections. > > +1. We should work on the roundoff-error issue as well, but > as you say, it's an independent problem. Pushed, thank you. -- Regards, Alexander Korotkov
Re: pgsql: Add suport for server-side LZ4 base backup compression.
On Sat, Feb 12, 2022 at 11:12:50PM -0500, Tom Lane wrote: > Michael Paquier writes: >> Well, this somebody is I, and the buildfarm did not blow up on any of >> that so that looked rather fine. > > Eh? copperhead for one is failing for exactly this reason: > > Bailout called. Further testing stopped: failed to execute command > "lz4 -d -m > /home/pgbf/buildroot/HEAD/pgsql.build/src/bin/pg_verifybackup/tmp_check/t_008_untar_primary_data/backup/server-backup/base.tar.lz4": > No such file or directory Well, I have put in place a check in the TAP tests, that Robert has managed to break. It looks like it was wrong of me to assume that it would be fine as-is. Sorry about that. >> Adding a few cycles for this check >> is fine by me. What do you think of the attached? > > Looks OK as far as it goes ... but do we need anything on the > MSVC side? It is possible to tweak the environment variable so one can unset it. If we make things consistent with the configure check, we could tweak the default to not be "lz4" if we don't have --with-lz4 in the MSVC configuration? I don't recall seeing in the wild an installer for LZ4 where we would have the command but not the libraries, so perhaps that's not worth the trouble, anyway, and we don't have any code paths in the tests where we use LZ4 without --with-lz4. > Also, I can't help wondering why we have both GZIP_PROGRAM and GZIP > variables. I suppose that's a separate issue though. From gzip's man page: "The obsolescent environment variable GZIP can hold a set of default options for gzip." This means that saving the command into a variable with the same name interacts with the command launch. This was discussed here: https://www.postgresql.org/message-id/nr63G0R6veCrLY30QMwxYA2aCawclWJopZiKEDN8HtKwIoIqZ79fVmSE2GxTyPD1Mk9FzRdfLCrc-BSGO6vTlDT_E2-aqtWeEiNn-qsDDzk=@pm.me -- Michael signature.asc Description: PGP signature
Re: bailing out in tap tests nearly always a bad idea
Hi, On 2022-02-13 18:32:59 -0500, Tom Lane wrote: > Andres Freund writes: > > Best with a > > central function signalling fatal error, rather than individual uses of die > > or such. > > Huh, doesn't Test::More already provide a sane way to do this? I looked, and didn't see anything. But I'm not a perl person, so I might just have missed something. > If not, why isn't die() good enough? (I don't think you can > realistically expect to prohibit die() anywhere in the TAP tests.) The output of dying isn't great either: t/000_fail.pl Dubious, test returned 25 (wstat 6400, 0x1900) No subtests run it'd be nicer if that that showed the actual reason for failing, rather than the unhelpful "Dubious, test returned" stuff. But it's still better than BAIL_OUT. So I thought that putting the failure handling in a central routine would allow us to make the exit nicer in a central place / at a later stage... Greetings, Andres Freund
Re: xml_is_well_formed (was Re: buildfarm warnings)
On 2/13/22 13:59, Tom Lane wrote: > So we're faced with a dilemma: we can't rename either instance without > breaking something. The only way to get rid of the warnings seems to be > to decide that the copy in xml2 has outlived its usefulness and remove > it. I don't think that's a hard argument to make: that version has been > obsolete since 9.1 (a0b7b717a), meaning that only pre-extensions versions > of xml2 could need it. In fact, we pulled the trigger on it once before, > in 2016 (20540710e), and then changed our minds not because anyone > lobbied to put it back but just because we gave up on the PGDLLEXPORT > rearrangement that prompted the change then. > > I think that getting rid of these build warnings is sufficient reason > to drop this ancient compatibility function, and now propose that > we do that. > > +1 cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: bailing out in tap tests nearly always a bad idea
Andres Freund writes: > Does anybody want to defend / explain the use of BAIL_OUT? If not, I think we > should consider doing a global replace of the use of bailing. +1 > Best with a > central function signalling fatal error, rather than individual uses of die > or such. Huh, doesn't Test::More already provide a sane way to do this? If not, why isn't die() good enough? (I don't think you can realistically expect to prohibit die() anywhere in the TAP tests.) regards, tom lane
Re: buildfarm warnings
On 2/12/22 16:42, Tom Lane wrote: >> I'm of the impression that some people have sql access to BF logs. > Yeah. I'm not sure exactly what the access policy is for that machine; > I know we'll give out logins to committers, but not whether there's > any precedent for non-committers. Yeah, I don't think it's wider than that. The sysadmins own these instances, so the policy is really up to them. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: [PATCH] Avoid open and lock the table Extendend Statistics (src/backend/commands/statscmds.c)
Tomas Vondra writes: > On 2/13/22 22:43, Andres Freund wrote: >> Having looked briefly at it, I don't understand what the locking scheme is? > I don't recall the details of the discussion (if at all), but if you try > to do concurrent ALTER STATISTICS, that'll end up with: > ERROR: tuple concurrently updated > reported by CatalogTupleUpdate. AFAIK that's what we do for other object > types that don't have a relation that we might lock (e.g. try to co > CREATE OR REPLACE FUNCTION). Yeah, we generally haven't bothered with a separate locking scheme for objects other than relations. I don't see any big problem with that for objects that are defined by a single catalog entry, since "tuple concurrently updated" failures serve fine (although maybe that error message could be nicer). Extended stats don't quite fit that definition, but at least for updates that only need to touch the pg_statistic_ext row, it's the same story. When we get around to supporting multi-relation statistics, there might need to be some protocol around when ANALYZE can update pg_statistic_ext_data rows. But I don't think that issue exists yet, since only one ANALYZE can run on a particular relation at a time. regards, tom lane
bailing out in tap tests nearly always a bad idea
Hi, A recent cfbot failure reminded me of a topic I wanted to raise: The use of BAIL in tap tests. When a tap test exits with BAIL it doesn't just consider that test failed, but it also stops running further tap tests in the same group: https://metacpan.org/pod/Test::More "Indicates to the harness that things are going so badly all testing should terminate. This includes the running of any additional test scripts." There may be a few cases where stopping testing makes sense, but nearly all of the uses of bail in our tests look like they just want to fail the the current test, rather than other tests in the same prove invocation. Besides aborting other tests, it's also just hard to parse the output: https://cirrus-ci.com/task/6243385840238592?logs=test_world#L2329 ... [16:28:12.178] [16:27:23] t/024_archive_recovery.pl ok 3440 ms ( 0.00 usr 0.00 sys + 1.43 cusr 0.64 csys = 2.07 CPU) [16:28:12.178] [16:27:26] t/025_stuck_on_old_timeline.pl ... ok 3024 ms ( 0.00 usr 0.00 sys + 1.21 cusr 0.61 csys = 1.82 CPU) [16:28:12.178] [16:27:30] t/026_overwrite_contrecord.pl ok 3693 ms ( 0.01 usr 0.00 sys + 1.05 cusr 0.42 csys = 1.48 CPU) [16:28:12.178] Bailout called. Further testing stopped: command "/tmp/cirrus-ci-build/src/test/recovery/../../../src/test/regress/pg_regress --dlpath="/tmp/cirrus-ci-build/src/test/regress" --bindir= --host=/tmp/gFHAMHHTJ_ --port=52708 --schedule=../regress/parallel_schedule --max-concurrent-tests=20 --inputdir=../regress --outputdir="/tmp/cirrus-ci-build/src/test/recovery/tmp_check"" exited with value 1 [16:28:12.178] FAILED--Further testing stopped: command "/tmp/cirrus-ci-build/src/test/recovery/../../../src/test/regress/pg_regress --dlpath="/tmp/cirrus-ci-build/src/test/regress" --bindir= --host=/tmp/gFHAMHHTJ_ --port=52708 --schedule=../regress/parallel_schedule --max-concurrent-tests=20 --inputdir=../regress --outputdir="/tmp/cirrus-ci-build/src/test/recovery/tmp_check"" exited with value 1 [16:28:12.178] make[2]: *** [Makefile:27: check] Error 255 [16:28:12.178] make[1]: *** [Makefile:48: check-recovery-recurse] Error 2 What failed was 027_stream_regress.pl, due to a trivial regression test output change: diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/guc.out /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/guc.out --- /tmp/cirrus-ci-build/src/test/regress/expected/guc.out 2022-02-13 16:20:53.931949802 + +++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/guc.out 2022-02-13 16:28:00.481307866 + @@ -850,7 +850,8 @@ name --- default_statistics_target -(1 row) + enable_indexskipscan +(2 rows) -- Runtime-computed GUCs should be part of the preset category. SELECT name FROM tab_settings_flags But there's pretty much no way to know from the error output that this failure was in 027_stream_regress.pl (just an example, this is widespread). For some reason prove's output is much worse when the output is redirected to a file. When testing with prove going to a terminal, bail at least prints the test name. interactive: t/000_fail.pl Bailout called. Further testing stopped: I am bailing FAILED--Further testing stopped: I am bailing file: Bailout called. Further testing stopped: I am bailing FAILED--Further testing stopped: I am bailing the latter is just about undebuggable, particularly when using PROVE_FLAGS='-j...'. Even just a 'die' is better than bail, output wise (and better due to not preventing other tests from running): interactive: t/000_fail.pl Dubious, test returned 25 (wstat 6400, 0x1900) No subtests run t/001_stream_rep.pl .. ok file: t/000_fail.pl Dubious, test returned 25 (wstat 6400, 0x1900) No subtests run t/001_stream_rep.pl .. ok Does anybody want to defend / explain the use of BAIL_OUT? If not, I think we should consider doing a global replace of the use of bailing. Best with a central function signalling fatal error, rather than individual uses of die or such. Greetings, Andres Freund
Re: [PATCH] Avoid open and lock the table Extendend Statistics (src/backend/commands/statscmds.c)
On 2/13/22 22:43, Andres Freund wrote: > Hi, > > On 2022-02-13 18:21:38 -0300, Ranier Vilela wrote: >> Why open and lock the table Extended Statistics if it is not the owner. >> Check and return to avoid this. > > I was about to say that this opens up time-to-check-time-to-use type > issues. But it's the wrong thing to lock to prevent those. > I doubt optimizing this is worth any effort - ALTER STATISTICS is rare, not doing it with owner rights is even rarer. > Having looked briefly at it, I don't understand what the locking scheme is? > Shouldn't there be at least some locking against concurrent ALTERs and between > ALTER and ANALYZE etc? > I don't recall the details of the discussion (if at all), but if you try to do concurrent ALTER STATISTICS, that'll end up with: ERROR: tuple concurrently updated reported by CatalogTupleUpdate. AFAIK that's what we do for other object types that don't have a relation that we might lock (e.g. try to co CREATE OR REPLACE FUNCTION). ANALYZE reads the statistics from the catalog, so it should see the last committed stattarget value, I think. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Mark all GUC variable as PGDLLIMPORT
On 02/13/22 15:48, Tom Lane wrote: > much fix any worries there. Or we could provide APIs that let an > extension look up a "struct config_generic *" once and then fetch > directly using that pointer. (But I'd prefer not to, since it'd > constrain the internals more than I think is wise.) There is GetConfigOptionByNum already ... but I'm not sure there's an easy way to ask "what's the num for option foo?". GetNumConfigOptions exists, so there is a brute-force way. Regards, -Chap
Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set
On 2022-02-14 11:23:18 +1300, Thomas Munro wrote: > On Sun, Feb 13, 2022 at 6:29 PM Andres Freund wrote: > > Afaics all the "regress test inside tap test" cases would need to do is to > > pass > > --outputdir=${PostgreSQL::Test::Utils::tmp_check} and you'd get exactly the > > same path as > > REGRESS_OUTPUTDIR currently provides. > > Ahh, right. I assume it still needs perl2host() treatment for MSYS2 > systems, because jacana's log shows TESTDIR is set to a Unixoid path > that I assume pg_regress's runtime can't use. That leads to the > attached. Looks sane to me.
Re: [PATCH] Avoid open and lock the table Extendend Statistics (src/backend/commands/statscmds.c)
Em dom., 13 de fev. de 2022 às 18:43, Andres Freund escreveu: > Hi, > > On 2022-02-13 18:21:38 -0300, Ranier Vilela wrote: > > Why open and lock the table Extended Statistics if it is not the owner. > > Check and return to avoid this. > > I was about to say that this opens up time-to-check-time-to-use type > issues. But it's the wrong thing to lock to prevent those. > > Having looked briefly at it, I don't understand what the locking scheme is? > Shouldn't there be at least some locking against concurrent ALTERs and > between > ALTER and ANALYZE etc? > I checked the pg_statistics_object_ownercheck function and I think the patch is wrong. It seems to me that when using SearchSysCache1, it is necessary to open and lock the table. Better remove the patch, sorry for the noise. regards, Ranier Vilela
Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set
On Sun, Feb 13, 2022 at 6:29 PM Andres Freund wrote: > Afaics all the "regress test inside tap test" cases would need to do is to > pass > --outputdir=${PostgreSQL::Test::Utils::tmp_check} and you'd get exactly the > same path as > REGRESS_OUTPUTDIR currently provides. Ahh, right. I assume it still needs perl2host() treatment for MSYS2 systems, because jacana's log shows TESTDIR is set to a Unixoid path that I assume pg_regress's runtime can't use. That leads to the attached. 0001-Remove-REGRESS_OUTPUTDIR-environment-variable.not-for-cfbot-patch Description: Binary data
Re: Fix overflow in DecodeInterval
Attached is a new version switching from ints to bools, as requested. - Joe Koshakow From af8f030ad146602b4386f77b5664c6013743569b Mon Sep 17 00:00:00 2001 From: Joseph Koshakow Date: Fri, 11 Feb 2022 14:18:32 -0500 Subject: [PATCH] Check for overflow when decoding an interval When decoding an interval there are various date units which are aliases for some multiple of another unit. For example a week is 7 days and a decade is 10 years. When decoding these specific units, there is no check for overflow, allowing the interval to overflow. This commit adds an overflow check for all of these units. Additionally fractional date/time units are rolled down into the next smallest unit. For example 0.1 months is 3 days. When adding these fraction units, there is no check for overflow, allowing the interval to overflow. This commit adds an overflow check for all of the fractional units. Additionally adding the word "ago" to the interval negates every field. However the negative of INT_MIN is still INT_MIN. This commit adds a check to make sure that we don't try and negate a field if it's INT_MIN. Signed-off-by: Joseph Koshakow --- src/backend/utils/adt/datetime.c | 100 +++-- src/test/regress/expected/interval.out | 68 + src/test/regress/sql/interval.sql | 18 + 3 files changed, 164 insertions(+), 22 deletions(-) diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c index 7926258c06..15a1ebbe67 100644 --- a/src/backend/utils/adt/datetime.c +++ b/src/backend/utils/adt/datetime.c @@ -21,6 +21,7 @@ #include "access/htup_details.h" #include "access/xact.h" #include "catalog/pg_type.h" +#include "common/int.h" #include "common/string.h" #include "funcapi.h" #include "miscadmin.h" @@ -44,10 +45,13 @@ static int DecodeDate(char *str, int fmask, int *tmask, bool *is2digits, struct pg_tm *tm); static char *AppendSeconds(char *cp, int sec, fsec_t fsec, int precision, bool fillzeros); -static void AdjustFractSeconds(double frac, struct pg_tm *tm, fsec_t *fsec, +static bool AdjustFractSeconds(double frac, struct pg_tm *tm, fsec_t *fsec, int scale); -static void AdjustFractDays(double frac, struct pg_tm *tm, fsec_t *fsec, +static bool AdjustFractDays(double frac, struct pg_tm *tm, fsec_t *fsec, int scale); +static bool AdjustFractMonths(double frac, struct pg_tm *tm, int scale); +static bool AdjustDays(int val, struct pg_tm *tm, int multiplier); +static bool AdjustYears(int val, struct pg_tm *tm, int multiplier); static int DetermineTimeZoneOffsetInternal(struct pg_tm *tm, pg_tz *tzp, pg_time_t *tp); static bool DetermineTimeZoneAbbrevOffsetInternal(pg_time_t t, @@ -499,34 +503,68 @@ AppendTimestampSeconds(char *cp, struct pg_tm *tm, fsec_t fsec) /* * Multiply frac by scale (to produce seconds) and add to *tm & *fsec. * We assume the input frac is less than 1 so overflow is not an issue. + * Returns true if successful, false if tm overflows. */ -static void +static bool AdjustFractSeconds(double frac, struct pg_tm *tm, fsec_t *fsec, int scale) { int sec; if (frac == 0) - return; + return true; frac *= scale; sec = (int) frac; - tm->tm_sec += sec; + if (pg_add_s32_overflow(tm->tm_sec, sec, >tm_sec)) + return false; frac -= sec; *fsec += rint(frac * 100); + return true; } /* As above, but initial scale produces days */ -static void +static bool AdjustFractDays(double frac, struct pg_tm *tm, fsec_t *fsec, int scale) { int extra_days; if (frac == 0) - return; + return true; frac *= scale; extra_days = (int) frac; - tm->tm_mday += extra_days; + if (pg_add_s32_overflow(tm->tm_mday, extra_days, >tm_mday)) + return false; frac -= extra_days; - AdjustFractSeconds(frac, tm, fsec, SECS_PER_DAY); + return AdjustFractSeconds(frac, tm, fsec, SECS_PER_DAY); +} + +/* As above, but initial scale produces months */ +static bool +AdjustFractMonths(double frac, struct pg_tm *tm, int scale) +{ + int months = rint(frac * MONTHS_PER_YEAR * scale); + + return !pg_add_s32_overflow(tm->tm_mon, months, >tm_mon); +} + +/* + * Multiply val by multiplier (to produce days) and add to *tm. + * Returns true if successful, false if tm overflows. + */ +static bool +AdjustDays(int val, struct pg_tm *tm, int multiplier) +{ + int extra_days; + return !pg_mul_s32_overflow(val, multiplier, _days) && + !pg_add_s32_overflow(tm->tm_mday, extra_days, >tm_mday); +} + +/* As above, but initial val produces years */ +static bool +AdjustYears(int val, struct pg_tm *tm, int multiplier) +{ + int years; + return !pg_mul_s32_overflow(val, multiplier, ) && + !pg_add_s32_overflow(tm->tm_year, years, >tm_year); } /* Fetch a fractional-second value with suitable error checking */ @@ -3275,56 +3313,69 @@ DecodeInterval(char **field, int *ftype, int nf, int range, case DTK_MINUTE: tm->tm_min += val; - AdjustFractSeconds(fval, tm, fsec, SECS_PER_MINUTE);
Re: Adding CI to our tree
Hi, On 2022-02-13 15:42:13 -0600, Justin Pryzby wrote: > > Note that prove unfortunately serializes the test output to be in the order > > it > > started them, even when tests run concurrently. Extremely unhelpful, but ... > > Are you sure ? Somewhat. I think it's a question of the prove version and some autodetection of what type of environment prove is running in (stdin/stdout/stderr). I don't remember the details, but at some point I pinpointed the source of the serialization, and verified that parallelization makes a significant difference in runtime even without being easily visible :(. But this is all vague memory, so I might be wrong. Reminds me that somebody (ugh, me???) should fix the perl > 5.26 incompatibilities on windows, then we'd also get a newer prove... > > One nice bit is that the output is a *lot* easier to read. > > > > You could try improving the total time by having prove remember slow tests > > and > > use that file to run the slowest tests first next time. --state slow,save or > > such I believe. Of course we'd have to save that state file... > > In a test, this hurt rather than helped (13m 42s). > https://cirrus-ci.com/task/6359167186239488 > > I'm not surprised - it makes sense to run 10 fast tests at once, but usually > doesn't make sense to run 10 slow tests tests at once (at least a couple of > which are doing something intensive). It was faster (12m16s) to do it > backwards (fastest tests first). > https://cirrus-ci.com/task/5745115443494912 Hm. I know I saw significant reduction in test times locally with meson by starting slow tests earlier, because they're the limiting factor for the *overall* test runtime - but I have more resources than on cirrus. Even locally on a windows VM, with the current buildsystem, I found that moving 027 to earlier withing recoverycheck reduced the test time. But it's possible that with all tests being scheduled concurrently, starting the slow tests early leads to sufficient resource overcommit to be problematic. > BTW, does it make sense to remove test_regress_parallel_script ? The > pg_upgrade run would do the same things, no ? If so, it might make sense to > run that first. OTOH, you suggested to run the upgrade tests with checksums > enabled, which seems like a good idea. No, I don't think so. The main regression tests are by far the most important thing during normal development. Just relying on main regression test runs embedded in other tests, with different output and config of the main regression test imo is just confusing. Greetings, Andres Freund
Re: generic plans and "initial" pruning
(just catching up on this thread) On Thu, 13 Jan 2022 at 07:20, Robert Haas wrote: > Yeah. I don't think it's only non-core code we need to worry about > either. What if I just do EXPLAIN ANALYZE on a prepared query that > ends up pruning away some stuff? IIRC, the pruned subplans are not > shown, so we might escape disaster here, but FWIW if I'd committed > that code I would have pushed hard for showing those and saying "(not > executed)" so it's not too crazy to imagine a world in which > things work that way. FWIW, that would remove the whole point in init run-time pruning. The reason I made two phases of run-time pruning was so that we could get away from having the init plan overhead of nodes we'll never need to scan. If we wanted to show the (never executed) scans in EXPLAIN then we'd need to do the init plan part and allocate all that memory needlessly. Imagine a hash partitioned table on "id" with 1000 partitions. The user does: PREPARE q1 (INT) AS SELECT * FROM parttab WHERE id = $1; EXECUTE q1(123); Assuming a generic plan, if we didn't have init pruning then we have to build a plan containing the scans for all 1000 partitions. There's significant overhead to that compared to just locking the partitions, and initialising 1 scan. If it worked this way then we'd be even further from Amit's goal of reducing the overhead of starting plan with run-time pruning nodes. I understood at the time it was just the EXPLAIN output that you had concerns with. I thought that was just around the lack of any display of the condition we used for pruning. David
Re: Adding CI to our tree
Hi, On 2022-02-13 15:31:20 -0600, Justin Pryzby wrote: > Oh - I suppose you're right. That's an unfortunate consequence of running a > single prove instance without chdir. I don't think it's chdir that's relevant (that changes into the source directory after all). It's the TESTDIR environment variable. I was thinking that we should make Utils.pm's INIT block responsible for figuring out both the directory a test should run in and the log location, instead having that in vcregress.pl and Makefile.global.in. Mostly because doing it in the latter means we can't start tests with different TESTDIR and working dir at the same time. If instead we pass the location of the top-level build and top-level source directory from vcregress.pl / Makefile.global, the tap test infrastructure can figure out that stuff themselves, on a per-test basis. For msvc builds we probably would need to pass in some information that allow Utils.pm to set up PATH appropriately. I think that might just require knowing that a) msvc build system is used b) Release vs Debug. Greetings, Andres Freund
Nonrandom scanned_pages distorts pg_class.reltuples set by VACUUM
My recent commit 44fa8488 made vacuumlazy.c always scan the last page of every heap relation, even when it's all-visible, regardless of any other factor. The general idea is to always examine the last page in the relation when that might prevent us from unsuccessfully attempting to truncate the relation -- since even the attempt is costly (in particular, we have to get an AccessExclusiveLock, which is painful with a Hot Standby replica). This created a new problem: if we manually VACUUM a smaller, static table many times, vac_estimate_reltuples() will consider the same scanned_page to be a random sample each time. Of course this page is the further possible thing from a random sample, because it's the same heap page each time, and because it's generally more likely that the last page will have fewer tuples. Every time you run VACUUM (without changing the table), pg_class.reltuples shrinks, by just a small amount. Run VACUUM like that enough times, and pg_class.reltuples will eventually become quite distorted. It's easy to spot this effect, just by running "VACUUM VERBOSE" in a loop against a static table, and noting how "tuples: ??? remain" tends to shrink over time, even though the contents of the table won't have changed. The "always scan last page" behavior isn't really new, actually. It's closely related to an earlier, slightly different behavior with the same goal -- though one that conditioned scanning the last page on certain other factors that were protective (which was probably never considered by the design). This closely related behavior (essentially the same behavior) was added by commit e8429082 from 2015. I judged that it wasn't worth worrying about scanning an extra page unnecessarily (better to keep things simple), since it's not at all uncommon for VACUUM to scan a few all-visible pages anyway (due to the SKIP_PAGES_THRESHOLD stuff). And I haven't changed my mind about that -- I still think that we should just scan the last page in all cases, to keep things simple. But I definitely don't think that this "pg_class.reltuples gets distorted over time" business is okay -- that has to be fixed. There has been at least one very similar bug in the past: the commit message from commit b4b6923e (from 2011) talks about a similar issue with over-sampling from the beginning of the table (not the end), due to a similar bias involving visibility map implementation details that lead to an extremely nonrandom scanned_pages sample. To me that suggests that the true problem here isn't really in vacuumlazy.c -- going back to the old approach (i.e. scanning the last page conditionally) seems like a case of the tail wagging the dog. IMV the real problem is that vac_estimate_reltuples() is totally unconcerned about these kinds of systematic sampling problems. Attached draft patch fixes the issue by making vac_estimate_reltuples() return the old/original reltuples from pg_class (and not care at all about the scanned_tuples value from its vacuumlazy.c caller) when: 1. The total_pages for the table hasn't changed (by even one page) since the last VACUUM (or last ANALYZE). and, 2. The total number of scanned_pages is less than 2% of total_pages. This fixes the observed problem directly. It also makes the code robust against other similar problems that might arise in the future. The risk that comes from trusting that scanned_pages is a truly random sample (given these conditions) is generally very large, while the risk that comes from disbelieving it (given these same conditions) is vanishingly small. -- Peter Geoghegan v1-0001-Avoid-vac_estimate_reltuples-distortion.patch Description: Binary data
Re: [PATCH] Avoid open and lock the table Extendend Statistics (src/backend/commands/statscmds.c)
Hi, On 2022-02-13 18:21:38 -0300, Ranier Vilela wrote: > Why open and lock the table Extended Statistics if it is not the owner. > Check and return to avoid this. I was about to say that this opens up time-to-check-time-to-use type issues. But it's the wrong thing to lock to prevent those. Having looked briefly at it, I don't understand what the locking scheme is? Shouldn't there be at least some locking against concurrent ALTERs and between ALTER and ANALYZE etc? Greetings, Andres Freund
Re: Adding CI to our tree
On Sat, Feb 12, 2022 at 02:26:25PM -0800, Andres Freund wrote: > On 2022-02-12 16:06:40 -0600, Justin Pryzby wrote: > > I had some success with that, but it doesn't seem to be significantly > > faster - > > it looks a lot like the tests are not actually running in parallel. Note that the total test time is close to the sum of the individual test times. But I think that may be an artifact of how prove is showing/attributing times to each test (which, if so, is misleading). > Note that prove unfortunately serializes the test output to be in the order it > started them, even when tests run concurrently. Extremely unhelpful, but ... Are you sure ? When I run it locally, I see: rm -fr src/test/recovery/tmp_check ; time PERL5LIB=`pwd`/src/test/perl TESTDIR=`pwd`/src/test/recovery PATH=`pwd`/tmp_install/usr/local/pgsql/bin:$PATH PG_REGRESS=`pwd`/src/test/regress/pg_regress REGRESS_SHLIB=`pwd`/src/test/regress/regress.so prove --time -j4 --ext '*.pl' `find src -name t` ... [15:34:48] src/bin/scripts/t/101_vacuumdb_all.pl ... ok 104 ms ( 0.00 usr 0.00 sys + 2.35 cusr 0.47 csys = 2.82 CPU) [15:34:49] src/bin/scripts/t/090_reindexdb.pl .. ok 8894 ms ( 0.06 usr 0.01 sys + 14.45 cusr 3.38 csys = 17.90 CPU) [15:34:50] src/bin/pg_config/t/001_pg_config.pl ok 79 ms ( 0.00 usr 0.01 sys + 0.23 cusr 0.04 csys = 0.28 CPU) [15:34:50] src/bin/pg_waldump/t/001_basic.pl ... ok 35 ms ( 0.00 usr 0.00 sys + 0.26 cusr 0.02 csys = 0.28 CPU) [15:34:51] src/bin/pg_test_fsync/t/001_basic.pl ok 100 ms ( 0.01 usr 0.00 sys + 0.24 cusr 0.04 csys = 0.29 CPU) [15:34:51] src/bin/pg_archivecleanup/t/010_pg_archivecleanup.pl ok 177 ms ( 0.02 usr 0.00 sys + 0.26 cusr 0.03 csys = 0.31 CPU) [15:34:55] src/bin/scripts/t/100_vacuumdb.pl ... ok 11267 ms ( 0.12 usr 0.04 sys + 13.47 cusr 3.20 csys = 16.83 CPU) [15:34:57] src/bin/scripts/t/102_vacuumdb_stages.pl ok 5802 ms ( 0.06 usr 0.01 sys + 7.70 cusr 1.37 csys = 9.14 CPU) ... => scripts/ stuff, followed by other stuff, followed by more, slower, scripts/ stuff. But I never saw that in cirrus. > Isn't this kind of a good test time? I thought earlier your alltaptests target > took a good bit longer? The original alltaptests runs in 16m 21s. https://cirrus-ci.com/task/6679061752709120 2 weeks ago, it was ~14min with your patch to cache initdb. https://cirrus-ci.com/task/5439320633901056 As I recall, that didn't seem to improve runtime when combined with my parallel patch. > One nice bit is that the output is a *lot* easier to read. > > You could try improving the total time by having prove remember slow tests and > use that file to run the slowest tests first next time. --state slow,save or > such I believe. Of course we'd have to save that state file... In a test, this hurt rather than helped (13m 42s). https://cirrus-ci.com/task/6359167186239488 I'm not surprised - it makes sense to run 10 fast tests at once, but usually doesn't make sense to run 10 slow tests tests at once (at least a couple of which are doing something intensive). It was faster (12m16s) to do it backwards (fastest tests first). https://cirrus-ci.com/task/5745115443494912 BTW, does it make sense to remove test_regress_parallel_script ? The pg_upgrade run would do the same things, no ? If so, it might make sense to run that first. OTOH, you suggested to run the upgrade tests with checksums enabled, which seems like a good idea. Note that in the attached patches, I changed the msvc "warnings" to use "tee". I don't know how to fix the pipeline test in a less hacky way... You said the docs build should be a separate task, but then said that it'd be okay to remove the dependency. So I did it both ways. There's currently some duplication between the docs patch and code coverage patch. -- Justin >From 05c24a1e679e5ae0dd0cb6504d397f77c8b1fc5c Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Mon, 17 Jan 2022 00:53:04 -0600 Subject: [PATCH 1/8] cirrus: include hints how to install OS packages.. This is useful for patches during development, but once a features is merged, new libraries should be added to the OS image files, rather than installed during every CI run forever into the future. --- .cirrus.yml | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/.cirrus.yml b/.cirrus.yml index dd96a97efe5..eda8ac9596c 100644 --- a/.cirrus.yml +++ b/.cirrus.yml @@ -73,10 +73,11 @@ task: chown -R postgres:postgres . mkdir -p ${CCACHE_DIR} chown -R postgres:postgres ${CCACHE_DIR} - setup_cores_script: | + setup_os_script: | mkdir -m 770 /tmp/cores chown root:postgres /tmp/cores sysctl kern.corefile='/tmp/cores/%N.%P.core' +#pkg install -y ... # NB: Intentionally build without --with-llvm. The freebsd image
Re: Adding CI to our tree
On Sun, Feb 13, 2022 at 01:23:16PM -0800, Andres Freund wrote: > If you're seeing this on windows on one of your test branches, that's much > more likely to be caused by the alltaptests stuff, than by the change in > artifact instruction. Oh - I suppose you're right. That's an unfortunate consequence of running a single prove instance without chdir. -- Justin
Re: Adding CI to our tree
Hi, On 2022-02-13 15:02:50 -0600, Justin Pryzby wrote: > On Sat, Feb 12, 2022 at 04:24:20PM -0800, Andres Freund wrote: > > > What I am excited about is that some of your other changes showed that we > > > don't need separate *_artifacts for separate directories anymore. That > > > used to > > > be the case, but an array of paths is now supported. Putting log, diffs, > > > and > > > regress_log in one directory will be considerably more convenient... > > > > pushed together. > > This change actually complicates things. > > Before, there was log/src/test/recovery/tmp_check/log, with a few files for > 001, a few for 002, a few for 003. This are a lot of output files, but at > least they're all related. > Now, there's a single log/tmp_check/log, which has logs for the entire tap > tests. There's 3 pages of 001*, 2 pages of 002*, 3 pages of 003, etc. Hm? Doesn't look like that to me, and I don't see why it would work that way? This didn't do anything to flatten the directory hierarchy, just combine three hierarchies into one? What I see, and what I expect, is that logs end up in e.g. log/src/test/recovery/tmp_check/log but that that directory contains regress_log_*, as well as *.log? Before one needed to go through the hierarchy multiple times to see both regress_log_ (i.e. tap test log) as well as 0*.log (i.e. server logs). A random example: https://cirrus-ci.com/task/5152523873943552 shows the logs for the failure in log/src/bin/pg_upgrade/tmp_check/log If you're seeing this on windows on one of your test branches, that's much more likely to be caused by the alltaptests stuff, than by the change in artifact instruction. Greetings, Andres Freund
[PATCH] Avoid open and lock the table Extendend Statistics (src/backend/commands/statscmds.c)
Hi, Commit https://github.com/postgres/postgres/commit/6d554e3fcd6fb8be2dbcbd3521e2947ed7a552cb has fixed the bug, but still has room for improvement. Why open and lock the table Extended Statistics if it is not the owner. Check and return to avoid this. trivial patch attached. regards, Ranier Vilela avoid_open_and_lock_table_if_not_owner.patch Description: Binary data
Re: Mark all GUC variable as PGDLLIMPORT
Andres Freund writes: > On 2022-02-13 15:36:16 -0500, Chapman Flack wrote: >> Also, I think there are some options that are only represented by >> an int, float8, etc., when shown, but whose native internal form >> is something else, like a struct. I was definitely contemplating >> that you could 'subscribe' to one of those too, by passing the >> address of an appropriate struct. But of course a GetConfigOption() >> flavor could work that way too. > I have a very hard time seeing a use-case for this. Nor how it'd even work > with a struct - you can't just copy the struct contents, because of pointers > to objects etc. I don't think there really are options like this anyway. There are a couple of legacy cases like "datestyle" where something that the user sees as one GUC translates to multiple variables under the hood, so you'd have to invent a struct if you wanted to pass them through a mechanism like this. I don't have a big problem with leaving those out of any such solution, though. (I see that datestyle's underlying variables DateStyle and DateOrder are already marked PGDLLIMPORT, and that's fine with me.) A possibly more interesting case is something like search_path --- but again, there are already special-purpose APIs for accessing the interpreted value of that. regards, tom lane
Re: Adding CI to our tree
On Sat, Feb 12, 2022 at 04:24:20PM -0800, Andres Freund wrote: > > What I am excited about is that some of your other changes showed that we > > don't need separate *_artifacts for separate directories anymore. That used > > to > > be the case, but an array of paths is now supported. Putting log, diffs, and > > regress_log in one directory will be considerably more convenient... > > pushed together. This change actually complicates things. Before, there was log/src/test/recovery/tmp_check/log, with a few files for 001, a few for 002, a few for 003. This are a lot of output files, but at least they're all related. Now, there's a single log/tmp_check/log, which has logs for the entire tap tests. There's 3 pages of 001*, 2 pages of 002*, 3 pages of 003, etc. -- Justin
Re: Mark all GUC variable as PGDLLIMPORT
Chapman Flack writes: > On 02/13/22 15:16, Tom Lane wrote: >> Why not just provide equivalents to GetConfigOption() that can >> deliver int, float8, etc values instead of strings? > And repeat the bsearch to find the option every time the interested > extension wants to check the value, even when what it learns is that > the value has not changed? And make it the job of every piece of > interested extension code to save the last-retrieved value and compare > if it wants to detect that it's changed? When the GUC machinery already > has code that executes exactly when a value is being supplied for > an option? (shrug) You could argue the performance aspect either way. If the core GUC code updates a value 1000 times while the extension consults the result once, you've probably lost money on the deal. As for the bsearch, we could improve on that when and if it's shown to be a bottleneck --- converting to a hash table ought to pretty much fix any worries there. Or we could provide APIs that let an extension look up a "struct config_generic *" once and then fetch directly using that pointer. (But I'd prefer not to, since it'd constrain the internals more than I think is wise.) regards, tom lane
Re: Mark all GUC variable as PGDLLIMPORT
Hi, On 2022-02-13 15:36:16 -0500, Chapman Flack wrote: > Clearly I'm not thinking here of the GUCs that are read-only views of > values that are determined some other way. How many of those are there > that are mutable, and could have their values changed without going > through the GUC mechanisms? Is there any GUCs where one needs this? There are a few GUCs frequently changing values, but it's stuff like transaction_read_only. Where I don't really see a use for constantly checking the value. > Also, I think there are some options that are only represented by > an int, float8, etc., when shown, but whose native internal form > is something else, like a struct. I was definitely contemplating > that you could 'subscribe' to one of those too, by passing the > address of an appropriate struct. But of course a GetConfigOption() > flavor could work that way too. I have a very hard time seeing a use-case for this. Nor how it'd even work with a struct - you can't just copy the struct contents, because of pointers to objects etc. I don't think there really are options like this anyway. Greetings, Andres Freund
Re: fixing bookindex.html bloat
Hi, On 2022-02-13 12:16:18 -0800, Andres Freund wrote: > Waiting for docbook to fix this seems a bit futile, I eventually found a > bugreport about this, from 2016: > https://sourceforge.net/p/docbook/bugs/1384/ Now also reported to the current repo: https://github.com/docbook/xslt10-stylesheets/issues/239 While there's been semi-regular changes / fixes, they've not done a release in years... So even if they fix it, it'll likely not trickle down into distro packages etc anytime soon, if ever. Greetings, Andres Freund
Re: Fix overflow in DecodeInterval
Andres Freund writes: > Do we want to consider backpatching these fixes? I wasn't thinking that we should. It's a behavioral change and people might not be pleased to have it appear in minor releases. regards, tom lane
Re: Mark all GUC variable as PGDLLIMPORT
On 02/13/22 15:16, Tom Lane wrote: > That seems like about 10X more complexity than is warranted, > not only in terms of the infrastructure required, but also in > the intellectual complexity around "just when could that value > change?" > > Why not just provide equivalents to GetConfigOption() that can > deliver int, float8, etc values instead of strings? And repeat the bsearch to find the option every time the interested extension wants to check the value, even when what it learns is that the value has not changed? And make it the job of every piece of interested extension code to save the last-retrieved value and compare if it wants to detect that it's changed? When the GUC machinery already has code that executes exactly when a value is being supplied for an option? Clearly I'm not thinking here of the GUCs that are read-only views of values that are determined some other way. How many of those are there that are mutable, and could have their values changed without going through the GUC mechanisms? Also, I think there are some options that are only represented by an int, float8, etc., when shown, but whose native internal form is something else, like a struct. I was definitely contemplating that you could 'subscribe' to one of those too, by passing the address of an appropriate struct. But of course a GetConfigOption() flavor could work that way too. Regards, -Chap
Re: Fix overflow in DecodeInterval
Hi, On 2022-02-13 09:35:47 -0500, Joseph Koshakow wrote: > I chose int return types to keep all these methods > consistent with DecodeInterval, which returns a > non-zero int to indicate an error. That's different, because it actually returns different types of errors. IMO that difference is actually reason to use a bool for the new cases, because then it's a tad clearer that they don't return DTERR_*. > Though I wasn't sure > if an int or bool would be best, so I'm happy to change > to bool if people think that's better. +1 or bool. > Also I'm realizing now that I've incorrectly been using the > number of the patch to indicate the version, instead of just > sticking a v3 to the front. So sorry about that, all the patches > I sent in this thread are the same patch, just different versions. No worries ;) Do we want to consider backpatching these fixes? If so, I'd argue for skipping 10, because it doesn't have int.h stuff yet. There's also the issue with potentially breaking indexes / constraints? Not that goes entirely away across major versions... Greetings, Andres Freund
Re: [PATCH] Dereference null return value (NULL_RETURNS) (src/backend/commands/statscmds.c)
Em ter., 25 de ago. de 2020 às 12:42, Ranier Vilela escreveu: > Hi Tom, > > Per Coverity. > > The SearchSysCache1 allows return NULL and at function AlterStatistics, > has one mistake, lack of, check of return, which enables a dereference > NULL pointer, > at function heap_modify_tuple. > > While there is room for improvement. > Avoid calling SearchSysCache1 and table_open if the user "is not the owner > of the existing statistics object". > After a long time, finally this bug has been fixed. https://github.com/postgres/postgres/commit/6d554e3fcd6fb8be2dbcbd3521e2947ed7a552cb regards, Ranier Vilela
Re: Adding CI to our tree
Hi, On 2022-02-13 15:09:20 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2022-02-13 12:13:17 -0500, Tom Lane wrote: > >> Right. Can we set things up so that it's not too painful to inject > >> custom build options into a CI build? > > > What kind of injection are you thinking about? > > That's exactly what needs to be decided. > [...] > I'd prefer something a bit more out-of-band. I don't know this technology > well enough to propose a way. I don't yet understand the precise use case for adjustments well enough to propose something. Who would like to adjust something for what purpose? The "original" author, for a one-off test? A reviewer / committer, to track down a hunch? If it's about CI runs in in a personal repository, one can set additional environment vars from the CI management interface. We can make sure they work (the ci stuff probably overrides CFLAGS, but COPT should work) and document the way to do so. > > A patch author can obviously > > just add options in .cirrus.yml. That's something possible now, that was not > > possible with cfbot applying its own .cirrus.yml > > Fine, but are committers supposed to keep track of the fact that they > shouldn't commit that part of a patch? I'd say it depends on the the specific modification - there's some patches where it seems to make sense to adjust extend CI as part of it and have it merged. But yea, in other cases committers would have to take them out. For more on-off stuff one would presumably not want to spam the list the list with a full patchset to trigger a cfbot run, nor wait till cfbot gets round to that patch again. When pushing to a personal repo it's of course easy to just have a commit on-top of what's submitted to play around with compile options. Greetings, Andres Freund
Re: Signed vs. Unsigned (some)
Em sex., 11 de jun. de 2021 às 23:05, Ranier Vilela escreveu: > Hi, > > Removing legitimate warnings can it be worth it? > > -1 CAST can be wrong, when there is an invalid value defined > (InvalidBucket, InvalidBlockNumber). > I think depending on the compiler -1 CAST may be different from > InvalidBucket or InvalidBlockNumber. > > pg_rewind is one special case. > All cases of XLogSegNo (uint64) initialization are zero, but in pg_rewind > was used -1? > I did not find it InvalidXLogSegNo! > Not tested. > > Trivial patch attached. > After a long time, finally a small part is accepted and fixed. https://github.com/postgres/postgres/commit/302612a6c74fb16f26d094ff47e9c59cf412740c regards, Ranier Vilela
Re: Mark all GUC variable as PGDLLIMPORT
Chapman Flack writes: > On 02/13/22 02:29, Julien Rouhaud wrote: >> Maybe we could have an actually usable GUC API to retrieve values in their >> native format rather than C string for instance, that we could make sure also >> works for cases like max_backend? > I proposed a sketch of such an API for discussion back in [0] (the second > idea in that email, the "what I'd really like" one). > In that scheme, some extension code that was interested in (say, > for some reason) log_statement_sample_rate could say: > static double samprate; > static int gucs_changed = 0; > #define SAMPRATE_CHANGED 1 > ... > ObserveTypedConfigValue("log_statement_sample_rate", > , _changed, SAMPRATE_CHANGED); > ... > and will be subscribed to have the native-format value stored into samprate, > and SAMPRATE_CHANGED ORed into gucs_changed, whenever the value changes. That seems like about 10X more complexity than is warranted, not only in terms of the infrastructure required, but also in the intellectual complexity around "just when could that value change?" Why not just provide equivalents to GetConfigOption() that can deliver int, float8, etc values instead of strings? (In any case we'd need to rethink the GUC "show_hook" APIs, which currently need only deal in string output.) regards, tom lane
fixing bookindex.html bloat
Hi, Sometime last year I was surprised to see (not on a public list unfortunately) that bookindex.html is 657kB, with > 200kB just being repetitions of xmlns="http://www.w3.org/1999/xhtml; xmlns:xlink="http://www.w3.org/1999/xlink; Reminded of this, due to a proposal to automatically generate docs as part of cfbot runs (which'd be fairly likely to update bookindex.html), I spent a few painful hours last night trying to track this down. The reason for the two xmlns= are different. The xmlns="http://www.w3.org/1999/xhtml; is afaict caused by confusion on our part. Some of our stylesheets use xmlns="http://www.w3.org/TR/xhtml1/transitional; others use xmlns="http://www.w3.org/1999/xhtml; It's noteworthy that the docbook xsl stylesheets end up with http://www.w3.org/1999/xhtml;> so it's a bit pointless to reference http://www.w3.org/TR/xhtml1/transitional afaict. Adding xmlns="http://www.w3.org/1999/xhtml; to stylesheet-html-common.xsl gets rid of xmlns="http://www.w3.org/TR/xhtml1/transitional; in bookindex specific content. Changing stylesheet.xsl from transitional to http://www.w3.org/1999/xhtml gets rid of xmlns="http://www.w3.org/TR/xhtml1/transitional; in navigation/footer. Of course we should likely change all http://www.w3.org/TR/xhtml1/transitional references, rather than just the one necessary to get rid of the xmlns= spam. So far, so easy. It took me way longer to understand what's causing the all the xmlns:xlink= appearances. For a long time I was misdirected because if I remove the in stylesheet-html-common.xsl, the number of xmlns:xlink drastically reduces to a handful. Which made me think that their existance is somehow our fault. And I tried and tried to find the cause. But it turns out that this originally is caused by a still existing buglet in the docbook xsl stylesheets, specifically autoidx.xsl. It doesn't omit xlink in exclude-result-prefixes, but uses ids etc from xlink. The reason that we end up with so many more xmlns:xlink is just that without our customization there ends up being a single http://www.w3.org/1999/xlink; class="index"> and then everything below that doesn't need the xmlns:xlink anymore. But because stylesheet-html-common.xsl emits the div, the xmlns:xlink is emitted for each element that autoidx.xsl has "control" over. Waiting for docbook to fix this seems a bit futile, I eventually found a bugreport about this, from 2016: https://sourceforge.net/p/docbook/bugs/1384/ But we can easily reduce the "impact" of the issue, by just adding a single xmlns:xlink to , which is sufficient to convince xsltproc to not repeat it. Before: -rw-r--r-- 1 andres andres 683139 Feb 13 04:31 html-broken/bookindex.html After: -rw-r--r-- 1 andres andres 442923 Feb 13 12:03 html/bookindex.html While most of the savings are in bookindex, the rest of the files are reduced by another ~100kB. WIP patch attached. For now I just adjusted the minimal set of xmlns="http://www.w3.org/TR/xhtml1/transitional;, but I think we should update all. Greetings, Andres Freund diff --git i/doc/src/sgml/stylesheet-html-common.xsl w/doc/src/sgml/stylesheet-html-common.xsl index d9961089c65..9f69af40a94 100644 --- i/doc/src/sgml/stylesheet-html-common.xsl +++ w/doc/src/sgml/stylesheet-html-common.xsl @@ -4,6 +4,7 @@ %common.entities; ]> http://www.w3.org/1999/XSL/Transform; +xmlns="http://www.w3.org/1999/xhtml; version="1.0"> + http://www.w3.org/1999/xlink;> diff --git i/doc/src/sgml/stylesheet.xsl w/doc/src/sgml/stylesheet.xsl index 0eac594f0cc..24a9481fd49 100644 --- i/doc/src/sgml/stylesheet.xsl +++ w/doc/src/sgml/stylesheet.xsl @@ -1,7 +1,7 @@ http://www.w3.org/1999/XSL/Transform; version='1.0' -xmlns="http://www.w3.org/TR/xhtml1/transitional; +xmlns="http://www.w3.org/1999/xhtml; exclude-result-prefixes="#default"> http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl"/>
Re: Adding CI to our tree
Andres Freund writes: > On 2022-02-13 12:13:17 -0500, Tom Lane wrote: >> Right. Can we set things up so that it's not too painful to inject >> custom build options into a CI build? > What kind of injection are you thinking about? That's exactly what needs to be decided. > A patch author can obviously > just add options in .cirrus.yml. That's something possible now, that was not > possible with cfbot applying its own .cirrus.yml Fine, but are committers supposed to keep track of the fact that they shouldn't commit that part of a patch? I'd prefer something a bit more out-of-band. I don't know this technology well enough to propose a way. > I'd like to have things like -fanitize=aligned and > -DWRITE_READ_PARSE_PLAN_TREES on by default for CI, primarily for cfbot's > benefit. Most patch authors won't know about using > -DWRITE_READ_PARSE_PLAN_TREES etc, so they won't even think about enabling > them. We're *really* not doing well on the "timely review" side of things, so > we at least should not waste time on high latency back-forth for easily > automatically detectable things. I don't personally have an objection to either of those; maybe Robert does. regards, tom lane
Re: Fix overflow in DecodeInterval
Actually found an additional overflow issue in DecodeInterval. I've updated the patch with the fix. Specifically it prevents trying to negate a field if it equals INT_MIN. Let me know if this is best put into another patch. - Joe Koshakow From 09eafa9b496c8461f2dc52ea62c9e833ab10a17f Mon Sep 17 00:00:00 2001 From: Joseph Koshakow Date: Fri, 11 Feb 2022 14:18:32 -0500 Subject: [PATCH] Check for overflow when decoding an interval When decoding an interval there are various date units which are aliases for some multiple of another unit. For example a week is 7 days and a decade is 10 years. When decoding these specific units, there is no check for overflow, allowing the interval to overflow. This commit adds an overflow check for all of these units. Additionally fractional date/time units are rolled down into the next smallest unit. For example 0.1 months is 3 days. When adding these fraction units, there is no check for overflow, allowing the interval to overflow. This commit adds an overflow check for all of the fractional units. Additionally adding the word "ago" to the interval negates every field. However the negative of INT_MIN is still INT_MIN. This commit adds a check to make sure that we don't try and negate a field if it's INT_MIN. Signed-off-by: Joseph Koshakow --- src/backend/utils/adt/datetime.c | 108 - src/test/regress/expected/interval.out | 68 src/test/regress/sql/interval.sql | 18 + 3 files changed, 172 insertions(+), 22 deletions(-) diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c index 7926258c06..49bd12075e 100644 --- a/src/backend/utils/adt/datetime.c +++ b/src/backend/utils/adt/datetime.c @@ -21,6 +21,7 @@ #include "access/htup_details.h" #include "access/xact.h" #include "catalog/pg_type.h" +#include "common/int.h" #include "common/string.h" #include "funcapi.h" #include "miscadmin.h" @@ -44,10 +45,13 @@ static int DecodeDate(char *str, int fmask, int *tmask, bool *is2digits, struct pg_tm *tm); static char *AppendSeconds(char *cp, int sec, fsec_t fsec, int precision, bool fillzeros); -static void AdjustFractSeconds(double frac, struct pg_tm *tm, fsec_t *fsec, +static int AdjustFractSeconds(double frac, struct pg_tm *tm, fsec_t *fsec, int scale); -static void AdjustFractDays(double frac, struct pg_tm *tm, fsec_t *fsec, +static int AdjustFractDays(double frac, struct pg_tm *tm, fsec_t *fsec, int scale); +static int AdjustFractMonths(double frac, struct pg_tm *tm, int scale); +static int AdjustDays(int val, struct pg_tm *tm, int multiplier); +static int AdjustYears(int val, struct pg_tm *tm, int multiplier); static int DetermineTimeZoneOffsetInternal(struct pg_tm *tm, pg_tz *tzp, pg_time_t *tp); static bool DetermineTimeZoneAbbrevOffsetInternal(pg_time_t t, @@ -499,34 +503,76 @@ AppendTimestampSeconds(char *cp, struct pg_tm *tm, fsec_t fsec) /* * Multiply frac by scale (to produce seconds) and add to *tm & *fsec. * We assume the input frac is less than 1 so overflow is not an issue. + * Returns 0 if successful, 1 if tm overflows. */ -static void +static int AdjustFractSeconds(double frac, struct pg_tm *tm, fsec_t *fsec, int scale) { int sec; if (frac == 0) - return; + return 0; frac *= scale; sec = (int) frac; - tm->tm_sec += sec; + if (pg_add_s32_overflow(tm->tm_sec, sec, >tm_sec)) + return 1; frac -= sec; *fsec += rint(frac * 100); + return 0; } /* As above, but initial scale produces days */ -static void +static int AdjustFractDays(double frac, struct pg_tm *tm, fsec_t *fsec, int scale) { int extra_days; if (frac == 0) - return; + return 0; frac *= scale; extra_days = (int) frac; - tm->tm_mday += extra_days; + if (pg_add_s32_overflow(tm->tm_mday, extra_days, >tm_mday)) + return 1; frac -= extra_days; - AdjustFractSeconds(frac, tm, fsec, SECS_PER_DAY); + return AdjustFractSeconds(frac, tm, fsec, SECS_PER_DAY); +} + +/* As above, but initial scale produces months */ +static int +AdjustFractMonths(double frac, struct pg_tm *tm, int scale) +{ + int months = rint(frac * MONTHS_PER_YEAR * scale); + + if (pg_add_s32_overflow(tm->tm_mon, months, >tm_mon)) + return 1; + return 0; +} + +/* + * Multiply val by multiplier (to produce days) and add to *tm. + * Returns 0 if successful, 1 if tm overflows. + */ +static int +AdjustDays(int val, struct pg_tm *tm, int multiplier) +{ + int extra_days; + if (pg_mul_s32_overflow(val, multiplier, _days)) + return 1; + if (pg_add_s32_overflow(tm->tm_mday, extra_days, >tm_mday)) + return 1; + return 0; +} + +/* As above, but initial val produces years */ +static int +AdjustYears(int val, struct pg_tm *tm, int multiplier) +{ + int years; + if (pg_mul_s32_overflow(val, multiplier, )) + return 1; + if (pg_add_s32_overflow(tm->tm_year, years, >tm_year)) + return 1; + return 0; } /* Fetch a fractional-second value with suitable error
Re: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 1173, ...
Hi, On 2022-02-13 13:27:08 +0100, Tomas Vondra wrote: > I'm not sure how could this be related to the issue fixed by b779d7d8fd. > That's merely about handling of empty xacts in the output plugin, it has > little (nothing) to do with reorderbuffer - the failures was simply > about (not) printing BEGIN/COMMIT for empty xacts. I'd only read your commit message - which didn't go into that much detail. I just saw that the run didn't yet include that commit and that the failed command specified skip-empty-xacts 1, which your commit described as fixing... > So why would it be triggering an Assert in reorderbuffer? Also, there > was a successful run with the test_decoding changes 80901b3291 (and also > with sequence decoding infrastructure committed a couple days ago) ... It's clearly a bug only happen at a lower likelihood... > Do we have access to the machine? If yes, it'd be interesting to run the > tests before the fix, and see how often it fails. Also, printing the LSN > would be interesting, so that we can correlate it to WAL. We really should provide assert infrastructure that can do comparisons while also printing values... If C just had proper generics :(. Greetings, Andres Freund
Re: Adding CI to our tree
Hi, On 2022-02-13 12:13:17 -0500, Tom Lane wrote: > Robert Haas writes: > > This is exactly why I'm not a huge fan of having ci stuff in the tree. > > It supposes that there's one right way to do a build, but in reality, > > different people want and indeed need to use different options for all > > kinds of reasons. Sure. But why is that an argument against "having ci stuff in the tree"? All it does is to make sure that a certain base-level of testing is easy to achieve for everyone. I don't like working on windows or mac, but my patches often have platform dependent bits. Now it's less likely that I need to manually interact with windows. I don't think we can (or well should) replace the buildfarm with the CI stuff. The buildfarm provides extensive and varied coverage for master/release branches. Which isn't feasible for unmerged development work, including cfbot, from a resource usage POV alone. > > That's the whole value of having things like > > configure and pg_config_manual.h. When we start arguing about whether > > or ci builds should use -DWRITE_READ_PARSE_PLAN_TREES we're inevitably > > into the realm where no choice is objectively better, > Right. Can we set things up so that it's not too painful to inject > custom build options into a CI build? What kind of injection are you thinking about? A patch author can obviously just add options in .cirrus.yml. That's something possible now, that was not possible with cfbot applying its own .cirrus.yml It'd be nice if there were a way to do it more easily for msvc and configure builds together, right now it'd require modifying those tasks in different ways. But that's not really a CI question. I'd like to have things like -fanitize=aligned and -DWRITE_READ_PARSE_PLAN_TREES on by default for CI, primarily for cfbot's benefit. Most patch authors won't know about using -DWRITE_READ_PARSE_PLAN_TREES etc, so they won't even think about enabling them. We're *really* not doing well on the "timely review" side of things, so we at least should not waste time on high latency back-forth for easily automatically detectable things. > I should think that at the very least one needs to be able to vary the > configure switches and CPPFLAGS/CFLAGS. Do you mean as part of a patch tested with cfbot, CI running for pushes to your own repository, or ...? Greetings, Andres Freund
xml_is_well_formed (was Re: buildfarm warnings)
I wrote: > This conversation motivated me to take a fresh pass over said filtering > script, and one thing I notice that it's been ignoring is these complaints > from all the AIX animals: > ld: 0711-224 WARNING: Duplicate symbol: .xml_is_well_formed > ld: 0711-224 WARNING: Duplicate symbol: xml_is_well_formed > while building contrib/xml2. Evidently this is because xml2 defines > a symbol also defined in the core. We cannot rename xml2's function > without a user-visible ABI break, but it would be pretty painless > to rename the core function (at the C level, not SQL level). > So I propose we do that. I tried to do that, but it blew up in my face, because it turns out that actually contrib/xml2's extension script has a by-C-name reference to the *core* function: -- deprecated old name for xml_is_well_formed CREATE FUNCTION xml_valid(text) RETURNS bool AS 'xml_is_well_formed' LANGUAGE INTERNAL STRICT STABLE PARALLEL SAFE; The situation with the instance in xml2 is explained by its comment: * Note: this has been superseded by a core function. We still have to * have it in the contrib module so that existing SQL-level references * to the function won't fail; but in normal usage with up-to-date SQL * definitions for the contrib module, this won't be called. So we're faced with a dilemma: we can't rename either instance without breaking something. The only way to get rid of the warnings seems to be to decide that the copy in xml2 has outlived its usefulness and remove it. I don't think that's a hard argument to make: that version has been obsolete since 9.1 (a0b7b717a), meaning that only pre-extensions versions of xml2 could need it. In fact, we pulled the trigger on it once before, in 2016 (20540710e), and then changed our minds not because anyone lobbied to put it back but just because we gave up on the PGDLLEXPORT rearrangement that prompted the change then. I think that getting rid of these build warnings is sufficient reason to drop this ancient compatibility function, and now propose that we do that. regards, tom lane
Fix overflow in justify_interval related functions
I mentioned this issue briefly in another thread, but the justify_interval, justify_days, and justify_hours functions all have the potential to overflow. The attached patch fixes this issue. Cheers, Joe Koshakow From 4400f2e6886097e3b75d455aeec1ffa9cbc88510 Mon Sep 17 00:00:00 2001 From: Joseph Koshakow Date: Sun, 13 Feb 2022 13:26:16 -0500 Subject: [PATCH] Check for overflow in justify_interval functions justify_interval, justify_hours, and justify_days didn't check for overflow when moving hours to days or days to hours. This commit adds check for overflow and returns an appropriate error. --- src/backend/utils/adt/timestamp.c | 20 src/test/regress/expected/interval.out | 8 src/test/regress/sql/interval.sql | 6 ++ 3 files changed, 30 insertions(+), 4 deletions(-) diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c index 36f8a84bcc..fb4b3ae0aa 100644 --- a/src/backend/utils/adt/timestamp.c +++ b/src/backend/utils/adt/timestamp.c @@ -2718,11 +2718,17 @@ interval_justify_interval(PG_FUNCTION_ARGS) result->time = span->time; TMODULO(result->time, wholeday, USECS_PER_DAY); - result->day += wholeday; /* could overflow... */ + if (pg_add_s32_overflow(result->day, wholeday, >day)) + ereport(ERROR, +(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), +errmsg("interval out of range"))); wholemonth = result->day / DAYS_PER_MONTH; result->day -= wholemonth * DAYS_PER_MONTH; - result->month += wholemonth; + if (pg_add_s32_overflow(result->month, wholemonth, >month)) + ereport(ERROR, +(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), +errmsg("interval out of range"))); if (result->month > 0 && (result->day < 0 || (result->day == 0 && result->time < 0))) @@ -2772,7 +2778,10 @@ interval_justify_hours(PG_FUNCTION_ARGS) result->time = span->time; TMODULO(result->time, wholeday, USECS_PER_DAY); - result->day += wholeday; /* could overflow... */ + if (pg_add_s32_overflow(result->day, wholeday, >day)) + ereport(ERROR, +(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), +errmsg("interval out of range"))); if (result->day > 0 && result->time < 0) { @@ -2808,7 +2817,10 @@ interval_justify_days(PG_FUNCTION_ARGS) wholemonth = result->day / DAYS_PER_MONTH; result->day -= wholemonth * DAYS_PER_MONTH; - result->month += wholemonth; + if (pg_add_s32_overflow(result->month, wholemonth, >month)) + ereport(ERROR, +(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), +errmsg("interval out of range"))); if (result->month > 0 && result->day < 0) { diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out index accd4a7d90..e2bb8095c3 100644 --- a/src/test/regress/expected/interval.out +++ b/src/test/regress/expected/interval.out @@ -396,6 +396,10 @@ SELECT justify_days(interval '6 months 36 days 5 hours 4 minutes 3 seconds') as @ 7 mons 6 days 5 hours 4 mins 3 secs (1 row) +SELECT justify_hours(interval '2147483647 days 24 hrs'); +ERROR: interval out of range +SELECT justify_days(interval '2147483647 months 30 days'); +ERROR: interval out of range -- test justify_interval() SELECT justify_interval(interval '1 month -1 hour') as "1 month -1 hour"; 1 month -1 hour @@ -403,6 +407,10 @@ SELECT justify_interval(interval '1 month -1 hour') as "1 month -1 hour"; @ 29 days 23 hours (1 row) +SELECT justify_interval(interval '2147483647 days 24 hrs'); +ERROR: interval out of range +SELECT justify_interval(interval '2147483647 months 30 days'); +ERROR: interval out of range -- test fractional second input, and detection of duplicate units SET DATESTYLE = 'ISO'; SET IntervalStyle TO postgres; diff --git a/src/test/regress/sql/interval.sql b/src/test/regress/sql/interval.sql index 6d532398bd..3ebdca5023 100644 --- a/src/test/regress/sql/interval.sql +++ b/src/test/regress/sql/interval.sql @@ -149,10 +149,16 @@ select '1y 10mon -10d -10h -10min -10.01s ago'::interval SELECT justify_hours(interval '6 months 3 days 52 hours 3 minutes 2 seconds') as "6 mons 5 days 4 hours 3 mins 2 seconds"; SELECT justify_days(interval '6 months 36 days 5 hours 4 minutes 3 seconds') as "7 mons 6 days 5 hours 4 mins 3 seconds"; +SELECT justify_hours(interval '2147483647 days 24 hrs'); +SELECT justify_days(interval '2147483647 months 30 days'); + -- test justify_interval() SELECT justify_interval(interval '1 month -1 hour') as "1 month -1 hour"; +SELECT justify_interval(interval '2147483647 days 24 hrs'); +SELECT justify_interval(interval '2147483647 months 30 days'); + -- test fractional second input, and detection of duplicate units SET DATESTYLE = 'ISO'; SET IntervalStyle TO postgres; -- 2.25.1
Re: Mark all GUC variable as PGDLLIMPORT
Hi, On 02/13/22 02:29, Julien Rouhaud wrote: > Maybe we could have an actually usable GUC API to retrieve values in their > native format rather than C string for instance, that we could make sure also > works for cases like max_backend? I proposed a sketch of such an API for discussion back in [0] (the second idea in that email, the "what I'd really like" one). In that scheme, some extension code that was interested in (say, for some reason) log_statement_sample_rate could say: static double samprate; static int gucs_changed = 0; #define SAMPRATE_CHANGED 1 ... ObserveTypedConfigValue("log_statement_sample_rate", , _changed, SAMPRATE_CHANGED); ... and will be subscribed to have the native-format value stored into samprate, and SAMPRATE_CHANGED ORed into gucs_changed, whenever the value changes. The considerations leading me to that design were: - avoid subscribing as a 'callback' sort of listener. GUCs can get set (or, especially, reset) in delicate situations like error recovery where calling out to arbitrary extra code might best be avoided. - instead, just dump the value in a subscribed location. A copy, of course, so no modification there affects the real value. - but at the same time, OR a flag into a bit set, so subscribing code can very cheaply poll for when a value of interest (or any of a bunch of values of interest) has changed since last checked. - do the variable lookup by name once only, and pay no further search cost when the subscribing code wants the value. I never pictured that proposal as the last word on the question, and different proposals could result from putting different weights on those objectives, or adding other objectives, but I thought it might serve as a discussion-starter. Regards, -Chap [0] https://www.postgresql.org/message-id/6123C425.3080409%40anastigmatix.net
Re: Adding CI to our tree
Robert Haas writes: > This is exactly why I'm not a huge fan of having ci stuff in the tree. > It supposes that there's one right way to do a build, but in reality, > different people want and indeed need to use different options for all > kinds of reasons. That's the whole value of having things like > configure and pg_config_manual.h. When we start arguing about whether > or ci builds should use -DWRITE_READ_PARSE_PLAN_TREES we're inevitably > into the realm where no choice is objectively better, Right. Can we set things up so that it's not too painful to inject custom build options into a CI build? I should think that at the very least one needs to be able to vary the configure switches and CPPFLAGS/CFLAGS. regards, tom lane
Re: buildfarm warnings
I wrote: > Justin Pryzby writes: >> Is there any check for warnings from new code, other than those buildfarm >> members with -Werror ? > I periodically grep the buildfarm logs for interesting warnings. > There are a lot of uninteresting ones :-(, so I'm not sure how > automatable that'd be. I do use a prefiltering script, which > right now says there are > 13917 total warnings from 41 buildfarm members > of which it thinks all but 350 are of no interest. This conversation motivated me to take a fresh pass over said filtering script, and one thing I notice that it's been ignoring is these complaints from all the AIX animals: ld: 0711-224 WARNING: Duplicate symbol: .xml_is_well_formed ld: 0711-224 WARNING: Duplicate symbol: xml_is_well_formed while building contrib/xml2. Evidently this is because xml2 defines a symbol also defined in the core. We cannot rename xml2's function without a user-visible ABI break, but it would be pretty painless to rename the core function (at the C level, not SQL level). So I propose we do that. I think that when I put in that pattern, there were more problems, but this seems to be all that's left. Another thing I've been ignoring on the grounds that I-don't-feel- like-fixing-this is that the Solaris and OpenIndiana machines whine about every single reference from loadable modules to the core code, eg this from haddock while building contrib/fuzzystrmatch: Undefined first referenced symbol in file cstring_to_text dmetaphone.o varstr_levenshtein fuzzystrmatch.o text_to_cstring dmetaphone.o errstart_cold fuzzystrmatch.o errmsg fuzzystrmatch.o palloc dmetaphone.o ExceptionalConditionfuzzystrmatch.o errfinish fuzzystrmatch.o varstr_levenshtein_less_equal fuzzystrmatch.o repallocdmetaphone.o errcode fuzzystrmatch.o errmsg_internal fuzzystrmatch.o pg_detoast_datum_packed dmetaphone.o ld: warning: symbol referencing errors Presumably, this means that the Solaris linker would really like to see the executable the module will load against, analogously to the BE_DLLLIBS settings we use on some other platforms such as macOS. It evidently still works without that, but we might be losing performance from an extra indirection or the like. And in any case, this amount of noise would be very distracting if you wanted to do actual development on that platform. I'm not especially interested in tracking down the correct incantation, but I'd gladly push a patch if someone submits one. regards, tom lane
Re: Adding CI to our tree
On Sun, Feb 13, 2022 at 3:30 AM Andres Freund wrote: > > There's other things that it'd be nice to enable, but maybe these don't > > need to > > be on by default. Maybe just have a list of optional compiler flags (and > > hints > > when they're useful). Like WRITE_READ_PARSE_PLAN_TREES. > > I think it'd be good to enable a reasonable set by default. Particularly for > newer contributors stuff like forgetting in/out/readfuncs, or not knowing > about some undefined behaviour, is easy. Probably makes sense to use different > settings on different tasks. This is exactly why I'm not a huge fan of having ci stuff in the tree. It supposes that there's one right way to do a build, but in reality, different people want and indeed need to use different options for all kinds of reasons. That's the whole value of having things like configure and pg_config_manual.h. When we start arguing about whether or ci builds should use -DWRITE_READ_PARSE_PLAN_TREES we're inevitably into the realm where no choice is objectively better, and whoever yells the loudest will get it the way they want, and then somebody else later will say "well that's dumb I don't like that" or even just "well that's not the right thing for testing MY patch," at which point the previous mailing list discussion will be cited as "precedent" for what was essentially an arbitrary decision made by 1 or 2 people. Mind you, I'm not trying to hold back the tide. I realize that in-tree ci stuff is very much in vogue. But I think it would be a very healthy thing if we acknowledged that what goes in there is far more arbitrary than most of what we put in the tree. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Sun, Feb 13, 2022 at 1:34 AM Dilip Kumar wrot> > test4: > 32 GB shared buffers, template DB size = 10GB, dirty shared buffer=70% > Head: 47656 ms > Patch: 79767 ms This seems like the most surprising result of the bunch. Here, the template DB is both small enough to fit in shared_buffers and small enough not to trigger a checkpoint all by itself, and yet the patch loses. Did you checkpoint between one test and the next, or might this test have been done after a bunch of WAL had already been written since the last checkpoint so that the 10GB pushed it over the edge? BTW, you have test4 twice in your list of results. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pgsql: Add suport for server-side LZ4 base backup compression.
On Sat, Feb 12, 2022 at 11:12 PM Tom Lane wrote: > Eh? copperhead for one is failing for exactly this reason: > > Bailout called. Further testing stopped: failed to execute command "lz4 -d > -m > /home/pgbf/buildroot/HEAD/pgsql.build/src/bin/pg_verifybackup/tmp_check/t_008_untar_primary_data/backup/server-backup/base.tar.lz4": > No such file or directory Well, that's because I didn't realize that LZ4 might be set to something that doesn't work at all. So Michael's thing worked, but because it (in my view) fixed the problem in a more surprising place than I would have preferred, I made a commit later that turned out to break the buildfarm. So you can blame either one of us that you like. Thanks, Michael, for preparing a patch. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Fix overflow in DecodeInterval
On Sat, Feb 12, 2022 at 10:51 PM Andres Freund wrote: > Any reason for using int return types? > > particularly since the pg_*_overflow stuff uses bool? I chose int return types to keep all these methods consistent with DecodeInterval, which returns a non-zero int to indicate an error. Though I wasn't sure if an int or bool would be best, so I'm happy to change to bool if people think that's better. Also I'm realizing now that I've incorrectly been using the number of the patch to indicate the version, instead of just sticking a v3 to the front. So sorry about that, all the patches I sent in this thread are the same patch, just different versions. - Joe Koshakow
Re: Error "initial slot snapshot too large" in create replication slot
В Пн, 07/02/2022 в 13:52 +0530, Dilip Kumar пишет: > On Mon, Jan 31, 2022 at 11:50 AM Kyotaro Horiguchi > wrote: > > At Mon, 17 Jan 2022 09:27:14 +0530, Dilip Kumar > > wrote in > > > > me> Mmm. The size of the array cannot be larger than the numbers the > > me> *Connt() functions return. Thus we cannot attach the oversized array > > me> to ->subxip. (I don't recall clearly but that would lead to assertion > > me> failure somewhere..) > > > > Then, I fixed the v3 error and post v4. > > Yeah you are right, SetTransactionSnapshot() has that assertion. > Anyway after looking again it appears that > GetMaxSnapshotSubxidCount is the correct size because this is > PGPROC_MAX_CACHED_SUBXIDS +1, i.e. it considers top transactions as > well so we don't need to add them separately. > > > SnapBUildInitialSnapshot tries to store XIDS of both top and sub > > transactions into snapshot->xip array but the array is easily > > overflowed and CREATE_REPLICATOIN_SLOT command ends with an error. > > > > To fix this, this patch is doing the following things. > > > > - Use subxip array instead of xip array to allow us have larger array > > for xids. So the snapshot is marked as takenDuringRecovery, which > > is a kind of abuse but largely reduces the chance of getting > > "initial slot snapshot too large" error. > > Right. I think the patch looks fine to me. > Good day. I've looked to the patch. Personally I'd prefer dynamically resize xip array. But I think there is issue with upgrade if replica source is upgraded before destination, right? Concerning patch, I think more comments should be written about new usage case for `takenDuringRecovery`. May be this field should be renamed at all? And there are checks for `takenDuringRecovery` in `heapgetpage` and `heapam_scan_sample_next_tuple`. Are this checks affected by the change? Neither the preceding discussion nor commit message answer me. --- regards Yura Sokolov Postgres Professional y.soko...@postgrespro.ru funny.fal...@gmail.com
Re: Avoiding smgrimmedsync() during nbtree index builds
> On Wed, Feb 09, 2022 at 01:49:30PM -0500, Melanie Plageman wrote: > Hi, > v5 attached and all email feedback addressed below Thanks for the patch, it looks quite good. I don't see it in the discussion, so naturally curious -- why directmgr is not used for bloom index, e.g. in blbuildempty? > On Sun, Jan 16, 2022 at 3:26 PM Justin Pryzby wrote: > > Separate from this issue, I wonder if it'd be useful to write a DEBUG log > > showing when btree uses shared_buffers vs fsync. And a regression test > > which > > first SETs client_min_messages=debug to capture the debug log to demonstrate > > when/that new code path is being hit. I'm not sure if that would be good to > > merge, but it may be useful for now. I can't find the thread right away, but I vaguely remember a similar situation where such approach, as a main way to test the patch, had caused some disagreement. Of course for the development phase it would be indeed convenient.
Re: Identify missing publications from publisher while create/alter subscription.
On Fri, Feb 11, 2022 at 7:14 PM Ashutosh Sharma wrote: > > I have spent little time looking at the latest patch. The patch looks > to be in good shape as it has already been reviewed by many people > here, although I did get some comments. Please take a look and let me > know your thoughts. > > > + /* Try to connect to the publisher. */ > + wrconn = walrcv_connect(sub->conninfo, true, sub->name, ); > + if (!wrconn) > + ereport(ERROR, > + (errmsg("could not connect to the publisher: %s", err))); > > I think it would be good to also include the errcode > (ERRCODE_CONNECTION_FAILURE) here? Modified > -- > > @@ -514,6 +671,8 @@ CreateSubscription(ParseState *pstate, > CreateSubscriptionStmt *stmt, > > PG_TRY(); > { > + check_publications(wrconn, publications, > opts.validate_publication); > + > > > Instead of passing the opts.validate_publication argument to > check_publication function, why can't we first check if this option is > set or not and accordingly call check_publication function? For other > options I see that it has been done in the similar way for e.g. check > for opts.connect or opts.refresh or opts.enabled etc. Modified > -- > > Above comment also applies for: > > @@ -968,6 +1130,8 @@ AlterSubscription(ParseState *pstate, > AlterSubscriptionStmt *stmt, > replaces[Anum_pg_subscription_subpublications - 1] = true; > > update_tuple = true; > + connect_and_check_pubs(sub, stmt->publication, > + opts.validate_publication); > Modified > -- > > + > + When true, the command verifies if all the specified publications > + that are being subscribed to are present in the publisher and > throws > + an error if any of the publication doesn't exist. The default is > + false. > > publication -> publications (in the 4th line : throw an error if any > of the publication doesn't exist) > > This applies for both CREATE and ALTER subscription commands. Modified Thanks for the comments, the attached v14 patch has the changes for the same. Regard,s Vignesh From a753ffcc3d9ab2aa87d76c52adff10f9192e3ff3 Mon Sep 17 00:00:00 2001 From: Vigneshwaran C Date: Sun, 13 Feb 2022 17:48:26 +0530 Subject: [PATCH v14] Identify missing publications from publisher while create/alter subscription. Creating/altering subscription is successful when we specify a publication which does not exist in the publisher. This patch checks if the specified publications are present in the publisher and throws an error if any of the publication is missing in the publisher. --- doc/src/sgml/ref/alter_subscription.sgml | 13 ++ doc/src/sgml/ref/create_subscription.sgml | 20 ++- src/backend/commands/subscriptioncmds.c | 201 +++--- src/bin/psql/tab-complete.c | 14 +- src/test/subscription/t/007_ddl.pl| 54 ++ 5 files changed, 269 insertions(+), 33 deletions(-) diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml index 0b027cc346..604bf66b65 100644 --- a/doc/src/sgml/ref/alter_subscription.sgml +++ b/doc/src/sgml/ref/alter_subscription.sgml @@ -168,6 +168,19 @@ ALTER SUBSCRIPTION name RENAME TO < + + +validate_publication (boolean) + + + When true, the command verifies if all the specified publications + that are being subscribed to are present in the publisher and throws + an error if any of the publications doesn't exist. The default is + false. + + + + diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml index 990a41f1a1..cd87daeae7 100644 --- a/doc/src/sgml/ref/create_subscription.sgml +++ b/doc/src/sgml/ref/create_subscription.sgml @@ -110,12 +110,14 @@ CREATE SUBSCRIPTION subscription_nametrue. Setting this to false will force the values of - create_slot, enabled and - copy_data to false. + create_slot, enabled, + copy_data and + validate_publication to false. (You cannot combine setting connect to false with setting create_slot, enabled, - or copy_data to true.) + copy_data or + validate_publication to true.) @@ -170,6 +172,18 @@ CREATE SUBSCRIPTION subscription_name + + +validate_publication (boolean) + + + When true, the command verifies if all the specified publications + that are being subscribed to are present in the publisher and throws + an error if any of the publications doesn't exist. The default is + false. + + + diff --git a/src/backend/commands/subscriptioncmds.c
Re: Identify missing publications from publisher while create/alter subscription.
On Thu, Feb 10, 2022 at 3:15 PM Ashutosh Sharma wrote: > > On Wed, Feb 9, 2022 at 11:53 PM Euler Taveira wrote: > > > > On Wed, Feb 9, 2022, at 12:06 PM, Ashutosh Sharma wrote: > > > > Just wondering if we should also be detecting the incorrect conninfo > > set with ALTER SUBSCRIPTION command as well. See below: > > > > -- try creating a subscription with incorrect conninfo. the command fails. > > postgres=# create subscription sub1 connection 'host=localhost > > port=5490 dbname=postgres' publication pub1; > > ERROR: could not connect to the publisher: connection to server at > > "localhost" (::1), port 5490 failed: Connection refused > > Is the server running on that host and accepting TCP/IP connections? > > connection to server at "localhost" (127.0.0.1), port 5490 failed: > > Connection refused > > Is the server running on that host and accepting TCP/IP connections? > > > > That's because by default 'connect' parameter is true. > > > > So can we use this option with the ALTER SUBSCRIPTION command. I think > we can't, which means if the user sets wrong conninfo using ALTER > SUBSCRIPTION command then we don't have the option to detect it like > we have in case of CREATE SUBSCRIPTION command. Since this thread is > trying to add the ability to identify the wrong/missing publication > name specified with the ALTER SUBSCRIPTION command, can't we do the > same for the wrong conninfo? I felt this can be extended once this feature is committed. Thoughts? Regards, Vignesh
Re: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 1173, ...
On 2/13/22 13:27, Tomas Vondra wrote: > > ... > > BTW it's probably just a coincidence, but the cfbot does show a strange > failure on the macOS machine [1], as if we did not wait for all changes > to be applied by replica. I thought it's because of the issue with > tracking LSN for sequences [2], but this is with a reworked test that > does not do just nextval(). So I'm wondering if these two things may be > related ... both things happened on macOS only. > I take this back - this failure was just a thinko in the ROLLBACK test, I've posted an updated patch fixing this. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: logical decoding and replication of sequences
On 2/12/22 20:58, Tomas Vondra wrote: > On 2/12/22 01:34, Tomas Vondra wrote: >> On 2/10/22 19:17, Tomas Vondra wrote: >>> I've polished & pushed the first part adding sequence decoding >>> infrastructure etc. Attached are the two remaining parts. >>> >>> I plan to wait a day or two and then push the test_decoding part. The >>> last part (for built-in replication) will need more work and maybe >>> rethinking the grammar etc. >>> >> >> I've pushed the second part, adding sequences to test_decoding. >> >> Here's the remaining part, rebased, with a small tweak in the TAP test >> to eliminate the issue with not waiting for sequence increments. I've >> kept the tweak in a separate patch, so that we can throw it away easily >> if we happen to resolve the issue. >> > > Hmm, cfbot was not happy about this, so here's a version fixing the > elog() format issue reported by CirrusCI/mingw by ditching the log > message. It was useful for debugging, but otherwise just noise. > There was another elog() making mingw unhappy, so here's a fix for that. This should also fix an issue on the macOS machine. This is a thinko in the tests, because wait_for_catchup() may not wait for all the sequence increments after a rollback. The default mode is "write" which uses pg_current_wal_lsn(), and that may be a bit stale after a rollback. Doing a simple insert after the rollback fixes this (using other LSN, like pg_current_wal_insert_lsn() would work too, but it'd cause long waits in the test). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom c2ab633ba457f5563769bd313dac6078b41e439b Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Thu, 10 Feb 2022 15:18:59 +0100 Subject: [PATCH 1/2] Add support for decoding sequences to built-in replication --- doc/src/sgml/catalogs.sgml | 71 doc/src/sgml/ref/alter_publication.sgml | 24 +- doc/src/sgml/ref/alter_subscription.sgml| 4 +- src/backend/catalog/pg_publication.c| 149 - src/backend/catalog/system_views.sql| 10 + src/backend/commands/publicationcmds.c | 350 +++- src/backend/commands/sequence.c | 79 + src/backend/commands/subscriptioncmds.c | 272 +++ src/backend/executor/execReplication.c | 2 +- src/backend/nodes/copyfuncs.c | 1 + src/backend/nodes/equalfuncs.c | 1 + src/backend/parser/gram.y | 32 ++ src/backend/replication/logical/proto.c | 52 +++ src/backend/replication/logical/tablesync.c | 114 ++- src/backend/replication/logical/worker.c| 56 src/backend/replication/pgoutput/pgoutput.c | 85 - src/backend/utils/cache/relcache.c | 4 +- src/bin/psql/tab-complete.c | 14 +- src/include/catalog/pg_proc.dat | 5 + src/include/catalog/pg_publication.h| 14 + src/include/commands/sequence.h | 1 + src/include/nodes/parsenodes.h | 6 + src/include/replication/logicalproto.h | 19 ++ src/include/replication/pgoutput.h | 1 + src/test/regress/expected/rules.out | 8 + src/test/subscription/t/028_sequences.pl| 201 +++ 26 files changed, 1539 insertions(+), 36 deletions(-) create mode 100644 src/test/subscription/t/028_sequences.pl diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 879d2dbce03..271dc03e5a2 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -9540,6 +9540,11 @@ SCRAM-SHA-256$iteration count: prepared transactions + + pg_publication_sequences + publications and their associated sequences + + pg_publication_tables publications and their associated tables @@ -11375,6 +11380,72 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx + + pg_publication_sequences + + + pg_publication_sequences + + + + The view pg_publication_sequences provides + information about the mapping between publications and the sequences they + contain. Unlike the underlying catalog + pg_publication_rel, + this view expands + publications defined as FOR ALL SEQUENCES, so for such + publications there will be a row for each eligible sequence. + + + + pg_publication_sequences Columns + + + + + Column Type + + + Description + + + + + + + + pubname name + (references pg_publication.pubname) + + + Name of publication + + + + + + schemaname name + (references pg_namespace.nspname) + + + Name of schema containing sequence + + + + + + sequencename name + (references pg_class.relname) + + + Name of sequence + + + + + + + pg_publication_tables diff --git
Re: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 1173, ...
On 2/13/22 01:28, Andres Freund wrote: > Hi, > > On February 12, 2022 3:57:28 PM PST, Thomas Munro > wrote: >> Hi, >> >> I don't know what happened here, but I my little script that scrapes >> for build farm assertion failures hasn't seen this one before now. >> It's on a 15 year old macOS release and compiled with 15 year old GCC >> version, for some added mystery. >> >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust=2022-02-12%2022%3A07%3A21 >> >> 2022-02-13 00:09:52.859 CET [92423:92] pg_regress/replorigin >> STATEMENT: SELECT data FROM >> pg_logical_slot_get_changes('regression_slot', NULL, NULL, >> 'include-xids', '0', 'skip-empty-xacts', '1', 'include-sequences', >> '0'); >> TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: >> "reorderbuffer.c", Line: 1173, PID: 92423) >> 0 postgres0x00592dd0 ExceptionalCondition + >> 160\\0 >> 1 postgres0x00391c40 >> ReorderBufferGetRelids + 304\\0 >> 2 postgres0x00394794 >> ReorderBufferAllocate + 1044\\0 >> 3 postgres0x003834dc heap_decode + 76\\0 >> 4 postgres0x003829c8 >> LogicalDecodingProcessRecord + 168\\0 >> 5 postgres0x0038a1d4 >> pg_logical_emit_message_text + 1876\\0 >> 6 postgres0x00241810 >> ExecMakeTableFunctionResult + 576\\0 >> 7 postgres0x002576c4 >> ExecInitFunctionScan + 1444\\0 >> 8 postgres0x00242140 ExecScan + 896\\0 >> 9 postgres0x002397cc standard_ExecutorRun + >> 540\\0 >> 10 postgres0x00427940 >> EnsurePortalSnapshotExists + 832\\0 >> 11 postgres0x00428e84 PortalRun + 644\\0 >> 12 postgres0x004252e0 pg_plan_queries + 3232\\0 >> 13 postgres0x00426270 PostgresMain + 3424\\0 >> 14 postgres0x0036b9ec PostmasterMain + 8060\\0 >> 15 postgres0x0029f24c main + 1356\\0 >> 16 postgres0x2524 start + 68\\0 > > I think that may be the bug that Tomas fixed a short while ago. He > said the skip empty xacts stuff didn't yet work for sequences. That's > only likely to be hit in slower machines... > I'm not sure how could this be related to the issue fixed by b779d7d8fd. That's merely about handling of empty xacts in the output plugin, it has little (nothing) to do with reorderbuffer - the failures was simply about (not) printing BEGIN/COMMIT for empty xacts. So why would it be triggering an Assert in reorderbuffer? Also, there was a successful run with the test_decoding changes 80901b3291 (and also with sequence decoding infrastructure committed a couple days ago) ... Do we have access to the machine? If yes, it'd be interesting to run the tests before the fix, and see how often it fails. Also, printing the LSN would be interesting, so that we can correlate it to WAL. BTW it's probably just a coincidence, but the cfbot does show a strange failure on the macOS machine [1], as if we did not wait for all changes to be applied by replica. I thought it's because of the issue with tracking LSN for sequences [2], but this is with a reworked test that does not do just nextval(). So I'm wondering if these two things may be related ... both things happened on macOS only. [1] https://cirrus-ci.com/task/6299472241098752 [2] https://www.postgresql.org/message-id/712cad46-a9c8-1389-aef8-faf0203c9be9%40enterprisedb.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Consistently use "startup process"/"WAL sender" instead of "Startup process"/"WAL Sender" in comments and docs.
On Sun, Feb 13, 2022 at 3:13 PM Bharath Rupireddy wrote: > > Hi, > > In the code comments, it is being used as "Startup process" in the > middle of the sentences whereas in most of the places "startup > process" is used. Also, "WAL Sender" is being used instead of "WAL > sender". Let's be consistent across the docs and code comments. FWIW, docs need to hold to a higher standard than code comments. -- John Naylor EDB: http://www.enterprisedb.com
Re: Non-decimal integer literals
On Wed, Jan 26, 2022 at 10:10 PM Peter Eisentraut wrote: > [v8 patch] 0001-0004 seem pretty straightforward. 0005: {realfail1} { - /* - * throw back the [Ee], and figure out whether what - * remains is an {integer} or {decimal}. - */ - yyless(yyleng - 1); SET_YYLLOC(); - return process_integer_literal(yytext, yylval); + yyerror("trailing junk after numeric literal"); } realfail1 has been subsumed by integer_junk and decimal_junk, so that pattern can be removed. { +/* + * Note that some trailing junk is valid in C (such as 100LL), so we contain + * this to SQL mode. + */ It seems Flex doesn't like C comments after the "%%", so this stanza was indented in 0006. If these are to be committed separately, that fix should happen here. 0006: Minor nit -- the s/decimal/numeric/ change doesn't seem to have any notational advantage, but it's not worse, either. 0007: I've attached an addendum to restore the no-backtrack property. Will the underscore syntax need treatment in the input routines as well? -- John Naylor EDB: http://www.enterprisedb.com diff --git a/src/backend/parser/Makefile b/src/backend/parser/Makefile index 827bc4c189..5ddb9a92f0 100644 --- a/src/backend/parser/Makefile +++ b/src/backend/parser/Makefile @@ -56,7 +56,7 @@ gram.c: BISON_CHECK_CMD = $(PERL) $(srcdir)/check_keywords.pl $< $(top_srcdir)/s scan.c: FLEXFLAGS = -CF -p -p -#scan.c: FLEX_NO_BACKUP=yes +scan.c: FLEX_NO_BACKUP=yes scan.c: FLEX_FIX_WARNING=yes diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l index 5b574c4233..3b311ac2dd 100644 --- a/src/backend/parser/scan.l +++ b/src/backend/parser/scan.l @@ -400,9 +400,9 @@ hexinteger 0[xX](_?{hexdigit})+ octinteger 0[oO](_?{octdigit})+ bininteger 0[bB](_?{bindigit})+ -hexfail0[xX] -octfail0[oO] -binfail0[bB] +hexfail0[xX]_? +octfail0[oO]_? +binfail0[bB]_? numeric(({decinteger}\.{decinteger}?)|(\.{decinteger})) numericfail{decdigit}+\.\.
Re: pgsql: pg_upgrade: Preserve relfilenodes and tablespace OIDs.
Re: Robert Haas > pg_upgrade: Preserve relfilenodes and tablespace OIDs. > src/bin/pg_dump/pg_dumpall.c | 3 + --- a/src/bin/pg_dump/pg_dumpall.c +++ b/src/bin/pg_dump/pg_dumpall.c @@ -1066,6 +1066,9 @@ dumpTablespaces(PGconn *conn) /* needed for buildACLCommands() */ fspcname = pg_strdup(fmtId(spcname)); + appendPQExpBufferStr(buf, "\n-- For binary upgrade, must preserve pg_table + appendPQExpBuffer(buf, "SELECT pg_catalog.binary_upgrade_set_next_pg_table This needs to be guarded with "if (binary_upgrade)". Error message during a Debian pg_upgradecluster (-m dump) from 14 to 15: 2022-02-13 12:44:01.272 CET [168032] postgres@template1 LOG: statement: SELECT pg_catalog.binary_upgrade_set_next_pg_tablespace_oid('16408'::pg_catalog.oid); 2022-02-13 12:44:01.272 CET [168032] postgres@template1 ERROR: function can only be called when server is in binary upgrade mode 2022-02-13 12:44:01.272 CET [168032] postgres@template1 STATEMENT: SELECT pg_catalog.binary_upgrade_set_next_pg_tablespace_oid('16408'::pg_catalog.oid); Christoph
Re: Adding CI to our tree
Hi, On 2022-02-12 23:19:38 -0600, Justin Pryzby wrote: > On Sat, Feb 12, 2022 at 05:20:08PM -0800, Andres Freund wrote: > > What was the reason behind moving the docs stuff from the compiler warnings > > task to linux? > > I wanted to build docs even if the linux task fails. To allow CFBOT to link > to > them, so somoene can always review the docs, in HTML (rather than reading SGML > with lines prefixed with +). I'd be ok with running the compiler warnings job without the dependency, if that's the connection. > BTW, docs can be built in parallel, and CI is using BUILD_JOBS: 4. > /usr/bin/xmllint --path . --noout --valid postgres.sgml > /usr/bin/xmllint --path . --noout --valid postgres.sgml > /usr/bin/xsltproc --path . --stringparam pg.version '15devel' stylesheet.xsl > postgres.sgml > /usr/bin/xsltproc --path . --stringparam pg.version '15devel' > stylesheet-man.xsl postgres.sgml Sure, it just doesn't make a difference: make -j48 -C doc/src/sgml/ maintainer-clean && time make -j48 -C doc/src/sgml/ real0m34.626s user0m34.342s sys 0m0.298s make -j48 -C doc/src/sgml/ maintainer-clean && time make -C doc/src/sgml/ real0m34.780s user0m34.494s sys 0m0.285s > > 1) iterate over release branches and see which has the smallest diff > > Maybe for each branch: do echo `git revlist or merge base |wc -l` $branch; > done |sort -n |head -1 > > > > > Is looking at the .c files in the change really a reliable predictor of > > > > where > > > > code coverage changes? I'm doubtful. Consider stuff like removing the > > > > last > > > > user of some infrastructure or such. Or adding the first. > > > > > > You're right that it isn't particularly accurate, but it's a step forward > > > if > > > lots of patches use it to check/improve coverge of new code. > > > > Maybe it's good enough... The overhead in test runtime is noticeable (~5.30m > > -> ~7.15m), but probably acceptable. Although I also would like to enable > > -fsanitize=alignment and -fsanitize=alignment, which add about 2 minutes as > > well (-fsanitize=address is a lot more expensive), they both work best on > > linux. > > There's other things that it'd be nice to enable, but maybe these don't need > to > be on by default. Maybe just have a list of optional compiler flags (and > hints > when they're useful). Like WRITE_READ_PARSE_PLAN_TREES. I think it'd be good to enable a reasonable set by default. Particularly for newer contributors stuff like forgetting in/out/readfuncs, or not knowing about some undefined behaviour, is easy. Probably makes sense to use different settings on different tasks. Greetings, Andres Freund
Consistently use "startup process"/"WAL sender" instead of "Startup process"/"WAL Sender" in comments and docs.
Hi, In the code comments, it is being used as "Startup process" in the middle of the sentences whereas in most of the places "startup process" is used. Also, "WAL Sender" is being used instead of "WAL sender". Let's be consistent across the docs and code comments. Attaching a patch. Thoughts? Regards, Bharath Rupireddy. v1-0001-s-Startup-process-startup-process-and-s-WAL-Sende.patch Description: Binary data