Re: Introduce pg_receivewal gzip compression tests
On Mon, Jul 19, 2021 at 04:03:33PM +0900, Michael Paquier wrote: > Another advantage of this patch is the handling of ".gz" is reduced to > one code path instead of four. That makes a bit easier the > introduction of new compression methods. > > A second thing that was really confusing is that the name of the WAL > segment generated in this code path completely ignored the type of > compression. This led to one confusing error message if failing to > open a segment for write where we'd mention a .partial file rather > than a .gz.partial file. The versions of zlib I used on Windows > looked buggy so I cannot conclude there, but I am sure that this > should allow bowerbird to handle the test correctly. After more testing and more review, I have applied and backpatched this stuff. Another thing I did on HEAD was to enable again the ZLIB portion of the pg_receivewal tests on Windows. bowerdird should stay green (I hope), and it is better to have as much more coverage as possible for all that. -- Michael signature.asc Description: PGP signature
Re: Introduce pg_receivewal gzip compression tests
On Fri, Jul 16, 2021 at 02:08:57PM +0900, Michael Paquier wrote: > This behavior is rather debatable, and it would be more instinctive to > me to just skip any business related to the pre-padding if compression > is enabled, at the cost of one extra callback in WalWriteMethod to > grab the compression level (dir_open_for_write() skips that for > compression) to allow receivelog.c to handle that. But at the same > time few users are going to care about that as pg_receivewal has most > likely always the same set of options, so complicating this code is > not really appealing either. I have chewed on that over the weekend, and skipping the padding logic if we are in compression mode in open_walfile() makes sense, so attached is a patch that I'd like to backpatch. Another advantage of this patch is the handling of ".gz" is reduced to one code path instead of four. That makes a bit easier the introduction of new compression methods. A second thing that was really confusing is that the name of the WAL segment generated in this code path completely ignored the type of compression. This led to one confusing error message if failing to open a segment for write where we'd mention a .partial file rather than a .gz.partial file. The versions of zlib I used on Windows looked buggy so I cannot conclude there, but I am sure that this should allow bowerbird to handle the test correctly. -- Michael diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c index 3952a3f943..7af9009320 100644 --- a/src/bin/pg_basebackup/receivelog.c +++ b/src/bin/pg_basebackup/receivelog.c @@ -88,26 +88,29 @@ static bool open_walfile(StreamCtl *stream, XLogRecPtr startpoint) { Walfile*f; - char fn[MAXPGPATH]; + char *fn; ssize_t size; XLogSegNo segno; XLByteToSeg(startpoint, segno, WalSegSz); XLogFileName(current_walfile_name, stream->timeline, segno, WalSegSz); - snprintf(fn, sizeof(fn), "%s%s", current_walfile_name, - stream->partial_suffix ? stream->partial_suffix : ""); + /* Note that this considers the compression used if necessary */ + fn = stream->walmethod->get_file_name(current_walfile_name, + stream->partial_suffix); /* * When streaming to files, if an existing file exists we verify that it's * either empty (just created), or a complete WalSegSz segment (in which * case it has been created and padded). Anything else indicates a corrupt - * file. + * file. Compressed files have no need for padding, so just ignore this + * case. * * When streaming to tar, no file with this name will exist before, so we * never have to verify a size. */ - if (stream->walmethod->existsfile(fn)) + if (stream->walmethod->compression() == 0 && + stream->walmethod->existsfile(fn)) { size = stream->walmethod->get_file_size(fn); if (size < 0) diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl index 158f7d176f..17fd71a450 100644 --- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl +++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl @@ -72,13 +72,11 @@ $primary->command_ok( my @partial_wals = glob "$stream_dir/*\.partial"; is(scalar(@partial_wals), 1, "one partial WAL segment was created"); -# Check ZLIB compression if available. On Windows, some old versions -# of zlib can cause some instabilities with this test, so disable it -# for now. +# Check ZLIB compression if available. SKIP: { - skip "postgres was not built with ZLIB support, or Windows is involved", 5 - if (!check_pg_config("#define HAVE_LIBZ 1") || $windows_os); + skip "postgres was not built with ZLIB support", 5 + if (!check_pg_config("#define HAVE_LIBZ 1")); # Generate more WAL worth one completed, compressed, segment. $primary->psql('postgres', 'SELECT pg_switch_wal();'); diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c index a15bbb20e7..3c0da70a04 100644 --- a/src/bin/pg_basebackup/walmethods.c +++ b/src/bin/pg_basebackup/walmethods.c @@ -68,20 +68,32 @@ dir_getlasterror(void) return strerror(errno); } +static char * +dir_get_file_name(const char *pathname, const char *temp_suffix) +{ + char *tmppath = pg_malloc0(MAXPGPATH * sizeof(char)); + + snprintf(tmppath, MAXPGPATH, "%s%s%s", + pathname, dir_data->compression > 0 ? ".gz" : "", + temp_suffix ? temp_suffix : ""); + + return tmppath; +} + static Walfile dir_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_size) { static char tmppath[MAXPGPATH]; + char *filename; int fd; DirectoryMethodFile *f; #ifdef HAVE_LIBZ gzFile gzfp = NULL; #endif - snprintf(tmppath, sizeof(tmppath), "%s/%s%s%s", - dir_data->basedir, pathname, - dir_data->compression > 0 ? ".gz" : "", - temp_suffix ? temp_suffix : ""); + filename = dir_get_file_name(pathname, temp_suffix); + snprintf(tmppath, sizeof(tmppath), "%s/%s", + dir_data->basedir, filename); /* * Open a file for non-compres
Re: Introduce pg_receivewal gzip compression tests
On Fri, Jul 16, 2021 at 08:59:11AM +0900, Michael Paquier wrote: > --compress is used and the sync fails for a non-compressed segment. > Looking at the code it is pretty obvious that open_walfile() is > getting confused with the handling of an existing .partial segment > while walmethods.c uses dir_data->compression in all the places that > matter. So that's a legit bug, that happens only when mixing > pg_receivewal runs for where successive runs use the compression or > non-compression modes. Ditto. After reading the code more carefully, the code is actually able to work even if it could be cleaner: 1) dir_existsfile() would check for the existence of a non-compressed, partial segment all the time. 2) If this non-compressed file was padded, the code would use open_for_write() that would open a compressed, partial segment. 3) The compressed, partial segment would be the one flushed. This behavior is rather debatable, and it would be more instinctive to me to just skip any business related to the pre-padding if compression is enabled, at the cost of one extra callback in WalWriteMethod to grab the compression level (dir_open_for_write() skips that for compression) to allow receivelog.c to handle that. But at the same time few users are going to care about that as pg_receivewal has most likely always the same set of options, so complicating this code is not really appealing either. > I am amazed that the other buildfarm members are not complaining, to > be honest. jacana runs this TAP test with MinGW and ZLIB, and does > not complain. I have spent more time on that with my own environment, and while testing I have bumped on a different issue with zlib, which was really weird. In the same scenario as above, gzdopen() has been failing for me at step 2), causing the test to loop forever. We document to use DLLs for ZLIB coming from zlib.net, but the ones available there are really outdated as far as I can see (found some called zlib.lib/dll myself, breaking Solution.pm). For now I have disabled those tests on Windows to bring back bowerbird to green, but there is something else going on here. We don't do much tests with ZLIB on Windows for pg_basebackup and pg_dump, so there may be some more issues? @Andrew: which version of ZLIB are you using on bowerbird? That's the one in c:\prog\3p64. That's a zdll.lib, right? -- Michael signature.asc Description: PGP signature
Re: Introduce pg_receivewal gzip compression tests
On Thu, Jul 15, 2021 at 09:35:52PM +0900, Michael Paquier wrote: > For this one, I'll try to test harder on my own host. I am curious to > see if the other Windows members running the TAP tests have anything > to say. Looking at the code of zlib, this would come from gz_zero() > in gzflush(), which could blow up on a write() in gz_comp(). bowerbird has just failed for the second time in a row on EACCESS, so there is more here than meets the eye. Looking at the code, I think I have spotted what it is and the buildfarm logs give a very good hint: # Running: pg_receivewal -D :/prog/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/t_020_pg_receivewal_primary_data/archive_wal --verbose --endpos 0/328 --compress 1 pg_receivewal: starting log streaming at 0/200 (timeline 1) pg_receivewal: fatal: could not fsync existing write-ahead log file "00010002.partial": Permission denied not ok 20 - streaming some WAL using ZLIB compression --compress is used and the sync fails for a non-compressed segment. Looking at the code it is pretty obvious that open_walfile() is getting confused with the handling of an existing .partial segment while walmethods.c uses dir_data->compression in all the places that matter. So that's a legit bug, that happens only when mixing pg_receivewal runs for where successive runs use the compression or non-compression modes. I am amazed that the other buildfarm members are not complaining, to be honest. jacana runs this TAP test with MinGW and ZLIB, and does not complain. -- Michael signature.asc Description: PGP signature
Re: Introduce pg_receivewal gzip compression tests
‐‐‐ Original Message ‐‐‐ On Thursday, July 15th, 2021 at 14:35, Michael Paquier wrote: > On Thu, Jul 15, 2021 at 08:35:27PM +0900, Michael Paquier wrote: > > > 2. curculio: > > Looking at the OpenBSD code (usr.bin/compress/main.c), long options > > are supported, where --version does exit(0) without printing > > set_outfile() is doing a discard of the file suffixes it does not > > recognize, and I think that their implementation bumps on .gz.partial > > and generates an exit code of 512 to map with WARNING. I still wish > > to keep this test, and I'd like to think that the contents of > > @zlib_wals are enough in terms of coverage. What do you think? > > After thinking more about this one, I have taken the course to just > remove the .gz.partial segment from the check, a full segment should > be enough in terms of coverage. I prefer this simplification over a > rename of the .partial segment or a tweak of the error code to map > with WARNING. Fair enough. Cheers, //Georgios > --- > > Michael
Re: Introduce pg_receivewal gzip compression tests
On Thu, Jul 15, 2021 at 08:35:27PM +0900, Michael Paquier wrote: > 1) bowerbird on Windows/MSVC: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2021-07-15%2010%3A30%3A36 > pg_receivewal: fatal: could not fsync existing write-ahead log file > "00010002.partial": Permission denied > not ok 20 - streaming some WAL using ZLIB compression > I don't think the existing code can be blamed for that as this means a > failure with gzflush(). Likely a concurrency issue as that's an > EACCES. If that's repeatable, that could point to an actual issue > with pg_receivewal --compress. For this one, I'll try to test harder on my own host. I am curious to see if the other Windows members running the TAP tests have anything to say. Looking at the code of zlib, this would come from gz_zero() in gzflush(), which could blow up on a write() in gz_comp(). > 2) curculio: > > Looking at the OpenBSD code (usr.bin/compress/main.c), long options > are supported, where --version does exit(0) without printing > anything, and --test is supported even if that's not on the man pages. > set_outfile() is doing a discard of the file suffixes it does not > recognize, and I think that their implementation bumps on .gz.partial > and generates an exit code of 512 to map with WARNING. I still wish > to keep this test, and I'd like to think that the contents of > @zlib_wals are enough in terms of coverage. What do you think? After thinking more about this one, I have taken the course to just remove the .gz.partial segment from the check, a full segment should be enough in terms of coverage. I prefer this simplification over a rename of the .partial segment or a tweak of the error code to map with WARNING. -- Michael signature.asc Description: PGP signature
Re: Introduce pg_receivewal gzip compression tests
On Thu, Jul 15, 2021 at 07:48:08AM +, gkokola...@pm.me wrote: > Let us hope that it will prevent some bugs from happening. The buildfarm has two reports. 1) bowerbird on Windows/MSVC: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2021-07-15%2010%3A30%3A36 pg_receivewal: fatal: could not fsync existing write-ahead log file "00010002.partial": Permission denied not ok 20 - streaming some WAL using ZLIB compression I don't think the existing code can be blamed for that as this means a failure with gzflush(). Likely a concurrency issue as that's an EACCES. If that's repeatable, that could point to an actual issue with pg_receivewal --compress. 2) curculio: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2021-07-15%2010%3A30%3A15 # Running: gzip --test /home/pgbf/buildroot/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/t_020_pg_receivewal_primary_data/archive_wal/00010002.gz /home/pgbf/buildroot/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/t_020_pg_receivewal_primary_data/archive_wal/00010003.gz.partial gzip: /home/pgbf/buildroot/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/t_020_pg_receivewal_primary_data/archive_wal/00010003.gz.partial: unknown suffix: ignored not ok 24 - gzip verified the integrity of compressed WAL segments Looking at the OpenBSD code (usr.bin/compress/main.c), long options are supported, where --version does exit(0) without printing anything, and --test is supported even if that's not on the man pages. set_outfile() is doing a discard of the file suffixes it does not recognize, and I think that their implementation bumps on .gz.partial and generates an exit code of 512 to map with WARNING. I still wish to keep this test, and I'd like to think that the contents of @zlib_wals are enough in terms of coverage. What do you think? -- Michael signature.asc Description: PGP signature
Re: Introduce pg_receivewal gzip compression tests
‐‐‐ Original Message ‐‐‐ On Thursday, July 15th, 2021 at 09:00, Michael Paquier wrote: > On Wed, Jul 14, 2021 at 02:11:09PM +, gkokola...@pm.me wrote: > > > Please find v6 attached. > > Thanks. I have spent some time checking this stuff in details, and > I did some tests on Windows while on it. A run of pgperltidy was > missing. A second thing is that you added one useless WAL segment > switch in the ZLIB block, and two at the end, causing the first two in > the set of three (one in the ZLIB block and one in the final command) > to be no-ops as they followed a previous WAL switch. The final one > was not needed as no WAL is generated after that. > Thank you for the work and comments. > And applied. Let's see if the buildfarm has anything to say. Perhaps > this will even catch some bugs that pre-existed. Let us hope that it will prevent some bugs from happening. Cheers, //Georgios > --- > > Michael
Re: Introduce pg_receivewal gzip compression tests
On Wed, Jul 14, 2021 at 02:11:09PM +, gkokola...@pm.me wrote: > Please find v6 attached. Thanks. I have spent some time checking this stuff in details, and I did some tests on Windows while on it. A run of pgperltidy was missing. A second thing is that you added one useless WAL segment switch in the ZLIB block, and two at the end, causing the first two in the set of three (one in the ZLIB block and one in the final command) to be no-ops as they followed a previous WAL switch. The final one was not needed as no WAL is generated after that. And applied. Let's see if the buildfarm has anything to say. Perhaps this will even catch some bugs that pre-existed. -- Michael signature.asc Description: PGP signature
Re: Introduce pg_receivewal gzip compression tests
‐‐‐ Original Message ‐‐‐ On Wednesday, July 14th, 2021 at 04:17, Michael Paquier wrote: > On Tue, Jul 13, 2021 at 11:16:06AM +, gkokola...@pm.me wrote: > > Agreed. For the record that is why I said v6 :) > Okay, thanks. Please find v6 attached. Cheers, //Georgios > --- > > Michael v6-0001-Introduce-pg_receivewal-gzip-compression-tests.patch Description: Binary data
Re: Introduce pg_receivewal gzip compression tests
On Tue, Jul 13, 2021 at 11:16:06AM +, gkokola...@pm.me wrote: > Agreed. For the record that is why I said v6 :) Okay, thanks. -- Michael signature.asc Description: PGP signature
Re: Introduce pg_receivewal gzip compression tests
‐‐‐ Original Message ‐‐‐ On Tuesday, July 13th, 2021 at 12:26, Michael Paquier wrote: > On Tue, Jul 13, 2021 at 08:28:44AM +, gkokola...@pm.me wrote: > > Sounds great. Let me cook up v6 for this. > > Thanks. Could you use v5 I posted upthread as a base? There were > some improvements in the variable names, the comments and the test > descriptions. Agreed. For the record that is why I said v6 :) Cheers, //Georgios > - > Michael
Re: Introduce pg_receivewal gzip compression tests
On Tue, Jul 13, 2021 at 08:28:44AM +, gkokola...@pm.me wrote: > Sounds great. Let me cook up v6 for this. Thanks. Could you use v5 I posted upthread as a base? There were some improvements in the variable names, the comments and the test descriptions. -- Michael signature.asc Description: PGP signature
Re: Introduce pg_receivewal gzip compression tests
‐‐‐ Original Message ‐‐‐ On Tuesday, July 13th, 2021 at 10:14, Michael Paquier wrote: > On Tue, Jul 13, 2021 at 04:37:53PM +0900, Michael Paquier wrote: > > Poking at this problem, I partially take this statement back as this > requires an initial run of pg_receivewal --endpos to ensure the > creation of one .gz and one .gz.partial. So I guess that this should > be structured as: > > 1. Keep the existing pg_receivewal --endpos. > 2. Add the ZLIB block, with one pg_receivewal --endpos. > 3. Add at the end one extra pg_receivewal --endpos, outside of the ZLIB > block, which should check the creation of a .partial, non-compressed > segment. This should mention that we place this command at the end of > the test for the start streaming point computation. > LZ4 tests would be then between 2) and 3), or 1) and 2). Sounds great. Let me cook up v6 for this. > > -- > > Michael
Re: Introduce pg_receivewal gzip compression tests
‐‐‐ Original Message ‐‐‐ On Tuesday, July 13th, 2021 at 09:37, Michael Paquier wrote: > On Tue, Jul 13, 2021 at 06:36:59AM +, gkokola...@pm.me wrote: > > > I am sorry this was not so clear. It is indeed running twice the binary > > with different flags. However the goal is not to check the flags, but > > to make certain that the partial file has now been completed. That is > > why there was code asserting that the previous FILENAME.gz.partial file > > after the second invocation is converted to FILENAME.gz. > > The first run you are adding checks the same thing thanks to > pg_switch_wal(), where pg_receivewal completes the generation of > 00010002.gz and finishes with > 00010003.gz.partial. This is correct. It is the 00010003 awl that the rest of the tests are targeting. > > > Additionally the second invocation of pg_receivewal is extending the > > coverage of FindStreamingStart(). > > Hmm. It looks like a waste in runtime once we mix LZ4 in that as that > would mean 5 runs of pg_receivewal, but we really need only three of > them with --endpos: > - One with ZLIB compression. > - One with LZ4 compression. > - One without compression. > > Do you think that we could take advantage of what is now the only run > of pg_receivewal --endpos for that? We could make the ZLIB checks run > first, conditionally, and then let the last command with --endpos > perform a full scan of the contents in $stream_dir with the .gz files > already in place. The addition of LZ4 would be an extra conditional > block similar to what's introduced in ZLIB, running before the last > command without compression. I will admit that for the current patch I am not taking lz4 into account as at the moment I have little idea as to how the lz4 patch will advance with the review rounds. I simply accepted that it will be rebased on top of the patch in the current thread and probably need to modify the current then. But I digress. I would like have some combination of .gz and .gz.parial but I will not take too strong of a stance. I am happy to go with your suggestion. Cheers, //Georgios > > -- > > Michael
Re: Introduce pg_receivewal gzip compression tests
On Tue, Jul 13, 2021 at 04:37:53PM +0900, Michael Paquier wrote: > Hmm. It looks like a waste in runtime once we mix LZ4 in that as that > would mean 5 runs of pg_receivewal, but we really need only three of > them with --endpos: > - One with ZLIB compression. > - One with LZ4 compression. > - One without compression. > > Do you think that we could take advantage of what is now the only run > of pg_receivewal --endpos for that? We could make the ZLIB checks run > first, conditionally, and then let the last command with --endpos > perform a full scan of the contents in $stream_dir with the .gz files > already in place. The addition of LZ4 would be an extra conditional > block similar to what's introduced in ZLIB, running before the last > command without compression. Poking at this problem, I partially take this statement back as this requires an initial run of pg_receivewal --endpos to ensure the creation of one .gz and one .gz.partial. So I guess that this should be structured as: 1) Keep the existing pg_receivewal --endpos. 2) Add the ZLIB block, with one pg_receivewal --endpos. 3) Add at the end one extra pg_receivewal --endpos, outside of the ZLIB block, which should check the creation of a .partial, non-compressed segment. This should mention that we place this command at the end of the test for the start streaming point computation. LZ4 tests would be then between 2) and 3), or 1) and 2). -- Michael signature.asc Description: PGP signature
Re: Introduce pg_receivewal gzip compression tests
On Tue, Jul 13, 2021 at 06:36:59AM +, gkokola...@pm.me wrote: > I am sorry this was not so clear. It is indeed running twice the binary > with different flags. However the goal is not to check the flags, but > to make certain that the partial file has now been completed. That is > why there was code asserting that the previous FILENAME.gz.partial file > after the second invocation is converted to FILENAME.gz. The first run you are adding checks the same thing thanks to pg_switch_wal(), where pg_receivewal completes the generation of 00010002.gz and finishes with 00010003.gz.partial. > Additionally the second invocation of pg_receivewal is extending the > coverage of FindStreamingStart(). Hmm. It looks like a waste in runtime once we mix LZ4 in that as that would mean 5 runs of pg_receivewal, but we really need only three of them with --endpos: - One with ZLIB compression. - One with LZ4 compression. - One without compression. Do you think that we could take advantage of what is now the only run of pg_receivewal --endpos for that? We could make the ZLIB checks run first, conditionally, and then let the last command with --endpos perform a full scan of the contents in $stream_dir with the .gz files already in place. The addition of LZ4 would be an extra conditional block similar to what's introduced in ZLIB, running before the last command without compression. -- Michael signature.asc Description: PGP signature
Re: Introduce pg_receivewal gzip compression tests
‐‐‐ Original Message ‐‐‐ On Tuesday, July 13th, 2021 at 03:53, Michael Paquier wrote: > On Mon, Jul 12, 2021 at 04:46:29PM +, gkokola...@pm.me wrote: > > > On Monday, July 12th, 2021 at 17:07, gkokola...@pm.me wrote: > > > > > ‐‐‐ Original Message ‐‐‐ > > Are you using outlook? The format of your messages gets blurry on the > > PG website, so does it for me. I am using protonmail's web page. I was not aware of the issue. Thank you for bringing it up to my attention. I shall try to address it. > > > > I am admittedly not so well versed on Windows systems. Thank you for > > > > > > informing me. > > > > > > Please find attached v3 of the patch where $ENV{GZIP_PROGRAM} is used > > > > > > instead. To the best of my knowledge one should avoid using $ENV{GZIP} > > > > > > because that would translate to the obsolete, yet used environment > > > > > > variable GZIP which holds a set of default options for gzip. In essence > > > > > > it would be equivalent to executing: > > > > > > GZIP=gzip gzip --test > > > > > > which can result to errors similar to: > > > > > > gzip: gzip: non-option in GZIP environment variable > > -# make this available to TAP test scripts > > +# make these available to TAP test scripts > > export TAR > > +export GZIP_PROGRAM=$(GZIP) > > Wow. So this comes from the fact that the command gzip can feed on > > the environment variable from the same name. I was not aware of > > that, and a comment would be in place here. That means complicating a > > bit the test flow for people on Windows, but I am fine to live with > > that as long as this does not fail hard. One extra thing we could do > > is drop this part of the test, but I agree that this is useful to have > > around as a validity check. Great. > > > After a bit more thinking, I went ahead and added on top of v3 a test > > > > verifying that the gzip program can actually be called. > > - system_log($gzip, '--version') != 0); > > > > Checking after that does not hurt, indeed. I am wondering if we > > should do that for TAR. I do not think that this will be a necessity for TAR. TAR after all is discovered by autoconf, which gzip is not. > > Another thing I find unnecessary is the number of the tests. This > > does two rounds of pg_receivewal just to test the long and short > > options of -Z/-compress, which brings only coverage to make sure that > > both option names are handled. That's a high cost for a low amound of > > extra coverage, so let's cut the runtime in half and just use the > > round with --compress. I am sorry this was not so clear. It is indeed running twice the binary with different flags. However the goal is not to check the flags, but to make certain that the partial file has now been completed. That is why there was code asserting that the previous FILENAME.gz.partial file after the second invocation is converted to FILENAME.gz. Additionally the second invocation of pg_receivewal is extending the coverage of FindStreamingStart(). The different flags was an added bonus. > > There was also a bit of confusion with ZLIB and gzip in the variable > > names and the comments, the latter being only the command while the > > compression happens with zlib. With a round of indentation on top of > > all that, I ge tthe attached. > > What do you think? Thank you very much for the patch. I would prefer to keep the parts that tests that .gz.partial are completed on a subsequent run if you agree. Cheers, //Georgios > -- > > Michael
Re: Introduce pg_receivewal gzip compression tests
On Mon, Jul 12, 2021 at 04:46:29PM +, gkokola...@pm.me wrote: > On Monday, July 12th, 2021 at 17:07, wrote: >> ‐‐‐ Original Message ‐‐‐ Are you using outlook? The format of your messages gets blurry on the PG website, so does it for me. >> I am admittedly not so well versed on Windows systems. Thank you for >> informing me. >> Please find attached v3 of the patch where $ENV{GZIP_PROGRAM} is used >> instead. To the best of my knowledge one should avoid using $ENV{GZIP} >> because that would translate to the obsolete, yet used environment >> variable GZIP which holds a set of default options for gzip. In essence >> it would be equivalent to executing: >> GZIP=gzip gzip --test >> which can result to errors similar to: >> gzip: gzip: non-option in GZIP environment variable -# make this available to TAP test scripts +# make these available to TAP test scripts export TAR +export GZIP_PROGRAM=$(GZIP) Wow. So this comes from the fact that the command gzip can feed on the environment variable from the same name. I was not aware of that, and a comment would be in place here. That means complicating a bit the test flow for people on Windows, but I am fine to live with that as long as this does not fail hard. One extra thing we could do is drop this part of the test, but I agree that this is useful to have around as a validity check. > After a bit more thinking, I went ahead and added on top of v3 a test > verifying that the gzip program can actually be called. + system_log($gzip, '--version') != 0); Checking after that does not hurt, indeed. I am wondering if we should do that for TAR. Another thing I find unnecessary is the number of the tests. This does two rounds of pg_receivewal just to test the long and short options of -Z/-compress, which brings only coverage to make sure that both option names are handled. That's a high cost for a low amound of extra coverage, so let's cut the runtime in half and just use the round with --compress. There was also a bit of confusion with ZLIB and gzip in the variable names and the comments, the latter being only the command while the compression happens with zlib. With a round of indentation on top of all that, I ge tthe attached. What do you think? -- Michael From 44d971f8d4187fa34dc7426b629eafeeb0af53c2 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 13 Jul 2021 10:51:20 +0900 Subject: [PATCH v5] Add some tests for pg_receivewal --compress --- src/bin/pg_basebackup/Makefile | 6 ++- src/bin/pg_basebackup/t/020_pg_receivewal.pl | 46 +++- 2 files changed, 50 insertions(+), 2 deletions(-) diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile index 66e0070f1a..459d514183 100644 --- a/src/bin/pg_basebackup/Makefile +++ b/src/bin/pg_basebackup/Makefile @@ -18,8 +18,12 @@ subdir = src/bin/pg_basebackup top_builddir = ../../.. include $(top_builddir)/src/Makefile.global -# make this available to TAP test scripts +# make these available to TAP test scripts export TAR +# Note that GZIP cannot be used directly as this environment variable is +# used by the command "gzip" to pass down options, so stick with a different +# name. +export GZIP_PROGRAM=$(GZIP) override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS) LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport) diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl index a547c97ef1..9fa1a3378b 100644 --- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl +++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl @@ -5,7 +5,7 @@ use strict; use warnings; use TestLib; use PostgresNode; -use Test::More tests => 19; +use Test::More tests => 23; program_help_ok('pg_receivewal'); program_version_ok('pg_receivewal'); @@ -66,6 +66,50 @@ $primary->command_ok( ], 'streaming some WAL with --synchronous'); +# Check ZLIB compression if available. +SKIP: +{ + skip "postgres was not build with ZLIB support", 4 + if (!check_pg_config("#define HAVE_LIBZ 1")); + + # Generate more WAL. + $primary->psql('postgres', 'SELECT pg_switch_wal();'); + $nextlsn = + $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();'); + chomp($nextlsn); + $primary->psql('postgres', + 'INSERT INTO test_table VALUES (generate_series(100,200));'); + $primary->psql('postgres', 'SELECT pg_switch_wal();'); + + $primary->command_ok( + [ + 'pg_receivewal', '-D', $stream_dir, '--verbose', + '--endpos', $nextlsn, '--compress', '1' + ], + "streaming some WAL using ZLIB compression"); + + # Verify that the stored files are generated with the expected names. + my @zlib_wals = glob "$stream_dir/*.gz"; + is(scalar(@zlib_wals), 1, + "one WAL segment compressed with ZLIB was created"); + my @zlib_partial_wals = glob "$stream_dir/*.gz.partial"; + is(scalar(@zlib_partial_wals), + 1, "one partial WAL segment compressed with ZLIB was created"); + + # There are two files com
Re: Introduce pg_receivewal gzip compression tests
‐‐‐ Original Message ‐‐‐ On Monday, July 12th, 2021 at 17:07, wrote: > ‐‐‐ Original Message ‐‐‐ > > On Monday, July 12th, 2021 at 13:04, Michael Paquier mich...@paquier.xyz > wrote: > > > On Mon, Jul 12, 2021 at 09:42:32AM +, gkokola...@pm.me wrote: > > > > > This to my understanding means that gzip is expected to exist. > > > > > > If this is correct, then simply checking for the headers should > > > > > > suffice, since that is the only dependency for the files to be > > > > > > created. > > > > You cannot expect this to work on Windows when it comes to MSVC for > > > > example, as gzip may not be in the environment PATH so the test would > > > > fail hard. Let's just rely on $ENV{GZIP} instead, and skip the test > > > > if it is not defined. > > I am admittedly not so well versed on Windows systems. Thank you for > > informing me. > > Please find attached v3 of the patch where $ENV{GZIP_PROGRAM} is used > > instead. To the best of my knowledge one should avoid using $ENV{GZIP} > > because that would translate to the obsolete, yet used environment > > variable GZIP which holds a set of default options for gzip. In essence > > it would be equivalent to executing: > > GZIP=gzip gzip --test > > which can result to errors similar to: > > gzip: gzip: non-option in GZIP environment variable > After a bit more thinking, I went ahead and added on top of v3 a test verifying that the gzip program can actually be called. Please find v4 attached. Cheers, //Georgios > > > Michael v4-0001-Introduce-pg_receivewal-gzip-compression-tests.patch Description: Binary data
Re: Introduce pg_receivewal gzip compression tests
‐‐‐ Original Message ‐‐‐ On Monday, July 12th, 2021 at 13:04, Michael Paquier wrote: > On Mon, Jul 12, 2021 at 09:42:32AM +, gkokola...@pm.me wrote: > > > This to my understanding means that gzip is expected to exist. > > > > If this is correct, then simply checking for the headers should > > > > suffice, since that is the only dependency for the files to be > > > > created. > > You cannot expect this to work on Windows when it comes to MSVC for > > example, as gzip may not be in the environment PATH so the test would > > fail hard. Let's just rely on $ENV{GZIP} instead, and skip the test > > if it is not defined. I am admittedly not so well versed on Windows systems. Thank you for informing me. Please find attached v3 of the patch where $ENV{GZIP_PROGRAM} is used instead. To the best of my knowledge one should avoid using $ENV{GZIP} because that would translate to the obsolete, yet used environment variable GZIP which holds a set of default options for gzip. In essence it would be equivalent to executing: GZIP=gzip gzip --test which can result to errors similar to: gzip: gzip: non-option in GZIP environment variable Cheers, //Georgios > > > Michael v3-0001-Introduce-pg_receivewal-gzip-compression-tests.patch Description: Binary data
Re: Introduce pg_receivewal gzip compression tests
‐‐‐ Original Message ‐‐‐ On Monday, July 12th, 2021 at 13:00, Gilles Darold wrote: > Le 12/07/2021 à 12:27, gkokola...@pm.me a écrit : > > > > > > Shouldn't this be coded as a loop going through @gzip_wals? > > > > > > > > > > I would hope that there is only one gz file created. There is a line > > > > > > > > further up that tests exactly that. > > > > > > > > - is (scalar(@gzip_wals), 1, "one gzip compressed WAL was created"); > > > > > > > > Let me amend that. The line should be instead: > > > > > > is (scalar(keys @gzip_wals), 1, "one gzip compressed WAL was created"); > > > > > > To properly test that there is one entry. > > > > > > Let me provide with v2 to fix this. > > The following tests are not correct in Perl even if Perl returns the > > right value. > > + is (scalar(keys @gzip_wals), 1, "one gzip compressed WAL was created"); > > + is (scalar(keys @gzip_partial_wals), 1, > > + "one partial gzip compressed WAL was created"); > > Function keys or values are used only with hashes but here you are using > > arrays. To obtain the length of the array you can just use the scalar > > function as Perl returns the length of the array when it is called in a > > scalar context. Please use the following instead: > > + is (scalar(@gzip_wals), 1, "one gzip compressed WAL was created"); > > + is (scalar(@gzip_partial_wals), 1, > > + "one partial gzip compressed WAL was created"); You are absolutely correct. I had used that in v1, yet since it got called out I doubted myself, assumed I was wrong and the rest is history. I shall ament the amendment for v3 of the patch. Cheers, //Georgios > > > - > > Gilles Darold > > http://www.darold.net/
Re: Introduce pg_receivewal gzip compression tests
On Mon, Jul 12, 2021 at 09:42:32AM +, gkokola...@pm.me wrote: > This to my understanding means that gzip is expected to exist. > If this is correct, then simply checking for the headers should > suffice, since that is the only dependency for the files to be > created. You cannot expect this to work on Windows when it comes to MSVC for example, as gzip may not be in the environment PATH so the test would fail hard. Let's just rely on $ENV{GZIP} instead, and skip the test if it is not defined. -- Michael signature.asc Description: PGP signature
Re: Introduce pg_receivewal gzip compression tests
Le 12/07/2021 à 12:27, gkokola...@pm.me a écrit : Shouldn't this be coded as a loop going through @gzip_wals? >>> I would hope that there is only one gz file created. There is a line >>> >>> further up that tests exactly that. >>> >>> - is (scalar(@gzip_wals), 1, "one gzip compressed WAL was created"); >> Let me amend that. The line should be instead: >> >> is (scalar(keys @gzip_wals), 1, "one gzip compressed WAL was created"); >> >> To properly test that there is one entry. >> >> Let me provide with v2 to fix this. The following tests are not correct in Perl even if Perl returns the right value. + is (scalar(keys @gzip_wals), 1, "one gzip compressed WAL was created"); + is (scalar(keys @gzip_partial_wals), 1, + "one partial gzip compressed WAL was created"); Function keys or values are used only with hashes but here you are using arrays. To obtain the length of the array you can just use the scalar function as Perl returns the length of the array when it is called in a scalar context. Please use the following instead: + is (scalar(@gzip_wals), 1, "one gzip compressed WAL was created"); + is (scalar(@gzip_partial_wals), 1, + "one partial gzip compressed WAL was created"); -- Gilles Darold http://www.darold.net/
Re: Introduce pg_receivewal gzip compression tests
‐‐‐ Original Message ‐‐‐ On Monday, July 12th, 2021 at 11:56, wrote: > ‐‐‐ Original Message ‐‐‐ > > On Monday, July 12th, 2021 at 11:42, gkokola...@pm.me wrote: > > > ‐‐‐ Original Message ‐‐‐ > > > > On Monday, July 12th, 2021 at 08:42, Michael Paquier mich...@paquier.xyz > > wrote: > > > > > On Fri, Jul 09, 2021 at 11:26:58AM +, Georgios wrote: > > > > > > > As suggested on a different thread [1], pg_receivewal can increase it's > > > > test > > > > > > > > coverage. There exists a non trivial amount of code that handles gzip > > > > > > > > compression. The current patch introduces tests that cover creation of > > > > gzip > > > > > > > > compressed WAL files and the handling of gzip partial segments. Finally > > > > the > > > > > > > > integrity of the compressed files is verified. > > > > > > - # Verify compressed file's integrity > > > > > > > > > - my $gzip_is_valid = system_log('gzip', '--test', > > > $gzip_wals[0]); > > > > > > > > > - is($gzip_is_valid, 0, "program gzip verified file's > > > integrity"); > > > > > > > > > > > > libz and gzip are usually split across different packages, hence there > > > > > > is no guarantee that this command is always available (same comment as > > > > > > for LZ4 from a couple of days ago). > > > > Of course. Though while going for it, I did find in Makefile.global.in: > > > > TAR = @TAR@ > > > > XGETTEXT = @XGETTEXT@ > > > > GZIP = gzip > > > > BZIP2 = bzip2 > > > > DOWNLOAD = wget -O $@ --no-use-server-timestamps > > > > Which is also used by GNUmakefile.in > > > > distcheck: dist > > > > rm -rf $(dummy) > > > > mkdir $(dummy) > > > > $(GZIP) -d -c $(distdir).tar.gz | $(TAR) xf - > > > > install_prefix=`cd $(dummy) && pwd`; \ > > > > This to my understanding means that gzip is expected to exist. > > > > If this is correct, then simply checking for the headers should > > > > suffice, since that is the only dependency for the files to be > > > > created. > > > > If this is wrong, then I will add the discovery code as in the > > > > other patch. > > > > > - [ > > > > > > > > > - 'pg_receivewal', '-D', $stream_dir, > > > '--verbose', > > > > > > > > > - '--endpos', $nextlsn, '-Z', '5' > > > > > > > > > - ], > > > > > > > > > > > > I would keep the compression level to a minimum here, to limit CPU > > > > > > usage but still compress something faster. > > > > > > - # Verify compressed file's integrity > > > > > > > > > - my $gzip_is_valid = system_log('gzip', '--test', > > > $gzip_wals[0]); > > > > > > > > > - is($gzip_is_valid, 0, "program gzip verified file's > > > integrity"); > > > > > > > > > > > > Shouldn't this be coded as a loop going through @gzip_wals? > > > > I would hope that there is only one gz file created. There is a line > > > > further up that tests exactly that. > > > > - is (scalar(@gzip_wals), 1, "one gzip compressed WAL was created"); > > Let me amend that. The line should be instead: > > is (scalar(keys @gzip_wals), 1, "one gzip compressed WAL was created"); > > To properly test that there is one entry. > > Let me provide with v2 to fix this. Please find v2 attached with the above. Cheers, //Georgios > > Cheers, > > //Georgios > > > Then there should also be a partial gz file which is tested further > > ahead. > > > > Cheers, > > > > //Georgios > > > > > > > Michael v2-0001-Introduce-pg_receivewal-gzip-compression-tests.patch Description: Binary data
Re: Introduce pg_receivewal gzip compression tests
‐‐‐ Original Message ‐‐‐ On Monday, July 12th, 2021 at 11:42, wrote: > ‐‐‐ Original Message ‐‐‐ > > On Monday, July 12th, 2021 at 08:42, Michael Paquier mich...@paquier.xyz > wrote: > > > On Fri, Jul 09, 2021 at 11:26:58AM +, Georgios wrote: > > > > > As suggested on a different thread [1], pg_receivewal can increase it's > > > test > > > > > > coverage. There exists a non trivial amount of code that handles gzip > > > > > > compression. The current patch introduces tests that cover creation of > > > gzip > > > > > > compressed WAL files and the handling of gzip partial segments. Finally > > > the > > > > > > integrity of the compressed files is verified. > > > > - # Verify compressed file's integrity > > > > > > - my $gzip_is_valid = system_log('gzip', '--test', $gzip_wals[0]); > > > > > > - is($gzip_is_valid, 0, "program gzip verified file's integrity"); > > > > > > > > libz and gzip are usually split across different packages, hence there > > > > is no guarantee that this command is always available (same comment as > > > > for LZ4 from a couple of days ago). > > Of course. Though while going for it, I did find in Makefile.global.in: > > TAR = @TAR@ > > XGETTEXT = @XGETTEXT@ > > GZIP = gzip > > BZIP2 = bzip2 > > DOWNLOAD = wget -O $@ --no-use-server-timestamps > > Which is also used by GNUmakefile.in > > distcheck: dist > > rm -rf $(dummy) > > mkdir $(dummy) > > $(GZIP) -d -c $(distdir).tar.gz | $(TAR) xf - > > install_prefix=`cd $(dummy) && pwd`; \ > > This to my understanding means that gzip is expected to exist. > > If this is correct, then simply checking for the headers should > > suffice, since that is the only dependency for the files to be > > created. > > If this is wrong, then I will add the discovery code as in the > > other patch. > > > - [ > > > > > > - 'pg_receivewal', '-D', $stream_dir, > > '--verbose', > > > > > > - '--endpos', $nextlsn, '-Z', '5' > > > > > > - ], > > > > > > > > I would keep the compression level to a minimum here, to limit CPU > > > > usage but still compress something faster. > > > > - # Verify compressed file's integrity > > > > > > - my $gzip_is_valid = system_log('gzip', '--test', $gzip_wals[0]); > > > > > > - is($gzip_is_valid, 0, "program gzip verified file's integrity"); > > > > > > > > Shouldn't this be coded as a loop going through @gzip_wals? > > I would hope that there is only one gz file created. There is a line > > further up that tests exactly that. > > - is (scalar(@gzip_wals), 1, "one gzip compressed WAL was created"); Let me amend that. The line should be instead: is (scalar(keys @gzip_wals), 1, "one gzip compressed WAL was created"); To properly test that there is one entry. Let me provide with v2 to fix this. Cheers, //Georgios > > Then there should also be a partial gz file which is tested further ahead. > > Cheers, > > //Georgios > > > Michael
Re: Introduce pg_receivewal gzip compression tests
‐‐‐ Original Message ‐‐‐ On Monday, July 12th, 2021 at 08:42, Michael Paquier wrote: > On Fri, Jul 09, 2021 at 11:26:58AM +, Georgios wrote: > > > As suggested on a different thread [1], pg_receivewal can increase it's test > > > > coverage. There exists a non trivial amount of code that handles gzip > > > > compression. The current patch introduces tests that cover creation of gzip > > > > compressed WAL files and the handling of gzip partial segments. Finally the > > > > integrity of the compressed files is verified. > > - # Verify compressed file's integrity > > > - my $gzip_is_valid = system_log('gzip', '--test', $gzip_wals[0]); > > > - is($gzip_is_valid, 0, "program gzip verified file's integrity"); > > > > libz and gzip are usually split across different packages, hence there > > is no guarantee that this command is always available (same comment as > > for LZ4 from a couple of days ago). Of course. Though while going for it, I did find in Makefile.global.in: TAR = @TAR@ XGETTEXT = @XGETTEXT@ GZIP= gzip BZIP2 = bzip2 DOWNLOAD = wget -O $@ --no-use-server-timestamps Which is also used by GNUmakefile.in distcheck: dist rm -rf $(dummy) mkdir $(dummy) $(GZIP) -d -c $(distdir).tar.gz | $(TAR) xf - install_prefix=`cd $(dummy) && pwd`; \ This to my understanding means that gzip is expected to exist. If this is correct, then simply checking for the headers should suffice, since that is the only dependency for the files to be created. If this is wrong, then I will add the discovery code as in the other patch. > > - [ > > > - 'pg_receivewal', '-D', $stream_dir, '--verbose', > > > - '--endpos', $nextlsn, '-Z', '5' > > > - ], > > > > I would keep the compression level to a minimum here, to limit CPU > > usage but still compress something faster. > > - # Verify compressed file's integrity > > > - my $gzip_is_valid = system_log('gzip', '--test', $gzip_wals[0]); > > > - is($gzip_is_valid, 0, "program gzip verified file's integrity"); > > > > Shouldn't this be coded as a loop going through @gzip_wals? I would hope that there is only one gz file created. There is a line further up that tests exactly that. + is (scalar(@gzip_wals), 1, "one gzip compressed WAL was created"); Then there should also be a partial gz file which is tested further ahead. Cheers, //Georgios > --- > > Michael
Re: Introduce pg_receivewal gzip compression tests
On Fri, Jul 09, 2021 at 11:26:58AM +, Georgios wrote: > As suggested on a different thread [1], pg_receivewal can increase it's test > coverage. There exists a non trivial amount of code that handles gzip > compression. The current patch introduces tests that cover creation of gzip > compressed WAL files and the handling of gzip partial segments. Finally the > integrity of the compressed files is verified. + # Verify compressed file's integrity + my $gzip_is_valid = system_log('gzip', '--test', $gzip_wals[0]); + is($gzip_is_valid, 0, "program gzip verified file's integrity"); libz and gzip are usually split across different packages, hence there is no guarantee that this command is always available (same comment as for LZ4 from a couple of days ago). + [ + 'pg_receivewal', '-D', $stream_dir, '--verbose', + '--endpos', $nextlsn, '-Z', '5' + ], I would keep the compression level to a minimum here, to limit CPU usage but still compress something faster. + # Verify compressed file's integrity + my $gzip_is_valid = system_log('gzip', '--test', $gzip_wals[0]); + is($gzip_is_valid, 0, "program gzip verified file's integrity"); Shouldn't this be coded as a loop going through @gzip_wals? -- Michael signature.asc Description: PGP signature
Introduce pg_receivewal gzip compression tests
Hi, As suggested on a different thread [1], pg_receivewal can increase it's test coverage. There exists a non trivial amount of code that handles gzip compression. The current patch introduces tests that cover creation of gzip compressed WAL files and the handling of gzip partial segments. Finally the integrity of the compressed files is verified. I hope you find this useful. Cheers, //Georgios [1] https://www.postgresql.org/message-id/flat/ZCm1J5vfyQ2E6dYvXz8si39HQ2gwxSZ3IpYaVgYa3lUwY88SLapx9EEnOf5uEwrddhx2twG7zYKjVeuP5MwZXCNPybtsGouDsAD1o2L_I5E%3D%40pm.me v1-0001-Introduce-pg_receivewal-gzip-compression-tests.patch Description: Binary data