Add LZ4 compression in pg_dump

2022-02-25 Thread Georgios
Hi,

please find attached a patchset which adds lz4 compression in pg_dump.

The first commit does the heavy lifting required for additional compression 
methods.
It expands testing coverage for the already supported gzip compression. Commit
bf9aa490db introduced cfp in compress_io.{c,h} with the intent of unifying
compression related code and allow for the introduction of additional archive
formats. However, pg_backup_archiver.c was not using that API. This commit
teaches pg_backup_archiver.c about cfp and is using it through out.

Furthermore, compression was chosen based on the value of the level passed
as an argument during the invocation of pg_dump or some hardcoded defaults. This
does not scale for more than one compression methods. Now the method used for
compression can be explicitly requested during command invocation, or set during
hardcoded defaults. Then it is stored in the relevant structs and passed in the
relevant functions, along side compression level which has lost it's special
meaning. The method for compression is not yet stored in the actual archive.
This is done in the next commit which does introduce a new method.

The previously named CompressionAlgorithm enum is changed for
CompressionMethod so that it matches better similar variables found through out
the code base.

In a fashion similar to the binary for pg_basebackup, the method for compression
is passed using the already existing -Z/--compress parameter of pg_dump. The
legacy format and behaviour is maintained. Additionally, the user can explicitly
pass a requested method and optionaly the level to be used after a 
semicolon,e.g. --compress=gzip:6

The second commit adds LZ4 compression in pg_dump and pg_restore.

Within compress_io.{c,h} there are two distinct APIs exposed, the streaming API
and a file API. The first one, is aimed at inlined use cases and thus simple
lz4.h calls can be used directly. The second one is generating output, or is
parsing input, which can be read/generated via the lz4 utility.

In the later case, the API is using an opaque wrapper around a file stream,
which aquired via fopen() or gzopen() respectively. It would then provide
wrappers around fread(), fwrite(), fgets(), fgetc(), feof(), and fclose(); or
their gz equivallents. However the LZ4F api does not provide this functionality.
So this has been implemented localy.

In order to maintain the API compatibility a new structure LZ4File is
introduced. It is responsible for keeping state and any yet unused generated
content. The later is required when the generated decompressed output, exceeds
the caller's buffer capacity.

Custom compressed archives need to now store the compression method in their
header. This requires a bump in the version number. The level of compression is
still stored in the dump, though admittedly is of no apparent use.

The series is authored by me. Rachel Heaton helped out with the expansion
of the testing coverage, testing in different platforms and providing debug 
information
on those, as well as native speaker wording.

Cheers,
//GeorgiosFrom 8dfe12b08378571add6e1d178bed97a61e988593 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Mon, 22 Nov 2021 09:58:49 +
Subject: [PATCH v1 1/2] Prepare pg_dump for additional compression methods

This commmit does the heavy lifting required for additional compression methods.
It expands testing coverage for the already supported gzip compression. Commit
bf9aa490db introduced cfp in compress_io.{c,h} with the intent of unifying
compression related code and allow for the introduction of additional archive
formats. However, pg_backup_archiver.c was not using that API. This commit
teaches pg_backup_archiver.c about cfp and is using it through out.

Furthermore, compression was chosen based on the value of the level passed
as an argument during the invocation of pg_dump or some hardcoded defaults. This
does not scale for more than one compression methods. Now the method used for
compression can be explicitly requested during command invocation, or set during
hardcoded defaults. Then it is stored in the relevant structs and passed in the
relevant functions, along side compression level which has lost it's special
meaning. The method for compression is not yet stored in the actual archive.
This is done in the next commit which does introduce a new method.

The previously named CompressionAlgorithm enum is changed for
CompressionMethod so that it matches better similar variables found through out
the code base.

In a fashion similar to the binary for pg_basebackup, the method for compression
is passed using the already existing -Z/--compress parameter of pg_dump. The
legacy format and behaviour is maintained. Additionally, the user can explicitly
pass a requested method and optionaly the level to be used after a semicolon,
e.g. --compress=gzip:6
---
 doc/src/sgml/ref/pg_dump.sgml |  30 +-
 src/bin/pg_dump/Makefile  |   2 +
 src/bin/pg_dump/compress_io.c |

Re: Add LZ4 compression in pg_dump

2022-03-24 Thread Greg Stark
It seems development on this has stalled. If there's no further work
happening I guess I'll mark the patch returned with feedback. Feel
free to resubmit it to the next CF when there's progress.




Re: Add LZ4 compression in pg_dump

2022-03-25 Thread Justin Pryzby
On Fri, Mar 25, 2022 at 01:20:47AM -0400, Greg Stark wrote:
> It seems development on this has stalled. If there's no further work
> happening I guess I'll mark the patch returned with feedback. Feel
> free to resubmit it to the next CF when there's progress.

Since it's a reasonably large patch (and one that I had myself started before)
and it's only been 20some days since (minor) review comments, and since the
focus right now is on committing features, and not reviewing new patches, and
this patch is new one month ago, and its 0002 not intended for pg15, therefor
I'm moving it to the next CF, where I hope to work with its authors to progress
it.

-- 
Justin




Re: Add LZ4 compression in pg_dump

2022-03-25 Thread gkokolatos



--- Original Message ---

On Saturday, March 26th, 2022 at 12:13 AM, Rachel Heaton 
 wrote:

> On Fri, Mar 25, 2022 at 6:22 AM Justin Pryzby pry...@telsasoft.com wrote:
>
> > On Fri, Mar 25, 2022 at 01:20:47AM -0400, Greg Stark wrote:
> >
> > > It seems development on this has stalled. If there's no further work
> > > happening I guess I'll mark the patch returned with feedback. Feel
> > > free to resubmit it to the next CF when there's progress.

We had some progress yet we didn't want to distract the list with too many
emails. Of course, it seemed stalled to the outside observer, yet I simply
wanted to set the record straight and say that we are actively working on it.

> >
> > Since it's a reasonably large patch (and one that I had myself started 
> > before)
> > and it's only been 20some days since (minor) review comments, and since the
> > focus right now is on committing features, and not reviewing new patches, 
> > and
> > this patch is new one month ago, and its 0002 not intended for pg15, 
> > therefor
> > I'm moving it to the next CF, where I hope to work with its authors to 
> > progress
> > it.

Thank you. It is much appreciated. We will sent updates when the next commitfest
starts in July as to not distract from the 15 work. Then, we can take it from 
there.

>
> Hi Folks,
>
> Here is an updated patchset from Georgios, with minor assistance from myself.
> The comments above should be addressed, but please let us know if

A small amendment to the above statement. This patchset does not include the
refactoring of compress_io suggested by Mr Paquier in the same thread, as it is
missing documentation. An updated version will be sent to include those changes
on the next commitfest.

> there are other things to go over. A functional change in this
> patchset is when `--compress=none` is passed to pg_dump, it will not
> compress for directory type (previously, it would use gzip if
> present). The previous default behavior is retained.
>
> - Rachel




Re: Add LZ4 compression in pg_dump

2022-03-25 Thread Michael Paquier
On Fri, Mar 25, 2022 at 11:43:17PM +, gkokola...@pm.me wrote:
> On Saturday, March 26th, 2022 at 12:13 AM, Rachel Heaton 
>  wrote:
>> Here is an updated patchset from Georgios, with minor assistance from myself.
>> The comments above should be addressed, but please let us know if
> 
> A small amendment to the above statement. This patchset does not include the
> refactoring of compress_io suggested by Mr Paquier in the same thread, as it 
> is
> missing documentation. An updated version will be sent to include those 
> changes
> on the next commitfest.

The refactoring using callbacks would make the code much cleaner IMO
in the long term, with zstd waiting in the queue.  Now, I see some
pieces of the patch set that could be merged now without waiting for
the development cycle of 16 to begin, as of 0001 to add more tests and
0002.

I have a question about 0002, actually.  What has led you to the
conclusion that this code is dead and could be removed?
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2022-03-25 Thread Justin Pryzby
On Sat, Mar 26, 2022 at 02:57:50PM +0900, Michael Paquier wrote:
> I have a question about 0002, actually.  What has led you to the
> conclusion that this code is dead and could be removed?

See 0001 and the manpage.

+   'pg_dump: compression is not supported by tar archive format');

When I submitted a patch to support zstd, I spent awhile trying to make
compression work with tar, but it's a significant effort and better done
separately.




Re: Add LZ4 compression in pg_dump

2022-03-26 Thread Justin Pryzby
LZ4F_HEADER_SIZE_MAX isn't defined in old LZ4.

I ran into that on an ubuntu LTS, so I don't think it's so old that it
shouldn't be handled more gracefully.  LZ4 should either have an explicit
version check, or else shouldn't depend on that feature (or should define a
safe fallback version if the library header doesn't define it).

https://packages.ubuntu.com/liblz4-1

0003: typo: of legacy => or legacy

There are a large number of ifdefs being added here - it'd be nice to minimize
that.  basebackup was organized to use separate files, which is one way.

$ git grep -c 'ifdef .*LZ4' src/bin/pg_dump/compress_io.c
src/bin/pg_dump/compress_io.c:19

In last year's CF entry, I had made a union within CompressorState.  LZ4
doesn't need z_streamp (and ztsd will need ZSTD_outBuffer, ZSTD_inBuffer,
ZSTD_CStream).

0002: I wonder if you're able to re-use any of the basebackup parsing stuff
from commit ffd53659c.  You're passing both the compression method *and* level.
I think there should be a structure which includes both.  In the future, that
can also handle additional options.  I hope to re-use these same things for
wal_compression=method:level.

You renamed this:

|-   COMPR_ALG_LIBZ
|-} CompressionAlgorithm;
|+   COMPRESSION_GZIP,
|+} CompressionMethod;

..But I don't think that's an improvement.  If you were to change it, it should
say something like PGDUMP_COMPRESS_ZLIB, since there are other compression
structs and typedefs.  zlib is not idential to gzip, which uses a different
header, so in WriteDataToArchive(), LIBZ is correct, and GZIP is incorrect.

The cf* changes in pg_backup_archiver could be split out into a separate
commit.  It's strictly a code simplification - not just preparation for more
compression algorithms.  The commit message should "See also:
bf9aa490db24b2334b3595ee33653bf2fe39208c".

The changes in 0002 for cfopen_write seem insufficient:
|+   if (compressionMethod == COMPRESSION_NONE)
|+   fp = cfopen(path, mode, compressionMethod, 0);
|else
|{
| #ifdef HAVE_LIBZ
|char   *fname;
| 
|fname = psprintf("%s.gz", path);
|-   fp = cfopen(fname, mode, compression);
|+   fp = cfopen(fname, mode, compressionMethod, compressionLevel);
|free_keep_errno(fname);
| #else

The only difference between the LIBZ and uncompressed case is the file
extension, and it'll be the only difference with LZ4 too.  So I suggest to
first handle the file extension, and the rest of the code path is not
conditional on the compression method.  I don't think cfopen_write even needs
HAVE_LIBZ - can't you handle that in cfopen_internal() ?

This patch rejects -Z0, which ought to be accepted:
./src/bin/pg_dump/pg_dump -h /tmp regression -Fc -Z0 |wc
pg_dump: error: can only specify -Z/--compress [LEVEL] when method is set

Your 0003 patch shouldn't reference LZ4:
+#ifndef HAVE_LIBLZ4
+   if (*compressionMethod == COMPRESSION_LZ4)
+   supports_compression = false;
+#endif

The 0004 patch renames zlibOutSize to outsize - I think the patch series should
be constructed such as to minimize the size of the method-specific patches.  I
say this anticipating also adding support for zstd.  The preliminary patches
should have all the boring stuff.  It would help for reviewing to keep the 
patches split up, or to enumerate all the boring things that are being renamed
(like change OutputContext to cfp, rename zlibOutSize, ...).

0004: The include should use  and not "lz4.h"

freebsd/cfbot is failing.

I suggested off-list to add an 0099 patch to change LZ4 to the default, to
exercise it more on CI.

-- 
Justin




Re: Add LZ4 compression in pg_dump

2022-03-26 Thread Justin Pryzby
On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote:
> You're passing both the compression method *and* level.  I think there should
> be a structure which includes both.  In the future, that can also handle
> additional options.

I'm not sure if there's anything worth saving, but I did that last year with
0003-Support-multiple-compression-algs-levels-opts.patch
I sent a rebased copy off-list.
https://www.postgresql.org/message-id/flat/20210104025321.ga9...@telsasoft.com#ca1b9f9d3552c87fa874731cad9d8391

|   fatal("not built with LZ4 support");
|   fatal("not built with lz4 support");

Please use consistent capitalization of "lz4" - then the compiler can optimize
away duplicate strings.

> 0004: The include should use  and not "lz4.h"

Also, use USE_LZ4 rather than HAVE_LIBLZ4, per 75eae0908.




Re: Add LZ4 compression in pg_dump

2022-03-26 Thread Daniel Gustafsson
> On 26 Mar 2022, at 17:21, Justin Pryzby  wrote:

> I suggested off-list to add an 0099 patch to change LZ4 to the default, to
> exercise it more on CI.

No need to change the defaults in autoconf for that.  The CFBot uses the cirrus
file in the tree so changing what the job includes can be easily done (assuming
the CFBot hasn't changed this recently which I think it hasn't).  I used that
trick in the NSS patchset to add a completely new job for --with-ssl=nss beside
the --with-ssl=openssl job.

--
Daniel Gustafsson   https://vmware.com/





Re: Add LZ4 compression in pg_dump

2022-03-26 Thread Justin Pryzby
On Sun, Mar 27, 2022 at 12:37:27AM +0100, Daniel Gustafsson wrote:
> > On 26 Mar 2022, at 17:21, Justin Pryzby  wrote:
> 
> > I suggested off-list to add an 0099 patch to change LZ4 to the default, to
> > exercise it more on CI.
> 
> No need to change the defaults in autoconf for that.  The CFBot uses the 
> cirrus
> file in the tree so changing what the job includes can be easily done 
> (assuming
> the CFBot hasn't changed this recently which I think it hasn't).  I used that
> trick in the NSS patchset to add a completely new job for --with-ssl=nss 
> beside
> the --with-ssl=openssl job.

I think you misunderstood - I'm suggesting not only to use with-lz4 (which was
always true since 93d973494), but to change pg_dump -Fc and -Fd to use LZ4 by
default (the same as I suggested for toast_compression, wal_compression, and
again in last year's patch to add zstd compression to pg_dump, for which
postgres was not ready).

@@ -781,6 +807,11 @@ main(int argc, char **argv)
compress.alg = COMPR_ALG_LIBZ;
compress.level = Z_DEFAULT_COMPRESSION;
 #endif
+
+#ifdef USE_ZSTD
+   compress.alg = COMPR_ALG_ZSTD; // Set default for 
testing purposes
+   compress.level = ZSTD_CLEVEL_DEFAULT;
+#endif





Re: Add LZ4 compression in pg_dump

2022-03-27 Thread Robert Haas
On Sat, Mar 26, 2022 at 12:22 PM Justin Pryzby  wrote:
> 0002: I wonder if you're able to re-use any of the basebackup parsing stuff
> from commit ffd53659c.  You're passing both the compression method *and* 
> level.
> I think there should be a structure which includes both.  In the future, that
> can also handle additional options.  I hope to re-use these same things for
> wal_compression=method:level.

Yeah, we should really try to use that infrastructure instead of
inventing a bunch of different ways to do it. It might require some
renaming here and there, and I'm not sure whether we really want to
try to rush all this into the current release, but I think we should
find a way to get it done.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Add LZ4 compression in pg_dump

2022-03-27 Thread Justin Pryzby
On Sun, Mar 27, 2022 at 10:13:00AM -0400, Robert Haas wrote:
> On Sat, Mar 26, 2022 at 12:22 PM Justin Pryzby  wrote:
> > 0002: I wonder if you're able to re-use any of the basebackup parsing stuff
> > from commit ffd53659c.  You're passing both the compression method *and* 
> > level.
> > I think there should be a structure which includes both.  In the future, 
> > that
> > can also handle additional options.  I hope to re-use these same things for
> > wal_compression=method:level.
> 
> Yeah, we should really try to use that infrastructure instead of
> inventing a bunch of different ways to do it. It might require some
> renaming here and there, and I'm not sure whether we really want to
> try to rush all this into the current release, but I think we should
> find a way to get it done.

It seems like something a whole lot like parse_compress_options() should be in
common/.  Nobody wants to write it again, and I couldn't convince myself to
copy it when I looked at using it for wal_compression.

Maybe it should take an argument which specifies the default algorithm to use
for input of a numeric "level".  And reject such input if not specified, since
wal_compression has never taken a "level", so it's not useful or desirable to
have that default to some new algorithm.

I could write this down if you want, although I'm not sure how/if you intend
other people to use bc_algorithm and bc_algorithm.  I don't think it's
important to do for v15, but it seems like it could be done after featue
freeze.  pg_dump+lz4 is targetting v16, although there's a cleanup patch that
could also go in before branching.

-- 
Justin




Re: Add LZ4 compression in pg_dump

2022-03-27 Thread Daniel Gustafsson
> On 27 Mar 2022, at 00:51, Justin Pryzby  wrote:
> 
> On Sun, Mar 27, 2022 at 12:37:27AM +0100, Daniel Gustafsson wrote:
>>> On 26 Mar 2022, at 17:21, Justin Pryzby  wrote:
>> 
>>> I suggested off-list to add an 0099 patch to change LZ4 to the default, to
>>> exercise it more on CI.
>> 
>> No need to change the defaults in autoconf for that.  The CFBot uses the 
>> cirrus
>> file in the tree so changing what the job includes can be easily done 
>> (assuming
>> the CFBot hasn't changed this recently which I think it hasn't).  I used that
>> trick in the NSS patchset to add a completely new job for --with-ssl=nss 
>> beside
>> the --with-ssl=openssl job.
> 
> I think you misunderstood - I'm suggesting not only to use with-lz4 (which was
> always true since 93d973494), but to change pg_dump -Fc and -Fd to use LZ4 by
> default (the same as I suggested for toast_compression, wal_compression, and
> again in last year's patch to add zstd compression to pg_dump, for which
> postgres was not ready).

Right, I clearly misunderstood, thanks for the clarification.

--
Daniel Gustafsson   https://vmware.com/





Re: Add LZ4 compression in pg_dump

2022-03-28 Thread Robert Haas
On Sun, Mar 27, 2022 at 12:06 PM Justin Pryzby  wrote:
> Maybe it should take an argument which specifies the default algorithm to use
> for input of a numeric "level".  And reject such input if not specified, since
> wal_compression has never taken a "level", so it's not useful or desirable to
> have that default to some new algorithm.

That sounds odd to me. Wouldn't it be rather confusing if a bare
integer meant gzip for one case and lz4 for another?

> I could write this down if you want, although I'm not sure how/if you intend
> other people to use bc_algorithm and bc_algorithm.  I don't think it's
> important to do for v15, but it seems like it could be done after featue
> freeze.  pg_dump+lz4 is targetting v16, although there's a cleanup patch that
> could also go in before branching.

Well, I think the first thing we should do is get rid of enum
WalCompressionMethod and use enum WalCompression instead. They've got
the same elements and very similar names, but the WalCompressionMethod
ones just have names like COMPRESSION_NONE, which is too generic,
whereas WalCompressionMethod uses WAL_COMPRESSION_NONE, which is
better. Then I think we should also rename the COMPR_ALG_* constants
in pg_dump.h to names like DUMP_COMPRESSION_*. Once we do that we've
got rid of all the unprefixed things that purport to be a list of
compression algorithms.

Then, if people are willing to adopt the syntax that the
backup_compression.c/h stuff supports as a project standard (+1 from
me) we can go the other way and rename that stuff to be more generic,
taking backup out of the name.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Add LZ4 compression in pg_dump

2022-03-28 Thread Michael Paquier
On Mon, Mar 28, 2022 at 08:36:15AM -0400, Robert Haas wrote:
> Well, I think the first thing we should do is get rid of enum
> WalCompressionMethod and use enum WalCompression instead. They've got
> the same elements and very similar names, but the WalCompressionMethod
> ones just have names like COMPRESSION_NONE, which is too generic,
> whereas WalCompressionMethod uses WAL_COMPRESSION_NONE, which is
> better. Then I think we should also rename the COMPR_ALG_* constants
> in pg_dump.h to names like DUMP_COMPRESSION_*. Once we do that we've
> got rid of all the unprefixed things that purport to be a list of
> compression algorithms.

Yes, having a centralized enum for the compression method would make
sense, along with the routines to parse and get the compression method
names.  At least that would be one step towards more unity in
src/common/.

> Then, if people are willing to adopt the syntax that the
> backup_compression.c/h stuff supports as a project standard (+1 from
> me) we can go the other way and rename that stuff to be more generic,
> taking backup out of the name.

I am not sure about the specification part which is only used by base
backups that has no client-server requirements, so option values would
still require their own grammar.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2022-03-29 Thread Michael Paquier
On Sat, Mar 26, 2022 at 01:14:41AM -0500, Justin Pryzby wrote:
> See 0001 and the manpage.
> 
> +   'pg_dump: compression is not supported by tar archive 
> format');
> 
> When I submitted a patch to support zstd, I spent awhile trying to make
> compression work with tar, but it's a significant effort and better done
> separately.

Wow.  This stuff is old enough to vote (c3e18804), dead since its
introduction.  There is indeed an argument for removing that, it is
not good to keep around that that has never been stressed and/or
used.  Upon review, the cleanup done looks correct, as we have never
been able to generate .dat.gz files in for a dump in the tar format.

+   command_fails_like(
+   [ 'pg_dump', '--compress', '1', '--format', 'tar' ],
This addition depending on HAVE_LIBZ is a good thing as a reminder of
any work that could be done in 0002.  Now that's waiting for 20 years
so I would not hold my breath on this support.  I think that this
could be just applied first, with 0002 on top of it, as a first
improvement.

+   compress_cmd => [
+   $ENV{'GZIP_PROGRAM'},
Patch 0001 is missing and update of pg_dump's Makefile to pass down
this environment variable to the test scripts, no?

+   compress_cmd => [
+   $ENV{'GZIP_PROGRAM'},
+   '-f',
[...]
+   $ENV{'GZIP_PROGRAM'},
+   '-k', '-d',
-f and -d are available everywhere I looked at, but is -k/--keep a
portable choice with a gzip command?  I don't see this option in
OpenBSD, for one.  So this test is going to cause problems on those
buildfarm machines, at least.  Couldn't this part be replaced by a
simple --test to check that what has been compressed is in correct
shape?  We know that this works, based on our recent experiences with
the other tests.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2022-03-29 Thread Robert Haas
On Tue, Mar 29, 2022 at 1:03 AM Michael Paquier  wrote:
> > Then, if people are willing to adopt the syntax that the
> > backup_compression.c/h stuff supports as a project standard (+1 from
> > me) we can go the other way and rename that stuff to be more generic,
> > taking backup out of the name.
>
> I am not sure about the specification part which is only used by base
> backups that has no client-server requirements, so option values would
> still require their own grammar.

I don't know what you mean by this. I think the specification stuff
could be reused in a lot of places. If you can ask for a base backup
with zstd:level=3,long=1,fancystuff=yes or whatever we end up with,
why not enable exactly the same for every other place that uses
compression? I don't know what "client-server requirements" is or what
that has to do with this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Add LZ4 compression in pg_dump

2022-03-29 Thread Michael Paquier
On Tue, Mar 29, 2022 at 09:46:27AM +, gkokola...@pm.me wrote:
> On Tuesday, March 29th, 2022 at 9:27 AM, Michael Paquier 
>  wrote:
>> On Sat, Mar 26, 2022 at 01:14:41AM -0500, Justin Pryzby wrote:
>> Wow. This stuff is old enough to vote (c3e18804), dead since its
>> introduction. There is indeed an argument for removing that, it is
>> not good to keep around that that has never been stressed and/or
>> used. Upon review, the cleanup done looks correct, as we have never
>> been able to generate .dat.gz files in for a dump in the tar format.
> 
> Correct. My driving force behind it was to ease up the cleanup/refactoring
> work that follows, by eliminating the callers of the GZ*() macros.

Makes sense to me.

>> + command_fails_like(
>>
>> + [ 'pg_dump', '--compress', '1', '--format', 'tar' ],
>> This addition depending on HAVE_LIBZ is a good thing as a reminder of
>> any work that could be done in 0002. Now that's waiting for 20 years
>> so I would not hold my breath on this support. I think that this
>> could be just applied first, with 0002 on top of it, as a first
>> improvement.
> 
> Excellent, thank you.

I have applied the test for --compress and --format=tar, separating it
from the rest.

While moving on with 0002, I have noticed the following in
_StartBlob():
if (AH->compression != 0)
sfx = ".gz";
else
sfx = "";

Shouldn't this bit also be simplified, adding a fatal() like the other
code paths, for safety?

>> + compress_cmd => [
>> + $ENV{'GZIP_PROGRAM'},
>> + '-f',
>> [...]
>> + $ENV{'GZIP_PROGRAM'},
>> + '-k', '-d',
>> -f and -d are available everywhere I looked at, but is -k/--keep a
>> portable choice with a gzip command? I don't see this option in
>> OpenBSD, for one. So this test is going to cause problems on those
>> buildfarm machines, at least. Couldn't this part be replaced by a
>> simple --test to check that what has been compressed is in correct
>> shape? We know that this works, based on our recent experiences with
>> the other tests.
> 
> I would argue that the simple '--test' will not do in this case, as the
> TAP tests do need a file named .sql to compare the contents with.
> This file is generated either directly by pg_dump itself, or by running
> pg_restore on pg_dump's output. In the case of compression pg_dump will
> generate a .sql. which can not be
> used in the comparison tests. So the intention of this block, is not to
> simply test for validity, but to also decompress pg_dump's output for it
> to be able to be used.

Ahh, I see, thanks.  I would add a comment about that in the area of
compression_gzip_plain_format.

+   my $supports_compression = check_pg_config("#define HAVE_LIBZ 1");

This part could be moved within the if block a couple of lines down.

+   my $compress_program = $ENV{GZIP_PROGRAM};

It seems to me that it is enough to rely on {compress_cmd}, hence
there should be no need for $compress_program, no?

It seems to me that we should have a description for compress_cmd at
the top of 002_pg_dump.pl (close to "Definition of the pg_dump runs to
make").  There is an order dependency with restore_cmd.

> I updated the patch to simply remove the '-k' flag.

Okay.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2022-03-29 Thread Michael Paquier
On Tue, Mar 29, 2022 at 09:14:03AM -0400, Robert Haas wrote:
> I don't know what you mean by this. I think the specification stuff
> could be reused in a lot of places. If you can ask for a base backup
> with zstd:level=3,long=1,fancystuff=yes or whatever we end up with,
> why not enable exactly the same for every other place that uses
> compression? I don't know what "client-server requirements" is or what
> that has to do with this.

Oh. I think that I got confused here.  I saw the backup component in
the file name and this has been associated with the client/server
choice that can be done in the options of pg_basebackup.  But
parse_bc_specification() does not include any knowledge about that:
pg_basebackup does this job in parse_compress_options().  I agree that
it looks possible to reuse that stuff in more places than just base
backups.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2022-03-30 Thread Michael Paquier
On Wed, Mar 30, 2022 at 03:32:55PM +, gkokola...@pm.me wrote:
> On Wednesday, March 30th, 2022 at 7:54 AM, Michael Paquier 
>  wrote:
>> While moving on with 0002, I have noticed the following in
>>
>> _StartBlob():
>> if (AH->compression != 0)
>> sfx = ".gz";
>> else
>> sfx = "";
>>
>> Shouldn't this bit also be simplified, adding a fatal() like the other
>> code paths, for safety?
> 
> Agreed. Fixed.

Okay.  0002 looks fine as-is, and I don't mind the extra fatal()
calls.  These could be asserts but that's not a big deal one way or
the other.  And the cleanup is now applied.

>> + my $compress_program = $ENV{GZIP_PROGRAM};
>>
>> It seems to me that it is enough to rely on {compress_cmd}, hence
>> there should be no need for $compress_program, no?
> 
> Maybe not. We don't want to the tests to fail if the utility is not
> installed. That becomes even more evident as more methods are added.
> However I realized that the presence of the environmental variable does
> not guarrantee that the utility is actually installed. In the attached,
> the existance of the utility is based on the return value of system_log().

Hmm.  [.. thinks ..]  The thing that's itching me here is that you
align the concept of compression with gzip, but that's not going to be
true once more compression options are added to pg_dump, and that
would make $supports_compression and $compress_program_exists
incorrect.  Perhaps the right answer would be to rename all that with
a suffix like "_gzip" to make a difference?  Or would there be enough
control with a value of "compression_gzip" instead of "compression" in
test_key?

+my $compress_program_exists = (system_log("$ENV{GZIP_PROGRAM}", '-h',
+ '>', '/dev/null') == 0);
Do we need this command execution at all?  In all the other tests, we
rely on a simple "if (!defined $gzip || $gzip eq '');", so we could do
the same here.

A last thing is that we should perhaps make a clear difference between
the check that looks at if the code has been built with zlib and the
check for the presence of GZIP_PROGRAM, as it can be useful in some
environments to be able to run pg_dump built with zlib, even if the
GZIP_PROGRAM command does not exist (I don't think this can be the
case, but other tests are flexible).  As of now, the patch relies on
pg_dump enforcing uncompression if building under --without-zlib even
if --compress/-Z is used, but that also means that those compression
tests overlap with the existing tests in this case.  Wouldn't it be
more consistent to check after $supports_compression when executing
the dump command for test_key = "compression[_gzip]"?  This would mean
keeping GZIP_PROGRAM as sole check when executing the compression
command.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2022-04-01 Thread gkokolatos
On Thursday, March 31st, 2022 at 4:34 AM, Michael Paquier  
wrote:
> On Wed, Mar 30, 2022 at 03:32:55PM +, gkokola...@pm.me wrote:
> > On Wednesday, March 30th, 2022 at 7:54 AM, Michael Paquier 
> > mich...@paquier.xyz wrote:
>
> Okay. 0002 looks fine as-is, and I don't mind the extra fatal()
> calls. These could be asserts but that's not a big deal one way or
> the other. And the cleanup is now applied.

Thank you very much.

> > > + my $compress_program = $ENV{GZIP_PROGRAM};
> > > It seems to me that it is enough to rely on {compress_cmd}, hence
> > > there should be no need for $compress_program, no?
> >
> > Maybe not. We don't want to the tests to fail if the utility is not
> > installed. That becomes even more evident as more methods are added.
> > However I realized that the presence of the environmental variable does
> > not guarrantee that the utility is actually installed. In the attached,
> > the existance of the utility is based on the return value of system_log().
>
> Hmm. [.. thinks ..] The thing that's itching me here is that you
> align the concept of compression with gzip, but that's not going to be
> true once more compression options are added to pg_dump, and that
> would make $supports_compression and $compress_program_exists
> incorrect. Perhaps the right answer would be to rename all that with
> a suffix like "_gzip" to make a difference? Or would there be enough
> control with a value of "compression_gzip" instead of "compression" in
> test_key?

I understand the itch. Indeed when LZ4 is added as compression method, this
block changes slightly. I went with the minimum amount changed. Please find
in 0001 of the attached this variable renamed as $gzip_program_exist. I thought
that as prefix it will match better the already used $ENV{GZIP_PROGRAM}.

> +my $compress_program_exists = (system_log("$ENV{GZIP_PROGRAM}", '-h',
> + '>', '/dev/null') == 0);
>
> Do we need this command execution at all? In all the other tests, we
> rely on a simple "if (!defined $gzip || $gzip eq '');", so we could do
> the same here.

You are very correct that we are using the simple version, and that is what
it was included in the previous versions of the current patch. However, I
did notice that the variable is hard-coded in Makefile.global.in and it does
not go through configure. By now, gzip is considered an essential package
in most installations, and this hard-code makes sense. Though I did remove
the utility from my system, (apt remove gzip) and tried the test with the
simple "if (!defined $gzip || $gzip eq '');", which predictably failed. For
this, I went with the system call, it is not too expensive and is rather
reliable.

It is true that the rest of the TAP tests that use this, e.g. in pg_basebackup,
also failed. There is an argument to go simple and I will be happy to revert
to the previous version.

> A last thing is that we should perhaps make a clear difference between
> the check that looks at if the code has been built with zlib and the
> check for the presence of GZIP_PROGRAM, as it can be useful in some
> environments to be able to run pg_dump built with zlib, even if the
> GZIP_PROGRAM command does not exist (I don't think this can be the
> case, but other tests are flexible).

You are very correct. We do that already in the current patch. Note that we skip
the test only when we specifically have to execute a compression command. Not
all compression tests define such command, exactly so that we can test those
cases as well. The point of using an external utility program is in order to
extend the coverage in previously untested yet supported scenarios, e.g. manual
compression of the *.toc files.

Also in the case where it will actually skip the compression command because the
gzip program is not present, it will execute the pg_dump command first.

> As of now, the patch relies on
> pg_dump enforcing uncompression if building under --without-zlib even
> if --compress/-Z is used, but that also means that those compression
> tests overlap with the existing tests in this case. Wouldn't it be
> more consistent to check after $supports_compression when executing
> the dump command for test_key = "compression[_gzip]"? This would mean
> keeping GZIP_PROGRAM as sole check when executing the compression
> command.

I can see the overlap case. Yet, I understand the test_key as serving different
purpose, as it is a key of %tests and %full_runs. I do not expect the database
content of the generated dump to change based on which compression method is 
used.

In the next round, I can see one explitcly requesting --compress=none to 
override
defaults. There is a benefit to group the tests for this scenario under the same
test_key, i.e. compression.

Also there will be cases where if the program exists, yet the codebase is 
compiled
without support for the method. Then compress_cmd or the restore_cmd that 
follows
will fail. For example, in the plain output, if we try to uncompress the 
generated
the test 

Re: Add LZ4 compression in pg_dump

2022-04-04 Thread Michael Paquier
On Fri, Apr 01, 2022 at 03:06:40PM +, gkokola...@pm.me wrote:
> I understand the itch. Indeed when LZ4 is added as compression method, this
> block changes slightly. I went with the minimum amount changed. Please find
> in 0001 of the attached this variable renamed as $gzip_program_exist. I 
> thought
> that as prefix it will match better the already used $ENV{GZIP_PROGRAM}.

Hmm.  I have spent some time on that, and upon review I really think
that we should skip the tests marked as dedicated to the gzip
compression entirely if the build is not compiled with this option,
rather than letting the code run a dump for nothing in some cases,
relying on the default to uncompress the contents in others.  In the
latter case, it happens that we have already some checks like
defaults_custom_format, but you already mentioned that.

We should also skip the later parts of the tests if the compression
program does not exist as we rely on it, but only if the command does
not exist.  This will count for LZ4.

> I can see the overlap case. Yet, I understand the test_key as serving 
> different
> purpose, as it is a key of %tests and %full_runs. I do not expect the database
> content of the generated dump to change based on which compression method is 
> used.

Contrary to the current LZ4 tests in pg_dump, what we have here is a
check for a command-level run and not a data-level check.  So what's
introduced is a new concept, and we need a new way to control if the
tests should be entirely skipped or not, particularly if we finish by
not using test_key to make the difference.  Perhaps the best way to
address that is to have a new keyword in the $runs structure.  The
attached defines a new compile_option, that can be completed later for
new compression methods introduced in the tests.  So the idea is to
mark all the tests related to compression with the same test_key, and
the tests can be skipped depending on what compile_option requires.

> In the attached version, I propose that the compression_cmd is converted into
> a hash. It contains two keys, the program and the arguments. Maybe it is 
> easier
> to read than before or than simply grabbing the first element of the array.

Splitting the program and its arguments makes sense.

At the end I am finishing with the attached.  I also saw an overlap
with the addition of --jobs for the directory format vs not using the
option, so I have removed the case where --jobs was not used in the
directory format.
--
Michael
From b705f77862c9e41a149ce078df638032903bce3d Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 5 Apr 2022 10:30:06 +0900
Subject: [PATCH v6] Extend compression coverage for pg_dump, pg_restore

---
 src/bin/pg_dump/Makefile |  2 +
 src/bin/pg_dump/t/002_pg_dump.pl | 93 +++-
 2 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_dump/Makefile b/src/bin/pg_dump/Makefile
index 302f7e02d6..2f524b09bf 100644
--- a/src/bin/pg_dump/Makefile
+++ b/src/bin/pg_dump/Makefile
@@ -16,6 +16,8 @@ subdir = src/bin/pg_dump
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
+export GZIP_PROGRAM=$(GZIP)
+
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
 
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index af5d6fa5a3..dda3649373 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -20,12 +20,22 @@ my $tempdir   = PostgreSQL::Test::Utils::tempdir;
 # test_key indicates that a given run should simply use the same
 # set of like/unlike tests as another run, and which run that is.
 #
+# compile_option indicates if the commands run depend on a compilation
+# option, if any.  This controls if tests are skipped if a dependency
+# is not satisfied.
+#
 # dump_cmd is the pg_dump command to run, which is an array of
 # the full command and arguments to run.  Note that this is run
 # using $node->command_ok(), so the port does not need to be
 # specified and is pulled from $PGPORT, which is set by the
 # PostgreSQL::Test::Cluster system.
 #
+# compress_cmd is the utility command for (de)compression, if any.
+# Note that this should generally be used on pg_dump's output
+# either to generate a text file to run the through the tests, or
+# to test pg_restore's ability to parse manually compressed files
+# that otherwise pg_dump does not compress on its own (e.g. *.toc).
+#
 # restore_cmd is the pg_restore command to run, if any.  Note
 # that this should generally be used when the pg_dump goes to
 # a non-text file and that the restore can then be used to
@@ -54,6 +64,58 @@ my %pgdump_runs = (
 			"$tempdir/binary_upgrade.dump",
 		],
 	},
+
+	# Do not use --no-sync to give test coverage for data sync.
+	compression_gzip_custom => {
+		test_key => 'compression',
+		compile_option => 'gzip',
+		dump_cmd => [
+			'pg_dump',  '--format=custom',
+			'--co

Re: Add LZ4 compression in pg_dump

2022-04-05 Thread gkokolatos



--- Original Message ---

On Tuesday, April 5th, 2022 at 3:34 AM, Michael Paquier  
wrote:
> On Fri, Apr 01, 2022 at 03:06:40PM +, gkokola...@pm.me wrote:

> Splitting the program and its arguments makes sense.

Great.

> At the end I am finishing with the attached. I also saw an overlap
> with the addition of --jobs for the directory format vs not using the
> option, so I have removed the case where --jobs was not used in the
> directory format.

Thank you. I agree with the attached and I will carry it forward to the
rest of the patchset.

Cheers,
//Georgios

> --
> Michael




Re: Add LZ4 compression in pg_dump

2022-04-05 Thread Michael Paquier
On Tue, Apr 05, 2022 at 07:13:35AM +, gkokola...@pm.me wrote:
> Thank you. I agree with the attached and I will carry it forward to the
> rest of the patchset.

No need to carry it forward anymore, I think ;)
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2022-04-05 Thread gkokolatos



--- Original Message ---

On Tuesday, April 5th, 2022 at 12:55 PM, Michael Paquier  
wrote:

> On Tue, Apr 05, 2022 at 07:13:35AM +, gkokola...@pm.me wrote:
> No need to carry it forward anymore, I think ;)

Thank you for committing!

Cheers,
//Georgios

> --
> Michael




Re: Add LZ4 compression in pg_dump

2023-01-08 Thread Justin Pryzby
On Thu, Dec 22, 2022 at 11:08:59AM -0600, Justin Pryzby wrote:
> There's a couple of lz4 bits which shouldn't be present in 002: file
> extension and comments.

There were "LZ4" comments and file extension stuff in the preparatory
commit.  But now it seems like you *removed* them in the LZ4 commit
(where it actually belongs) rather than *moving* it from the
prior/parent commit *to* the lz4 commit.  I recommend to run something
like "git diff @{1}" whenever doing this kind of patch surgery.

+   if (AH->compression_spec.algorithm != PG_COMPRESSION_NONE &&
+   AH->compression_spec.algorithm == PG_COMPRESSION_GZIP &&

This looks wrong/redundant.  The gzip part should be removed, right ?

Maybe other places that check if (compression==PG_COMPRESSION_GZIP)
should maybe change to say compression!=NONE?

_PrepParallelRestore() references ".gz", so I think it needs to be
retrofitted to handle .lz4.  Ideally, that's built into a struct or list
of file extensions to try.  Maybe compression.h should have a function
to return the file extension of a given algorithm.  I'm planning to send
a patch for zstd, and hoping its changes will be minimized by these
preparatory commits.

+ errno = errno ? : ENOSPC;

"?:" is a GNU extension (not the ternary operator, but the ternary
operator with only 2 args).  It's not in use anywhere else in postgres.
You could instead write it with 3 "errno"s or as "if (errno==0):
errno=ENOSPC"

You wrote "eol_flag == false" and "eol_flag == 0" and true.  But it's
cleaner to test it as a boolean: if (eol_flag) / if (!eol_flag).

Both LZ4File_init() and its callers check "inited".  Better to do it in
one place than 3.  It's a static function, so I think there's no
performance concern.

Gzip_close() still has a useless save_errno (or rebase issue?).

I think it's confusing to have two functions, one named
InitCompressLZ4() and InitCompressorLZ4().

pg_compress_specification is being passed by value, but I think it
should be passed as a pointer, as is done everywhere else.

pg_compress_algorithm is being writen directly into the pg_dump header.
Currently, I think that's not an externally-visible value (it could be
renumbered, theoretically even in a minor release).  Maybe there should
be a "private" enum for encoding the pg_dump header, similar to
WAL_COMPRESSION_LZ4 vs BKPIMAGE_COMPRESS_LZ4 ?  Or else a comment there
should warn that the values are encoded in pg_dump, and must never be
changed.

+ Verify that data files where compressed
typo: s/where/were/

Also:
s/occurance/occurrence/
s/begining/beginning/
s/Verfiy/Verify/
s/nessary/necessary/

BTW I noticed that cfdopen() was accidentally committed to compress_io.h
in master without being defined anywhere.

-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-01-14 Thread Justin Pryzby
On Sun, Jan 08, 2023 at 01:45:25PM -0600, Justin Pryzby wrote:
> On Thu, Dec 22, 2022 at 11:08:59AM -0600, Justin Pryzby wrote:
> > There's a couple of lz4 bits which shouldn't be present in 002: file
> > extension and comments.

> BTW I noticed that cfdopen() was accidentally committed to compress_io.h
> in master without being defined anywhere.

This was resolved in 69fb29d1a (so now needs to be re-added for this
patch series).

> pg_compress_specification is being passed by value, but I think it
> should be passed as a pointer, as is done everywhere else.

ISTM that was an issue with 5e73a6048, affecting a few public and
private functions.  I wrote a pre-preparatory patch which changes to
pass by reference.

And addressed a handful of other issues I reported as separate fixup
commits.  And changed to use LZ4 by default for CI.

I also rebased my 2 year old patch to support zstd in pg_dump.  I hope
it can finally added for v16.  I'll send it for the next CF if these
patches progress.

One more thing: some comments still refer to the cfopen API, which this
patch removes.

> There were "LZ4" comments and file extension stuff in the preparatory
> commit.  But now it seems like you *removed* them in the LZ4 commit
> (where it actually belongs) rather than *moving* it from the
> prior/parent commit *to* the lz4 commit.  I recommend to run something
> like "git diff @{1}" whenever doing this kind of patch surgery.

TODO

> Maybe other places that check if (compression==PG_COMPRESSION_GZIP)
> should maybe change to say compression!=NONE?
> 
> _PrepParallelRestore() references ".gz", so I think it needs to be
> retrofitted to handle .lz4.  Ideally, that's built into a struct or list
> of file extensions to try.  Maybe compression.h should have a function
> to return the file extension of a given algorithm.  I'm planning to send
> a patch for zstd, and hoping its changes will be minimized by these
> preparatory commits.

TODO

> I think it's confusing to have two functions, one named
> InitCompressLZ4() and InitCompressorLZ4().

TODO

> pg_compress_algorithm is being writen directly into the pg_dump header.
> Currently, I think that's not an externally-visible value (it could be
> renumbered, theoretically even in a minor release).  Maybe there should
> be a "private" enum for encoding the pg_dump header, similar to
> WAL_COMPRESSION_LZ4 vs BKPIMAGE_COMPRESS_LZ4 ?  Or else a comment there
> should warn that the values are encoded in pg_dump, and must never be
> changed.

Michael, WDYT ?

-- 
Justin
>From 3105d480ab82093ca2873e423782f5b2edd9fbb7 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 14 Jan 2023 10:58:23 -0600
Subject: [PATCH 1/7] pg_dump: pass pg_compress_specification as a pointer..

..as is done everywhere else.
---
 src/bin/pg_dump/compress_io.c | 30 +--
 src/bin/pg_dump/compress_io.h |  8 +++
 src/bin/pg_dump/pg_backup.h   |  2 +-
 src/bin/pg_dump/pg_backup_archiver.c  | 12 +--
 src/bin/pg_dump/pg_backup_custom.c|  6 +++---
 src/bin/pg_dump/pg_backup_directory.c |  8 +++
 src/bin/pg_dump/pg_dump.c |  2 +-
 7 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index 7a2c80bbc4c..62a9527fa48 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -94,19 +94,19 @@ static void WriteDataToArchiveNone(ArchiveHandle *AH, CompressorState *cs,
 
 /* Allocate a new compressor */
 CompressorState *
-AllocateCompressor(const pg_compress_specification compression_spec,
+AllocateCompressor(const pg_compress_specification *compression_spec,
    WriteFunc writeF)
 {
 	CompressorState *cs;
 
 #ifndef HAVE_LIBZ
-	if (compression_spec.algorithm == PG_COMPRESSION_GZIP)
+	if (compression_spec->algorithm == PG_COMPRESSION_GZIP)
 		pg_fatal("this build does not support compression with %s", "gzip");
 #endif
 
 	cs = (CompressorState *) pg_malloc0(sizeof(CompressorState));
 	cs->writeF = writeF;
-	cs->compression_spec = compression_spec;
+	cs->compression_spec = *compression_spec; // XXX
 
 	/*
 	 * Perform compression algorithm specific initialization.
@@ -125,12 +125,12 @@ AllocateCompressor(const pg_compress_specification compression_spec,
  */
 void
 ReadDataFromArchive(ArchiveHandle *AH,
-	const pg_compress_specification compression_spec,
+	const pg_compress_specification *compression_spec,
 	ReadFunc readF)
 {
-	if (compression_spec.algorithm == PG_COMPRESSION_NONE)
+	if (compression_spec->algorithm == PG_COMPRESSION_NONE)
 		ReadDataFromArchiveNone(AH, readF);
-	if (compression_spec.algorithm == PG_COMPRESSION_GZIP)
+	if (compression_spec->algorithm == PG_COMPRESSION_GZIP)
 	{
 #ifdef HAVE_LIBZ
 		ReadDataFromArchiveZlib(AH, readF);
@@ -432,13 +432,13 @@ cfopen_read(const char *path, const char *mode)
 	if (hasSuffix(path, ".gz"))
 	{
 		compression_spec.algorithm = PG_COMPRESSION_GZIP;
-		fp = cfopen(path, mode, com

Re: Add LZ4 compression in pg_dump

2023-01-15 Thread Michael Paquier
On Sat, Jan 14, 2023 at 03:43:09PM -0600, Justin Pryzby wrote:
> On Sun, Jan 08, 2023 at 01:45:25PM -0600, Justin Pryzby wrote:
>> pg_compress_specification is being passed by value, but I think it
>> should be passed as a pointer, as is done everywhere else.
> 
> ISTM that was an issue with 5e73a6048, affecting a few public and
> private functions.  I wrote a pre-preparatory patch which changes to
> pass by reference.

The functions changed by 0001 are cfopen[_write](),
AllocateCompressor() and ReadDataFromArchive().  Why is it a good idea
to change these interfaces which basically exist to handle inputs?  Is
there some benefit in changing compression_spec within the internals
of these routines before going back one layer down to their callers?
Changing the compression_spec on-the-fly in these internal paths could
be risky, actually, no?

> And addressed a handful of other issues I reported as separate fixup
> commits.  And changed to use LZ4 by default for CI.

Are your slight changes shaped as of 0003-f.patch, 0005-f.patch and
0007-f.patch on top of the original patches sent by Georgios?

> I also rebased my 2 year old patch to support zstd in pg_dump.  I hope
> it can finally added for v16.  I'll send it for the next CF if these
> patches progress.

Good idea to see if what you have done for zstd fits with what's
presented here.

>> pg_compress_algorithm is being writen directly into the pg_dump header.

Do you mean that this is what happens once the patch series 0001~0008
sent upthread is applied on HEAD?

>> Currently, I think that's not an externally-visible value (it could be
>> renumbered, theoretically even in a minor release).  Maybe there should
>> be a "private" enum for encoding the pg_dump header, similar to
>> WAL_COMPRESSION_LZ4 vs BKPIMAGE_COMPRESS_LZ4 ?  Or else a comment there
>> should warn that the values are encoded in pg_dump, and must never be
>> changed.
> 
> Michael, WDYT ?

Changing the order of the members in an enum would cause an ABI
breakage, so that would not happen, and we tend to be very careful
about that.  Appending new members would be fine, though.  FWIW, I'd
rather avoid adding more enums that would just be exact maps to
pg_compress_algorithm.

-   /*
-* For now the compression type is implied by the level.  This will need
-* to change once support for more compression algorithms is added,
-* requiring a format bump.
-*/
-   WriteInt(AH, AH->compression_spec.level);
+   AH->WriteBytePtr(AH, AH->compression_spec.algorithm);

I may be missing something here, but it seems to me that you ought to
store as well the level in the dump header, or it would not be
possible to report in the dump's description what was used?  Hence,
K_VERS_1_15 should imply that we have both the method compression and
the compression level.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-01-15 Thread Justin Pryzby
On Mon, Jan 16, 2023 at 10:28:50AM +0900, Michael Paquier wrote:
> On Sat, Jan 14, 2023 at 03:43:09PM -0600, Justin Pryzby wrote:
> > On Sun, Jan 08, 2023 at 01:45:25PM -0600, Justin Pryzby wrote:
> >> pg_compress_specification is being passed by value, but I think it
> >> should be passed as a pointer, as is done everywhere else.
> > 
> > ISTM that was an issue with 5e73a6048, affecting a few public and
> > private functions.  I wrote a pre-preparatory patch which changes to
> > pass by reference.
> 
> The functions changed by 0001 are cfopen[_write](),
> AllocateCompressor() and ReadDataFromArchive().  Why is it a good idea
> to change these interfaces which basically exist to handle inputs?

I changed to pass pg_compress_specification as a pointer, since that's
the usual convention for structs, as followed by the existing uses of
pg_compress_specification.

> Is there some benefit in changing compression_spec within the
> internals of these routines before going back one layer down to their
> callers?  Changing the compression_spec on-the-fly in these internal
> paths could be risky, actually, no?

I think what you're saying is that if the spec is passed as a pointer,
then the called functions shouldn't set spec->algorithm=something.

I agree that if they need to do that, they should use a local variable.
Which looks to be true for the functions that were changed in 001.

> > And addressed a handful of other issues I reported as separate fixup
> > commits.  And changed to use LZ4 by default for CI.
> 
> Are your slight changes shaped as of 0003-f.patch, 0005-f.patch and
> 0007-f.patch on top of the original patches sent by Georgios?

Yes, the original patches, rebased as needed on top of HEAD and 001...

> >> pg_compress_algorithm is being writen directly into the pg_dump header.
> 
> Do you mean that this is what happens once the patch series 0001~0008
> sent upthread is applied on HEAD?

Yes

> -   /*
> -* For now the compression type is implied by the level.  This will need
> -* to change once support for more compression algorithms is added,
> -* requiring a format bump.
> -*/
> -   WriteInt(AH, AH->compression_spec.level);
> +   AH->WriteBytePtr(AH, AH->compression_spec.algorithm);
> 
> I may be missing something here, but it seems to me that you ought to
> store as well the level in the dump header, or it would not be
> possible to report in the dump's description what was used?  Hence,
> K_VERS_1_15 should imply that we have both the method compression and
> the compression level.

Maybe.  But the "level" isn't needed for decompression for any case I'm
aware of.

Also, dumps with the default compression level currently say:
"Compression: -1", which does't seem valuable.

-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-01-15 Thread gkokolatos
Oh, I didn’t realize you took over Justin? Why? After almost a year of work?

This is rather disheartening.

On Mon, Jan 16, 2023 at 02:56, Justin Pryzby  wrote:

> On Mon, Jan 16, 2023 at 10:28:50AM +0900, Michael Paquier wrote:
>> On Sat, Jan 14, 2023 at 03:43:09PM -0600, Justin Pryzby wrote:
>> > On Sun, Jan 08, 2023 at 01:45:25PM -0600, Justin Pryzby wrote:
>> >> pg_compress_specification is being passed by value, but I think it
>> >> should be passed as a pointer, as is done everywhere else.
>> >
>> > ISTM that was an issue with 5e73a6048, affecting a few public and
>> > private functions. I wrote a pre-preparatory patch which changes to
>> > pass by reference.
>>
>> The functions changed by 0001 are cfopen[_write](),
>> AllocateCompressor() and ReadDataFromArchive(). Why is it a good idea
>> to change these interfaces which basically exist to handle inputs?
>
> I changed to pass pg_compress_specification as a pointer, since that's
> the usual convention for structs, as followed by the existing uses of
> pg_compress_specification.
>
>> Is there some benefit in changing compression_spec within the
>> internals of these routines before going back one layer down to their
>> callers? Changing the compression_spec on-the-fly in these internal
>> paths could be risky, actually, no?
>
> I think what you're saying is that if the spec is passed as a pointer,
> then the called functions shouldn't set spec->algorithm=something.
>
> I agree that if they need to do that, they should use a local variable.
> Which looks to be true for the functions that were changed in 001.
>
>> > And addressed a handful of other issues I reported as separate fixup
>> > commits. And changed to use LZ4 by default for CI.
>>
>> Are your slight changes shaped as of 0003-f.patch, 0005-f.patch and
>> 0007-f.patch on top of the original patches sent by Georgios?
>
> Yes, the original patches, rebased as needed on top of HEAD and 001...
>
>> >> pg_compress_algorithm is being writen directly into the pg_dump header.
>>
>> Do you mean that this is what happens once the patch series 0001~0008
>> sent upthread is applied on HEAD?
>
> Yes
>
>> - /*
>> - * For now the compression type is implied by the level. This will need
>> - * to change once support for more compression algorithms is added,
>> - * requiring a format bump.
>> - */
>> - WriteInt(AH, AH->compression_spec.level);
>> + AH->WriteBytePtr(AH, AH->compression_spec.algorithm);
>>
>> I may be missing something here, but it seems to me that you ought to
>> store as well the level in the dump header, or it would not be
>> possible to report in the dump's description what was used? Hence,
>> K_VERS_1_15 should imply that we have both the method compression and
>> the compression level.
>
> Maybe. But the "level" isn't needed for decompression for any case I'm
> aware of.
>
> Also, dumps with the default compression level currently say:
> "Compression: -1", which does't seem valuable.
>
> --
> Justin

Re: Add LZ4 compression in pg_dump

2023-01-15 Thread Justin Pryzby
On Mon, Jan 16, 2023 at 02:27:56AM +, gkokola...@pm.me wrote:
> Oh, I didn’t realize you took over Justin? Why? After almost a year of work?
> 
> This is rather disheartening.

I believe you've misunderstood my intent here.  I sent rebased versions
of your patches with fixup commits implementing fixes that I'd
previously sent.  I don't think that's unusual.  I hope your patches
will be included in v16, and I hope to facilitate that.  I don't mean
any offense.  Actually, the fixups are provided as separate patches so
you can adopt the changes easily into your branch.  

-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-01-15 Thread Michael Paquier
On Sun, Jan 15, 2023 at 07:56:25PM -0600, Justin Pryzby wrote:
> On Mon, Jan 16, 2023 at 10:28:50AM +0900, Michael Paquier wrote:
>> The functions changed by 0001 are cfopen[_write](),
>> AllocateCompressor() and ReadDataFromArchive().  Why is it a good idea
>> to change these interfaces which basically exist to handle inputs?
> 
> I changed to pass pg_compress_specification as a pointer, since that's
> the usual convention for structs, as followed by the existing uses of
> pg_compress_specification.

Okay, but what do we gain here?  It seems to me that this introduces
the risk that a careless change in one of the internal routines if
they change slight;ly compress_spec, hence impacting any of their
callers?  Or is that fixing an actual bug (except if I am missing your
point, that does not seem to be the case)?  

>> Is there some benefit in changing compression_spec within the
>> internals of these routines before going back one layer down to their
>> callers?  Changing the compression_spec on-the-fly in these internal
>> paths could be risky, actually, no?
> 
> I think what you're saying is that if the spec is passed as a pointer,
> then the called functions shouldn't set spec->algorithm=something.

Yes.  HEAD makes sure of that, 0001 would not prevent that.  So I am a
bit confused in seeing how this is a benefit.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-01-18 Thread Tomas Vondra
Hi,

On 1/16/23 16:14, gkokola...@pm.me wrote:
> Hi,
> 
> I admit I am completely at lost as to what is expected from me anymore.
> 

:-(

I understand it's frustrating not to know why a patch is not moving
forward. Particularly when is seems fairly straightforward ...

Let me briefly explain my personal (and admittedly very subjective) view
on picking what patches to review/commit. I'm sure other committers have
other criteria, but maybe this will help.

There are always more patches than I can review/commit, so I have to
prioritize, and pick which patches to look at. For me, it's mostly about
cost/benefit of the patch. The cost is e.g. the amount of time I need to
spend to review/commit the stuff, maybe read the thread, etc. Benefits
is mainly the new features/improvements.

It's oversimplified, we could talk about various bits that contribute to
the costs and benefits, but this is what it boils down.

There's always the aspect of time - patches A and B have roughly the
same benefits, but with A we get it "immediately" while B requires
additional parts that we don't have ready yet (and if they don't make it
we get no benefit), I'll probably pick A.

Unfortunately, this plays against this patch - I'm certainly in favor of
adding lz4 (and other compression algos) into pg_dump, but if I commit
0001 we get little benefit, and the other parts actually adding lz4/zstd
are treated as "WIP / for completeness" so it's unclear when we'd get to
commit them.

So if I could recommend one thing, it'd be to get at least one of those
WIP patches into a shape that's likely committable right after 0001.

> I had posted v19-0001 for a committer's consideration and v19-000{2,3} for 
> completeness.
> Please find a rebased v20 attached.
> 

I took a quick look at 0001, so a couple comments (sorry if some of this
was already discussed in the thread):

1) I don't think a "refactoring" patch should reference particular
compression algorithms (lz4/zstd), and in particular I don't think we
should have "not yet implemented" messages. We only have a couple other
places doing that, when we didn't have a better choice. But here we can
simply reject the algorithm when parsing the options, we don't need to
do that in a dozen other places.

2) I wouldn't reorder the cases in WriteDataToArchive, i.e. I'd keep
"none" at the end. It might make backpatches harder.

3) While building, I get bunch of warnings about missing cfdopen()
prototype and pg_backup_archiver.c not knowing about cfdopen() and
adding an implicit prototype (so I doubt it actually works).

4) "cfp" struct no longer wraps gzFile, but the comment was not updated.
FWIW I'm not sure switching to "void *" is an improvement, maybe it'd be
better to have a "union" of correct types?

5) cfopen/cfdopen are missing comments. cfopen_internal has an updated
comment, but that's a static function while cfopen/cfdopen are the
actual API.

> Also please let me know if I should silently step away from it and let other 
> people lead
> it. I would be glad to comply either way.
> 

Please don't. I promise to take a look at this patch again.

Thanks for doing all the work.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyReadDataFromArchiveZlib


compress_io.c:605:1: warning: no previous prototype for ‘cfdopen’ [-Wmissing-prototypes]
  605 | cfdopen(int fd, const char *mode,
  | ^~~
pg_backup_archiver.c: In function ‘SetOutput’:
pg_backup_archiver.c:1528:26: warning: implicit declaration of function ‘cfdopen’; did you mean ‘cfopen’? [-Wimplicit-function-declaration]
 1528 | AH->OF = cfdopen(dup(fn), mode, compression_spec);
  |  ^~~
  |  cfopen
pg_backup_archiver.c:1528:24: warning: assignment to ‘void *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
 1528 | AH->OF = cfdopen(dup(fn), mode, compression_spec);
  |^
pg_backup_archiver.c: In function ‘_allocAH’:
pg_backup_archiver.c:2236:16: warning: assignment to ‘void *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
 2236 | AH->OF = cfdopen(dup(fileno(stdout)), PG_BINARY_A, out_compress_spec);
  |^


Re: Add LZ4 compression in pg_dump

2023-01-19 Thread Tomas Vondra
On 1/18/23 20:05, gkokola...@pm.me wrote:
> 
> 
> 
> 
> 
> --- Original Message ---
> On Wednesday, January 18th, 2023 at 3:00 PM, Tomas Vondra 
>  wrote:
> 
> 
>>
>>
>> Hi,
>>
>> On 1/16/23 16:14, gkokola...@pm.me wrote:
>>
>>> Hi,
>>>
>>> I admit I am completely at lost as to what is expected from me anymore.
>>
> 
>>
>> Unfortunately, this plays against this patch - I'm certainly in favor of
>> adding lz4 (and other compression algos) into pg_dump, but if I commit
>> 0001 we get little benefit, and the other parts actually adding lz4/zstd
>> are treated as "WIP / for completeness" so it's unclear when we'd get to
>> commit them.
> 
> Thank you for your kindness and for taking the time to explain.
>  
>> So if I could recommend one thing, it'd be to get at least one of those
>> WIP patches into a shape that's likely committable right after 0001.
> 
> This was clearly my fault. I misunderstood a suggestion upthread to focus
> on the first patch of the series and ignore documentation and comments on
> the rest.
> 
> Please find v21 to contain 0002 and 0003 in a state which I no longer consider
> as WIP but worthy of proper consideration. Some guidance on where is best to 
> add
> documentation in 0002 for the function pointers in CompressFileHandle will
> be welcomed.
> 

This is internal-only API, not meant for use by regular users and/or
extension authors, so I don't think we need sgml docs. I'd just add
regular code-level documentation to compress_io.h.

For inspiration see docs for "struct ReorderBuffer" in reorderbuffer.h,
or "struct _archiveHandle" in pg_backup_archiver.h.

Or what other kind of documentation you had in mind?

>>
>>> I had posted v19-0001 for a committer's consideration and v19-000{2,3} for 
>>> completeness.
>>> Please find a rebased v20 attached.
>>
>>
>> I took a quick look at 0001, so a couple comments (sorry if some of this
>> was already discussed in the thread):
> 
> Much appreciated!
> 
>>
>> 1) I don't think a "refactoring" patch should reference particular
>> compression algorithms (lz4/zstd), and in particular I don't think we
>> should have "not yet implemented" messages. We only have a couple other
>> places doing that, when we didn't have a better choice. But here we can
>> simply reject the algorithm when parsing the options, we don't need to
>> do that in a dozen other places.
> 
> I have now removed lz4/zstd from where they were present with the exception
> of pg_dump.c which is responsible for parsing.
> 

I'm not sure I understand why leave the lz4/zstd in this place?

>> 2) I wouldn't reorder the cases in WriteDataToArchive, i.e. I'd keep
>> "none" at the end. It might make backpatches harder.
> 
> Agreed. However a 'default' is needed in order to avoid compilation warnings.
> Also note that 0002 completely does away with cases within WriteDataToArchive.
> 

OK, although that's also a consequence of using a "switch" instead of
plan "if" branches.

Furthermore, I'm not sure we really need the pg_fatal() about invalid
compression method in these default blocks. I mean, how could we even
get to these places when the build does not support the algorithm? All
of this (ReadDataFromArchive, WriteDataToArchive, EndCompressor, ...)
happens looong after the compressor was initialized and the method
checked, no? So maybe either this should simply do Assert(false) or use
a different error message.

>> 3) While building, I get bunch of warnings about missing cfdopen()
>> prototype and pg_backup_archiver.c not knowing about cfdopen() and
>> adding an implicit prototype (so I doubt it actually works).
> 
> Fixed. cfdopen() got prematurely introduced in 5e73a6048 and then got removed
> in 69fb29d1af. v20 failed to properly take 69fb29d1af in consideration. Note
> that cfdopen is removed in 0002 which explains why cfbot didn't complain.
>  

OK.

>> 4) "cfp" struct no longer wraps gzFile, but the comment was not updated.
>> FWIW I'm not sure switching to "void *" is an improvement, maybe it'd be
>> better to have a "union" of correct types?
> 
> Please find and updated comment and a union in place of the void *. Also
> note that 0002 completely does away with cfp in favour of a new struct
> CompressFileHandle. I maintained the void * there because it is used by
> private methods of the compressors. 0003 contains such an example with
> LZ4CompressorState.
> 

I wonder if this (and also the previous item) makes sense to keep 0001
and 0002 or to combine them. The "intermediate" state is a bit annoying.

>> 5) cfopen/cfdopen are missing comments. cfopen_internal has an updated
>> comment, but that's a static function while cfopen/cfdopen are the
>> actual API.
> 
> Added comments to cfopen/cfdopen.
> 

OK.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-01-19 Thread gkokolatos





--- Original Message ---
On Thursday, January 19th, 2023 at 4:45 PM, Tomas Vondra 
 wrote:


> 
> 
> On 1/18/23 20:05, gkokola...@pm.me wrote:
> 
> > --- Original Message ---
> > On Wednesday, January 18th, 2023 at 3:00 PM, Tomas Vondra 
> > tomas.von...@enterprisedb.com wrote:
> > 
> > > Hi,
> > > 
> > > On 1/16/23 16:14, gkokola...@pm.me wrote:
> > > 
> > > > Hi,
> > > > 
> > > > I admit I am completely at lost as to what is expected from me anymore.
> > 
> > 
> > 
> > > Unfortunately, this plays against this patch - I'm certainly in favor of
> > > adding lz4 (and other compression algos) into pg_dump, but if I commit
> > > 0001 we get little benefit, and the other parts actually adding lz4/zstd
> > > are treated as "WIP / for completeness" so it's unclear when we'd get to
> > > commit them.
> > 
> > Thank you for your kindness and for taking the time to explain.
> > 
> > > So if I could recommend one thing, it'd be to get at least one of those
> > > WIP patches into a shape that's likely committable right after 0001.
> > 
> > This was clearly my fault. I misunderstood a suggestion upthread to focus
> > on the first patch of the series and ignore documentation and comments on
> > the rest.
> > 
> > Please find v21 to contain 0002 and 0003 in a state which I no longer 
> > consider
> > as WIP but worthy of proper consideration. Some guidance on where is best 
> > to add
> > documentation in 0002 for the function pointers in CompressFileHandle will
> > be welcomed.
> 
> 
> This is internal-only API, not meant for use by regular users and/or
> extension authors, so I don't think we need sgml docs. I'd just add
> regular code-level documentation to compress_io.h.
> 
> For inspiration see docs for "struct ReorderBuffer" in reorderbuffer.h,
> or "struct _archiveHandle" in pg_backup_archiver.h.
> 
> Or what other kind of documentation you had in mind?

This is exactly what I was after. I was between compress_io.c and compress_io.h.
Thank you.

> > > > I had posted v19-0001 for a committer's consideration and v19-000{2,3} 
> > > > for completeness.
> > > > Please find a rebased v20 attached.
> > > 
> > > I took a quick look at 0001, so a couple comments (sorry if some of this
> > > was already discussed in the thread):
> > 
> > Much appreciated!
> > 
> > > 1) I don't think a "refactoring" patch should reference particular
> > > compression algorithms (lz4/zstd), and in particular I don't think we
> > > should have "not yet implemented" messages. We only have a couple other
> > > places doing that, when we didn't have a better choice. But here we can
> > > simply reject the algorithm when parsing the options, we don't need to
> > > do that in a dozen other places.
> > 
> > I have now removed lz4/zstd from where they were present with the exception
> > of pg_dump.c which is responsible for parsing.
> 
> 
> I'm not sure I understand why leave the lz4/zstd in this place?

You are right, it is not obvious. Those were added in 5e73a60488 which is
already committed in master and I didn't want to backtrack. Of course, I am
not opposing in doing so if you wish.

> 
> > > 2) I wouldn't reorder the cases in WriteDataToArchive, i.e. I'd keep
> > > "none" at the end. It might make backpatches harder.
> > 
> > Agreed. However a 'default' is needed in order to avoid compilation 
> > warnings.
> > Also note that 0002 completely does away with cases within 
> > WriteDataToArchive.
> 
> 
> OK, although that's also a consequence of using a "switch" instead of
> plan "if" branches.
> 
> Furthermore, I'm not sure we really need the pg_fatal() about invalid
> compression method in these default blocks. I mean, how could we even
> get to these places when the build does not support the algorithm? All
> of this (ReadDataFromArchive, WriteDataToArchive, EndCompressor, ...)
> happens looong after the compressor was initialized and the method
> checked, no? So maybe either this should simply do Assert(false) or use
> a different error message.

I like Assert(false).

> > > 3) While building, I get bunch of warnings about missing cfdopen()
> > > prototype and pg_backup_archiver.c not knowing about cfdopen() and
> > > adding an implicit prototype (so I doubt it actually works).
> > 
> > Fixed. cfdopen() got prematurely introduced in 5e73a6048 and then got 
> > removed
> > in 69fb29d1af. v20 failed to properly take 69fb29d1af in consideration. Note
> > that cfdopen is removed in 0002 which explains why cfbot didn't complain.
> 
> 
> OK.
> 
> > > 4) "cfp" struct no longer wraps gzFile, but the comment was not updated.
> > > FWIW I'm not sure switching to "void *" is an improvement, maybe it'd be
> > > better to have a "union" of correct types?
> > 
> > Please find and updated comment and a union in place of the void *. Also
> > note that 0002 completely does away with cfp in favour of a new struct
> > CompressFileHandle. I maintained the void * there because it is used by
> > private methods of the compressors. 0003 conta

Re: Add LZ4 compression in pg_dump

2023-01-19 Thread Tomas Vondra
Hi,

On 1/19/23 17:42, gkokola...@pm.me wrote:
> 
> --- Original Message ---
> On Thursday, January 19th, 2023 at 4:45 PM, Tomas Vondra 
>  wrote:
>>
>> On 1/18/23 20:05, gkokola...@pm.me wrote:
>>
>>> --- Original Message ---
>>> On Wednesday, January 18th, 2023 at 3:00 PM, Tomas Vondra 
>>> tomas.von...@enterprisedb.com wrote:
>>
>> I'm not sure I understand why leave the lz4/zstd in this place?
> 
> You are right, it is not obvious. Those were added in 5e73a60488 which is
> already committed in master and I didn't want to backtrack. Of course, I am
> not opposing in doing so if you wish.
> 

Ah, I didn't realize it was already added by earlier commit. In that
case let's not worry about it.

>>
 2) I wouldn't reorder the cases in WriteDataToArchive, i.e. I'd keep
 "none" at the end. It might make backpatches harder.
>>>
>>> Agreed. However a 'default' is needed in order to avoid compilation 
>>> warnings.
>>> Also note that 0002 completely does away with cases within 
>>> WriteDataToArchive.
>>
>>
>> OK, although that's also a consequence of using a "switch" instead of
>> plan "if" branches.
>>
>> Furthermore, I'm not sure we really need the pg_fatal() about invalid
>> compression method in these default blocks. I mean, how could we even
>> get to these places when the build does not support the algorithm? All
>> of this (ReadDataFromArchive, WriteDataToArchive, EndCompressor, ...)
>> happens looong after the compressor was initialized and the method
>> checked, no? So maybe either this should simply do Assert(false) or use
>> a different error message.
> 
> I like Assert(false).
> 

OK, good. Do you agree we should never actually get there, if the
earlier checks work correctly?

>>
 4) "cfp" struct no longer wraps gzFile, but the comment was not updated.
 FWIW I'm not sure switching to "void *" is an improvement, maybe it'd be
 better to have a "union" of correct types?
>>>
>>> Please find and updated comment and a union in place of the void *. Also
>>> note that 0002 completely does away with cfp in favour of a new struct
>>> CompressFileHandle. I maintained the void * there because it is used by
>>> private methods of the compressors. 0003 contains such an example with
>>> LZ4CompressorState.
>>
>>
>> I wonder if this (and also the previous item) makes sense to keep 0001
>> and 0002 or to combine them. The "intermediate" state is a bit annoying.
> 
> Agreed. It was initially submitted as one patch. Then it was requested to be
> split up in two parts, one to expand the use of the existing API and one to
> replace with the new interface. Unfortunately the expansion of usage of the
> existing API requires some tweaking, but that is not a very good reason for
> the current patch set. I should have done a better job there.
> 
> Please find v22 attach which combines back 0001 and 0002. It is missing the
> documentation that was discussed above as I wanted to give a quick feedback.
> Let me know if you think that the combined version is the one to move forward
> with.
> 

Thanks, I'll take a look.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-01-19 Thread Tomas Vondra
On 1/19/23 18:55, Tomas Vondra wrote:
> Hi,
> 
> On 1/19/23 17:42, gkokola...@pm.me wrote:
>>
>> ...
>>
>> Agreed. It was initially submitted as one patch. Then it was requested to be
>> split up in two parts, one to expand the use of the existing API and one to
>> replace with the new interface. Unfortunately the expansion of usage of the
>> existing API requires some tweaking, but that is not a very good reason for
>> the current patch set. I should have done a better job there.
>>
>> Please find v22 attach which combines back 0001 and 0002. It is missing the
>> documentation that was discussed above as I wanted to give a quick feedback.
>> Let me know if you think that the combined version is the one to move forward
>> with.
>>
> 
> Thanks, I'll take a look.
> 

After taking a look and thinking about it a bit more, I think we should
keep the two parts separate. I think Michael (or whoever proposed) the
split was right, it makes the patches easier to grok.

Sorry for the noise, hopefully we can just revert to the last version.

While reading the thread, I also noticed this:

> By the way, I think that this 0002 should drop all the default clauses
> in the switches for the compression method so as we'd catch any
> missing code paths with compiler warnings if a new compression method
> is added in the future.

Now I realize why there were "not yet implemented" errors for lz4/zstd
in all the switches, and why after removing them you had to add a
default branch.

We DON'T want a default branch, because the idea is that after adding a
new compression algorithm, we get warnings about switches not handling
it correctly.

So I guess we should walk back this change too :-( It's probably easier
to go back to v20 from January 16, and redo the couple remaining things
I commented on.


FWIW I think this is a hint that adding LZ4/ZSTD options, in 5e73a6048,
but without implementation, was not a great idea. It mostly defeats the
idea of getting the compiler warnings - all the places already handle
PG_COMPRESSION_LZ4/PG_COMPRESSION_ZSTD by throwing a pg_fatal. So you'd
have to grep for the options, inspect all the places or something like
that anyway. The warnings would only work for entirely new methods.

However, I now also realize the compressor API in 0002 replaces all of
this with calls to a generic API callback, so trying to improve this was
pretty silly from me.


Please, fix the couple remaining details in v20, add the docs for the
callbacks, and I'll try to polish it and get it committed.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-01-20 Thread gkokolatos






--- Original Message ---
On Friday, January 20th, 2023 at 12:34 AM, Tomas Vondra 
 wrote:


> 
> 
> On 1/19/23 18:55, Tomas Vondra wrote:
> 
> > Hi,
> > 
> > On 1/19/23 17:42, gkokola...@pm.me wrote:
> > 
> > > ...
> > > 
> > > Agreed. It was initially submitted as one patch. Then it was requested to 
> > > be
> > > split up in two parts, one to expand the use of the existing API and one 
> > > to
> > > replace with the new interface. Unfortunately the expansion of usage of 
> > > the
> > > existing API requires some tweaking, but that is not a very good reason 
> > > for
> > > the current patch set. I should have done a better job there.
> > > 
> > > Please find v22 attach which combines back 0001 and 0002. It is missing 
> > > the
> > > documentation that was discussed above as I wanted to give a quick 
> > > feedback.
> > > Let me know if you think that the combined version is the one to move 
> > > forward
> > > with.
> > 
> > Thanks, I'll take a look.
> 
> 
> After taking a look and thinking about it a bit more, I think we should
> keep the two parts separate. I think Michael (or whoever proposed) the
> split was right, it makes the patches easier to grok.
> 

Excellent. I will attempt a better split this time round.

> 
> While reading the thread, I also noticed this:
> 
> > By the way, I think that this 0002 should drop all the default clauses
> > in the switches for the compression method so as we'd catch any
> > missing code paths with compiler warnings if a new compression method
> > is added in the future.
> 
> 
> Now I realize why there were "not yet implemented" errors for lz4/zstd
> in all the switches, and why after removing them you had to add a
> default branch.
> 
> We DON'T want a default branch, because the idea is that after adding a
> new compression algorithm, we get warnings about switches not handling
> it correctly.
> 
> So I guess we should walk back this change too :-( It's probably easier
> to go back to v20 from January 16, and redo the couple remaining things
> I commented on.
> 

Sure.

> 
> FWIW I think this is a hint that adding LZ4/ZSTD options, in 5e73a6048,
> but without implementation, was not a great idea. It mostly defeats the
> idea of getting the compiler warnings - all the places already handle
> PG_COMPRESSION_LZ4/PG_COMPRESSION_ZSTD by throwing a pg_fatal. So you'd
> have to grep for the options, inspect all the places or something like
> that anyway. The warnings would only work for entirely new methods.
> 
> However, I now also realize the compressor API in 0002 replaces all of
> this with calls to a generic API callback, so trying to improve this was
> pretty silly from me.

I can try to do a better job at splitting things up.

> 
> Please, fix the couple remaining details in v20, add the docs for the
> callbacks, and I'll try to polish it and get it committed.

Excellent. Allow me an attempt to polish and expect a new version soon.

Cheers,
//Georgios

> 
> 
> regards
> 
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-01-23 Thread Justin Pryzby
On Mon, Jan 23, 2023 at 05:31:55PM +, gkokola...@pm.me wrote:
> Please find attached v23 which reintroduces the split.
> 
> 0001 is reworked to have a reduced footprint than before. Also in an attempt
> to facilitate the readability, 0002 splits the API's and the uncompressed
> implementation in separate files.

Thanks for updating the patch.  Could you address the review comments I
sent here ?
https://www.postgresql.org/message-id/20230108194524.GA27637%40telsasoft.com

Thanks,
-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-01-24 Thread Justin Pryzby
On Tue, Jan 24, 2023 at 03:56:20PM +, gkokola...@pm.me wrote:
> On Monday, January 23rd, 2023 at 7:00 PM, Justin Pryzby 
>  wrote:
> > On Mon, Jan 23, 2023 at 05:31:55PM +, gkokola...@pm.me wrote:
> > 
> > > Please find attached v23 which reintroduces the split.
> > > 
> > > 0001 is reworked to have a reduced footprint than before. Also in an 
> > > attempt
> > > to facilitate the readability, 0002 splits the API's and the uncompressed
> > > implementation in separate files.
> > 
> > Thanks for updating the patch. Could you address the review comments I
> > sent here ?
> > https://www.postgresql.org/message-id/20230108194524.GA27637%40telsasoft.com
> 
> Please find v24 attached.

Thanks for updating the patch.

In 001, RestoreArchive() does:

> -#ifndef HAVE_LIBZ
> -   if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP &&
> -   AH->PrintTocDataPtr != NULL)
> +   supports_compression = false;
> +   if (AH->compression_spec.algorithm == PG_COMPRESSION_NONE ||
> +   AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> +   supports_compression = true;
> +
> +   if (AH->PrintTocDataPtr != NULL)
> {
> for (te = AH->toc->next; te != AH->toc; te = te->next)
> {
> if (te->hadDumper && (te->reqs & REQ_DATA) != 0)
> -   pg_fatal("cannot restore from compressed 
> archive (compression not supported in this installation)");
> +   {
> +#ifndef HAVE_LIBZ
> +   if (AH->compression_spec.algorithm == 
> PG_COMPRESSION_GZIP)
> +   supports_compression = false;
> +#endif
> +   if (supports_compression == false)
> +   pg_fatal("cannot restore from 
> compressed archive (compression not supported in this installation)");
> +   }
> }
> }
> -#endif

This first checks if the algorithm is implemented, and then checks if
the algorithm is supported by the current build - that confused me for a
bit.  It seems unnecessary to check for unimplemented algorithms before
looping.  That also requires referencing both GZIP and LZ4 in two
places.

I think it could be written to avoid the need to change for added
compression algorithms:

+   if (te->hadDumper && (te->reqs & REQ_DATA) != 0)
+   {
+   /* Check if the compression algorithm is 
supported */
+   pg_compress_specification spec;
+   
parse_compress_specification(AH->compression_spec.algorithm, NULL, &spec);
+   if (spec->parse_error != NULL)
+   pg_fatal(spec->parse_error);
+   }

Or maybe add a new function to compression.c to indicate whether a given
algorithm is supported.

That would also indicate *which* compression library isn't supported.

Other than that, I think 001 is ready.

002/003 use these names, which I think are too similar - initially I
didn't even realize there were two separate functions (each with a
second stub function to handle the case of unsupported compression):

+extern void InitCompressorGzip(CompressorState *cs, const 
pg_compress_specification compression_spec);

  
+extern void InitCompressGzip(CompressFileHandle *CFH, const 
pg_compress_specification compression_spec);



+extern void InitCompressorLZ4(CompressorState *cs, const 
pg_compress_specification compression_spec);

 
+extern void InitCompressLZ4(CompressFileHandle *CFH, const 
pg_compress_specification compression_spec);

   

typo:
s/not build with/not built with/

Should AllocateCompressor() set cs->compression_spec, rather than doing
it in each compressor ?

Thanks for considering.

-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-01-25 Thread Tomas Vondra


On 1/25/23 16:37, gkokola...@pm.me wrote:
> 
> 
> 
> 
> 
> --- Original Message ---
> On Wednesday, January 25th, 2023 at 2:42 AM, Justin Pryzby 
>  wrote:
> 
> 
>>
>>
>> On Tue, Jan 24, 2023 at 03:56:20PM +, gkokola...@pm.me wrote:
>>
>>> On Monday, January 23rd, 2023 at 7:00 PM, Justin Pryzby 
>>> pry...@telsasoft.com wrote:
>>>
 On Mon, Jan 23, 2023 at 05:31:55PM +, gkokola...@pm.me wrote:

> Please find attached v23 which reintroduces the split.
>
> 0001 is reworked to have a reduced footprint than before. Also in an 
> attempt
> to facilitate the readability, 0002 splits the API's and the uncompressed
> implementation in separate files.

 Thanks for updating the patch. Could you address the review comments I
 sent here ?
 https://www.postgresql.org/message-id/20230108194524.GA27637%40telsasoft.com
>>>
>>> Please find v24 attached.
>>
>>
>> Thanks for updating the patch.
>>
>> In 001, RestoreArchive() does:
>>
>>> -#ifndef HAVE_LIBZ
>>> - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP &&
>>> - AH->PrintTocDataPtr != NULL)
>>> + supports_compression = false;
>>> + if (AH->compression_spec.algorithm == PG_COMPRESSION_NONE ||
>>> + AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
>>> + supports_compression = true;
>>> +
>>> + if (AH->PrintTocDataPtr != NULL)
>>> {
>>> for (te = AH->toc->next; te != AH->toc; te = te->next)
>>> {
>>> if (te->hadDumper && (te->reqs & REQ_DATA) != 0)
>>> - pg_fatal("cannot restore from compressed archive (compression not 
>>> supported in this installation)");
>>> + {
>>> +#ifndef HAVE_LIBZ
>>> + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
>>> + supports_compression = false;
>>> +#endif
>>> + if (supports_compression == false)
>>> + pg_fatal("cannot restore from compressed archive (compression not 
>>> supported in this installation)");
>>> + }
>>> }
>>> }
>>> -#endif
>>
>>
>> This first checks if the algorithm is implemented, and then checks if
>> the algorithm is supported by the current build - that confused me for a
>> bit. It seems unnecessary to check for unimplemented algorithms before
>> looping. That also requires referencing both GZIP and LZ4 in two
>> places.
> 
> I am not certain that it is unnecessary, at least not in the way that is
> described. The idea is that new compression methods can be added, without
> changing the archive's version number. It is very possible that it is
> requested to restore an archive compressed with a method not implemented
> in the current binary. The first check takes care of that and sets
> supports_compression only for the supported versions. It is possible to
> enter the loop with supports_compression already set to false, for example
> because the archive was compressed with ZSTD, triggering the fatal error.
> 
> Of course, one can throw the error before entering the loop, yet I think
> that it does not help the readability of the code. IMHO it is easier to
> follow if the error is thrown once during that check.
> 

Actually, I don't understand why 0001 moves the check into the loop. I
mean, why not check HAVE_LIBZ before the loop?

>>
>> I think it could be written to avoid the need to change for added
>> compression algorithms:
>>
>> + if (te->hadDumper && (te->reqs & REQ_DATA) != 0)
>>
>> + {
>> + /* Check if the compression algorithm is supported */
>> + pg_compress_specification spec;
>> + parse_compress_specification(AH->compression_spec.algorithm, NULL, &spec);
>>
>> + if (spec->parse_error != NULL)
>>
>> + pg_fatal(spec->parse_error);
>>
>> + }
> 
> I am not certain how that would work in the example with ZSTD above.
> If I am not wrong, parse_compress_specification() will not throw an error
> if the codebase supports ZSTD, yet this specific pg_dump binary will not
> support it because ZSTD is not implemented. parse_compress_specification()
> is not aware of that and should not be aware of it, should it?
> 

Not sure. What happens in a similar situation now? That is, when trying
to deal with an archive gzip-compressed in a build without libz?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-01-25 Thread Justin Pryzby
On Wed, Jan 25, 2023 at 03:37:12PM +, gkokola...@pm.me wrote:
> Of course, one can throw the error before entering the loop, yet I think
> that it does not help the readability of the code. IMHO it is easier to
> follow if the error is thrown once during that check.

> If anything, I can suggest to throw an error much earlier, i.e. in ReadHead(),
> and remove altogether this check. On the other hand, I like the belts
> and suspenders approach because there are no more checks after this point.

While looking at this, I realized that commit 5e73a6048 introduced a
regression:

@@ -3740,19 +3762,24 @@ ReadHead(ArchiveHandle *AH)

-   if (AH->compression != 0)
-   pg_log_warning("archive is compressed, but this installation 
does not support compression -- no data will be available");
+   if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
+   pg_fatal("archive is compressed, but this installation does not 
support compression");

Before, it was possible to restore non-data chunks of a dump file, even
if the current build didn't support its compression.  But that's now
impossible - and it makes the code we're discussing in RestoreArchive()
unreachable.

I don't think we can currently test for that, since it requires creating a dump
using a build --with compression and then trying to restore using a build
--without compression.  The coverage report disagrees with me, though...
https://coverage.postgresql.org/src/bin/pg_dump/pg_backup_archiver.c.gcov.html#3901

> > I think it could be written to avoid the need to change for added
> > compression algorithms:
...
> 
> I am not certain how that would work in the example with ZSTD above.
> If I am not wrong, parse_compress_specification() will not throw an error
> if the codebase supports ZSTD, yet this specific pg_dump binary will not
> support it because ZSTD is not implemented. parse_compress_specification()
> is not aware of that and should not be aware of it, should it?

You're right.

I think the 001 patch should try to remove hardcoded references to
LIBZ/GZIP, such that the later patches don't need to update those same
places for LZ4.  For example in ReadHead() and RestoreArchive(), and
maybe other places dealing with file extensions.  Maybe that could be
done by adding a function specific to pg_dump indicating whether or not
an algorithm is implemented and supported.

-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-01-25 Thread gkokolatos






--- Original Message ---
On Wednesday, January 25th, 2023 at 6:28 PM, Tomas Vondra 
 wrote:


> 
> 
> 
> On 1/25/23 16:37, gkokola...@pm.me wrote:
> 
> > --- Original Message ---
> > On Wednesday, January 25th, 2023 at 2:42 AM, Justin Pryzby 
> > pry...@telsasoft.com wrote:
> > 
> > > On Tue, Jan 24, 2023 at 03:56:20PM +, gkokola...@pm.me wrote:
> > > 
> > > > On Monday, January 23rd, 2023 at 7:00 PM, Justin Pryzby 
> > > > pry...@telsasoft.com wrote:
> > > > 
> > > > > On Mon, Jan 23, 2023 at 05:31:55PM +, gkokola...@pm.me wrote:
> > > > > 
> > > > > > Please find attached v23 which reintroduces the split.
> > > > > > 
> > > > > > 0001 is reworked to have a reduced footprint than before. Also in 
> > > > > > an attempt
> > > > > > to facilitate the readability, 0002 splits the API's and the 
> > > > > > uncompressed
> > > > > > implementation in separate files.
> > > > > 
> > > > > Thanks for updating the patch. Could you address the review comments I
> > > > > sent here ?
> > > > > https://www.postgresql.org/message-id/20230108194524.GA27637%40telsasoft.com
> > > > 
> > > > Please find v24 attached.
> > > 
> > > Thanks for updating the patch.
> > > 
> > > In 001, RestoreArchive() does:
> > > 
> > > > -#ifndef HAVE_LIBZ
> > > > - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP &&
> > > > - AH->PrintTocDataPtr != NULL)
> > > > + supports_compression = false;
> > > > + if (AH->compression_spec.algorithm == PG_COMPRESSION_NONE ||
> > > > + AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> > > > + supports_compression = true;
> > > > +
> > > > + if (AH->PrintTocDataPtr != NULL)
> > > > {
> > > > for (te = AH->toc->next; te != AH->toc; te = te->next)
> > > > {
> > > > if (te->hadDumper && (te->reqs & REQ_DATA) != 0)
> > > > - pg_fatal("cannot restore from compressed archive (compression not 
> > > > supported in this installation)");
> > > > + {
> > > > +#ifndef HAVE_LIBZ
> > > > + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> > > > + supports_compression = false;
> > > > +#endif
> > > > + if (supports_compression == false)
> > > > + pg_fatal("cannot restore from compressed archive (compression not 
> > > > supported in this installation)");
> > > > + }
> > > > }
> > > > }
> > > > -#endif
> > > 
> > > This first checks if the algorithm is implemented, and then checks if
> > > the algorithm is supported by the current build - that confused me for a
> > > bit. It seems unnecessary to check for unimplemented algorithms before
> > > looping. That also requires referencing both GZIP and LZ4 in two
> > > places.
> > 
> > I am not certain that it is unnecessary, at least not in the way that is
> > described. The idea is that new compression methods can be added, without
> > changing the archive's version number. It is very possible that it is
> > requested to restore an archive compressed with a method not implemented
> > in the current binary. The first check takes care of that and sets
> > supports_compression only for the supported versions. It is possible to
> > enter the loop with supports_compression already set to false, for example
> > because the archive was compressed with ZSTD, triggering the fatal error.
> > 
> > Of course, one can throw the error before entering the loop, yet I think
> > that it does not help the readability of the code. IMHO it is easier to
> > follow if the error is thrown once during that check.
> 
> 
> Actually, I don't understand why 0001 moves the check into the loop. I
> mean, why not check HAVE_LIBZ before the loop?

The intention is to be able to restore archives that don't contain
data. In that case compression becomes irrelevant as only the data in
an archive is compressed.

> 
> > > I think it could be written to avoid the need to change for added
> > > compression algorithms:
> > > 
> > > + if (te->hadDumper && (te->reqs & REQ_DATA) != 0)
> > > 
> > > + {
> > > + /* Check if the compression algorithm is supported */
> > > + pg_compress_specification spec;
> > > + parse_compress_specification(AH->compression_spec.algorithm, NULL, 
> > > &spec);
> > > 
> > > + if (spec->parse_error != NULL)
> > > 
> > > + pg_fatal(spec->parse_error);
> > > 
> > > + }
> > 
> > I am not certain how that would work in the example with ZSTD above.
> > If I am not wrong, parse_compress_specification() will not throw an error
> > if the codebase supports ZSTD, yet this specific pg_dump binary will not
> > support it because ZSTD is not implemented. parse_compress_specification()
> > is not aware of that and should not be aware of it, should it?
> 
> 
> Not sure. What happens in a similar situation now? That is, when trying
> to deal with an archive gzip-compressed in a build without libz?


In case that there are no data chunks, the archive will be restored.

Cheers,
//Georgios


> 
> regards
> 
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-01-25 Thread gkokolatos






--- Original Message ---
On Wednesday, January 25th, 2023 at 7:00 PM, Justin Pryzby 
 wrote:


> 
> 
> On Wed, Jan 25, 2023 at 03:37:12PM +, gkokola...@pm.me wrote:
> 

> While looking at this, I realized that commit 5e73a6048 introduced a
> regression:
> 
> @@ -3740,19 +3762,24 @@ ReadHead(ArchiveHandle *AH)
> 
> - if (AH->compression != 0)
> 
> - pg_log_warning("archive is compressed, but this installation does not 
> support compression -- no data will be available");
> + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> 
> + pg_fatal("archive is compressed, but this installation does not support 
> compression");
> 
> Before, it was possible to restore non-data chunks of a dump file, even
> if the current build didn't support its compression. But that's now
> impossible - and it makes the code we're discussing in RestoreArchive()
> unreachable.

Nice catch!

Cheers,
//Georgios

> --
> Justin




Re: Add LZ4 compression in pg_dump

2023-01-25 Thread Michael Paquier
On Wed, Jan 25, 2023 at 07:57:18PM +, gkokola...@pm.me wrote:
> Nice catch!

Let me see..
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-01-25 Thread Michael Paquier
On Wed, Jan 25, 2023 at 12:00:20PM -0600, Justin Pryzby wrote:
> While looking at this, I realized that commit 5e73a6048 introduced a
> regression:
> 
> @@ -3740,19 +3762,24 @@ ReadHead(ArchiveHandle *AH)
> 
> -   if (AH->compression != 0)
> -   pg_log_warning("archive is compressed, but this installation 
> does not support compression -- no data will be available");
> +   if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> +   pg_fatal("archive is compressed, but this installation does 
> not support compression");
> 
> Before, it was possible to restore non-data chunks of a dump file, even
> if the current build didn't support its compression.  But that's now
> impossible - and it makes the code we're discussing in RestoreArchive()
> unreachable.

Right.  The impacts the possibility of looking at the header data,
which is useful with pg_restore -l for example.  On a dump that's been
compressed, pg_restore <= 15 would always print the TOC entries with
or without compression support.  On HEAD, this code prevents the
header lookup.  All *nix or BSD platforms should have support for
zlib, I hope..  Still that could be an issue on Windows, and this
would prevent folks to check the contents of the dumps after saving it
on a WIN32 host, so let's undo that.

So, I have been testing the attached with four sets of binaries from
15/HEAD and with[out] zlib support, and this brings HEAD back to the
pre-15 state (header information able to show up, still failure when
attempting to restore the dump's data without zlib).

> I don't think we can currently test for that, since it requires creating a 
> dump
> using a build --with compression and then trying to restore using a build
> --without compression.

Right, the location of the data is in the header, and I don't see how
you would be able to do that without two sets of binaries at hand, but
our tests run under the assumption that you have only one.  Well,
that's not entirely true as well, as you could create a TAP test like
pg_upgrade that relies on a environment variable pointing to a second
set of binaries.  That's not worth the complication involved, IMO.

> The coverage report disagrees with me, though...
> https://coverage.postgresql.org/src/bin/pg_dump/pg_backup_archiver.c.gcov.html#3901

Isn't that one of the tests like compression_gzip_plain?

Thoughts?
--
Michael
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index ba5e6acbbb..cb4386f871 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -3784,7 +3784,7 @@ ReadHead(ArchiveHandle *AH)
 
 #ifndef HAVE_LIBZ
 	if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
-		pg_fatal("archive is compressed, but this installation does not support compression");
+		pg_log_warning("archive is compressed, but this installation does not support compression -- no data will be available");
 #endif
 
 	if (AH->version >= K_VERS_1_4)


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-01-25 Thread Justin Pryzby
On Thu, Jan 26, 2023 at 02:49:27PM +0900, Michael Paquier wrote:
> On Wed, Jan 25, 2023 at 12:00:20PM -0600, Justin Pryzby wrote:
> > While looking at this, I realized that commit 5e73a6048 introduced a
> > regression:
> > 
> > @@ -3740,19 +3762,24 @@ ReadHead(ArchiveHandle *AH)
> > 
> > -   if (AH->compression != 0)
> > -   pg_log_warning("archive is compressed, but this 
> > installation does not support compression -- no data will be available");
> > +   if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> > +   pg_fatal("archive is compressed, but this installation does 
> > not support compression");
> > 
> > Before, it was possible to restore non-data chunks of a dump file, even
> > if the current build didn't support its compression.  But that's now
> > impossible - and it makes the code we're discussing in RestoreArchive()
> > unreachable.
> 
> Right.  The impacts the possibility of looking at the header data,
> which is useful with pg_restore -l for example.

It's not just header data - it's schema and (I think) everything other
than table data.

> > The coverage report disagrees with me, though...
> > https://coverage.postgresql.org/src/bin/pg_dump/pg_backup_archiver.c.gcov.html#3901
> 
> Isn't that one of the tests like compression_gzip_plain?

I'm not sure what you mean.  Plain dump is restored with psql and not
with pg_restore.

My line number was wrong:
https://coverage.postgresql.org/src/bin/pg_dump/pg_backup_archiver.c.gcov.html#390

What test would hit that code without rebuilding ?

394 : #ifndef HAVE_LIBZ
395 : if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP 
&&

> Thoughts?
>  #ifndef HAVE_LIBZ
>   if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> - pg_fatal("archive is compressed, but this installation does not 
> support compression");
> + pg_log_warning("archive is compressed, but this installation 
> does not support compression -- no data will be available");

Your patch is fine for now, but these errors should eventually specify
*which* compression algorithm is unavailable.  I think that should be a
part of the 001 patch, ideally in a way that minimizes the number of
places which need to be updated when adding an algorithm.

-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-01-26 Thread Michael Paquier
On Thu, Jan 26, 2023 at 11:24:47AM +, gkokola...@pm.me wrote:
> I gave this a little bit of thought. I think that ReadHead should not
> emit a warning, or at least not this warning as it is slightly misleading.
> It implies that it will automatically turn off data restoration, which is
> false. Further ahead, the code will fail with a conflicting error message
> stating that the compression is not available.
> 
> Instead, it would be cleaner both for the user and the maintainer to
> move the check in RestoreArchive and make it the sole responsible for
> this logic.

-pg_fatal("cannot restore from compressed archive (compression not 
supported in this installation)");
+pg_fatal("cannot restore data from compressed archive (compression not 
supported in this installation)");
Hmm.  I don't mind changing this part as you suggest.

-#ifndef HAVE_LIBZ
-   if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
-   pg_fatal("archive is compressed, but this installation does not 
support compression");
-#endif
However I think that we'd better keep the warning, as it can offer a
hint when using pg_restore -l not built with compression support if
looking at a dump that has been compressed.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-01-26 Thread Justin Pryzby
On Wed, Jan 25, 2023 at 07:57:18PM +, gkokola...@pm.me wrote:
> On Wednesday, January 25th, 2023 at 7:00 PM, Justin Pryzby 
>  wrote:
> > While looking at this, I realized that commit 5e73a6048 introduced a
> > regression:
> > 
> > @@ -3740,19 +3762,24 @@ ReadHead(ArchiveHandle *AH)
> > 
> > - if (AH->compression != 0)
> > 
> > - pg_log_warning("archive is compressed, but this installation does not 
> > support compression -- no data will be available");
> > + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> > 
> > + pg_fatal("archive is compressed, but this installation does not support 
> > compression");
> > 
> > Before, it was possible to restore non-data chunks of a dump file, even
> > if the current build didn't support its compression. But that's now
> > impossible - and it makes the code we're discussing in RestoreArchive()
> > unreachable.

On Thu, Jan 26, 2023 at 08:53:28PM +0900, Michael Paquier wrote:
> On Thu, Jan 26, 2023 at 11:24:47AM +, gkokola...@pm.me wrote:
> > I gave this a little bit of thought. I think that ReadHead should not
> > emit a warning, or at least not this warning as it is slightly misleading.
> > It implies that it will automatically turn off data restoration, which is
> > false. Further ahead, the code will fail with a conflicting error message
> > stating that the compression is not available.
> > 
> > Instead, it would be cleaner both for the user and the maintainer to
> > move the check in RestoreArchive and make it the sole responsible for
> > this logic.
> 
> -pg_fatal("cannot restore from compressed archive (compression not 
> supported in this installation)");
> +pg_fatal("cannot restore data from compressed archive (compression not 
> supported in this installation)");
> Hmm.  I don't mind changing this part as you suggest.
> 
> -#ifndef HAVE_LIBZ
> -   if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> -   pg_fatal("archive is compressed, but this installation does 
> not support compression");
> -#endif
> However I think that we'd better keep the warning, as it can offer a
> hint when using pg_restore -l not built with compression support if
> looking at a dump that has been compressed.

Yeah.  But the original log_warning text was better, and should be
restored:

-   if (AH->compression != 0)
-   pg_log_warning("archive is compressed, but this installation 
does not support compression -- no data will be available");

That commit also added this to pg-dump.c:

+   case PG_COMPRESSION_ZSTD:
+   pg_fatal("compression with %s is not yet supported", 
"ZSTD");
+   break;
+   case PG_COMPRESSION_LZ4:
+   pg_fatal("compression with %s is not yet supported", 
"LZ4");
+   break;

In 002, that could be simplified by re-using the supports_compression()
function.  (And maybe the same in WriteDataToArchive()?)

-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-01-26 Thread Michael Paquier
On Thu, Jan 26, 2023 at 12:22:45PM -0600, Justin Pryzby wrote:
> Yeah.  But the original log_warning text was better, and should be
> restored:
> 
> -   if (AH->compression != 0)
> -   pg_log_warning("archive is compressed, but this installation 
> does not support compression -- no data will be available");

Yeah, this one's on me.  So I have gone with the simplest solution and
applied a fix to restore the original behavior, with the same warning
showing up.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-01-26 Thread Justin Pryzby
On Mon, Jan 16, 2023 at 11:54:46AM +0900, Michael Paquier wrote:
> On Sun, Jan 15, 2023 at 07:56:25PM -0600, Justin Pryzby wrote:
> > On Mon, Jan 16, 2023 at 10:28:50AM +0900, Michael Paquier wrote:
> >> The functions changed by 0001 are cfopen[_write](),
> >> AllocateCompressor() and ReadDataFromArchive().  Why is it a good idea
> >> to change these interfaces which basically exist to handle inputs?
> > 
> > I changed to pass pg_compress_specification as a pointer, since that's
> > the usual convention for structs, as followed by the existing uses of
> > pg_compress_specification.
> 
> Okay, but what do we gain here?  It seems to me that this introduces
> the risk that a careless change in one of the internal routines if
> they change slight;ly compress_spec, hence impacting any of their
> callers?  Or is that fixing an actual bug (except if I am missing your
> point, that does not seem to be the case)?  

To circle back to this: I was not saying there's any bug.  The proposed
change was only to follow normal and existing normal conventions for
passing structs.  It could also be a pointer to const.  It's fine with
me if you say that it's intentional how it's written already.

> >> Is there some benefit in changing compression_spec within the
> >> internals of these routines before going back one layer down to their
> >> callers?  Changing the compression_spec on-the-fly in these internal
> >> paths could be risky, actually, no?
> > 
> > I think what you're saying is that if the spec is passed as a pointer,
> > then the called functions shouldn't set spec->algorithm=something.
> 
> Yes.  HEAD makes sure of that, 0001 would not prevent that.  So I am a
> bit confused in seeing how this is a benefit.




Re: Add LZ4 compression in pg_dump

2023-01-27 Thread Justin Pryzby
On Thu, Jan 26, 2023 at 12:22:45PM -0600, Justin Pryzby wrote:
> That commit also added this to pg-dump.c:
> 
> +   case PG_COMPRESSION_ZSTD:
> +   pg_fatal("compression with %s is not yet supported", 
> "ZSTD");
> +   break;
> +   case PG_COMPRESSION_LZ4:
> +   pg_fatal("compression with %s is not yet supported", 
> "LZ4");
> +   break;
> 
> In 002, that could be simplified by re-using the supports_compression()
> function.  (And maybe the same in WriteDataToArchive()?)

The first patch aims to minimize references to ".gz" and "GZIP" and
ZLIB.  pg_backup_directory.c comments still refers to ".gz".  I think
the patch should ideally change to refer to "the compressed file
extension" (similar to compress_io.c), avoiding the need to update it
later.

I think the file extension stuff could be generalized, so it doesn't
need to be updated in multiple places (pg_backup_directory.c and
compress_io.c).  Maybe it's useful to add a function to return the
extension of a given compression method.  It could go in compression.c,
and be useful in basebackup.

For the 2nd patch:

I might be in the minority, but I still think some references to "gzip"
should say "zlib":

+} GzipCompressorState;
+
+/* Private routines that support gzip compressed data I/O */
+static void
+DeflateCompressorGzip(ArchiveHandle *AH, CompressorState *cs, bool flush)

In my mind, three things here are misleading, because it doesn't use
gzip headers:

| GzipCompressorState, DeflateCompressorGzip, "gzip compressed".

This comment is about exactly that:

  * underlying stream. The second API is a wrapper around fopen/gzopen and
  * friends, providing an interface similar to those, but abstracts away
  * the possible compression. Both APIs use libz for the compression, but
  * the second API uses gzip headers, so the resulting files can be easily
  * manipulated with the gzip utility.

AIUI, Michael says that it's fine that the user-facing command-line
options use "-Z gzip" (even though the "custom" format doesn't use gzip
headers).  I'm okay with that, as long as that's discussed/understood.

-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-01-31 Thread gkokolatos






--- Original Message ---
On Friday, January 27th, 2023 at 6:23 PM, Justin Pryzby  
wrote:


> 
> 
> On Thu, Jan 26, 2023 at 12:22:45PM -0600, Justin Pryzby wrote:
> 
> > That commit also added this to pg-dump.c:
> > 
> > + case PG_COMPRESSION_ZSTD:
> > + pg_fatal("compression with %s is not yet supported", "ZSTD");
> > + break;
> > + case PG_COMPRESSION_LZ4:
> > + pg_fatal("compression with %s is not yet supported", "LZ4");
> > + break;
> > 
> > In 002, that could be simplified by re-using the supports_compression()
> > function. (And maybe the same in WriteDataToArchive()?)
> 
> 
> The first patch aims to minimize references to ".gz" and "GZIP" and
> ZLIB. pg_backup_directory.c comments still refers to ".gz". I think
> the patch should ideally change to refer to "the compressed file
> extension" (similar to compress_io.c), avoiding the need to update it
> later.
> 
> I think the file extension stuff could be generalized, so it doesn't
> need to be updated in multiple places (pg_backup_directory.c and
> compress_io.c). Maybe it's useful to add a function to return the
> extension of a given compression method. It could go in compression.c,
> and be useful in basebackup.
> 
> For the 2nd patch:
> 
> I might be in the minority, but I still think some references to "gzip"
> should say "zlib":
> 
> +} GzipCompressorState;
> +
> +/* Private routines that support gzip compressed data I/O */
> +static void
> +DeflateCompressorGzip(ArchiveHandle *AH, CompressorState *cs, bool flush)
> 
> In my mind, three things here are misleading, because it doesn't use
> gzip headers:
> 
> | GzipCompressorState, DeflateCompressorGzip, "gzip compressed".
> 
> This comment is about exactly that:
> 
> * underlying stream. The second API is a wrapper around fopen/gzopen and
> * friends, providing an interface similar to those, but abstracts away
> * the possible compression. Both APIs use libz for the compression, but
> * the second API uses gzip headers, so the resulting files can be easily
> * manipulated with the gzip utility.
> 
> AIUI, Michael says that it's fine that the user-facing command-line
> options use "-Z gzip" (even though the "custom" format doesn't use gzip
> headers). I'm okay with that, as long as that's discussed/understood.
> 

Thank you for the input Justin. I am currently waiting for input from a
third person to get some conclusion. I thought that it should be stated
before my inactiveness is considered as indifference, which is not.

Cheers,
//Georgios

> --
> Justin




Re: Add LZ4 compression in pg_dump

2022-11-02 Thread Justin Pryzby
Checking if you'll be able to submit new patches soon ?




Re: Add LZ4 compression in pg_dump

2022-11-06 Thread gkokolatos
On Wed, Nov 2, 2022 at 14:28, Justin Pryzby  wrote:

> Checking if you'll be able to submit new patches soon ?

Thank you for checking up. Expect new versions within this commitfest cycle.

Re: Add LZ4 compression in pg_dump

2023-02-07 Thread Justin Pryzby
On Tue, Jan 31, 2023 at 09:00:56AM +, gkokola...@pm.me wrote:
> > In my mind, three things here are misleading, because it doesn't use
> > gzip headers:
> > 
> > | GzipCompressorState, DeflateCompressorGzip, "gzip compressed".
> > 
> > This comment is about exactly that:
> > 
> > * underlying stream. The second API is a wrapper around fopen/gzopen and
> > * friends, providing an interface similar to those, but abstracts away
> > * the possible compression. Both APIs use libz for the compression, but
> > * the second API uses gzip headers, so the resulting files can be easily
> > * manipulated with the gzip utility.
> > 
> > AIUI, Michael says that it's fine that the user-facing command-line
> > options use "-Z gzip" (even though the "custom" format doesn't use gzip
> > headers). I'm okay with that, as long as that's discussed/understood.
> 
> Thank you for the input Justin. I am currently waiting for input from a
> third person to get some conclusion. I thought that it should be stated
> before my inactiveness is considered as indifference, which is not.

I'm not sure what there is to lose by making the names more accurate -
especially since they're private/internal-only.

Tomas marked himself as a committer, so maybe could comment.

It'd be nice to also come to some conclusion about whether -Fc -Z gzip
is confusing (due to not actually using gzip).

BTW, do you intend to merge this for v16 ?  I verified in earlier patch
versions that tests all pass with lz4 as the default compression method.
And checked that gzip output is compatible with before, and that old
dumps restore correctly, and there's no memory leaks or other errors.

-- 
Justin




RE: Add LZ4 compression in pg_dump

2023-02-15 Thread shiy.f...@fujitsu.com
On Fri, Jan 27, 2023 2:04 AM gkokola...@pm.me  wrote:
> 
> --- Original Message ---
> On Thursday, January 26th, 2023 at 12:53 PM, Michael Paquier
>  wrote:
> 
> 
> >
> >
> > On Thu, Jan 26, 2023 at 11:24:47AM +, gkokola...@pm.me wrote:
> >
> > > I gave this a little bit of thought. I think that ReadHead should not
> > > emit a warning, or at least not this warning as it is slightly misleading.
> > > It implies that it will automatically turn off data restoration, which is
> > > false. Further ahead, the code will fail with a conflicting error message
> > > stating that the compression is not available.
> > >
> > > Instead, it would be cleaner both for the user and the maintainer to
> > > move the check in RestoreArchive and make it the sole responsible for
> > > this logic.
> >
> >
> > - pg_fatal("cannot restore from compressed archive (compression not
> supported in this installation)");
> > + pg_fatal("cannot restore data from compressed archive (compression not
> supported in this installation)");
> > Hmm. I don't mind changing this part as you suggest.
> >
> > -#ifndef HAVE_LIBZ
> > - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> >
> > - pg_fatal("archive is compressed, but this installation does not support
> compression");
> > -#endif
> > However I think that we'd better keep the warning, as it can offer a
> > hint when using pg_restore -l not built with compression support if
> > looking at a dump that has been compressed.
> 
> Fair enough. Please find v27 attached.
> 

Hi,

I am interested in this feature and tried the patch. While reading the comments,
I noticed some minor things that could possibly be improved (in v27-0003 patch).

1.
+   /*
+* Open a file for writing.
+*
+* 'mode' can be one of ''w', 'wb', 'a', and 'ab'. Requrires an already
+* initialized CompressFileHandle.
+*/
+   int (*open_write_func) (const char *path, const 
char *mode,
+   
CompressFileHandle *CFH);

There is a redundant single quote in front of 'w'.

2.
/*
 * Callback function for WriteDataToArchive. Writes one block of (compressed)
 * data to the archive.
 */
/*
 * Callback function for ReadDataFromArchive. To keep things simple, we
 * always read one compressed block at a time.
 */

Should the function names in the comments be updated?

WriteDataToArchive
->
writeData

ReadDataFromArchive
->
readData

3.
+   Assert(strcmp(mode, "r") == 0 || strcmp(mode, "rb") == 0);

Could we use PG_BINARY_R instead of "r" and "rb" here?

Regards,
Shi Yu



Re: Add LZ4 compression in pg_dump

2023-02-19 Thread Justin Pryzby
Some little updates since I last checked:

+ * This file also includes the implementation when compression is none for
+ * both API's.

=> this comment is obsolete.

s/deffer/infer/ ?
or determine ?
This typo occurs multiple times.

currently this includes only ".gz"
=> remove this phase from the 002 patch (or at least update it in 003).

deferred by iteratively
=> inferred?

s/Requrires/Requires/
twice.

s/occured/occurred/

s/disc/disk/ ?
Probably unimportant, but "disc" isn't used anywhere else.

"compress file handle"
=> maybe these should say "compressed"

supports_compression():
Since this is an exported function, it should probably be called
pgdump_supports_compresion.




Re: Add LZ4 compression in pg_dump

2023-02-23 Thread Tomas Vondra
Thanks for v30 with the updated commit messages. I've pushed 0001 after
fixing a comment typo and removing (I think) an unnecessary change in an
error message.

I'll give the buildfarm a bit of time before pushing 0002 and 0003.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-02-23 Thread Tomas Vondra
On 2/23/23 16:26, Tomas Vondra wrote:
> Thanks for v30 with the updated commit messages. I've pushed 0001 after
> fixing a comment typo and removing (I think) an unnecessary change in an
> error message.
> 
> I'll give the buildfarm a bit of time before pushing 0002 and 0003.
> 

I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
and marked the CF entry as committed. Thanks for the patch!

I wonder how difficult would it be to add the zstd compression, so that
we don't have the annoying "unsupported" cases.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-02-23 Thread Justin Pryzby
On Thu, Feb 23, 2023 at 09:24:46PM +0100, Tomas Vondra wrote:
> I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
> and marked the CF entry as committed. Thanks for the patch!

A big thanks from me to everyone involved.

> I wonder how difficult would it be to add the zstd compression, so that
> we don't have the annoying "unsupported" cases.

I'll send a patch soon.  I first submitted patches for that 2 years ago
(before PGDG was ready to add zstd).
https://commitfest.postgresql.org/31/2888/

-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-02-23 Thread Michael Paquier
On Thu, Feb 23, 2023 at 07:51:16PM -0600, Justin Pryzby wrote:
> On Thu, Feb 23, 2023 at 09:24:46PM +0100, Tomas Vondra wrote:
>> I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
>> and marked the CF entry as committed. Thanks for the patch!
> 
> A big thanks from me to everyone involved.

Wow, nice!  The APIs are clear to follow.

> I'll send a patch soon.  I first submitted patches for that 2 years ago
> (before PGDG was ready to add zstd).
> https://commitfest.postgresql.org/31/2888/

Thanks.  It should be straight-forward to see that in 16, I guess.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-02-24 Thread gkokolatos






--- Original Message ---
On Friday, February 24th, 2023 at 5:35 AM, Michael Paquier 
 wrote:


> 
> 
> On Thu, Feb 23, 2023 at 07:51:16PM -0600, Justin Pryzby wrote:
> 
> > On Thu, Feb 23, 2023 at 09:24:46PM +0100, Tomas Vondra wrote:
> > 
> > > I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
> > > and marked the CF entry as committed. Thanks for the patch!
> > 
> > A big thanks from me to everyone involved.
> 
> 
> Wow, nice! The APIs are clear to follow.

I am out of words, thank you all so very much. I learned a lot. 

> 
> > I'll send a patch soon. I first submitted patches for that 2 years ago
> > (before PGDG was ready to add zstd).
> > https://commitfest.postgresql.org/31/2888/
> 
> 
> Thanks. It should be straight-forward to see that in 16, I guess.
> --
> Michael




Re: Add LZ4 compression in pg_dump

2023-02-24 Thread Justin Pryzby
I have some fixes (attached) and questions while polishing the patch for
zstd compression.  The fixes are small and could be integrated with the
patch for zstd, but could be applied independently.

- I'm unclear about get_error_func().  That's called in three places
  from pg_backup_directory.c, after failures from write_func(), to
  supply an compression-specific error message to pg_fatal().  But it's
  not being used outside of directory format, nor for errors for other
  function pointers, or even for all errors in write_func().  Is there
  some reason why each compression method's write_func() shouldn't call
  pg_fatal() directly, with its compression-specific message ?

- I still think supports_compression() should be renamed, or made into a
  static function in the necessary file.  The main reason is that it's
  more clear what it indicates - whether compression is "implemented by
  pgdump" and not whether compression is "supported by this postgres
  build".  It also seems possible that we'd want to add a function
  called something like supports_compression(), indicating whether the
  algorithm is supported by the current build.  It'd be better if pgdump
  didn't subjugate that name.

- Finally, the "Nothing to do in the default case" comment comes from
  Michael's commit 5e73a6048:

+   /*
+* Custom and directory formats are compressed by default with gzip when
+* available, not the others.
+*/
+   if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
+   !user_compression_defined)
{
 #ifdef HAVE_LIBZ
-   if (archiveFormat == archCustom || archiveFormat == 
archDirectory)
-   compressLevel = Z_DEFAULT_COMPRESSION;
-   else
+   parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
+
&compression_spec);
+#else
+   /* Nothing to do in the default case */
 #endif
-   compressLevel = 0;
}


As the comment says: for -Fc and -Fd, the compression is set to zlib, if
enabled, and when not otherwise specified by the user.

Before 5e73a6048, this set compressLevel=0 for -Fp and -Ft, *and* when
zlib was unavailable.

But I'm not sure why there's now an empty "#else".  I also don't know
what "the default case" refers to.

Maybe the best thing here is to move the preprocessor #if, since it's no
longer in the middle of a runtime conditional:

 #ifdef HAVE_LIBZ
+   if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
+   !user_compression_defined)
+   parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
+&compression_spec);
 #endif

...but that elicits a warning about "variable set but not used"...

-- 
Justin
>From e31901414a8509317297972d1033c2e08629d903 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 24 Feb 2023 14:07:09 -0600
Subject: [PATCH] f!fixes for LZ4

---
 doc/src/sgml/ref/pg_dump.sgml| 4 ++--
 src/bin/pg_dump/compress_io.c| 2 +-
 src/bin/pg_dump/compress_io.h| 4 ++--
 src/bin/pg_dump/compress_lz4.c   | 4 ++--
 src/bin/pg_dump/pg_backup_archiver.c | 4 ++--
 src/bin/pg_dump/pg_dump.c| 2 --
 src/bin/pg_dump/t/002_pg_dump.pl | 6 +++---
 7 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 49d218905fb..6fbe49f7ede 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -331,7 +331,7 @@ PostgreSQL documentation
can read. A directory format archive can be manipulated with
standard Unix tools; for example, files in an uncompressed archive
can be compressed with the gzip or
-   lz4tool.
+   lz4 tools.
This format is compressed by default using gzip
and also supports parallel dumps.
   
@@ -655,7 +655,7 @@ PostgreSQL documentation

 Specify the compression method and/or the compression level to use.
 The compression method can be set to gzip or
-lz4 or none for no compression.
+lz4, or none for no compression.
 A compression detail string can optionally be specified.  If the
 detail string is an integer, it specifies the compression level.
 Otherwise, it should be a comma-separated list of items, each of the
diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index ce06f1eac9c..9239dbb2755 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -83,7 +83,7 @@
  * used by the caller in an error message.
  */
 char *
-supports_compression(const pg_compress_specification compression_spec)
+pgdump_supports_compression(const pg_compress_specification compression_spec)
 {
 	const pg_compress_algorithm	algorithm = compression_spec.algorithm;
 	

Re: Add LZ4 compression in pg_dump

2023-02-25 Thread Justin Pryzby
On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
> I have some fixes (attached) and questions while polishing the patch for
> zstd compression.  The fixes are small and could be integrated with the
> patch for zstd, but could be applied independently.

One more - WriteDataToArchiveGzip() says:

+   if (cs->compression_spec.level == 0)
+   pg_fatal("requested to compress the archive yet no level was 
specified");

That was added at e9960732a.  

But if you specify gzip:0, the compression level is already enforced by
validate_compress_specification(), before hitting gzip.c:

| pg_dump: error: invalid compression specification: compression algorithm 
"gzip" expects a compression level between 1 and 9 (default at -1)

5e73a6048 intended that to work as before, and you *can* specify -Z0:

The change is backward-compatible, hence specifying only an integer
leads to no compression for a level of 0 and gzip compression when the
level is greater than 0.

$ time ./src/bin/pg_dump/pg_dump -h /tmp regression -t int8_tbl -Fp 
--compress 0 |file -
/dev/stdin: ASCII text

Right now, I think that pg_fatal in gzip.c is dead code - that was first
added in the patch version sent on 21 Dec 2022.

-- 
Justin
>From 07b446803ec89ccd0954583640f314fa7f77048f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 24 Feb 2023 14:07:09 -0600
Subject: [PATCH] f!fixes for LZ4

---
 doc/src/sgml/ref/pg_dump.sgml| 4 ++--
 src/bin/pg_dump/compress_gzip.c  | 7 ---
 src/bin/pg_dump/compress_io.c| 2 +-
 src/bin/pg_dump/compress_io.h| 4 ++--
 src/bin/pg_dump/compress_lz4.c   | 4 ++--
 src/bin/pg_dump/pg_backup_archiver.c | 4 ++--
 src/bin/pg_dump/pg_dump.c| 2 --
 src/bin/pg_dump/t/002_pg_dump.pl | 6 +++---
 8 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 49d218905fb..6fbe49f7ede 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -331,7 +331,7 @@ PostgreSQL documentation
can read. A directory format archive can be manipulated with
standard Unix tools; for example, files in an uncompressed archive
can be compressed with the gzip or
-   lz4tool.
+   lz4 tools.
This format is compressed by default using gzip
and also supports parallel dumps.
   
@@ -655,7 +655,7 @@ PostgreSQL documentation

 Specify the compression method and/or the compression level to use.
 The compression method can be set to gzip or
-lz4 or none for no compression.
+lz4, or none for no compression.
 A compression detail string can optionally be specified.  If the
 detail string is an integer, it specifies the compression level.
 Otherwise, it should be a comma-separated list of items, each of the
diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index 0af65afeb4e..af68fd27a12 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -123,13 +123,6 @@ WriteDataToArchiveGzip(ArchiveHandle *AH, CompressorState *cs,
 		gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
 		gzipcs->outsize = ZLIB_OUT_SIZE;
 
-		/*
-		 * A level of zero simply copies the input one block at the time. This
-		 * is probably not what the user wanted when calling this interface.
-		 */
-		if (cs->compression_spec.level == 0)
-			pg_fatal("requested to compress the archive yet no level was specified");
-
 		if (deflateInit(zp, cs->compression_spec.level) != Z_OK)
 			pg_fatal("could not initialize compression library: %s", zp->msg);
 
diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index ce06f1eac9c..9239dbb2755 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -83,7 +83,7 @@
  * used by the caller in an error message.
  */
 char *
-supports_compression(const pg_compress_specification compression_spec)
+pgdump_supports_compression(const pg_compress_specification compression_spec)
 {
 	const pg_compress_algorithm	algorithm = compression_spec.algorithm;
 	bool		supported = false;
diff --git a/src/bin/pg_dump/compress_io.h b/src/bin/pg_dump/compress_io.h
index bbde2693915..46815fa2ebe 100644
--- a/src/bin/pg_dump/compress_io.h
+++ b/src/bin/pg_dump/compress_io.h
@@ -21,7 +21,7 @@
 #define ZLIB_OUT_SIZE	4096
 #define ZLIB_IN_SIZE	4096
 
-extern char *supports_compression(const pg_compress_specification compression_spec);
+extern char *pgdump_supports_compression(const pg_compress_specification compression_spec);
 
 /*
  * Prototype for callback function used in writeData()
@@ -172,7 +172,7 @@ struct CompressFileHandle
 extern CompressFileHandle *InitCompressFileHandle(const pg_compress_specification compression_spec);
 
 /*
- * Initialize a compress file stream. Deffer the compression algorithm
+ * Initialize a compress file stream. Infe

Re: Add LZ4 compression in pg_dump

2023-02-26 Thread Tomas Vondra



On 2/25/23 06:02, Justin Pryzby wrote:
> I have some fixes (attached) and questions while polishing the patch for
> zstd compression.  The fixes are small and could be integrated with the
> patch for zstd, but could be applied independently.
> 
> - I'm unclear about get_error_func().  That's called in three places
>   from pg_backup_directory.c, after failures from write_func(), to
>   supply an compression-specific error message to pg_fatal().  But it's
>   not being used outside of directory format, nor for errors for other
>   function pointers, or even for all errors in write_func().  Is there
>   some reason why each compression method's write_func() shouldn't call
>   pg_fatal() directly, with its compression-specific message ?
> 

I think there are a couple more places that might/should call
get_error_func(). For example ahwrite() in pg_backup_archiver.c now
simply does

if (bytes_written != size * nmemb)
WRITE_ERROR_EXIT;

but perhaps it should call get_error_func() too. There are probably
other places that call write_func() and should use get_error_func().

> - I still think supports_compression() should be renamed, or made into a
>   static function in the necessary file.  The main reason is that it's
>   more clear what it indicates - whether compression is "implemented by
>   pgdump" and not whether compression is "supported by this postgres
>   build".  It also seems possible that we'd want to add a function
>   called something like supports_compression(), indicating whether the
>   algorithm is supported by the current build.  It'd be better if pgdump
>   didn't subjugate that name.
> 

If we choose to rename this to have pgdump_ prefix, fine with me. But I
don't think there's a realistic chance of conflict, as it's restricted
to pgdump header etc. And it's not part of an API, so I guess we could
rename that in the future if needed.

> - Finally, the "Nothing to do in the default case" comment comes from
>   Michael's commit 5e73a6048:
> 
> +   /*
> +* Custom and directory formats are compressed by default with gzip 
> when
> +* available, not the others.
> +*/
> +   if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
> +   !user_compression_defined)
> {
>  #ifdef HAVE_LIBZ
> -   if (archiveFormat == archCustom || archiveFormat == 
> archDirectory)
> -   compressLevel = Z_DEFAULT_COMPRESSION;
> -   else
> +   parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
> +
> &compression_spec);
> +#else
> +   /* Nothing to do in the default case */
>  #endif
> -   compressLevel = 0;
> }
> 
> 
> As the comment says: for -Fc and -Fd, the compression is set to zlib, if
> enabled, and when not otherwise specified by the user.
> 
> Before 5e73a6048, this set compressLevel=0 for -Fp and -Ft, *and* when
> zlib was unavailable.
> 
> But I'm not sure why there's now an empty "#else".  I also don't know
> what "the default case" refers to.
> 
> Maybe the best thing here is to move the preprocessor #if, since it's no
> longer in the middle of a runtime conditional:
> 
>  #ifdef HAVE_LIBZ
> +   if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
> +   !user_compression_defined)
> +   parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
> +&compression_spec);
>  #endif
> 
> ...but that elicits a warning about "variable set but not used"...
> 

Not sure, I need to think about this a bit.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-02-26 Thread Justin Pryzby
On Sat, Feb 25, 2023 at 08:05:53AM -0600, Justin Pryzby wrote:
> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
> > I have some fixes (attached) and questions while polishing the patch for
> > zstd compression.  The fixes are small and could be integrated with the
> > patch for zstd, but could be applied independently.
> 
> One more - WriteDataToArchiveGzip() says:

One more again.

The LZ4 path is using non-streaming mode, which compresses each block
without persistent state, giving poor compression for -Fc compared with
-Fp.  If the data is highly compressible, the difference can be orders
of magnitude.

$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fp |wc -c
12351763
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
21890708

That's not true for gzip:

$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fc |wc -c
2118869
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fp |wc -c
2115832

The function ought to at least use streaming mode, so each block/row
isn't compressioned in isolation.  003 is a simple patch to use
streaming mode, which improves the -Fc case:

$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
15178283

However, that still flushes the compression buffer, writing a block
header, for every row.  With a single-column table, pg_dump -Fc -Z lz4
still outputs ~10% *more* data than with no compression at all.  And
that's for compressible data.

$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z lz4 |wc -c
12890296
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z none |wc -c
11890296

I think this should use the LZ4F API with frames, which are buffered to
avoid outputting a header for every single row.  The LZ4F format isn't
compatible with the LZ4 format, so (unlike changing to the streaming
API) that's not something we can change in a bugfix release.  I consider
this an Opened Item.

With the LZ4F API in 004, -Fp and -Fc are essentially the same size
(like gzip).  (Oh, and the output is three times smaller, too.)

$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fp |wc -c
4155448
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fc |wc -c
4156548

-- 
Justin
>From 3a980f956bf314fb161fbf0a76f62ed0c2c35bfe Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 24 Feb 2023 14:07:09 -0600
Subject: [PATCH 1/4] f!fixes for LZ4

---
 src/bin/pg_dump/compress_gzip.c |  8 
 src/bin/pg_dump/compress_io.h   |  2 +-
 src/bin/pg_dump/compress_lz4.c  | 12 
 src/bin/pg_dump/pg_dump.c   |  2 --
 4 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index 0af65afeb4e..52f41c2e58c 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -123,17 +123,9 @@ WriteDataToArchiveGzip(ArchiveHandle *AH, CompressorState *cs,
 		gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
 		gzipcs->outsize = ZLIB_OUT_SIZE;
 
-		/*
-		 * A level of zero simply copies the input one block at the time. This
-		 * is probably not what the user wanted when calling this interface.
-		 */
-		if (cs->compression_spec.level == 0)
-			pg_fatal("requested to compress the archive yet no level was specified");
-
 		if (deflateInit(zp, cs->compression_spec.level) != Z_OK)
 			pg_fatal("could not initialize compression library: %s", zp->msg);
 
-		/* Just be paranoid - maybe End is called after Start, with no Write */
 		zp->next_out = gzipcs->outbuf;
 		zp->avail_out = gzipcs->outsize;
 	}
diff --git a/src/bin/pg_dump/compress_io.h b/src/bin/pg_dump/compress_io.h
index bbde2693915..cdb15951ea9 100644
--- a/src/bin/pg_dump/compress_io.h
+++ b/src/bin/pg_dump/compress_io.h
@@ -172,7 +172,7 @@ struct CompressFileHandle
 extern CompressFileHandle *InitCompressFileHandle(const pg_compress_specification compression_spec);
 
 /*
- * Initialize a compress file stream. Deffer the compression algorithm
+ * Initialize a compress file stream. Infer the compression algorithm
  * from 'path', either by examining its suffix or by appending the supported
  * suffixes in 'path'.
  */
diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index fe1014e6e77..7dacfeae469 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -105,12 +105,8 @@ EndCompressorLZ4(ArchiveHandle *AH, CompressorState *cs)
 	LZ4CompressorState *LZ4cs;
 
 	LZ4cs = (LZ4CompressorState *) cs->private_data;
-	if (LZ4cs)
-	{
-		pg_free(LZ4cs->outbuf);
-		pg_free(LZ4cs);
-		cs->private_data = NULL;
-	}
+	pg_free(LZ4cs->outbuf);
+	pg_free(LZ4cs);
 }
 
 
@@ -161,8 +157,8 @@ typedef struct LZ4File
 } LZ4File;
 
 /*
- * LZ4 equivalent to feof() or gzeof(). The end of file is reached if there
- * is no decompressed output in the overflow buffer and the end of the file
+ * LZ4 equivalent to feof() or gzeof().  Return true iff there is no
+ * decompressed output in the overflow buffer and the end of the backing file
  * is

Re: Add LZ4 compression in pg_dump

2023-02-27 Thread gkokolatos






--- Original Message ---
On Sunday, February 26th, 2023 at 3:59 PM, Tomas Vondra 
 wrote:


> 
> 
> 
> 
> On 2/25/23 06:02, Justin Pryzby wrote:
> 
> > I have some fixes (attached) and questions while polishing the patch for
> > zstd compression. The fixes are small and could be integrated with the
> > patch for zstd, but could be applied independently.
> > 
> > - I'm unclear about get_error_func(). That's called in three places
> > from pg_backup_directory.c, after failures from write_func(), to
> > supply an compression-specific error message to pg_fatal(). But it's
> > not being used outside of directory format, nor for errors for other
> > function pointers, or even for all errors in write_func(). Is there
> > some reason why each compression method's write_func() shouldn't call
> > pg_fatal() directly, with its compression-specific message ?
> 
> 
> I think there are a couple more places that might/should call
> get_error_func(). For example ahwrite() in pg_backup_archiver.c now
> simply does
> 
> if (bytes_written != size * nmemb)
> WRITE_ERROR_EXIT;
> 
> but perhaps it should call get_error_func() too. There are probably
> other places that call write_func() and should use get_error_func().

Agreed, calling get_error_func() would be preferable to a fatal error. It
should be the caller of the api who decides how to proceed.

> 
> > - I still think supports_compression() should be renamed, or made into a
> > static function in the necessary file. The main reason is that it's
> > more clear what it indicates - whether compression is "implemented by
> > pgdump" and not whether compression is "supported by this postgres
> > build". It also seems possible that we'd want to add a function
> > called something like supports_compression(), indicating whether the
> > algorithm is supported by the current build. It'd be better if pgdump
> > didn't subjugate that name.
> 
> 
> If we choose to rename this to have pgdump_ prefix, fine with me. But I
> don't think there's a realistic chance of conflict, as it's restricted
> to pgdump header etc. And it's not part of an API, so I guess we could
> rename that in the future if needed.

Agreed, it is very unrealistic that one will include that header file anywhere
but within pg_dump. Also. I think that adding a prefix, "pgdump", "pg_dump",
or similar does not add value and subtracts readability. 

> 
> > - Finally, the "Nothing to do in the default case" comment comes from
> > Michael's commit 5e73a6048:
> > 
> > + /*
> > + * Custom and directory formats are compressed by default with gzip when
> > + * available, not the others.
> > + /
> > + if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
> > + !user_compression_defined)
> > {
> > #ifdef HAVE_LIBZ
> > - if (archiveFormat == archCustom || archiveFormat == archDirectory)
> > - compressLevel = Z_DEFAULT_COMPRESSION;
> > - else
> > + parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
> > + &compression_spec);
> > +#else
> > + / Nothing to do in the default case */
> > #endif
> > - compressLevel = 0;
> > }
> > 
> > As the comment says: for -Fc and -Fd, the compression is set to zlib, if
> > enabled, and when not otherwise specified by the user.
> > 
> > Before 5e73a6048, this set compressLevel=0 for -Fp and -Ft, and when
> > zlib was unavailable.
> > 
> > But I'm not sure why there's now an empty "#else". I also don't know
> > what "the default case" refers to.
> > 
> > Maybe the best thing here is to move the preprocessor #if, since it's no
> > longer in the middle of a runtime conditional:
> > 
> > #ifdef HAVE_LIBZ
> > + if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
> > + !user_compression_defined)
> > + parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
> > + &compression_spec);
> > #endif
> > 
> > ...but that elicits a warning about "variable set but not used"...
> 
> 
> Not sure, I need to think about this a bit.

Not having warnings is preferable, isn't it? I can understand the confusion
on the message though. Maybe a phrasing like:
/* Nothing to do for the default case when LIBZ is not available */
is easier to understand.

Cheers,
//Georgios

> 
> regards
> 
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-02-27 Thread gkokolatos






--- Original Message ---
On Saturday, February 25th, 2023 at 3:05 PM, Justin Pryzby 
 wrote:


> 
> 
> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
> 
> > I have some fixes (attached) and questions while polishing the patch for
> > zstd compression. The fixes are small and could be integrated with the
> > patch for zstd, but could be applied independently.


Please find some comments on the rest of the fixes patch that Tomas has not
commented on.

can be compressed with the gzip or
-   lz4tool.
+   lz4 tools.

+1

 The compression method can be set to gzip or
-lz4 or none for no compression.
+lz4, or none for no compression.

I am not a native English speaker. Yet I think that if one adds commas
in one of the options, then one should add commas to all the options.
Namely, the aboveis missing a comma between gzip and lz4. However I
think that not having any commas still works grammatically and
syntactically.

-   /*
-* A level of zero simply copies the input one block at the 
time. This
-* is probably not what the user wanted when calling this 
interface.
-*/
-   if (cs->compression_spec.level == 0)
-   pg_fatal("requested to compress the archive yet no 
level was specified");


I disagree with change. WriteDataToArchiveGzip() is far away from
what ever the code in pg_dump.c is doing. Any non valid values for
level will emit an error in when the proper gzip/zlib code is
called. A zero value however, will not emit such error. Having the
extra check there is a future proof guarantee in a very low cost.
Furthermore, it quickly informs the reader of the code about that
specific value helping with readability and comprehension.

If any change is required, something for which I vote strongly
against, I would at least recommend to replace it with an
assertion.

- * Initialize a compress file stream. Deffer the compression algorithm
+ * Initialize a compress file stream. Infer the compression algorithm

:+1:

-   # Skip command-level tests for gzip if there is no support for it.
+   # Skip command-level tests for gzip/lz4 if they're not supported.

We will be back at that again soon. Maybe change to:

Skip command-level test for unsupported compression methods

To include everything.


-   ($pgdump_runs{$run}->{compile_option} eq 'gzip' && 
!$supports_gzip) ||
-   ($pgdump_runs{$run}->{compile_option} eq 'lz4' && 
!$supports_lz4))
+   (($pgdump_runs{$run}->{compile_option} eq 'gzip' && 
!$supports_gzip) ||
+   ($pgdump_runs{$run}->{compile_option} eq 'lz4' && 
!$supports_lz4)))

Good catch, :+1:

Cheers,
//Georgios

> --
> Justin




Re: Add LZ4 compression in pg_dump

2023-02-28 Thread Justin Pryzby
On Thu, Feb 23, 2023 at 09:24:46PM +0100, Tomas Vondra wrote:
> On 2/23/23 16:26, Tomas Vondra wrote:
> > Thanks for v30 with the updated commit messages. I've pushed 0001 after
> > fixing a comment typo and removing (I think) an unnecessary change in an
> > error message.
> > 
> > I'll give the buildfarm a bit of time before pushing 0002 and 0003.
> > 
> 
> I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
> and marked the CF entry as committed. Thanks for the patch!

I found that e9960732a broke writing of empty gzip-compressed data,
specifically LOs.  pg_dump succeeds, but then the restore fails:

postgres=# SELECT lo_create(1234);
lo_create | 1234

$ time ./src/bin/pg_dump/pg_dump -h /tmp -d postgres -Fc 
|./src/bin/pg_dump/pg_restore -f /dev/null -v 
pg_restore: implied data-only restore
pg_restore: executing BLOB 1234
pg_restore: processing BLOBS
pg_restore: restoring large object with OID 1234
pg_restore: error: could not uncompress data: (null)

The inline patch below fixes it, but you won't be able to apply it
directly, as it's on top of other patches which rename the functions
back to "Zlib" and rearranges the functions to their original order, to
allow running:

git diff --diff-algorithm=minimal -w e9960732a~:./src/bin/pg_dump/compress_io.c 
./src/bin/pg_dump/compress_gzip.c

The current function order avoids 3 lines of declarations, but it's
obviously pretty useful to be able to run that diff command.  I already
argued for not calling the functions "Gzip" on the grounds that the name
was inaccurate.

I'd want to create an empty large object in src/test/sql/largeobject.sql
to exercise this tested during pgupgrade.  But unfortunately that
doesn't use -Fc, so this isn't hit.  Empty input is an important enough
test case to justify a tap test, if there's no better way.

diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index f3f5e87c9a8..68f3111b2fe 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -55,6 +55,32 @@ InitCompressorZlib(CompressorState *cs,
gzipcs = (ZlibCompressorState *) 
pg_malloc0(sizeof(ZlibCompressorState));
 
cs->private_data = gzipcs;
+
+   if (cs->writeF)
+   {
+   z_streamp   zp;
+   zp = gzipcs->zp = (z_streamp) pg_malloc0(sizeof(z_stream));
+   zp->zalloc = Z_NULL;
+   zp->zfree = Z_NULL;
+   zp->opaque = Z_NULL;
+
+   /*
+* outsize is the buffer size we tell zlib it can output to.  We
+* actually allocate one extra byte because some routines want 
to append a
+* trailing zero byte to the zlib output.
+*/
+
+   gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
+   gzipcs->outsize = ZLIB_OUT_SIZE;
+
+   if (deflateInit(gzipcs->zp, cs->compression_spec.level) != Z_OK)
+   pg_fatal("could not initialize compression library: %s",
+   zp->msg);
+
+   /* Just be paranoid - maybe End is called after Start, with no 
Write */
+   zp->next_out = gzipcs->outbuf;
+   zp->avail_out = gzipcs->outsize;
+   }
 }
 
 static void
@@ -63,7 +89,7 @@ EndCompressorZlib(ArchiveHandle *AH, CompressorState *cs)
ZlibCompressorState *gzipcs = (ZlibCompressorState *) cs->private_data;
z_streamp   zp;
 
-   if (gzipcs->zp)
+   if (cs->writeF != NULL)
{
zp = gzipcs->zp;
zp->next_in = NULL;
@@ -131,29 +157,6 @@ WriteDataToArchiveZlib(ArchiveHandle *AH, CompressorState 
*cs,
   const void *data, size_t dLen)
 {
ZlibCompressorState *gzipcs = (ZlibCompressorState *) cs->private_data;
-   z_streamp   zp;
-
-   if (!gzipcs->zp)
-   {
-   zp = gzipcs->zp = (z_streamp) pg_malloc(sizeof(z_stream));
-   zp->zalloc = Z_NULL;
-   zp->zfree = Z_NULL;
-   zp->opaque = Z_NULL;
-
-   /*
-* outsize is the buffer size we tell zlib it can output to.  We
-* actually allocate one extra byte because some routines want 
to
-* append a trailing zero byte to the zlib output.
-*/
-   gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
-   gzipcs->outsize = ZLIB_OUT_SIZE;
-
-   if (deflateInit(zp, cs->compression_spec.level) != Z_OK)
-   pg_fatal("could not initialize compression library: 
%s", zp->msg);
-
-   zp->next_out = gzipcs->outbuf;
-   zp->avail_out = gzipcs->outsize;
-   }
 
gzipcs->zp->next_in = (void *) unconstify(void *, data);
gzipcs->zp->avail_in = dLen;
>From 1c707279596f3cffde9c97b450dcbef0b6ddbd94 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 26 Feb 2023 17:12:03 -0600
Subject: [PATCH

Re: Add LZ4 compression in pg_dump

2023-02-28 Thread Michael Paquier
On Tue, Feb 28, 2023 at 05:58:34PM -0600, Justin Pryzby wrote:
> I found that e9960732a broke writing of empty gzip-compressed data,
> specifically LOs.  pg_dump succeeds, but then the restore fails:

The number of issues you have been reporting here begins to worries
me..  How many of them have you found?  Is it right to assume that all
of them have basically 03d02f5 as oldest origin point?
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-03-01 Thread gkokolatos





--- Original Message ---
On Wednesday, March 1st, 2023 at 12:58 AM, Justin Pryzby  
wrote:



> I found that e9960732a broke writing of empty gzip-compressed data,
> specifically LOs. pg_dump succeeds, but then the restore fails:
> 
> postgres=# SELECT lo_create(1234);
> lo_create | 1234
> 
> $ time ./src/bin/pg_dump/pg_dump -h /tmp -d postgres -Fc 
> |./src/bin/pg_dump/pg_restore -f /dev/null -v
> pg_restore: implied data-only restore
> pg_restore: executing BLOB 1234
> pg_restore: processing BLOBS
> pg_restore: restoring large object with OID 1234
> pg_restore: error: could not uncompress data: (null)
> 

Thank you for looking. This was an untested case.

> The inline patch below fixes it, but you won't be able to apply it
> directly, as it's on top of other patches which rename the functions
> back to "Zlib" and rearranges the functions to their original order, to
> allow running:
> 
> git diff --diff-algorithm=minimal -w 
> e9960732a~:./src/bin/pg_dump/compress_io.c ./src/bin/pg_dump/compress_gzip.c
> 

Please find a patch attached that can be applied directly.

> The current function order avoids 3 lines of declarations, but it's
> obviously pretty useful to be able to run that diff command. I already
> argued for not calling the functions "Gzip" on the grounds that the name
> was inaccurate.

I have no idea why we are back on the naming issue. I stand by the name
because in my humble opinion helps the code reader. There is a certain
uniformity when the compression_spec.algorithm and the compressor
functions match as the following code sample shows.

if (compression_spec.algorithm == PG_COMPRESSION_NONE) 
InitCompressorNone(cs, compression_spec);
else if (compression_spec.algorithm == PG_COMPRESSION_GZIP)
InitCompressorGzip(cs, compression_spec);
else if (compression_spec.algorithm == PG_COMPRESSION_LZ4)
InitCompressorLZ4(cs, compression_spec);
 
When the reader wants to see what happens when the PG_COMPRESSION_XXX
is set, has to simply search for the XXX part. I think that this is
justification enough for the use of the names.

> 
> I'd want to create an empty large object in src/test/sql/largeobject.sql
> to exercise this tested during pgupgrade. But unfortunately that
> doesn't use -Fc, so this isn't hit. Empty input is an important enough
> test case to justify a tap test, if there's no better way.

Please find in the attached a test case that exercises this codepath.

Cheers,
//GeorgiosFrom 95450f0e7e90f0a1a3cdfc12c760a9520bd2995f Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Wed, 1 Mar 2023 12:42:32 +
Subject: [PATCH vX] Properly gzip compress when no data is available

When creating dumps with the Compressor API, it is possible to only call the
Allocate and End compressor functions without ever writing any data. The gzip
implementation wrongly assumed that the write function would be called and
defered the initialization of the internal compression system for the first
write call. The End call would only finilize the internal compression system if
that was ever initialized.

The problem with that approach is that it violated the expectations of the
internal compression system during decompression.

This commit initializes the internal compression system during the Allocate
call, under the condition that a write function was provided by the caller.
Given that decompression does not need to keep track of any state, the
compressor's private_data member is now populated only during compression.

Tests are added to cover this scenario.

Initial patch by Justin Pruzby.

Reported-by: Justin Pryzby
---
 src/bin/pg_dump/compress_gzip.c  | 118 +--
 src/bin/pg_dump/t/002_pg_dump.pl |  27 ++-
 2 files changed, 91 insertions(+), 54 deletions(-)

diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index 0af65afeb4..f5d32cf059 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -32,9 +32,48 @@ typedef struct GzipCompressorState
 	size_t		outsize;
 } GzipCompressorState;
 
+
 /* Private routines that support gzip compressed data I/O */
 static void
-DeflateCompressorGzip(ArchiveHandle *AH, CompressorState *cs, bool flush)
+DeflateCompressorInit(CompressorState *cs)
+{
+	GzipCompressorState *gzipcs;
+	z_streamp	zp;
+
+	gzipcs = (GzipCompressorState *) pg_malloc0(sizeof(GzipCompressorState));
+	zp = gzipcs->zp = (z_streamp) pg_malloc(sizeof(z_stream));
+	zp->zalloc = Z_NULL;
+	zp->zfree = Z_NULL;
+	zp->opaque = Z_NULL;
+
+	/*
+	 * outsize is the buffer size we tell zlib it can output to.  We
+	 * actually allocate one extra byte because some routines want to
+	 * append a trailing zero byte to the zlib output.
+	 */
+	gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
+	gzipcs->outsize = ZLIB_OUT_SIZE;
+
+	/*
+	 * A level of zero simply copies the input one block at the time. This
+	 * is probably n

Re: Add LZ4 compression in pg_dump

2023-03-01 Thread Tomas Vondra
On 3/1/23 08:24, Michael Paquier wrote:
> On Tue, Feb 28, 2023 at 05:58:34PM -0600, Justin Pryzby wrote:
>> I found that e9960732a broke writing of empty gzip-compressed data,
>> specifically LOs.  pg_dump succeeds, but then the restore fails:
> 
> The number of issues you have been reporting here begins to worries
> me..  How many of them have you found?  Is it right to assume that all
> of them have basically 03d02f5 as oldest origin point?

AFAICS a lot of the issues are more a discussion about wording in a
couple places, whether it's nicer to do A or B, name the functions
differently or what.

I'm aware of three genuine issues that I intend to fix shortly:

1) incorrect "if" condition in a TAP test

2) failure when compressing empty LO (which we had no test for)

3) change in handling "compression level = 0" (which I believe should be
made to behave like before)


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-01 Thread Tomas Vondra



On 3/1/23 14:39, gkokola...@pm.me wrote:
> 
> 
> 
> 
> 
> --- Original Message ---
> On Wednesday, March 1st, 2023 at 12:58 AM, Justin Pryzby 
>  wrote:
> 
> 
> 
>> I found that e9960732a broke writing of empty gzip-compressed data,
>> specifically LOs. pg_dump succeeds, but then the restore fails:
>>
>> postgres=# SELECT lo_create(1234);
>> lo_create | 1234
>>
>> $ time ./src/bin/pg_dump/pg_dump -h /tmp -d postgres -Fc 
>> |./src/bin/pg_dump/pg_restore -f /dev/null -v
>> pg_restore: implied data-only restore
>> pg_restore: executing BLOB 1234
>> pg_restore: processing BLOBS
>> pg_restore: restoring large object with OID 1234
>> pg_restore: error: could not uncompress data: (null)
>>
> 
> Thank you for looking. This was an untested case.
> 

Yeah :-(

>> The inline patch below fixes it, but you won't be able to apply it
>> directly, as it's on top of other patches which rename the functions
>> back to "Zlib" and rearranges the functions to their original order, to
>> allow running:
>>
>> git diff --diff-algorithm=minimal -w 
>> e9960732a~:./src/bin/pg_dump/compress_io.c ./src/bin/pg_dump/compress_gzip.c
>>
> 
> Please find a patch attached that can be applied directly.
> 
>> The current function order avoids 3 lines of declarations, but it's
>> obviously pretty useful to be able to run that diff command. I already
>> argued for not calling the functions "Gzip" on the grounds that the name
>> was inaccurate.
> 
> I have no idea why we are back on the naming issue. I stand by the name
> because in my humble opinion helps the code reader. There is a certain
> uniformity when the compression_spec.algorithm and the compressor
> functions match as the following code sample shows.
> 
> if (compression_spec.algorithm == PG_COMPRESSION_NONE) 
> InitCompressorNone(cs, compression_spec);
> else if (compression_spec.algorithm == PG_COMPRESSION_GZIP)
> InitCompressorGzip(cs, compression_spec);
> else if (compression_spec.algorithm == PG_COMPRESSION_LZ4)
> InitCompressorLZ4(cs, compression_spec);
>  
> When the reader wants to see what happens when the PG_COMPRESSION_XXX
> is set, has to simply search for the XXX part. I think that this is
> justification enough for the use of the names.
> 

I don't recall the previous discussion about the naming, but I'm not
sure why would it be inaccurate. We call it 'gzip' pretty much
everywhere, and I agree with Georgios there's it helps to make this
consistent with the PG_COMPRESSION_ stuff.

The one thing that concerned me while reviewing it earlier was that it
might make the backpatcheing harder. But that's mostly irrelevant due to
all the other changes I think.

>>
>> I'd want to create an empty large object in src/test/sql/largeobject.sql
>> to exercise this tested during pgupgrade. But unfortunately that
>> doesn't use -Fc, so this isn't hit. Empty input is an important enough
>> test case to justify a tap test, if there's no better way.
> 
> Please find in the attached a test case that exercises this codepath.
> 

Thanks. That seems correct to me, but I find it somewhat confusing,
because we now have

 DeflateCompressorInit vs. InitCompressorGzip

 DeflateCompressorEnd vs. EndCompressorGzip

 DeflateCompressorData - The name doesn't really say what it does (would
 be better to have a verb in there, I think).

I wonder if we can make this somehow clearer?

Also, InitCompressorGzip says this:

   /*
* If the caller has defined a write function, prepare the necessary
* state. Avoid initializing during the first write call, because End
* may be called without ever writing any data.
*/
if (cs->writeF)
DeflateCompressorInit(cs);

Does it actually make sense to not have writeF defined in some cases?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-01 Thread Tomas Vondra



On 2/27/23 15:56, gkokola...@pm.me wrote:
> 
> 
> 
> 
> 
> --- Original Message ---
> On Saturday, February 25th, 2023 at 3:05 PM, Justin Pryzby 
>  wrote:
> 
> 
>>
>>
>> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
>>
>>> I have some fixes (attached) and questions while polishing the patch for
>>> zstd compression. The fixes are small and could be integrated with the
>>> patch for zstd, but could be applied independently.
> 
> 
> Please find some comments on the rest of the fixes patch that Tomas has not
> commented on.
> 
> can be compressed with the gzip or
> -   lz4tool.
> +   lz4 tools.
> 
> +1
> 
>  The compression method can be set to gzip or
> -lz4 or none for no compression.
> +lz4, or none for no 
> compression.
> 
> I am not a native English speaker. Yet I think that if one adds commas
> in one of the options, then one should add commas to all the options.
> Namely, the aboveis missing a comma between gzip and lz4. However I
> think that not having any commas still works grammatically and
> syntactically.
> 

I pushed a fix with most of these wording changes. As for this comma, I
believe the correct style is

   a, b, or c

At least that's what the other places in the pg_dump.sgml file do.

> -   ($pgdump_runs{$run}->{compile_option} eq 'gzip' && 
> !$supports_gzip) ||
> -   ($pgdump_runs{$run}->{compile_option} eq 'lz4' && 
> !$supports_lz4))
> +   (($pgdump_runs{$run}->{compile_option} eq 'gzip' && 
> !$supports_gzip) ||
> +   ($pgdump_runs{$run}->{compile_option} eq 'lz4' && 
> !$supports_lz4)))
> 

Pushed a fix for this too.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-01 Thread Tomas Vondra



On 2/25/23 15:05, Justin Pryzby wrote:
> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
>> I have some fixes (attached) and questions while polishing the patch for
>> zstd compression.  The fixes are small and could be integrated with the
>> patch for zstd, but could be applied independently.
> 
> One more - WriteDataToArchiveGzip() says:
> 
> +   if (cs->compression_spec.level == 0)
> +   pg_fatal("requested to compress the archive yet no level was 
> specified");
> 
> That was added at e9960732a.  
> 
> But if you specify gzip:0, the compression level is already enforced by
> validate_compress_specification(), before hitting gzip.c:
> 
> | pg_dump: error: invalid compression specification: compression algorithm 
> "gzip" expects a compression level between 1 and 9 (default at -1)
> 
> 5e73a6048 intended that to work as before, and you *can* specify -Z0:
> 
> The change is backward-compatible, hence specifying only an integer
> leads to no compression for a level of 0 and gzip compression when the
> level is greater than 0.
> 
> $ time ./src/bin/pg_dump/pg_dump -h /tmp regression -t int8_tbl -Fp 
> --compress 0 |file -
> /dev/stdin: ASCII text
> 

FWIW I agree we should make this backwards-compatible - accept "0" and
treat it as no compression.

Georgios, can you prepare a patch doing that?


regards
-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-01 Thread Tomas Vondra



On 2/27/23 05:49, Justin Pryzby wrote:
> On Sat, Feb 25, 2023 at 08:05:53AM -0600, Justin Pryzby wrote:
>> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
>>> I have some fixes (attached) and questions while polishing the patch for
>>> zstd compression.  The fixes are small and could be integrated with the
>>> patch for zstd, but could be applied independently.
>>
>> One more - WriteDataToArchiveGzip() says:
> 
> One more again.
> 
> The LZ4 path is using non-streaming mode, which compresses each block
> without persistent state, giving poor compression for -Fc compared with
> -Fp.  If the data is highly compressible, the difference can be orders
> of magnitude.
> 
> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fp |wc -c
> 12351763
> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
> 21890708
> 
> That's not true for gzip:
> 
> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fc |wc -c
> 2118869
> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fp |wc -c
> 2115832
> 
> The function ought to at least use streaming mode, so each block/row
> isn't compressioned in isolation.  003 is a simple patch to use
> streaming mode, which improves the -Fc case:
> 
> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
> 15178283
> 
> However, that still flushes the compression buffer, writing a block
> header, for every row.  With a single-column table, pg_dump -Fc -Z lz4
> still outputs ~10% *more* data than with no compression at all.  And
> that's for compressible data.
> 
> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z lz4 |wc -c
> 12890296
> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z none |wc -c
> 11890296
> 
> I think this should use the LZ4F API with frames, which are buffered to
> avoid outputting a header for every single row.  The LZ4F format isn't
> compatible with the LZ4 format, so (unlike changing to the streaming
> API) that's not something we can change in a bugfix release.  I consider
> this an Opened Item.
> 
> With the LZ4F API in 004, -Fp and -Fc are essentially the same size
> (like gzip).  (Oh, and the output is three times smaller, too.)
> 
> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fp |wc -c
> 4155448
> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fc |wc -c
> 4156548
> 

Thanks. Those are definitely interesting improvements/optimizations!

I suggest we track them as a separate patch series - please add them to
the CF app (I guess you'll have to add them to 2023-07 at this point,
but we can get them in, I think).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-01 Thread Justin Pryzby
On Wed, Mar 01, 2023 at 01:39:14PM +, gkokola...@pm.me wrote:
> On Wednesday, March 1st, 2023 at 12:58 AM, Justin Pryzby 
>  wrote:
> 
> > The current function order avoids 3 lines of declarations, but it's
> > obviously pretty useful to be able to run that diff command. I already
> > argued for not calling the functions "Gzip" on the grounds that the name
> > was inaccurate.
> 
> I have no idea why we are back on the naming issue. I stand by the name
> because in my humble opinion helps the code reader. There is a certain
> uniformity when the compression_spec.algorithm and the compressor
> functions match as the following code sample shows.

I mentioned that it's because this allows usefully running "diff"
against the previous commits.

> if (compression_spec.algorithm == PG_COMPRESSION_NONE) 
> InitCompressorNone(cs, compression_spec);
> else if (compression_spec.algorithm == PG_COMPRESSION_GZIP)
> InitCompressorGzip(cs, compression_spec);
> else if (compression_spec.algorithm == PG_COMPRESSION_LZ4)
> InitCompressorLZ4(cs, compression_spec);
>  
> When the reader wants to see what happens when the PG_COMPRESSION_XXX
> is set, has to simply search for the XXX part. I think that this is
> justification enough for the use of the names.

You're right about that.

But (with the exception of InitCompressorGzip), I'm referring to the
naming of internal functions, static to gzip.c, so renaming can't be
said to cause a loss of clarity.

> > I'd want to create an empty large object in src/test/sql/largeobject.sql
> > to exercise this tested during pgupgrade. But unfortunately that
> > doesn't use -Fc, so this isn't hit. Empty input is an important enough
> > test case to justify a tap test, if there's no better way.
> 
> Please find in the attached a test case that exercises this codepath.

Thanks for writing it.

This patch could be an opportunity to improve the "diff" output, without
renaming anything.

The old order of functions was:
-InitCompressorZlib
-EndCompressorZlib
-DeflateCompressorZlib
-WriteDataToArchiveZlib
-ReadDataFromArchiveZlib

If you put DeflateCompressorEnd immediately after DeflateCompressorInit,
diff works nicely.  You'll have to add at least one declaration, which
seems very worth it.

-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-03-02 Thread gkokolatos





--- Original Message ---
On Wednesday, March 1st, 2023 at 5:20 PM, Tomas Vondra 
 wrote:


> 
> 
> 
> 
> On 2/25/23 15:05, Justin Pryzby wrote:
> 
> > On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
> > 
> > > I have some fixes (attached) and questions while polishing the patch for
> > > zstd compression. The fixes are small and could be integrated with the
> > > patch for zstd, but could be applied independently.
> > 
> > One more - WriteDataToArchiveGzip() says:
> > 
> > + if (cs->compression_spec.level == 0)
> > + pg_fatal("requested to compress the archive yet no level was specified");
> > 
> > That was added at e9960732a.
> > 
> > But if you specify gzip:0, the compression level is already enforced by
> > validate_compress_specification(), before hitting gzip.c:
> > 
> > | pg_dump: error: invalid compression specification: compression algorithm 
> > "gzip" expects a compression level between 1 and 9 (default at -1)
> > 
> > 5e73a6048 intended that to work as before, and you can specify -Z0:
> > 
> > The change is backward-compatible, hence specifying only an integer
> > leads to no compression for a level of 0 and gzip compression when the
> > level is greater than 0.
> > 
> > $ time ./src/bin/pg_dump/pg_dump -h /tmp regression -t int8_tbl -Fp 
> > --compress 0 |file -
> > /dev/stdin: ASCII text
> 
> 
> FWIW I agree we should make this backwards-compatible - accept "0" and
> treat it as no compression.
> 
> Georgios, can you prepare a patch doing that?

Please find a patch attached. However I am a bit at a loss, the backwards
compatible behaviour has not changed. Passing -Z0/--compress=0 does produce
a non compressed output. So I am not really certain as to what broke and
needs fixing.

What commit 5e73a6048 did fail to do, is test the backwards compatible
behaviour. The attached amends it.

Cheers,
//Georgios

> 
> 
> regards
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL CompanyFrom 99c2da94ecbeacf997270dd26fc5c0a63ffcedd4 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Thu, 2 Mar 2023 16:10:03 +
Subject: [PATCH vX] Add test for backwards compatible -Z0 option in pg_dump

Commit 5e73a6048 expanded pg_dump with the ability to use compression
specifications. A commonly shared code which lets the user control in an
extended way the method, level, and other details of a desired compression.

Prior to this commit, pg_dump could only accept an integer for the
-Z/--compress option. An integer value of 0 had the special meaning of non
compression. Commit 5e73a6048 respected and maintained this behaviour for
backwards compatibility.

However no tests covered this scenario. The current commit adds coverage for
this case.
---
 src/bin/pg_dump/t/002_pg_dump.pl | 13 +
 1 file changed, 13 insertions(+)

diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index 7b5a6e190c..ec7aaab884 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -76,6 +76,19 @@ my %pgdump_runs = (
 		],
 	},
 
+	# Verify that the backwards compatible option -Z0 produces
+	# non compressed output
+	compression_none_plain => {
+		test_key   => 'compression',
+		# Enforce this test when compile option is available
+		compile_option => 'gzip',
+		dump_cmd   => [
+			'pg_dump',  '--format=plain',
+			'-Z0', "--file=$tempdir/compression_none_plain.sql",
+			'postgres',
+		],
+	},
+
 	# Do not use --no-sync to give test coverage for data sync.
 	compression_gzip_custom => {
 		test_key   => 'compression',
-- 
2.34.1



Re: Add LZ4 compression in pg_dump

2023-03-02 Thread Justin Pryzby
On Wed, Mar 01, 2023 at 05:20:05PM +0100, Tomas Vondra wrote:
> On 2/25/23 15:05, Justin Pryzby wrote:
> > On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
> >> I have some fixes (attached) and questions while polishing the patch for
> >> zstd compression.  The fixes are small and could be integrated with the
> >> patch for zstd, but could be applied independently.
> > 
> > One more - WriteDataToArchiveGzip() says:
> > 
> > +   if (cs->compression_spec.level == 0)
> > +   pg_fatal("requested to compress the archive yet no level was 
> > specified");
> > 
> > That was added at e9960732a.  
> > 
> > But if you specify gzip:0, the compression level is already enforced by
> > validate_compress_specification(), before hitting gzip.c:
> > 
> > | pg_dump: error: invalid compression specification: compression algorithm 
> > "gzip" expects a compression level between 1 and 9 (default at -1)
> > 
> > 5e73a6048 intended that to work as before, and you *can* specify -Z0:
> > 
> > The change is backward-compatible, hence specifying only an integer
> > leads to no compression for a level of 0 and gzip compression when the
> > level is greater than 0.
> > 
> > $ time ./src/bin/pg_dump/pg_dump -h /tmp regression -t int8_tbl -Fp 
> > --compress 0 |file -
> > /dev/stdin: ASCII text
> 
> FWIW I agree we should make this backwards-compatible - accept "0" and
> treat it as no compression.
> 
> Georgios, can you prepare a patch doing that?

I think maybe Tomas misunderstood.  What I was trying to say is that -Z
0 *is* accepted to mean no compression.  This part wasn't quoted, but I
said:

> Right now, I think that pg_fatal in gzip.c is dead code - that was first
> added in the patch version sent on 21 Dec 2022.

If you run the diff command that I've been talking about, you'll see
that InitCompressorZlib was almost unchanged - e9960732 is essentially a
refactoring.  I don't think it's desirable to add a pg_fatal() in a
function that's otherwise nearly-unchanged.  The fact that it's
nearly-unchanged is a good thing: it simplifies reading of what changed.
If someone wants to add a pg_fatal() in that code path, it'd be better
done in its own commit, with a separate message explaining the change.

If you insist on changing anything here, you might add an assertion (as
you said earlier) along with a comment like
/* -Z 0 uses the "None" compressor rather than zlib with no compression */

-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-03-02 Thread Tomas Vondra



On 3/2/23 18:18, Justin Pryzby wrote:
> On Wed, Mar 01, 2023 at 05:20:05PM +0100, Tomas Vondra wrote:
>> On 2/25/23 15:05, Justin Pryzby wrote:
>>> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
 I have some fixes (attached) and questions while polishing the patch for
 zstd compression.  The fixes are small and could be integrated with the
 patch for zstd, but could be applied independently.
>>>
>>> One more - WriteDataToArchiveGzip() says:
>>>
>>> +   if (cs->compression_spec.level == 0)
>>> +   pg_fatal("requested to compress the archive yet no level was 
>>> specified");
>>>
>>> That was added at e9960732a.  
>>>
>>> But if you specify gzip:0, the compression level is already enforced by
>>> validate_compress_specification(), before hitting gzip.c:
>>>
>>> | pg_dump: error: invalid compression specification: compression algorithm 
>>> "gzip" expects a compression level between 1 and 9 (default at -1)
>>>
>>> 5e73a6048 intended that to work as before, and you *can* specify -Z0:
>>>
>>> The change is backward-compatible, hence specifying only an integer
>>> leads to no compression for a level of 0 and gzip compression when the
>>> level is greater than 0.
>>>
>>> $ time ./src/bin/pg_dump/pg_dump -h /tmp regression -t int8_tbl -Fp 
>>> --compress 0 |file -
>>> /dev/stdin: ASCII text
>>
>> FWIW I agree we should make this backwards-compatible - accept "0" and
>> treat it as no compression.
>>
>> Georgios, can you prepare a patch doing that?
> 
> I think maybe Tomas misunderstood.  What I was trying to say is that -Z
> 0 *is* accepted to mean no compression.  This part wasn't quoted, but I
> said:
> 

Ah, I see. Well, I also tried but with "-Z gzip:0" (and not -Z 0), and
that does fail:

  error: invalid compression specification: compression algorithm "gzip"
  expects a compression level between 1 and 9 (default at -1)

It's a bit weird these two cases behave differently, when both translate
to the same default compression method (gzip).

>> Right now, I think that pg_fatal in gzip.c is dead code - that was first
>> added in the patch version sent on 21 Dec 2022.
> 
> If you run the diff command that I've been talking about, you'll see
> that InitCompressorZlib was almost unchanged - e9960732 is essentially a
> refactoring.  I don't think it's desirable to add a pg_fatal() in a
> function that's otherwise nearly-unchanged.  The fact that it's
> nearly-unchanged is a good thing: it simplifies reading of what changed.
> If someone wants to add a pg_fatal() in that code path, it'd be better
> done in its own commit, with a separate message explaining the change.
> 
> If you insist on changing anything here, you might add an assertion (as
> you said earlier) along with a comment like
> /* -Z 0 uses the "None" compressor rather than zlib with no compression */
> 

Yeah, a comment would be helpful.

Also, after thinking about it a bit more maybe having the unreachable
pg_fatal() is not a good thing, as it will just confuse people (I'd
certainly assume having such check means there's a way in which case it
might trigger.). Maybe an assert would be better?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-07 Thread Justin Pryzby
On Wed, Mar 01, 2023 at 04:52:49PM +0100, Tomas Vondra wrote:
> Thanks. That seems correct to me, but I find it somewhat confusing,
> because we now have
> 
>  DeflateCompressorInit vs. InitCompressorGzip
> 
>  DeflateCompressorEnd vs. EndCompressorGzip
> 
>  DeflateCompressorData - The name doesn't really say what it does (would
>  be better to have a verb in there, I think).
> 
> I wonder if we can make this somehow clearer?

To move things along, I updated Georgios' patch:

Rename DeflateCompressorData() to DeflateCompressorCommon();
Rearrange functions to their original order allowing a cleaner diff to the 
prior code;
Change pg_fatal() to an assertion+comment;
Update the commit message and fix a few typos;

> Also, InitCompressorGzip says this:
> 
>/*
> * If the caller has defined a write function, prepare the necessary
> * state. Avoid initializing during the first write call, because End
> * may be called without ever writing any data.
> */
> if (cs->writeF)
> DeflateCompressorInit(cs);
>
> Does it actually make sense to not have writeF defined in some cases?

InitCompressor is being called for either reading or writing, either of
which could be null:

src/bin/pg_dump/pg_backup_custom.c: ctx->cs = 
AllocateCompressor(AH->compression_spec,
src/bin/pg_dump/pg_backup_custom.c- 
 NULL,
src/bin/pg_dump/pg_backup_custom.c- 
 _CustomWriteFunc);
--
src/bin/pg_dump/pg_backup_custom.c: cs = 
AllocateCompressor(AH->compression_spec,
src/bin/pg_dump/pg_backup_custom.c- 
_CustomReadFunc, NULL);

It's confusing that the comment says "Avoid initializing...".  What it
really means is "Initialize eagerly...".  But that makes more sense in
the context of the commit message for this bugfix than in a comment.  So
I changed that too.

+   /* If deflation was initialized, finalize it */ 
 
+   if (cs->private_data)   
 
+   DeflateCompressorEnd(AH, cs);   
 

Maybe it'd be more clear if this used "if (cs->writeF)", like in the
init function ?

-- 
Justin
>From 5c027aa86e8591db140093c48a58aafba7a6c28c Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Wed, 1 Mar 2023 12:42:32 +
Subject: [PATCH] pg_dump: fix gzip compression of empty data

When creating dumps with the Compressor API, it is possible to only call
the Allocate and End compressor functions without ever writing any data.
Since e9960732a, the gzip implementation wrongly assumed that the write
function would always be called and deferred the initialization of the
internal compression system until the first write call.

The problem with that approach is that the End call would not finalize
the internal compression system if it hadn't been initialized.

This commit initializes the internal compression system during the
Allocate call, whenever a write function was provided by the caller.
Given that decompression does not need to keep track of any state, the
compressor's private_data member is now populated only during
compression.

In passing, rearrange the functions to their original order, to allow
usefully comparing with the previous code, like:
git diff --diff-algorithm=minimal -w e9960732a~:src/bin/pg_dump/compress_io.c src/bin/pg_dump/compress_gzip.c

Also replace an unreachable pg_fatal() with an assert+comment.  I
(Justin) argued that the new fatal shouldn't have been introduced in a
refactoring commit, so this is a compromise.

Report and initial patch by Justin Pryzby, test case by Georgios
Kokolatos.

https://www.postgresql.org/message-id/20230228235834.GC30529%40telsasoft.com
---
 src/bin/pg_dump/compress_gzip.c  | 137 ++-
 src/bin/pg_dump/t/002_pg_dump.pl |  23 ++
 2 files changed, 101 insertions(+), 59 deletions(-)

diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index 0af65afeb4e..3c9ac55c266 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -32,9 +32,75 @@ typedef struct GzipCompressorState
 	size_t		outsize;
 } GzipCompressorState;
 
+
 /* Private routines that support gzip compressed data I/O */
+static void DeflateCompressorInit(CompressorState *cs);
+static void DeflateCompressorEnd(ArchiveHandle *AH, CompressorState *cs);
+static void DeflateCompressorCommon(ArchiveHandle *AH, CompressorState *cs,
+	bool flush);
+static void EndCompressorGzip(ArchiveHandle *AH, CompressorState *cs);
+static void WriteDataToArchiveGzip(ArchiveHandle *AH, CompressorState *cs,
+   const void *data, size_t dLen);

Re: Add LZ4 compression in pg_dump

2022-11-20 Thread Justin Pryzby
On Fri, Aug 05, 2022 at 02:23:45PM +, Georgios Kokolatos wrote:
> Thank you for your work during commitfest.
> 
> The patch is still in development. Given vacation status, expect the next 
> patches to be ready for the November commitfest.
> For now it has moved to the September one. Further action will be taken then 
> as needed.

On Sun, Nov 06, 2022 at 02:53:12PM +, gkokola...@pm.me wrote:
> On Wed, Nov 2, 2022 at 14:28, Justin Pryzby  wrote:
> > Checking if you'll be able to submit new patches soon ?
> 
> Thank you for checking up. Expect new versions within this commitfest cycle.

Hi,

I think this patch record should be closed for now.  You can re-open the
existing patch record once a patch is ready to be reviewed.

The commitfest is a time for committing/reviewing patches that were
previously submitted, but there's no new patch since July.  Making a
patch available for review at the start of the commitfest seems like a
requirement for current patch records (same as for new patch records).

I wrote essentially the same patch as your early patches 2 years ago
(before postgres was ready to consider new compression algorithms), so
I'm happy to review a new patch when it's available, regardless of its
status in the cfapp.

BTW, some of my own review comments from March weren't addressed.
Please check.  Also, in February, I asked if you knew how to use
cirrusci to run checks on cirrusci, but the patches still had
compilation errors and warnings on various OS.

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3571

-- 
Justin




Re: Add LZ4 compression in pg_dump

2022-11-20 Thread Michael Paquier
On Sun, Nov 20, 2022 at 11:26:11AM -0600, Justin Pryzby wrote:
> I think this patch record should be closed for now.  You can re-open the
> existing patch record once a patch is ready to be reviewed.

Indeed.  As of things are, this is just a dead entry in the CF which
would be confusing.  I have marked it as RwF.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2022-11-22 Thread gkokolatos






--- Original Message ---
On Monday, November 21st, 2022 at 12:13 AM, Michael Paquier 
 wrote:


> 
> 
> On Sun, Nov 20, 2022 at 11:26:11AM -0600, Justin Pryzby wrote:
> 
> > I think this patch record should be closed for now. You can re-open the
> > existing patch record once a patch is ready to be reviewed.
> 
> 
> Indeed. As of things are, this is just a dead entry in the CF which
> would be confusing. I have marked it as RwF.

Thank you for closing it.

For the record I am currently working on it simply unsure if I should submit
WIP patches and add noise to the list or wait until it is in a state that I
feel that the comments have been addressed.

A new version that I feel that is in a decent enough state for review should
be ready within this week. I am happy to drop the patch if you think I should
not work on it though.

Cheers,
//Georgios

> --
> Michael




Re: Add LZ4 compression in pg_dump

2022-11-22 Thread Michael Paquier
On Tue, Nov 22, 2022 at 10:00:47AM +, gkokola...@pm.me wrote:
> A new version that I feel that is in a decent enough state for review should
> be ready within this week. I am happy to drop the patch if you think I should
> not work on it though.

If you can post a new version of the patch, that's fine, of course.
I'll be happy to look over it more.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2022-11-23 Thread Justin Pryzby
On Tue, Nov 22, 2022 at 10:00:47AM +, gkokola...@pm.me wrote:
> For the record I am currently working on it simply unsure if I should submit
> WIP patches and add noise to the list or wait until it is in a state that I
> feel that the comments have been addressed.
> 
> A new version that I feel that is in a decent enough state for review should
> be ready within this week. I am happy to drop the patch if you think I should
> not work on it though.

I hope you'll want to continue work on it.  The patch record is like a
request for review, so it's closed if there's nothing ready to review.

I think you should re-send patches (and update the CF app) as often as
they're ready for more review.  Your 001 commit (which is almost the
same as what I wrote 2 years ago) still needs to account for some review
comments, and the whole patch set ought to pass cirrusci tests.  At that
point, you'll be ready for another round of review, even if there's
known TODO/FIXME items in later patches.

BTW I saw that you updated your branch on github.  You'll need to make
the corresponding changes to ./meson.build that you made to ./Makefile.
https://wiki.postgresql.org/wiki/Meson_for_patch_authors
https://wiki.postgresql.org/wiki/Meson  

   

-- 
Justin




Re: Add LZ4 compression in pg_dump

2022-11-28 Thread Michael Paquier
On Mon, Nov 28, 2022 at 04:32:43PM +, gkokola...@pm.me wrote:
> The focus of this version of this series is 0001 and 0002.
> 
> Admittedly 0001 could be presented in a separate thread though given its size 
> and
> proximity to the topic, I present it here.

I don't mind.  This was a hole in meson.build, so nice catch!  I have
noticed a second defect with pg_verifybackup for all the commands, and
applied both at the same time.

> In an earlier review you spotted the similarity between pg_dump's and 
> pg_receivewal's
> parsing of compression options. However there exists a substantial difference 
> in the
> behaviour of the two programs; one treats the lack of support for the 
> requested
> algorithm as a fatal error, whereas the other does not. The existing 
> functions in
> common/compression.c do not account for the later. 0002 proposes an 
> implementation
> for this. It's usefulness is shown in 0003.

In what does it matter?  The logic in compression.c provides an error
when looking at a spec or validating it, but the caller is free to
consume it as it wants because this is shared between the frontend and
the backend, and that includes consuming it as a warning rather than a
ahrd failure.  If we don't want to issue an error and force
non-compression if attempting to use a compression method not
supported in pg_dump, that's fine by me as a historical behavior, but
I don't see why these routines have any need to be split more as
proposed in 0002.

Saying that, I do agree that it would be nice to remove the
duplication between the option parsing of pg_basebackup and
pg_receivewal.  Your patch is very close to that, actually, and it
occured to me that if we move the check on "server-" and "client-" in
pg_basebackup to be just before the integer-only check then we can
consolidate the whole thing.

Attached is an alternative that does not sacrifice the pluggability of
the existing routines while allowing 0003~ to still use them (I don't
really want to move around the checks on the supported build options
now in parse_compress_specification(), that was hard enough to settle
on this location).  On top of that, pg_basebackup is able to cope with
the case of --compress=0 already, enforcing "none" (BaseBackup could
be simplified a bit more before StartLogStreamer).  This refactoring
shaves a little bit of code.

> Please consider 0003-0005 as work in progress. They are differences from v7 
> yet they
> may contain unaddressed comments for now.

Okay.
--
Michael
From 6fb2aa609348ad7df6f9c12da60c07aa96243965 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 29 Nov 2022 15:17:27 +0900
Subject: [PATCH v9] Make the pg_receivewal compression parsing function common

Also and relax parsing errors in the helper functions and re-introduce those as
an independed function.

As it is shown in the rest of the patch series, there is a lot of duplication
between pg_dump's parsing of compression options and pg_receivewal's. Now the
core work is done in common. However pg_dump would not error out if the
requested compression algorithm is not supported by the build, whereas other
callers will error out. Also it seems a bit weird for only one of the parsing
functions for compressions to error out on missing support and that one to not
be the one responsible for identifying the compression algorithm.

A new function is added to test the support of the algorithm allowing the user
to tune the behaviour.
---
 src/include/common/compression.h  |  2 +
 src/common/compression.c  | 63 +++
 src/bin/pg_basebackup/pg_basebackup.c | 49 -
 src/bin/pg_basebackup/pg_receivewal.c | 61 --
 4 files changed, 73 insertions(+), 102 deletions(-)

diff --git a/src/include/common/compression.h b/src/include/common/compression.h
index 5d680058ed..46855b1a3b 100644
--- a/src/include/common/compression.h
+++ b/src/include/common/compression.h
@@ -33,6 +33,8 @@ typedef struct pg_compress_specification
 	char	   *parse_error;	/* NULL if parsing was OK, else message */
 } pg_compress_specification;
 
+extern void parse_compress_options(const char *option, char **algorithm,
+   char **detail);
 extern bool parse_compress_algorithm(char *name, pg_compress_algorithm *algorithm);
 extern const char *get_compress_algorithm_name(pg_compress_algorithm algorithm);
 
diff --git a/src/common/compression.c b/src/common/compression.c
index df5b627834..5274ba5ba8 100644
--- a/src/common/compression.c
+++ b/src/common/compression.c
@@ -356,3 +356,66 @@ validate_compress_specification(pg_compress_specification *spec)
 
 	return NULL;
 }
+
+#ifdef FRONTEND
+
+/*
+ * Basic parsing of a value specified through a command-line option, commonly
+ * -Z/--compress.
+ *
+ * The parsing consists of a METHOD:DETAIL string fed later to
+ * parse_compress_specification().  This only extracts METHOD and DETAIL.
+ * If only an integer is found, the method is implied by the value specifi

Re: Add LZ4 compression in pg_dump

2022-11-28 Thread Michael Paquier
On Tue, Nov 29, 2022 at 03:19:17PM +0900, Michael Paquier wrote:
> Attached is an alternative that does not sacrifice the pluggability of
> the existing routines while allowing 0003~ to still use them (I don't
> really want to move around the checks on the supported build options
> now in parse_compress_specification(), that was hard enough to settle
> on this location).  On top of that, pg_basebackup is able to cope with
> the case of --compress=0 already, enforcing "none" (BaseBackup could
> be simplified a bit more before StartLogStreamer).  This refactoring
> shaves a little bit of code.

One thing that I forgot to mention is that this refactoring would
treat things like server-N, client-N as valid grammars (in this case N
takes precedence over an optional detail string), implying that N = 0
is "none" and N > 0 is gzip, so that makes for an extra grammar flavor
without impacting the existing ones.  I am not sure that it is worth
documenting, still worth mentioning.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2022-11-29 Thread Michael Paquier
On Tue, Nov 29, 2022 at 12:10:46PM +, gkokola...@pm.me wrote:
> Thank you. Please advice if is preferable to split 0002 in two parts.
> I think not but I will happily do so if you think otherwise.

This one makes me curious.  What kind of split are you talking about?
If it makes the code review and the git history cleaner and easier, I
am usually a lot in favor of such incremental changes.  As far as I
can see, there is the switch from the compression integer to
compression specification as one thing.  The second thing is the
refactoring of cfclose() and these routines, paving the way for 0003.
Hmm, it may be cleaner to move the switch to the compression spec in
one patch, and move the logic around cfclose() to its own, paving the
way to 0003.

By the way, I think that this 0002 should drop all the default clauses
in the switches for the compression method so as we'd catch any
missing code paths with compiler warnings if a new compression method
is added in the future.

Anyway, I have applied 0001, adding you as a primary author because
you did most of it with only tweaks from me for pg_basebackup.  The
docs of pg_basebackup have been amended to mention the slight change
in grammar, affecting the case where we do not have a detail string.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2022-11-30 Thread Michael Paquier
On Wed, Nov 30, 2022 at 05:11:44PM +, gkokola...@pm.me wrote:
> Fair enough. The atteched v11 does that. 0001 introduces compression
> specification and is using it throughout. 0002 paves the way to the
> new interface by homogenizing the use of cfp. 0003 introduces the new
> API and stores the compression algorithm in the custom format header
> instead of the compression level integer. Finally 0004 adds support for
> LZ4.

I have been looking at 0001, and..  Hmm.  I am really wondering
whether it would not be better to just nuke this warning into orbit.
This stuff enforces non-compression even if -Z has been used to a
non-default value.  This has been moved to its current location by
cae2bb1 as of this thread:
https://www.postgresql.org/message-id/20160526.185551.242041780.horiguchi.kyotaro%40lab.ntt.co.jp

However, this is only active if -Z is used when not building with
zlib.  At the end, it comes down to whether we want to prioritize the
portability of pg_dump commands specifying a -Z/--compress across
environments knowing that these may or may not be built with zlib,
vs the amount of simplification/uniformity we would get across the
binaries in the tree once we switch everything to use the compression
specifications.  Now that pg_basebackup and pg_receivewal are managed
by compression specifications, and that we'd want more compression
options for pg_dump, I would tend to do the latter and from now on
complain if attempting to do a pg_dump -Z under --without-zlib with a
compression level > 0.  zlib is also widely available, and we don't
document the fact that non-compression is enforced in this case,
either.  (Two TAP tests with the custom format had to be tweaked.)

As per the patch, it is true that we do not need to bump the format of
the dump archives, as we can still store only the compression level
and guess the method from it.  I have added some notes about that in
ReadHead and WriteHead to not forget.

Most of the changes are really-straight forward, and it has resisted
my tests, so I think that this is in a rather-commitable shape as-is.
--
Michael
From a4fa522d0259e8969cde32798a917321cced0415 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 1 Dec 2022 11:03:41 +0900
Subject: [PATCH v12] Teach pg_dump about compress_spec and use it throughout.

Align pg_dump with the rest of the binaries which use common compression. It is
teaching pg_dump.c about the common compression definitions and interfaces. Then
it propagates those throughout the code.
---
 src/bin/pg_dump/compress_io.c   | 107 
 src/bin/pg_dump/compress_io.h   |  20 ++--
 src/bin/pg_dump/pg_backup.h |   7 +-
 src/bin/pg_dump/pg_backup_archiver.c|  76 +-
 src/bin/pg_dump/pg_backup_archiver.h|  10 +-
 src/bin/pg_dump/pg_backup_custom.c  |   6 +-
 src/bin/pg_dump/pg_backup_directory.c   |  13 ++-
 src/bin/pg_dump/pg_backup_tar.c |  11 +-
 src/bin/pg_dump/pg_dump.c   |  67 +++-
 src/bin/pg_dump/t/001_basic.pl  |  34 +--
 src/bin/pg_dump/t/002_pg_dump.pl|   3 +-
 src/test/modules/test_pg_dump/t/001_base.pl |  16 +++
 doc/src/sgml/ref/pg_dump.sgml   |  34 +--
 src/tools/pgindent/typedefs.list|   1 -
 14 files changed, 242 insertions(+), 163 deletions(-)

diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index 62f940ff7a..8f0d6d6210 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -64,7 +64,7 @@
 /* typedef appears in compress_io.h */
 struct CompressorState
 {
-	CompressionAlgorithm comprAlg;
+	pg_compress_specification compression_spec;
 	WriteFunc	writeF;
 
 #ifdef HAVE_LIBZ
@@ -74,9 +74,6 @@ struct CompressorState
 #endif
 };
 
-static void ParseCompressionOption(int compression, CompressionAlgorithm *alg,
-   int *level);
-
 /* Routines that support zlib compressed data I/O */
 #ifdef HAVE_LIBZ
 static void InitCompressorZlib(CompressorState *cs, int level);
@@ -93,57 +90,30 @@ static void ReadDataFromArchiveNone(ArchiveHandle *AH, ReadFunc readF);
 static void WriteDataToArchiveNone(ArchiveHandle *AH, CompressorState *cs,
    const char *data, size_t dLen);
 
-/*
- * Interprets a numeric 'compression' value. The algorithm implied by the
- * value (zlib or none at the moment), is returned in *alg, and the
- * zlib compression level in *level.
- */
-static void
-ParseCompressionOption(int compression, CompressionAlgorithm *alg, int *level)
-{
-	if (compression == Z_DEFAULT_COMPRESSION ||
-		(compression > 0 && compression <= 9))
-		*alg = COMPR_ALG_LIBZ;
-	else if (compression == 0)
-		*alg = COMPR_ALG_NONE;
-	else
-	{
-		pg_fatal("invalid compression code: %d", compression);
-		*alg = COMPR_ALG_NONE;	/* keep compiler quiet */
-	}
-
-	/* The level is just the passed-in value. */
-	if (level)
-		*level = compression;
-}
-
 /* Public interface routines */
 
 /* Allo

Re: Add LZ4 compression in pg_dump

2022-12-01 Thread gkokolatos





--- Original Message ---
On Thursday, December 1st, 2022 at 3:05 AM, Michael Paquier 
 wrote:


> 
> 
> On Wed, Nov 30, 2022 at 05:11:44PM +, gkokola...@pm.me wrote:
> 
> > Fair enough. The atteched v11 does that. 0001 introduces compression
> > specification and is using it throughout. 0002 paves the way to the
> > new interface by homogenizing the use of cfp. 0003 introduces the new
> > API and stores the compression algorithm in the custom format header
> > instead of the compression level integer. Finally 0004 adds support for
> > LZ4.
> 
> 
> I have been looking at 0001, and.. Hmm. I am really wondering
> whether it would not be better to just nuke this warning into orbit.
> This stuff enforces non-compression even if -Z has been used to a
> non-default value. This has been moved to its current location by
> cae2bb1 as of this thread:
> https://www.postgresql.org/message-id/20160526.185551.242041780.horiguchi.kyotaro%40lab.ntt.co.jp
> 
> However, this is only active if -Z is used when not building with
> zlib. At the end, it comes down to whether we want to prioritize the
> portability of pg_dump commands specifying a -Z/--compress across
> environments knowing that these may or may not be built with zlib,
> vs the amount of simplification/uniformity we would get across the
> binaries in the tree once we switch everything to use the compression
> specifications. Now that pg_basebackup and pg_receivewal are managed
> by compression specifications, and that we'd want more compression
> options for pg_dump, I would tend to do the latter and from now on
> complain if attempting to do a pg_dump -Z under --without-zlib with a
> compression level > 0. zlib is also widely available, and we don't
> document the fact that non-compression is enforced in this case,
> either. (Two TAP tests with the custom format had to be tweaked.)

Fair enough. Thank you for looking. However I have a small comment
on your new patch.

-   /* Custom and directory formats are compressed by default, others not */
-   if (compressLevel == -1)
-   {
-#ifdef HAVE_LIBZ
-   if (archiveFormat == archCustom || archiveFormat == 
archDirectory)
-   compressLevel = Z_DEFAULT_COMPRESSION;
-   else
-#endif
-   compressLevel = 0;
-   }


Nuking the warning from orbit and changing the behaviour around disabling
the requested compression when the libraries are not present, should not
mean that we need to change the behaviour of default values for different
formats. Please find v13 attached which reinstates it.

Which in itself it got me looking and wondering why the tests succeeded.
The only existing test covering that path is `defaults_dir_format` in
`002_pg_dump.pl`. However as the test is currently written it does not
check whether the output was compressed. The restore command would succeed
in either case. A simple `gzip -t -r` against the directory will not
suffice to test it, because there exist files which are never compressed
in this format (.toc). A little bit more involved test case would need
to be written, yet before I embark to this journey, I would like to know
if you would agree to reinstate the defaults for those formats.

> 
> As per the patch, it is true that we do not need to bump the format of
> the dump archives, as we can still store only the compression level
> and guess the method from it. I have added some notes about that in
> ReadHead and WriteHead to not forget.

Agreed. A minor suggestion if you may.

 #ifndef HAVE_LIBZ
-   if (AH->compression != 0)
+   if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
pg_log_warning("archive is compressed, but this installation 
does not support compression -- no data will be available");
 #endif

It would seem a more consistent to error out in this case. We do error
in all other cases where the compression is not available.

> 
> Most of the changes are really-straight forward, and it has resisted
> my tests, so I think that this is in a rather-commitable shape as-is.

Thank you.

Cheers,
//Georgios

> --
> MichaelFrom 16e10b38cc8eb6eb5b1ffc15365d7e6ce23eef0a Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Thu, 1 Dec 2022 08:58:51 +
Subject: [PATCH v13] Teach pg_dump about compress_spec and use it throughout.

Align pg_dump with the rest of the binaries which use common compression. It is
teaching pg_dump.c about the common compression definitions and interfaces. Then
it propagates those throughout the code.
---
 doc/src/sgml/ref/pg_dump.sgml   |  34 +--
 src/bin/pg_dump/compress_io.c   | 107 
 src/bin/pg_dump/compress_io.h   |  20 ++--
 src/bin/pg_dump/pg_backup.h |   7 +-
 src/bin/pg_dump/pg_backup_archiver.c|  78 +-
 src/bin/pg_dump/pg_backup_archiver.h|  10 +-
 src/bin/pg_dump/pg_backup_custom.c  |   6 +-
 src/bin/pg_dump/pg_back

Re: Add LZ4 compression in pg_dump

2022-12-01 Thread Michael Paquier
On Thu, Dec 01, 2022 at 02:58:35PM +, gkokola...@pm.me wrote:
> Nuking the warning from orbit and changing the behaviour around disabling
> the requested compression when the libraries are not present, should not
> mean that we need to change the behaviour of default values for different
> formats. Please find v13 attached which reinstates it.

Gah, thanks!  And this default behavior is documented as dependent on
the compilation as well.

> Which in itself it got me looking and wondering why the tests succeeded.
> The only existing test covering that path is `defaults_dir_format` in
> `002_pg_dump.pl`. However as the test is currently written it does not
> check whether the output was compressed. The restore command would succeed
> in either case. A simple `gzip -t -r` against the directory will not
> suffice to test it, because there exist files which are never compressed
> in this format (.toc). A little bit more involved test case would need
> to be written, yet before I embark to this journey, I would like to know
> if you would agree to reinstate the defaults for those formats.

On top of my mind, I briefly recall that -r is not that portable.  And
the toc format makes the files generated non-deterministic as these
use OIDs..

[.. thinks ..]

We are going to need a new thing here, as compress_cmd cannot be
directly used.  What if we used only an array of glob()-able elements?
Let's say "expected_contents" that could include a "dir_path/*.gz"
conditional on $supports_gzip?  glob() can only be calculated when the
test is run as the file names cannot be known beforehand :/

>> As per the patch, it is true that we do not need to bump the format of
>> the dump archives, as we can still store only the compression level
>> and guess the method from it. I have added some notes about that in
>> ReadHead and WriteHead to not forget.
> 
> Agreed. A minor suggestion if you may.
> 
>  #ifndef HAVE_LIBZ
> -   if (AH->compression != 0)
> +   if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> pg_log_warning("archive is compressed, but this installation 
> does not support compression -- no data will be available");
>  #endif
> 
> It would seem a more consistent to error out in this case. We do error
> in all other cases where the compression is not available.

Makes sense.

I have gone through the patch again, and applied it.  Thanks!
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2022-12-02 Thread Michael Paquier
On Fri, Dec 02, 2022 at 04:15:10PM +, gkokola...@pm.me wrote:
> You are very correct. However one can glob after the fact. Please find
> 0001 of the attached v14 which attempts to implement it.

+   if ($pgdump_runs{$run}->{glob_pattern})
+   {
+   my $glob_pattern = $pgdump_runs{$run}->{glob_pattern};
+   my @glob_output = glob($glob_pattern);
+   is(scalar(@glob_output) > 0, 1, "glob pattern matched")
+   }

While this is correct in checking that the contents are compressed
under --with-zlib, this also removes the coverage where we make sure
that this command is able to complete under --without-zlib without
compressing any of the table data files.  Hence my point from
upthread: this test had better not use compile_option, but change
glob_pattern depending on if the build uses zlib or not.

In order to check this behavior with defaults_custom_format, perhaps
we could just remove the -Z6 from it or add an extra command for its
default behavior?
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2022-12-04 Thread Michael Paquier
On Sat, Dec 03, 2022 at 11:45:30AM +0900, Michael Paquier wrote:
> While this is correct in checking that the contents are compressed
> under --with-zlib, this also removes the coverage where we make sure
> that this command is able to complete under --without-zlib without
> compressing any of the table data files.  Hence my point from
> upthread: this test had better not use compile_option, but change
> glob_pattern depending on if the build uses zlib or not.

In short, I mean something like the attached.  I have named the flag
content_patterns, and switched it to an array so as we can check that
toc.dat is always uncompression and that the other data files are
always uncompressed.

> In order to check this behavior with defaults_custom_format, perhaps
> we could just remove the -Z6 from it or add an extra command for its
> default behavior?

This is slightly more complicated as there is just one file generated
for the compression and non-compression cases, so I have let that as
it is now.
--
Michael
From 5c583358caed5598fec9abea6750ff7fbd98d269 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 5 Dec 2022 16:04:57 +0900
Subject: [PATCH v15] Provide coverage for pg_dump default compression for dir
 format

The restore program will succeed regardless of whether the dumped output was
compressed or not. This commit implements a portable way to check the contents
of the directory via perl's build in filename expansion.
---
 src/bin/pg_dump/t/002_pg_dump.pl | 29 +
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index 709db0986d..9796d2667f 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -36,6 +36,9 @@ my $tempdir = PostgreSQL::Test::Utils::tempdir;
 # to test pg_restore's ability to parse manually compressed files
 # that otherwise pg_dump does not compress on its own (e.g. *.toc).
 #
+# content_patterns is an optional array consisting of strings compilable
+# with glob() to check the files generated after a dump.
+#
 # restore_cmd is the pg_restore command to run, if any.  Note
 # that this should generally be used when the pg_dump goes to
 # a non-text file and that the restore can then be used to
@@ -46,6 +49,10 @@ my $tempdir = PostgreSQL::Test::Utils::tempdir;
 # database and then pg_dump *that* database (or something along
 # those lines) to validate that part of the process.
 
+my $supports_icu  = ($ENV{with_icu} eq 'yes');
+my $supports_lz4  = check_pg_config("#define USE_LZ4 1");
+my $supports_gzip = check_pg_config("#define HAVE_LIBZ 1");
+
 my %pgdump_runs = (
 	binary_upgrade => {
 		dump_cmd => [
@@ -213,6 +220,9 @@ my %pgdump_runs = (
 	},
 
 	# Do not use --no-sync to give test coverage for data sync.
+	# By default, the directory format compresses its contents
+	# when the code is compiled with gzip support, and lets things
+	# uncompressed when not compiled with it.
 	defaults_dir_format => {
 		test_key => 'defaults',
 		dump_cmd => [
@@ -224,6 +234,11 @@ my %pgdump_runs = (
 			"--file=$tempdir/defaults_dir_format.sql",
 			"$tempdir/defaults_dir_format",
 		],
+		content_patterns => ["$tempdir/defaults_dir_format/toc.dat",
+ $supports_gzip ?
+ "$tempdir/defaults_dir_format/*.dat.gz" :
+ "$tempdir/defaults_dir_format/*.dat",
+		],
 	},
 
 	# Do not use --no-sync to give test coverage for data sync.
@@ -3920,10 +3935,6 @@ if ($collation_check_stderr !~ /ERROR: /)
 	$collation_support = 1;
 }
 
-my $supports_icu  = ($ENV{with_icu} eq 'yes');
-my $supports_lz4  = check_pg_config("#define USE_LZ4 1");
-my $supports_gzip = check_pg_config("#define HAVE_LIBZ 1");
-
 # ICU doesn't work with some encodings
 my $encoding = $node->safe_psql('postgres', 'show server_encoding');
 $supports_icu = 0 if $encoding eq 'SQL_ASCII';
@@ -4153,6 +4164,16 @@ foreach my $run (sort keys %pgdump_runs)
 		command_ok(\@full_compress_cmd, "$run: compression commands");
 	}
 
+	if ($pgdump_runs{$run}->{content_patterns})
+	{
+		my $content_patterns = $pgdump_runs{$run}->{content_patterns};
+		foreach my $content_pattern (@{$content_patterns})
+		{
+			my @glob_output = glob($content_pattern);
+			is(scalar(@glob_output) > 0, 1, "$run: content check for $content_pattern");
+		}
+	}
+
 	if ($pgdump_runs{$run}->{restore_cmd})
 	{
 		$node->command_ok(\@{ $pgdump_runs{$run}->{restore_cmd} },
-- 
2.38.1



signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2022-12-05 Thread Michael Paquier
On Mon, Dec 05, 2022 at 12:48:28PM +, gkokola...@pm.me wrote:
> I also took the liberty of applying the test pattern when it the dump
> is explicitly compressed.

Sticking with glob_patterns is fine by me.

> I was thinking a bit more about this. I think that we can use the list
> TOC option of pg_restore. This option will first print out the header
> info which contains the compression. Perl utils already support to 
> parse the generated output of a command. Please find an attempt to do
> so in the attached. The benefits of having some testing for this case
> become a bit more obvious in 0004 of the patchset, when lz4 is
> introduced.

This is where the fun is.  What you are doing here is more complete,
and we would make sure that the custom and data directory would always
see their contents compressed by default.  And it would have caught
the bug you mentioned upthread for the custom format.

I have kept things as you proposed at the end, added a few comments,
documented the new command_like and an extra command_like for
defaults_dir_format.  Glad to see this addressed, thanks!
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2022-12-17 Thread Justin Pryzby
001: still refers to "gzip", which is correct for -Fp and -Fd but not
for -Fc, for which it's more correct to say "zlib".  That affects the
name of the function, structures, comments, etc.  I'm not sure if it's
an issue to re-use the basebackup compression routines here.  Maybe we
should accept "-Zn" for zlib output (-Fc), but reject "gzip:9", which
I'm sure some will find confusing, as it does not output.  Maybe 001
should be split into a patch to re-use the existing "cfp" interface
(which is a clear win), and 002 to re-use the basebackup interfaces for
user input and constants, etc.

001 still doesn't compile on freebsd, and 002 doesn't compile on
windows.  Have you checked test results from cirrusci on your private
github account ?

002 says:
+   save_errno = errno; 


+   errno = save_errno; 



I suppose that's intended to wrap the preceding library call.

002 breaks "pg_dump -Fc -Z2" because (I think) AllocateCompressor()
doesn't store the passed-in compression_spec.

003 still uses  and not "lz4.h".

Earlier this year I also suggested to include an 999 patch to change to
use LZ4 as the default compression, to exercise the new code under CI.
I suggest to re-open the cf patch entry after that passes tests on all
platforms and when it's ready for more review.

BTW, some of these review comments are the same as what I sent earlier
this year.

https://www.postgresql.org/message-id/20220326162156.GI28503%40telsasoft.com
https://www.postgresql.org/message-id/20220705151328.GQ13040%40telsasoft.com

-- 
Justin




Re: Add LZ4 compression in pg_dump

2022-12-18 Thread Michael Paquier
On Sat, Dec 17, 2022 at 05:26:15PM -0600, Justin Pryzby wrote:
> 001: still refers to "gzip", which is correct for -Fp and -Fd but not
> for -Fc, for which it's more correct to say "zlib".

Or should we begin by changing all these existing "not built with zlib 
support" error strings to the more generic "this build does not
support compression with %s" to reduce the number of messages to
translate?  That would bring consistency with the other tools dealing
with compression.

> That affects the
> name of the function, structures, comments, etc.  I'm not sure if it's
> an issue to re-use the basebackup compression routines here.  Maybe we
> should accept "-Zn" for zlib output (-Fc), but reject "gzip:9", which
> I'm sure some will find confusing, as it does not output.  Maybe 001
> should be split into a patch to re-use the existing "cfp" interface
> (which is a clear win), and 002 to re-use the basebackup interfaces for
> user input and constants, etc.
> 
> 001 still doesn't compile on freebsd, and 002 doesn't compile on
> windows.  Have you checked test results from cirrusci on your private
> github account ?

FYI, I have re-added an entry to the CF app to get some automated
coverage:
https://commitfest.postgresql.org/41/3571/

On MinGW, a complain about the open() callback, which I guess ought to
be avoided with a rename:
[00:16:37.254] compress_gzip.c:356:38: error: macro "open" passed 4 arguments, 
but takes just 3
[00:16:37.254]   356 |  ret = CFH->open(fname, -1, mode, CFH);
[00:16:37.254]   |  ^
[00:16:37.254] In file included from ../../../src/include/c.h:1309,
[00:16:37.254]  from ../../../src/include/postgres_fe.h:25,
[00:16:37.254]  from compress_gzip.c:15:

On MSVC, some declaration conflicts, for a similar issue:
[00:12:31.966] ../src/bin/pg_dump/compress_io.c(193): error C2371: '_read': 
redefinition; different basic types
[00:12:31.966] C:\Program Files (x86)\Windows 
Kits\10\include\10.0.20348.0\ucrt\corecrt_io.h(252): note: see declaration of 
'_read'
[00:12:31.966] ../src/bin/pg_dump/compress_io.c(210): error C2371: '_write': 
redefinition; different basic types
[00:12:31.966] C:\Program Files (x86)\Windows 
Kits\10\include\10.0.20348.0\ucrt\corecrt_io.h(294): note: see declaration of 
'_write'

> 002 breaks "pg_dump -Fc -Z2" because (I think) AllocateCompressor()
> doesn't store the passed-in compression_spec.

Hmm.  This looks like a gap in the existing tests that we'd better fix
first.  This CI is green on Linux.

> 003 still uses  and not "lz4.h".

This should be , not "lz4.h".
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2022-12-19 Thread Justin Pryzby
On Mon, Dec 19, 2022 at 05:03:21PM +, gkokola...@pm.me wrote:
> > > 001 still doesn't compile on freebsd, and 002 doesn't compile on
> > > windows. Have you checked test results from cirrusci on your private
> > > github account ?
> 
> There are still known gaps in 0002 and 0003, for example documentation,
> and I have not been focusing too much on those. You are right, it is helpful
> and kind to try to reduce the noise. The attached should have hopefully
> tackled the ci errors.

Yep.  Are you using cirrusci under your github account ?

> > FYI, I have re-added an entry to the CF app to get some automated
> > coverage:
> > https://commitfest.postgresql.org/41/3571/
> 
> Much obliged. Should I change the state to "ready for review" when post a
> new version or should I leave that to the senior personnel?   

It's better to update it to reflect what you think its current status
is.  If you think it's ready for review.

> > > 002 breaks "pg_dump -Fc -Z2" because (I think) AllocateCompressor()
> > > doesn't store the passed-in compression_spec.
> 
> I am afraid I have not been able to reproduce this error. I tried both
> debian and freebsd after I addressed the compilation warnings. Which
> error did you get? Is it still present in the attached?

It's not that there's an error - it's that compression isn't working.

$ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z1 -Fp regression |wc -c
659956
$ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z2 -Fp regression |wc -c
637192

$ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z1 -Fc regression |wc -c
1954890
$ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z2 -Fc regression |wc -c
1954890

-- 
Justin




Re: Add LZ4 compression in pg_dump

2022-12-19 Thread Justin Pryzby
On Mon, Dec 19, 2022 at 01:06:00PM +0900, Michael Paquier wrote:
> On Sat, Dec 17, 2022 at 05:26:15PM -0600, Justin Pryzby wrote:
> > 001: still refers to "gzip", which is correct for -Fp and -Fd but not
> > for -Fc, for which it's more correct to say "zlib".
> 
> Or should we begin by changing all these existing "not built with zlib 
> support" error strings to the more generic "this build does not
> support compression with %s" to reduce the number of messages to
> translate?  That would bring consistency with the other tools dealing
> with compression.

That's fine, but it doesn't touch on the issue I'm talking about, which
is that zlib != gzip.

BTW I noticed that that also affects the pg_dump file itself; 002
changes the file format to say "gzip", but that's wrong for -Fc, which
does not use gzip headers, which could be surprising to someone who
specified "gzip".

-- 
Justin




  1   2   >