On 3/11/23 11:50, gkokola...@pm.me wrote:
> ------- Original Message -------
> On Saturday, March 11th, 2023 at 7:00 AM, Alexander Lakhin 
> <exclus...@gmail.com> wrote:
> 
>> Hello,
>> 23.02.2023 23:24, Tomas Vondra wrote:
>>
>>> On 2/23/23 16:26, Tomas Vondra wrote:
>>>
>>>> Thanks for v30 with the updated commit messages. I've pushed 0001 after
>>>> fixing a comment typo and removing (I think) an unnecessary change in an
>>>> error message.
>>>>
>>>> I'll give the buildfarm a bit of time before pushing 0002 and 0003.
>>>
>>> I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
>>> and marked the CF entry as committed. Thanks for the patch!
>>>
>>> I wonder how difficult would it be to add the zstd compression, so that
>>> we don't have the annoying "unsupported" cases.
>>
>>
>> With the patch 0003 committed, a single warning -Wtype-limits appeared in the
>> master branch:
>> $ CPPFLAGS="-Og -Wtype-limits" ./configure --with-lz4 -q && make -s -j8
>> compress_lz4.c: In function ‘LZ4File_gets’:
>> compress_lz4.c:492:19: warning: comparison of unsigned expression in ‘< 0’ is
>> always false [-Wtype-limits]
>> 492 | if (dsize < 0)
>> |
> 
> Thank you Alexander. Please find attached an attempt to address it.
> 
>> (I wonder, is it accidental that there no other places that triggers
>> the warning, or some buildfarm animals had this check enabled before?)
> 
> I can not answer about the buildfarms. Do you think that adding an explicit
> check for this warning in meson would help? I am a bit uncertain as I think
> that type-limits are included in extra.
> 
> @@ -1748,6 +1748,7 @@ common_warning_flags = [
>    '-Wshadow=compatible-local',
>    # This was included in -Wall/-Wformat in older GCC versions
>    '-Wformat-security',
> +  '-Wtype-limits',
>  ]
> 
>>
>> It is not a false positive as can be proved by the 002_pg_dump.pl modified as
>> follows:
>> - program => $ENV{'LZ4'},
>>
>> + program => 'mv',
>>
>> args => [
>>
>> - '-z', '-f', '--rm',
>> "$tempdir/compression_lz4_dir/blobs.toc",
>> "$tempdir/compression_lz4_dir/blobs.toc.lz4",
>> ],
>> },
> 
> Correct, it is not a false positive. The existing testing framework provides
> limited support for exercising error branches. Especially so when those are
> dependent on generated output. 
> 
>> A diagnostic logging added shows:
>> LZ4File_gets() after LZ4File_read_internal; dsize: 18446744073709551615
>>
>> and pg_restore fails with:
>> error: invalid line in large object TOC file
>> ".../src/bin/pg_dump/tmp_check/tmp_test_22ri/compression_lz4_dir/blobs.toc": 
>> "????"
> 
> It is a good thing that the restore fails with bad input. Yet it should
> have failed earlier. The attached makes certain it does fail earlier. 
> 

Thanks for the patch.

I did look if there are other places that might have the same issue, and
I think there are - see attached 0002. For example LZ4File_write is
declared to return size_t, but then it also does

        if (LZ4F_isError(status))
        {
            fs->errcode = status;
            return -1;
        }

That won't work :-(

And these issues may not be restricted to lz4 code - Gzip_write is
declared to return size_t, but it does

    return gzwrite(gzfp, ptr, size);

and gzwrite returns int. Although, maybe that's correct, because
gzwrite() is "0 on error" so maybe this is fine ...

However, Gzip_read assigns gzread() to size_t, and that does not seem
great. It probably will still trigger the following pg_fatal() because
it'd be very lucky to match the expected 'size', but it's confusing.


I wonder whether CompressorState should use int or size_t for the
read_func/write_func callbacks. I guess no option is perfect, i.e. no
data type will work for all compression libraries we might use (lz4 uses
int while lz4f uses size_t, to there's that).

It's a bit weird the "open" functions return int and the read/write
size_t. Maybe we should stick to int, which is what the old functions
(cfwrite etc.) did.


But I think the actual problem here is that the API does not clearly
define how errors are communicated. I mean, it's nice to return the
value returned by the library function without "mangling" it by
conversion to size_t, but what if the libraries communicate errors in
different way? Some may return "0" while others may return "-1".

I think the right approach is to handle all library errors and not just
let them through. So Gzip_write() needs to check the return value, and
either call pg_fatal() or translate it to an error defined by the API.

For example we might say "returns 0 on error" and then translate all
library-specific errors to that.


While looking at the code I realized a couple function comments don't
say what's returned in case of error, etc. So 0004 adds those.

0003 is a couple minor assorted comments/questions:

- Should we move ZLIB_OUT_SIZE/ZLIB_IN_SIZE to compress_gzip.c?

- Why are LZ4 buffer sizes different (ZLIB has both 4kB)?

- I wonder if we actually need LZ4F_HEADER_SIZE_MAX? Is it even possible
for LZ4F_compressBound to return value this small (especially for 16kB
input buffer)?



regards


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From 72cb710c08c4617fa491fe4824c4f99d3d3402fb Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@2ndquadrant.com>
Date: Mon, 13 Mar 2023 20:42:33 +0100
Subject: [PATCH 4/4] comment improvements

---
 src/bin/pg_dump/compress_lz4.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index 94f28d6806..19a9cf2df2 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -190,10 +190,15 @@ LZ4File_get_error(CompressFileHandle *CFH)
 }
 
 /*
- * Prepare an already alloc'ed LZ4File struct for subsequent calls.
+ * Prepare an already alloc'ed LZ4File struct for subsequent calls (either
+ * compression or decompression).
  *
- * It creates the necessary contexts for the operations. When compressing,
- * it additionally writes the LZ4 header in the output stream.
+ * It creates the necessary contexts for the operations. When compressing data
+ * (indicated by compressing=true), it additionally writes the LZ4 header in the
+ * output stream.
+ *
+ * Returns 0 on success. In case of a failure returns 1, and stores the error
+ * code in fs->errcode.
  */
 static int
 LZ4File_init(LZ4File *fs, int size, bool compressing)
@@ -206,6 +211,7 @@ LZ4File_init(LZ4File *fs, int size, bool compressing)
 	fs->compressing = compressing;
 	fs->inited = true;
 
+	/* When compressing, write LZ4 header to the output stream. */
 	if (fs->compressing)
 	{
 		fs->buflen = LZ4F_compressBound(LZ4_IN_SIZE, &fs->prefs);
@@ -248,7 +254,7 @@ LZ4File_init(LZ4File *fs, int size, bool compressing)
 			return 1;
 		}
 
-		fs->buflen = size > LZ4_OUT_SIZE ? size : LZ4_OUT_SIZE;
+		fs->buflen = (size > LZ4_OUT_SIZE) ? size : LZ4_OUT_SIZE;
 		fs->buffer = pg_malloc(fs->buflen);
 
 		fs->overflowalloclen = fs->buflen;
@@ -262,7 +268,10 @@ LZ4File_init(LZ4File *fs, int size, bool compressing)
 /*
  * Read already decompressed content from the overflow buffer into 'ptr' up to
  * 'size' bytes, if available. If the eol_flag is set, then stop at the first
- * occurrence of the new line char prior to 'size' bytes.
+ * occurrence of the newline char prior to 'size' bytes.
+ *
+ * Returns the number of bytes read from the overflow buffer (and copied into
+ * the 'ptr' buffer), or 0 if the overflow buffer is empty.
  *
  * Any unread content in the overflow buffer is moved to the beginning.
  */
@@ -304,6 +313,9 @@ LZ4File_read_overflow(LZ4File *fs, void *ptr, int size, bool eol_flag)
  * at an overflow buffer within LZ4File. Of course, when the function is
  * called, it will first try to consume any decompressed content already
  * present in the overflow buffer, before decompressing new content.
+ *
+ * Returns the number of bytes of decompressed data copied into the ptr
+ * buffer, or -1 in case of error.
  */
 static int
 LZ4File_read_internal(LZ4File *fs, void *ptr, int ptrsize, bool eol_flag)
-- 
2.39.2

From e9d1e4dcbf17200f34cdb857c7961fb0df1e8435 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@2ndquadrant.com>
Date: Mon, 13 Mar 2023 20:42:21 +0100
Subject: [PATCH 3/4] questions

---
 src/bin/pg_dump/compress_io.h  | 1 +
 src/bin/pg_dump/compress_lz4.c | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/src/bin/pg_dump/compress_io.h b/src/bin/pg_dump/compress_io.h
index cdb15951ea..ae32a4de1c 100644
--- a/src/bin/pg_dump/compress_io.h
+++ b/src/bin/pg_dump/compress_io.h
@@ -18,6 +18,7 @@
 #include "pg_backup_archiver.h"
 
 /* Initial buffer sizes used in zlib compression. */
+/* XXX shouldn't this be moved to compress_gzip.c? */
 #define ZLIB_OUT_SIZE	4096
 #define ZLIB_IN_SIZE	4096
 
diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index 9ab57ceff3..94f28d6806 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -20,6 +20,8 @@
 #include <lz4.h>
 #include <lz4frame.h>
 
+/* Initial buffer sizes used in zlib compression. */
+/* XXX Why is this different from GZIP values? That uses 4kB for both. */
 #define LZ4_OUT_SIZE	(4 * 1024)
 #define LZ4_IN_SIZE		(16 * 1024)
 
@@ -207,6 +209,10 @@ LZ4File_init(LZ4File *fs, int size, bool compressing)
 	if (fs->compressing)
 	{
 		fs->buflen = LZ4F_compressBound(LZ4_IN_SIZE, &fs->prefs);
+		/*
+		 * XXX Does this actually do something meaningful? With LZ4_IN_SIZE=16kB
+		 * I get buflen=143600 (roughly), so can it ever be smaller than 22?
+		 */
 		if (fs->buflen < LZ4F_HEADER_SIZE_MAX)
 			fs->buflen = LZ4F_HEADER_SIZE_MAX;
 
-- 
2.39.2

From cd361f4bc631a33eb7374bf8b292976aaf07799b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@2ndquadrant.com>
Date: Mon, 13 Mar 2023 20:41:38 +0100
Subject: [PATCH 2/4] more size_t places

---
 src/bin/pg_dump/compress_gzip.c | 6 ++++--
 src/bin/pg_dump/compress_lz4.c  | 2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index 0af65afeb4..6b042cab5b 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -233,12 +233,13 @@ InitCompressorGzip(CompressorState *cs,
  *----------------------
  */
 
-static size_t
+static size_t	/* XXX issue size_t vs. int? */
 Gzip_read(void *ptr, size_t size, CompressFileHandle *CFH)
 {
 	gzFile		gzfp = (gzFile) CFH->private_data;
 	size_t		ret;
 
+	/* XXX this is probably wrong because gzread is "-1 for error" but this breaks that. */
 	ret = gzread(gzfp, ptr, size);
 	if (ret != size && !gzeof(gzfp))
 	{
@@ -252,11 +253,12 @@ Gzip_read(void *ptr, size_t size, CompressFileHandle *CFH)
 	return ret;
 }
 
-static size_t
+static size_t	/* XXX issue size_t vs. int? */
 Gzip_write(const void *ptr, size_t size, CompressFileHandle *CFH)
 {
 	gzFile		gzfp = (gzFile) CFH->private_data;
 
+	/* XXX this is probably OK, because gzwrite is "or 0 in case of error" per https://zlib.net/manual.html#Basic */
 	return gzwrite(gzfp, ptr, size);
 }
 
diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index cc039f0b47..9ab57ceff3 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -430,7 +430,7 @@ LZ4File_write(const void *ptr, size_t size, CompressFileHandle *CFH)
 		if (LZ4F_isError(status))
 		{
 			fs->errcode = status;
-			return -1;
+			return -1;	/* FIXME size_t */
 		}
 
 		if (fwrite(fs->buffer, 1, status, fs->fp) != status)
-- 
2.39.2

From 79860a9600f4e677f10be39db507f38c711812cf Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@2ndquadrant.com>
Date: Mon, 13 Mar 2023 20:39:37 +0100
Subject: [PATCH 1/4] Respect return type of LZ4File_read_internal

The function LZ4File_gets() was storing the return value of
LZ4File_read_internal in a variable of the wrong type, disregarding sign-es.
As a consequence, LZ4File_gets() would not take the error path when it should.

In an attempt to improve readability, spell out the significance of a negative
return value of LZ4File_read_internal() in LZ4File_read().

Reported-by: Alexander Lakhin <exclus...@gmail.com>
---
 src/bin/pg_dump/compress_lz4.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index 63e794cdc6..cc039f0b47 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -453,7 +453,7 @@ LZ4File_read(void *ptr, size_t size, CompressFileHandle *CFH)
 	int			ret;
 
 	ret = LZ4File_read_internal(fs, ptr, size, false);
-	if (ret != size && !LZ4File_eof(CFH))
+	if (ret < 0 || (ret != size && !LZ4File_eof(CFH)))
 		pg_fatal("could not read from input file: %s", LZ4File_get_error(CFH));
 
 	return ret;
@@ -486,14 +486,14 @@ static char *
 LZ4File_gets(char *ptr, int size, CompressFileHandle *CFH)
 {
 	LZ4File    *fs = (LZ4File *) CFH->private_data;
-	size_t		dsize;
+	int			ret;
 
-	dsize = LZ4File_read_internal(fs, ptr, size, true);
-	if (dsize < 0)
+	ret = LZ4File_read_internal(fs, ptr, size, true);
+	if (ret < 0)
 		pg_fatal("could not read from input file: %s", LZ4File_get_error(CFH));
 
 	/* Done reading */
-	if (dsize == 0)
+	if (ret == 0)
 		return NULL;
 
 	return ptr;
-- 
2.39.2

Reply via email to