Re: Inconvenience of pg_read_binary_file()
At Sat, 30 Jul 2022 10:24:39 -0400, Tom Lane wrote in > Alvaro Herrera writes: > > On 2022-Jul-30, Michael Paquier wrote: > >> PG_VERSION would be simpler. Looking at postmaster.pid would require > >> a lookup at external_pid_file, and as it is not set by default you > >> would need to add some extra logic in the tests where > >> external_pid_file = NULL <=> PGDATA/postmaster.pid. > > > Hmm, no? as I recall external_pid_file is an *additional* PID file; it > > doesn't supplant postmaster.pid. > > Right. postmaster.pid absolutely should be there if the postmaster > is up (and if it ain't, you're going to have lots of other difficulty > in running the regression tests...). It doesn't feel quite as static > as PG_VERSION, though. Thanks for committing it. Also the revised test (being suggested by Michael) looks good. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Inconvenience of pg_read_binary_file()
Alvaro Herrera writes: > On 2022-Jul-30, Michael Paquier wrote: >> PG_VERSION would be simpler. Looking at postmaster.pid would require >> a lookup at external_pid_file, and as it is not set by default you >> would need to add some extra logic in the tests where >> external_pid_file = NULL <=> PGDATA/postmaster.pid. > Hmm, no? as I recall external_pid_file is an *additional* PID file; it > doesn't supplant postmaster.pid. Right. postmaster.pid absolutely should be there if the postmaster is up (and if it ain't, you're going to have lots of other difficulty in running the regression tests...). It doesn't feel quite as static as PG_VERSION, though. regards, tom lane
Re: Inconvenience of pg_read_binary_file()
On 2022-Jul-30, Michael Paquier wrote: > PG_VERSION would be simpler. Looking at postmaster.pid would require > a lookup at external_pid_file, and as it is not set by default you > would need to add some extra logic in the tests where > external_pid_file = NULL <=> PGDATA/postmaster.pid. Hmm, no? as I recall external_pid_file is an *additional* PID file; it doesn't supplant postmaster.pid. postmaster.opts is also an option. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "E pur si muove" (Galileo Galilei)
Re: Inconvenience of pg_read_binary_file()
On Fri, Jul 29, 2022 at 11:35:36PM -0400, Tom Lane wrote: > Hm. I considered reading PG_VERSION instead, or postmaster.pid. > PG_VERSION would be a very boring test case, but it should certainly > be present in $PGDATA. PG_VERSION would be simpler. Looking at postmaster.pid would require a lookup at external_pid_file, and as it is not set by default you would need to add some extra logic in the tests where external_pid_file = NULL <=> PGDATA/postmaster.pid. -- Michael signature.asc Description: PGP signature
Re: Inconvenience of pg_read_binary_file()
Michael Paquier writes: > On Fri, Jul 29, 2022 at 03:44:25PM -0400, Tom Lane wrote: >> Pushed after some fooling with the docs and test cases. (Notably, >> I do not think we can assume that pg_hba.conf exists in $PGDATA; some >> installations keep it elsewhere. I used postgresql.auto.conf instead.) > Are you sure that this last part is a good idea? We don't force the > creation of postgresql.auto.conf when starting a server, so this > impacts the portability of the tests with installcheck if one decides > to remove it from the data folder, and it sounds plausible to me that > some distributions do exactly that.. Hm. I considered reading PG_VERSION instead, or postmaster.pid. PG_VERSION would be a very boring test case, but it should certainly be present in $PGDATA. regards, tom lane
Re: Inconvenience of pg_read_binary_file()
On Fri, Jul 29, 2022 at 03:44:25PM -0400, Tom Lane wrote: > Pushed after some fooling with the docs and test cases. (Notably, > I do not think we can assume that pg_hba.conf exists in $PGDATA; some > installations keep it elsewhere. I used postgresql.auto.conf instead.) Are you sure that this last part is a good idea? We don't force the creation of postgresql.auto.conf when starting a server, so this impacts the portability of the tests with installcheck if one decides to remove it from the data folder, and it sounds plausible to me that some distributions do exactly that.. I guess that you could rely on config_file or hba_file instead. -- Michael signature.asc Description: PGP signature
Re: Inconvenience of pg_read_binary_file()
Kyotaro Horiguchi writes: > Please find the attached. I added some regression tests for both > pg_read_file() and pg_read_binary_file(). Yeah, I definitely find this way cleaner even if it's a bit more verbose. I think that the PG_RETURN_NULL code paths are not reachable in the wrappers that don't have missing_ok. I concur with your decision to write them all the same, though. Pushed after some fooling with the docs and test cases. (Notably, I do not think we can assume that pg_hba.conf exists in $PGDATA; some installations keep it elsewhere. I used postgresql.auto.conf instead.) regards, tom lane
Re: Inconvenience of pg_read_binary_file()
Thanks for taking a look! At Thu, 28 Jul 2022 16:22:17 -0400, Tom Lane wrote in > Kyotaro Horiguchi writes: > > - Simplified the implementation (by complexifying argument handling..). > > - REVOKEd EXECUTE from the new functions. > > - Edited the signature of the two functions. > > >> - pg_read_file ( filename text [, offset bigint, length bigint [, > >> missing_ok boolean ]] ) → text > >> + pg_read_file ( filename text [, offset bigint, length bigint ] [, > >> missing_ok boolean ] ) → text > > I'm okay with allowing this variant of the functions. Since there's > no implicit cast between bigint and bool, plus the fact that you > can't give just offset without length, there shouldn't be much risk > of confusion as to which variant to invoke. Grad to hear that. > I don't really like the implementation style though. That mess of > PG_NARGS tests is illegible code already and this makes it worse. Ah..., I have to admit that I faintly felt that feeling while on it... > I think it'd be way cleaner to have all the PG_GETARG calls in the > bottom SQL-callable functions (which are already one-per-signature) > and then pass them on to a common function that has an ordinary C > call signature, along the lines of > > static Datum > pg_read_file_common(text *filename_t, > int64 seek_offset, int64 bytes_to_read, > bool read_to_eof, bool missing_ok) > { > if (read_to_eof) > bytes_to_read = -1;// or just Assert that it's -1 ? I prefer assertion since that parameter cannot be passed by users. > else if (bytes_to_read < 0) > ereport(...); > ... > } This function cannot return NULL directly. Without the ability to return NULL, it is pointless for the function to return Datum. In the attached the function returns text*. > Datum > pg_read_file_off_len(PG_FUNCTION_ARGS) > { > text *filename_t = PG_GETARG_TEXT_PP(0); > int64 seek_offset = PG_GETARG_INT64(1); > int64 bytes_to_read = PG_GETARG_INT64(2); > > return pg_read_file_common(filename_t, seek_offset, bytes_to_read, >false, false); As the result this function need to return NULL or TEXT_P according to the returned value from pg_read_file_common. + if (!ret) + PG_RETURN_NULL(); + + PG_RETURN_TEXT_P(ret); > } # I'm tempted to call read_text_file() directly from each SQL functions.. Please find the attached. I added some regression tests for both pg_read_file() and pg_read_binary_file(). regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 0edfc4f411b9b1fa5731c5f28d6225256f7077e1 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 30 Jun 2022 10:30:35 +0900 Subject: [PATCH v4] Add an argument variation to pg_read_file Let the functions pg_read_file and pg_read_binary_file have the argument variation of f(filename, missing_ok) so that the functions can read the whole file tolerating the file to be missing. --- doc/src/sgml/func.sgml | 4 +- src/backend/catalog/system_functions.sql | 4 + src/backend/utils/adt/genfile.c | 205 +-- src/include/catalog/pg_proc.dat | 11 +- src/test/regress/expected/misc_functions.out | 77 +++ src/test/regress/sql/misc_functions.sql | 31 +++ 6 files changed, 260 insertions(+), 72 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 21f8ab73e2..49ebf6dea9 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -28636,7 +28636,7 @@ SELECT pg_size_pretty(sum(pg_relation_size(relid))) AS total_size pg_read_file -pg_read_file ( filename text , offset bigint, length bigint , missing_ok boolean ) +pg_read_file ( filename text , offset bigint, length bigint , missing_ok boolean ) text @@ -28661,7 +28661,7 @@ SELECT pg_size_pretty(sum(pg_relation_size(relid))) AS total_size pg_read_binary_file -pg_read_binary_file ( filename text , offset bigint, length bigint , missing_ok boolean ) +pg_read_binary_file ( filename text , offset bigint, length bigint , missing_ok boolean ) bytea diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql index 73da687d5d..30a048f6b0 100644 --- a/src/backend/catalog/system_functions.sql +++ b/src/backend/catalog/system_functions.sql @@ -659,12 +659,16 @@ REVOKE EXECUTE ON FUNCTION pg_ls_tmpdir(oid) FROM public; REVOKE EXECUTE ON FUNCTION pg_read_file(text) FROM public; +REVOKE EXECUTE ON FUNCTION pg_read_file(text,boolean) FROM public; + REVOKE EXECUTE ON FUNCTION pg_read_file(text,bigint,bigint) FROM public; REVOKE EXECUTE ON FUNCTION pg_read_file(text,bigint,bigint,boolean) FROM public; REVOKE EXECUTE ON FUNCTION pg_read_binary_file(text) FROM public; +REVOKE EXECUTE ON FUNCTION
Re: Inconvenience of pg_read_binary_file()
Kyotaro Horiguchi writes: > - Simplified the implementation (by complexifying argument handling..). > - REVOKEd EXECUTE from the new functions. > - Edited the signature of the two functions. >> - pg_read_file ( filename text [, offset bigint, length bigint [, missing_ok >> boolean ]] ) $B"*(B text >> + pg_read_file ( filename text [, offset bigint, length bigint ] [, >> missing_ok boolean ] ) $B"*(B text I'm okay with allowing this variant of the functions. Since there's no implicit cast between bigint and bool, plus the fact that you can't give just offset without length, there shouldn't be much risk of confusion as to which variant to invoke. I don't really like the implementation style though. That mess of PG_NARGS tests is illegible code already and this makes it worse. I think it'd be way cleaner to have all the PG_GETARG calls in the bottom SQL-callable functions (which are already one-per-signature) and then pass them on to a common function that has an ordinary C call signature, along the lines of static Datum pg_read_file_common(text *filename_t, int64 seek_offset, int64 bytes_to_read, bool read_to_eof, bool missing_ok) { if (read_to_eof) bytes_to_read = -1;// or just Assert that it's -1 ? else if (bytes_to_read < 0) ereport(...); ... } Datum pg_read_file_off_len(PG_FUNCTION_ARGS) { text *filename_t = PG_GETARG_TEXT_PP(0); int64 seek_offset = PG_GETARG_INT64(1); int64 bytes_to_read = PG_GETARG_INT64(2); return pg_read_file_common(filename_t, seek_offset, bytes_to_read, false, false); } regards, tom lane
Re: Inconvenience of pg_read_binary_file()
At Tue, 07 Jun 2022 17:29:31 +0900 (JST), Kyotaro Horiguchi wrote in > pg_read_file(text, bool) makes sense to me, but it doesn't seem like > to be able to share C function with other variations. > pg_read_binary_file() need to accept some out-of-range value for > offset or length to signal that offset and length are not specified. In this version all the polypmorphic variations share the same body function. I tempted to add tail-reading feature but it would be another feature. > (function comments needs to be edited and docs are needed) - Simplified the implementation (by complexifying argument handling..). - REVOKEd EXECUTE from the new functions. - Edited the signature of the two functions. > - pg_read_file ( filename text [, offset bigint, length bigint [, missing_ok > boolean ]] ) → text > + pg_read_file ( filename text [, offset bigint, length bigint ] [, > missing_ok boolean ] ) → text And registered this to the next CF. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 5d344fb56cded75e22cb4e56bcf1c10a31f5d4bb Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 30 Jun 2022 10:30:35 +0900 Subject: [PATCH v3] Add an argument variation to pg_read_file Let the functions pg_read_file and pg_read_binary_file have the argument variation of f(filename, missing_ok) so that the functions can read the whole file tolerating the file to be missing. --- doc/src/sgml/func.sgml | 4 +-- src/backend/catalog/system_functions.sql | 4 +++ src/backend/utils/adt/genfile.c | 33 +++- src/include/catalog/pg_proc.dat | 6 + 4 files changed, 39 insertions(+), 8 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 7b652460a1..34d76b74e4 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -28593,7 +28593,7 @@ SELECT pg_size_pretty(sum(pg_relation_size(relid))) AS total_size pg_read_file -pg_read_file ( filename text , offset bigint, length bigint , missing_ok boolean ) +pg_read_file ( filename text , offset bigint, length bigint , missing_ok boolean ) text @@ -28618,7 +28618,7 @@ SELECT pg_size_pretty(sum(pg_relation_size(relid))) AS total_size pg_read_binary_file -pg_read_binary_file ( filename text , offset bigint, length bigint , missing_ok boolean ) +pg_read_binary_file ( filename text , offset bigint, length bigint , missing_ok boolean ) bytea diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql index 73da687d5d..30a048f6b0 100644 --- a/src/backend/catalog/system_functions.sql +++ b/src/backend/catalog/system_functions.sql @@ -659,12 +659,16 @@ REVOKE EXECUTE ON FUNCTION pg_ls_tmpdir(oid) FROM public; REVOKE EXECUTE ON FUNCTION pg_read_file(text) FROM public; +REVOKE EXECUTE ON FUNCTION pg_read_file(text,boolean) FROM public; + REVOKE EXECUTE ON FUNCTION pg_read_file(text,bigint,bigint) FROM public; REVOKE EXECUTE ON FUNCTION pg_read_file(text,bigint,bigint,boolean) FROM public; REVOKE EXECUTE ON FUNCTION pg_read_binary_file(text) FROM public; +REVOKE EXECUTE ON FUNCTION pg_read_binary_file(text,boolean) FROM public; + REVOKE EXECUTE ON FUNCTION pg_read_binary_file(text,bigint,bigint) FROM public; REVOKE EXECUTE ON FUNCTION pg_read_binary_file(text,bigint,bigint,boolean) FROM public; diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c index 2bf5219256..1a09431a79 100644 --- a/src/backend/utils/adt/genfile.c +++ b/src/backend/utils/adt/genfile.c @@ -278,6 +278,9 @@ pg_read_file(PG_FUNCTION_ARGS) * * No superuser check done here- instead privileges are handled by the * GRANT system. + * + * The two-argument variation of this function supposes the second argument as + * to be a Boolean. */ Datum pg_read_file_v2(PG_FUNCTION_ARGS) @@ -290,7 +293,9 @@ pg_read_file_v2(PG_FUNCTION_ARGS) text *result; /* handle optional arguments */ - if (PG_NARGS() >= 3) + if (PG_NARGS() == 2) + missing_ok = PG_GETARG_BOOL(1); + else if (PG_NARGS() >= 3) { seek_offset = PG_GETARG_INT64(1); bytes_to_read = PG_GETARG_INT64(2); @@ -299,9 +304,10 @@ pg_read_file_v2(PG_FUNCTION_ARGS) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("requested length cannot be negative"))); + + if (PG_NARGS() >= 4) + missing_ok = PG_GETARG_BOOL(3); } - if (PG_NARGS() >= 4) - missing_ok = PG_GETARG_BOOL(3); filename = convert_and_check_filename(filename_t); @@ -326,6 +332,8 @@ pg_read_binary_file(PG_FUNCTION_ARGS) bytea *result; /* handle optional arguments */ + if (PG_NARGS() == 2) + missing_ok = PG_GETARG_BOOL(1); if (PG_NARGS() >= 3) { seek_offset = PG_GETARG_INT64(1); @@ -335,9 +343,10 @@ pg_read_binary_file(PG_FUNCTION_ARGS) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
Re: Inconvenience of pg_read_binary_file()
At Tue, 7 Jun 2022 16:33:53 +0900, Michael Paquier wrote in > On Tue, Jun 07, 2022 at 04:05:20PM +0900, Kyotaro Horiguchi wrote: > > If I want to read a file that I'm not sure of the existence but I want > > to read the whole file if exists, I would call > > pg_read_binary_file('path', 0, -1, true) but unfortunately this > > doesn't work. > > Yeah, the "normal" cases that I have seen in the past just used an > extra call to pg_stat_file() to retrieve the size of the file before > reading it, but arguably it does not help if the file gets extended > between the stat() call and the read() call (we don't need to care > about this case with pg_rewind that has been the reason why the > missing_ok argument was introduced first, for one, as file extensions > don't matter as we'd replay from the LSN point where the rewind > began, adding the new blocks at replay). Sure. > There is also an argument for supporting negative values rather than > just -1. For example -2 could mean to read all the file except the > last byte. Or you could have an extra flavor, as of > pg_read_file(text, bool) to read the whole by default. Supporting > just -1 as special value for the amount of data to read would be > confusing IMO. So I would tend to choose for a set of arguments based > on (text, bool). I'm not sure about the negative length smaller than -1, since I don't find an apprpriate offset that represents (last byte + 1). pg_read_file(text, bool) makes sense to me, but it doesn't seem like to be able to share C function with other variations. pg_read_binary_file() need to accept some out-of-range value for offset or length to signal that offset and length are not specified. In the attached pg_read(_binary)_file_all() is modifiedf so that they have the different body from pg_read(_binary)_file(). (function comments needs to be edited and docs are needed) regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c index 2bf5219256..62e16b014e 100644 --- a/src/backend/utils/adt/genfile.c +++ b/src/backend/utils/adt/genfile.c @@ -349,7 +349,6 @@ pg_read_binary_file(PG_FUNCTION_ARGS) PG_RETURN_NULL(); } - /* * Wrapper functions for the 1 and 3 argument variants of pg_read_file_v2() * and pg_read_binary_file(). @@ -367,7 +366,21 @@ pg_read_file_off_len(PG_FUNCTION_ARGS) Datum pg_read_file_all(PG_FUNCTION_ARGS) { - return pg_read_file_v2(fcinfo); + text *filename_t = PG_GETARG_TEXT_PP(0); + bool missing_ok = false; + char *filename; + text *result; + + if (PG_NARGS() >= 2) + missing_ok = PG_GETARG_BOOL(1); + + filename = convert_and_check_filename(filename_t); + + result = read_text_file(filename, 0, -1, missing_ok); + if (result) + PG_RETURN_TEXT_P(result); + else + PG_RETURN_NULL(); } Datum @@ -379,7 +392,21 @@ pg_read_binary_file_off_len(PG_FUNCTION_ARGS) Datum pg_read_binary_file_all(PG_FUNCTION_ARGS) { - return pg_read_binary_file(fcinfo); + text *filename_t = PG_GETARG_TEXT_PP(0); + bool missing_ok = false; + char *filename; + bytea *result; + + if (PG_NARGS() >= 2) + missing_ok = PG_GETARG_BOOL(1); + + filename = convert_and_check_filename(filename_t); + + result = read_binary_file(filename, 0, -1, missing_ok); + if (result) + PG_RETURN_BYTEA_P(result); + else + PG_RETURN_NULL(); } /* diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 87aa571a33..e180286749 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -6423,6 +6423,9 @@ { oid => '3826', descr => 'read text from a file', proname => 'pg_read_file', provolatile => 'v', prorettype => 'text', proargtypes => 'text', prosrc => 'pg_read_file_all' }, +{ oid => '8530', descr => 'read text from a file', + proname => 'pg_read_file', provolatile => 'v', prorettype => 'text', + proargtypes => 'text bool', prosrc => 'pg_read_file_all' }, { oid => '3827', descr => 'read bytea from a file', proname => 'pg_read_binary_file', provolatile => 'v', prorettype => 'bytea', proargtypes => 'text int8 int8', prosrc => 'pg_read_binary_file_off_len' }, @@ -6432,6 +6435,9 @@ { oid => '3828', descr => 'read bytea from a file', proname => 'pg_read_binary_file', provolatile => 'v', prorettype => 'bytea', proargtypes => 'text', prosrc => 'pg_read_binary_file_all' }, +{ oid => '8529', descr => 'read bytea from a file', + proname => 'pg_read_binary_file', provolatile => 'v', prorettype => 'bytea', + proargtypes => 'text bool', prosrc => 'pg_read_binary_file_all' }, { oid => '2625', descr => 'list all files in a directory', proname => 'pg_ls_dir', prorows => '1000', proretset => 't', provolatile => 'v', prorettype => 'text', proargtypes => 'text',
Re: Inconvenience of pg_read_binary_file()
On Tue, Jun 07, 2022 at 04:05:20PM +0900, Kyotaro Horiguchi wrote: > If I want to read a file that I'm not sure of the existence but I want > to read the whole file if exists, I would call > pg_read_binary_file('path', 0, -1, true) but unfortunately this > doesn't work. Yeah, the "normal" cases that I have seen in the past just used an extra call to pg_stat_file() to retrieve the size of the file before reading it, but arguably it does not help if the file gets extended between the stat() call and the read() call (we don't need to care about this case with pg_rewind that has been the reason why the missing_ok argument was introduced first, for one, as file extensions don't matter as we'd replay from the LSN point where the rewind began, adding the new blocks at replay). There is also an argument for supporting negative values rather than just -1. For example -2 could mean to read all the file except the last byte. Or you could have an extra flavor, as of pg_read_file(text, bool) to read the whole by default. Supporting just -1 as special value for the amount of data to read would be confusing IMO. So I would tend to choose for a set of arguments based on (text, bool). -- Michael signature.asc Description: PGP signature
Inconvenience of pg_read_binary_file()
If I want to read a file that I'm not sure of the existence but I want to read the whole file if exists, I would call pg_read_binary_file('path', 0, -1, true) but unfortunately this doesn't work. Does it make sense to change the function so as to accept the parameter specification above? Or the arguments could be ('path', null, null, true) but (0,-1) is simpler considering the characteristics of the function. (We could also rearrange the the parameter order as "filename, missing_ok, offset, length" but that is simply confusing..) If it is, pg_read_file() is worth receive the same modification and I'll post the version containing doc part. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c index 2bf5219256..219203be1c 100644 --- a/src/backend/utils/adt/genfile.c +++ b/src/backend/utils/adt/genfile.c @@ -331,10 +331,10 @@ pg_read_binary_file(PG_FUNCTION_ARGS) seek_offset = PG_GETARG_INT64(1); bytes_to_read = PG_GETARG_INT64(2); - if (bytes_to_read < 0) + if (bytes_to_read < -1) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("requested length cannot be negative"))); + errmsg("invalid requested length"))); } if (PG_NARGS() >= 4) missing_ok = PG_GETARG_BOOL(3);