Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-26 Thread Michael Paquier
On Mon, Jul 26, 2021 at 05:46:22PM +0900, Kyotaro Horiguchi wrote:
> Thanks for revising and committing! I'm fine with all of the recent
> discussions on the committed part. Though I don't think it works for
> "live" command line options, making the omitting policy symmetric
> looks good. I see the same done in several similar use of strto[il].

OK, applied this one.  So for now we are done here.

> The change in 020_pg_receivewal.pl results in a chain of four
> successive failures, which is fine. But the last failure (#23) happens
> for a bit dubious reason.

Yes, I saw that as well.  I'll address that separately.
--
Michael


signature.asc
Description: PGP signature


Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-26 Thread Kyotaro Horiguchi
At Mon, 26 Jul 2021 15:01:35 +0900, Michael Paquier  wrote 
in 
> On Sat, Jul 24, 2021 at 07:41:12PM +0900, Michael Paquier wrote:
> > I have looked at that over the last couple of days, and applied it
> > after some small fixes, including an indentation.
> 
> One thing that we forgot here is the handling of trailing
> whitespaces.  Attached is small patch to adjust that, with one
> positive and one negative tests.
> 
> > The int64 and float
> > parts are extra types we could handle, but I have not looked yet at
> > how much benefits we'd get in those cases.
> 
> I have looked at these two but there is really less benefits, so for
> now I am not planning to do more in this area.  For float options,
> pg_basebackup --max-rate could be one target on top of the three set
> of options in pgbench, but it needs to handle units :(

Thanks for revising and committing! I'm fine with all of the recent
discussions on the committed part. Though I don't think it works for
"live" command line options, making the omitting policy symmetric
looks good. I see the same done in several similar use of strto[il].

The change in 020_pg_receivewal.pl results in a chain of four
successive failures, which is fine. But the last failure (#23) happens
for a bit dubious reason.

> Use of uninitialized value in pattern match (m//) at t/020_pg_receivewal.pl 
> line 114.
> not ok 23 - one partial WAL segment is now completed

It might not be worth amending, but we don't need to use m/// there
and eq works fine.

020_pg_receivewal.pl: 114
-   is($zlib_wals[0] =~ m/$partial_wals[0]/,
+   is($zlib_wals[0] eq $partial_wals[0],

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-26 Thread Michael Paquier
On Sat, Jul 24, 2021 at 07:41:12PM +0900, Michael Paquier wrote:
> I have looked at that over the last couple of days, and applied it
> after some small fixes, including an indentation.

One thing that we forgot here is the handling of trailing
whitespaces.  Attached is small patch to adjust that, with one
positive and one negative tests.

> The int64 and float
> parts are extra types we could handle, but I have not looked yet at
> how much benefits we'd get in those cases.

I have looked at these two but there is really less benefits, so for
now I am not planning to do more in this area.  For float options,
pg_basebackup --max-rate could be one target on top of the three set
of options in pgbench, but it needs to handle units :(
--
Michael
From 2c38b36841965fc458dab69d846bcc0dde07aca2 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 26 Jul 2021 14:41:15 +0900
Subject: [PATCH] Skip trailing whitespaces when parsing integer options

strtoint(), via strtol(), would skip any leading whitespace but the same
rule was not applied for trailing whitespaces, which was a bit
inconsistent.  Some tests are added while on it.
---
 src/fe_utils/option_utils.c  | 9 -
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 4 +++-
 src/bin/pg_dump/t/001_basic.pl   | 3 ++-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/src/fe_utils/option_utils.c b/src/fe_utils/option_utils.c
index 3e7e512ad9..bcfe7365fd 100644
--- a/src/fe_utils/option_utils.c
+++ b/src/fe_utils/option_utils.c
@@ -57,7 +57,14 @@ option_parse_int(const char *optarg, const char *optname,
 	errno = 0;
 	val = strtoint(optarg, , 10);
 
-	if (*endptr)
+	/*
+	 * Skip any trailing whitespace; if anything but whitespace remains before
+	 * the terminating character, fail.
+	 */
+	while (*endptr != '\0' && isspace((unsigned char) *endptr))
+		endptr++;
+
+	if (*endptr != '\0')
 	{
 		pg_log_error("invalid value \"%s\" for option %s",
 	 optarg, optname);
diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 158f7d176f..47c4ecb073 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -88,10 +88,12 @@ SKIP:
 	$primary->psql('postgres',
 		'INSERT INTO test_table VALUES (generate_series(100,200));');
 
+	# Note the trailing whitespace after the value of --compress, that is
+	# a valid value.
 	$primary->command_ok(
 		[
 			'pg_receivewal', '-D', $stream_dir,  '--verbose',
-			'--endpos',  $nextlsn, '--compress', '1'
+			'--endpos',  $nextlsn, '--compress', '1 '
 		],
 		"streaming some WAL using ZLIB compression");
 
diff --git a/src/bin/pg_dump/t/001_basic.pl b/src/bin/pg_dump/t/001_basic.pl
index 59de6df7ba..d1a7e1db40 100644
--- a/src/bin/pg_dump/t/001_basic.pl
+++ b/src/bin/pg_dump/t/001_basic.pl
@@ -101,8 +101,9 @@ command_fails_like(
 	qr/\Qpg_dump: error: parallel backup only supported by the directory format\E/,
 	'pg_dump: parallel backup only supported by the directory format');
 
+# Note the trailing whitespace for the value of --jobs, that is valid.
 command_fails_like(
-	[ 'pg_dump', '-j', '-1' ],
+	[ 'pg_dump', '-j', '-1 ' ],
 	qr/\Qpg_dump: error: -j\/--jobs must be in range\E/,
 	'pg_dump: -j/--jobs must be in range');
 
-- 
2.32.0



signature.asc
Description: PGP signature


Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-24 Thread Michael Paquier
On Thu, Jul 22, 2021 at 02:32:35PM +0900, Michael Paquier wrote:
> Okay, done those parts as per the attached.  While on it, I noticed an
> extra one for pg_dump --rows-per-insert.  I am counting 25 translated
> strings cut in total.
> 
> Any objections to this first step?

I have looked at that over the last couple of days, and applied it
after some small fixes, including an indentation.  The int64 and float
parts are extra types we could handle, but I have not looked yet at
how much benefits we'd get in those cases.
--
Michael


signature.asc
Description: PGP signature


Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-22 Thread Michael Paquier
On Thu, Jul 22, 2021 at 09:42:00AM -0400, Alvaro Herrera wrote:
> May I suggest for the second sentence something like "If the parsing is
> successful, returns true and stores the result in *result if that's
> given; if parsing fails, returns false"

Sounds fine to me.  Thanks.
--
Michael


signature.asc
Description: PGP signature


Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-22 Thread Alvaro Herrera
On 2021-Jul-21, Michael Paquier wrote:

> +/*
> + * option_parse_int
> + *
> + * Parse an integer for a given option.  Returns true if the parsing
> + * could be done with optionally *result holding the parsed value, and
> + * false on failure.
> + */

May I suggest for the second sentence something like "If the parsing is
successful, returns true and stores the result in *result if that's
given; if parsing fails, returns false"

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)




Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-21 Thread Michael Paquier
On Thu, Jul 22, 2021 at 01:19:41AM +1200, David Rowley wrote:
> On Thu, 22 Jul 2021 at 00:44, Michael Paquier  wrote:
>>
>> On Thu, Jul 22, 2021 at 12:32:39AM +1200, David Rowley wrote:
>> > I see both of these are limited to 64 on windows. Won't those fail on 
>> > Windows?
>>
>> Yes, thanks, they would.  I would just cut the range numbers from the
>> expected output here.  This does not matter in terms of coverage
>> either.
> 
> Sounds good.
> 
>> x> I also wondered if it would be worth doing #define MAX_JOBS  somewhere
>> > away from the option parsing code.  This part is pretty ugly:
>>
>> Agreed as well.  pg_dump and pg_restore have their own idea of
>> parallelism in parallel.{c.h}.  What about putting MAX_JOBS in
>> parallel.h then?
> 
> parallel.h looks ok to me.

Okay, done those parts as per the attached.  While on it, I noticed an
extra one for pg_dump --rows-per-insert.  I am counting 25 translated
strings cut in total.

Any objections to this first step?
--
Michael
From 36be37c0dd0810b65d75e9807b61c244d4f56333 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 22 Jul 2021 14:31:31 +0900
Subject: [PATCH v5] Introduce and use routine for parsing of int32 options

---
 src/include/fe_utils/option_utils.h|  3 ++
 src/fe_utils/option_utils.c| 39 ++
 src/bin/pg_amcheck/pg_amcheck.c|  8 ++-
 src/bin/pg_basebackup/pg_basebackup.c  | 17 +++---
 src/bin/pg_basebackup/pg_receivewal.c  | 23 +++--
 src/bin/pg_basebackup/pg_recvlogical.c | 25 -
 src/bin/pg_checksums/Makefile  |  3 ++
 src/bin/pg_checksums/pg_checksums.c|  9 ++--
 src/bin/pg_dump/parallel.h | 13 +
 src/bin/pg_dump/pg_dump.c  | 46 -
 src/bin/pg_dump/pg_restore.c   | 22 ++--
 src/bin/pg_dump/t/001_basic.pl | 20 
 src/bin/pgbench/pgbench.c  | 60 +++---
 src/bin/pgbench/t/002_pgbench_no_server.pl | 36 -
 src/bin/scripts/reindexdb.c|  9 ++--
 src/bin/scripts/vacuumdb.c | 31 ---
 16 files changed, 174 insertions(+), 190 deletions(-)

diff --git a/src/include/fe_utils/option_utils.h b/src/include/fe_utils/option_utils.h
index d653cb94e3..e999d56ec0 100644
--- a/src/include/fe_utils/option_utils.h
+++ b/src/include/fe_utils/option_utils.h
@@ -19,5 +19,8 @@ typedef void (*help_handler) (const char *progname);
 extern void handle_help_version_opts(int argc, char *argv[],
 	 const char *fixed_progname,
 	 help_handler hlp);
+extern bool option_parse_int(const char *optarg, const char *optname,
+			 int min_range, int max_range,
+			 int *result);
 
 #endif			/* OPTION_UTILS_H */
diff --git a/src/fe_utils/option_utils.c b/src/fe_utils/option_utils.c
index e19a495dba..a09d57db71 100644
--- a/src/fe_utils/option_utils.c
+++ b/src/fe_utils/option_utils.c
@@ -12,6 +12,8 @@
 
 #include "postgres_fe.h"
 
+#include "common/logging.h"
+#include "common/string.h"
 #include "fe_utils/option_utils.h"
 
 /*
@@ -36,3 +38,40 @@ handle_help_version_opts(int argc, char *argv[],
 		}
 	}
 }
+
+/*
+ * option_parse_int
+ *
+ * Parse integer value for an option.  Returns true if the parsing could
+ * be done with optionally *result holding the parsed value, and false on
+ * failure.
+ */
+bool
+option_parse_int(const char *optarg, const char *optname,
+ int min_range, int max_range,
+ int *result)
+{
+	char	   *endptr;
+	int			val;
+
+	errno = 0;
+	val = strtoint(optarg, , 10);
+
+	if (*endptr)
+	{
+		pg_log_error("invalid value \"%s\" for option \"%s\"",
+	 optarg, optname);
+		return false;
+	}
+
+	if (errno == ERANGE || val < min_range || val > max_range)
+	{
+		pg_log_error("\"%s\" must be in range %d..%d",
+	 optname, min_range, max_range);
+		return false;
+	}
+
+	if (result)
+		*result = val;
+	return true;
+}
diff --git a/src/bin/pg_amcheck/pg_amcheck.c b/src/bin/pg_amcheck/pg_amcheck.c
index 4bde16fb4b..ee8aa71bdf 100644
--- a/src/bin/pg_amcheck/pg_amcheck.c
+++ b/src/bin/pg_amcheck/pg_amcheck.c
@@ -12,6 +12,7 @@
  */
 #include "postgres_fe.h"
 
+#include 
 #include 
 
 #include "catalog/pg_am_d.h"
@@ -326,12 +327,9 @@ main(int argc, char *argv[])
 append_btree_pattern(, optarg, encoding);
 break;
 			case 'j':
-opts.jobs = atoi(optarg);
-if (opts.jobs < 1)
-{
-	pg_log_error("number of parallel jobs must be at least 1");
+if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
+	  ))
 	exit(1);
-}
 break;
 			case 'p':
 port = pg_strdup(optarg);
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 8bb0acf498..9ea98481d8 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -31,6 +31,7 @@
 #include "common/file_utils.h"
 #include "common/logging.h"
 #include "common/string.h"
+#include "fe_utils/option_utils.h"
 

Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-21 Thread David Rowley
On Thu, 22 Jul 2021 at 00:44, Michael Paquier  wrote:
>
> On Thu, Jul 22, 2021 at 12:32:39AM +1200, David Rowley wrote:
> > I see both of these are limited to 64 on windows. Won't those fail on 
> > Windows?
>
> Yes, thanks, they would.  I would just cut the range numbers from the
> expected output here.  This does not matter in terms of coverage
> either.

Sounds good.

> x> I also wondered if it would be worth doing #define MAX_JOBS  somewhere
> > away from the option parsing code.  This part is pretty ugly:
>
> Agreed as well.  pg_dump and pg_restore have their own idea of
> parallelism in parallel.{c.h}.  What about putting MAX_JOBS in
> parallel.h then?

parallel.h looks ok to me.

David




Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-21 Thread Michael Paquier
On Thu, Jul 22, 2021 at 12:32:39AM +1200, David Rowley wrote:
> I see both of these are limited to 64 on windows. Won't those fail on Windows?

Yes, thanks, they would.  I would just cut the range numbers from the
expected output here.  This does not matter in terms of coverage
either.

x> I also wondered if it would be worth doing #define MAX_JOBS  somewhere
> away from the option parsing code.  This part is pretty ugly:

Agreed as well.  pg_dump and pg_restore have their own idea of
parallelism in parallel.{c.h}.  What about putting MAX_JOBS in
parallel.h then?
--
Michael


signature.asc
Description: PGP signature


Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-21 Thread David Rowley
On Wed, 21 Jul 2021 at 23:50, Michael Paquier  wrote:
> Hacking on that, I am finishing with the attached.  It is less
> ambitious, still very useful as it removes a dozen of custom error
> messages in favor of the two ones introduced in option_utils.c.  On
> top of that this reduces a bit the code:
>  15 files changed, 156 insertions(+), 169 deletions(-)
>
> What do you think?

This is just a driveby review, but I think that it's good to see no
increase in the number of lines of code to make these improvements.
It's also good to see the added consistency introduced by this patch.

I didn't test the patch, so this is just from reading through.

I wondered about the TAP tests here:

command_fails_like(
[ 'pg_dump', '-j', '-1' ],
qr/\Qpg_dump: error: -j\/--jobs must be in range 0..2147483647\E/,
'pg_dump: invalid number of parallel jobs');

command_fails_like(
[ 'pg_restore', '-j', '-1', '-f -' ],
qr/\Qpg_restore: error: -j\/--jobs must be in range 0..2147483647\E/,
'pg_restore: invalid number of parallel jobs');

I see both of these are limited to 64 on windows. Won't those fail on Windows?

I also wondered if it would be worth doing #define MAX_JOBS  somewhere
away from the option parsing code.  This part is pretty ugly:

/*
* On Windows we can only have at most MAXIMUM_WAIT_OBJECTS
* (= 64 usually) parallel jobs because that's the maximum
* limit for the WaitForMultipleObjects() call.
*/
if (!option_parse_int(optarg, "-j/--jobs", 0,
#ifdef WIN32
  MAXIMUM_WAIT_OBJECTS,
#else
  INT_MAX,
#endif
  ))
exit(1);
break;

David




Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-21 Thread Michael Paquier
On Wed, Jul 21, 2021 at 05:02:29PM +0900, Kyotaro Horiguchi wrote:
> The difference is your suggestion is making the function output the
> message within.  I guess that the reason for the original proposal is
> different style of message is required in several places.

That's one step toward having a maximum number of frontend tools to
use the central logging APIs of src/common/.

> 1. Some "bare" options (that is not preceded by a hyphen option) like
>  PID of pg_ctl kill doesn't fit the format.  \pset parameters of
>  pg_ctl is the same.

Yep.  I was reviewing this one, but I have finished by removing it.
The argument 2 just below also came into my mind.

> 2. pg_ctl, pg_upgrade use their own error reporting mechanisms.

Yeah, for this reason I don't think that it is a good idea to switch
those areas to use the parsing of option_utils.c.  Perhaps we should
consider switching pg_upgrade to have a better logging infra, but
there are also reasons behind what we have now.  pg_ctl is out of
scope as it needs to cover WIN32 event logging.

> 3. parameters that take real numbers doesn't fit the scheme specifying
>  range borders. For example boundary values may or may not be included
>  in the range.

This concerns only pgbench, which I'd be fine to let as-is.

> 4. Most of the errors are PG_LOG_ERROR, but a few ones are
>  PG_LOG_FATAL.

I would take it that pgbench is inconsistent with the rest.  Note that
pg_dump uses fatal(), but that's just a wrapper to pg_log_error().

> loglevel specifies the loglevel to use to emit error messages. If it
> is the newly added item PG_LOG_OMIT, the function never emits an error
> message. Addition to that, the return type is changed to an enum which
> indicates what trouble the given string has. The caller can print
> arbitrary error messages consulting the value. (killproc in pg_ctl.c)

I am not much a fan of that.  If we do so, what's the point in having
a dependency to logging.c anyway in option_utils.c?  This OMIT option
only exists to bypass the specific logging needs where this gets
added.  That does not seem a design adapted to me in the long term,
neither am I a fan of specific error codes for a code path that's just
going to be used to parse command options.

> I added two more similar functions option_parse_long/double. The
> former is a clone of _int. The latter doesn't perform explicit range
> checks for the reason described above.

These have a limited impact, so I would limit things to int32 for
now.

> Maybe we need to make pg_upgrade use the common-logging features
> instead of its own, but it is not included in this patch.

Maybe.  That would be good in the long term, though its case is very
particular.

> The attached patch needs more polish but should be enough to tell what
> I have in my mind.

This breaks some of the TAP tests of pgbench and pg_dump, at short
sight.

The checks for the port value in pg_receivewal and pg_recvlogical is
strange to have.  We don't care about that in any other tools.

The number of checks for --jobs and workers could be made more
consistent across the board, but I have let that out for now.

Hacking on that, I am finishing with the attached.  It is less
ambitious, still very useful as it removes a dozen of custom error
messages in favor of the two ones introduced in option_utils.c.  On
top of that this reduces a bit the code:
 15 files changed, 156 insertions(+), 169 deletions(-) 

What do you think?
--
Michael
From 3aaf98a9e53d90ca2ac8c2734ae77487aa03843f Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 21 Jul 2021 20:42:13 +0900
Subject: [PATCH v4] Introduce and use routine for parsing of int32 options

---
 src/include/fe_utils/option_utils.h|  3 ++
 src/fe_utils/option_utils.c| 39 ++
 src/bin/pg_amcheck/pg_amcheck.c|  8 ++-
 src/bin/pg_basebackup/pg_basebackup.c  | 17 +++---
 src/bin/pg_basebackup/pg_receivewal.c  | 23 +++--
 src/bin/pg_basebackup/pg_recvlogical.c | 25 -
 src/bin/pg_checksums/Makefile  |  3 ++
 src/bin/pg_checksums/pg_checksums.c|  9 ++--
 src/bin/pg_dump/pg_dump.c  | 41 +++
 src/bin/pg_dump/pg_restore.c   | 31 +--
 src/bin/pg_dump/t/001_basic.pl |  8 +--
 src/bin/pgbench/pgbench.c  | 60 +++---
 src/bin/pgbench/t/002_pgbench_no_server.pl | 18 +++
 src/bin/scripts/reindexdb.c|  9 ++--
 src/bin/scripts/vacuumdb.c | 31 ---
 15 files changed, 156 insertions(+), 169 deletions(-)

diff --git a/src/include/fe_utils/option_utils.h b/src/include/fe_utils/option_utils.h
index d653cb94e3..e999d56ec0 100644
--- a/src/include/fe_utils/option_utils.h
+++ b/src/include/fe_utils/option_utils.h
@@ -19,5 +19,8 @@ typedef void (*help_handler) (const char *progname);
 extern void handle_help_version_opts(int argc, char *argv[],
 	 const char *fixed_progname,
 	 

Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-21 Thread Kyotaro Horiguchi
At Thu, 15 Jul 2021 16:19:07 +0900, Michael Paquier  wrote 
in 
> On Wed, Jul 14, 2021 at 11:02:47AM -0400, Alvaro Herrera wrote:
> > No, this doesn't work.  When the first word is something that is
> > not to be translated (a literal parameter name), let's use a %s (but of
> > course the values must be taken out of the phrase too).  But when it is
> > a translatable phrase, it must be present a full phrase, no
> > substitution:
> > 
> > pg_log_error("%s must be in range %s..%s", "extra_float_digits", "-15", 
> > "3");
> > pg_log..("compression level must be in range %s..%s", "0", "9"))
> > 
> > I think the purity test is whether you want to use a _() around the
> > argument, then you have to include it into the original message.
> 
> After thinking about that, it seems to me that we don't have that much
> context to lose once we build those error messages to show the option
> name of the command.  And the patch, as proposed, finishes with the

Agreed.

> same error message patterns all over the place, which would be a
> recipe for more inconsistencies in the future.  I think that we should
> centralize all that, say with a small-ish routine in a new file called
> src/fe_utils/option_parsing.c that uses strtoint() as follows:
> bool option_parse_int(const char *optarg,
> const char *optname,
> int min_range,
> int max_range,
> int *result);
>
> Then this routine may print two types of errors through
> pg_log_error():
> - Incorrect range, using min_range/max_range.
> - Junk input.
> The boolean status is here to let the caller do any custom exit()
> actions he wishes if there one of those two failures.  pg_dump has
> some of that with exit_nicely(), for one.

It is substantially the same suggestion with [1] in the original
thread.  The original proposal in the old thread was

> bool pg_strtoint64_range(arg, , min, max, _message)

The difference is your suggestion is making the function output the
message within.  I guess that the reason for the original proposal is
different style of message is required in several places.

Actually there are several irregulars.

1. Some "bare" options (that is not preceded by a hyphen option) like
 PID of pg_ctl kill doesn't fit the format.  \pset parameters of
 pg_ctl is the same.

2. pg_ctl, pg_upgrade use their own error reporting mechanisms.

3. parameters that take real numbers doesn't fit the scheme specifying
 range borders. For example boundary values may or may not be included
 in the range.

4. Most of the errors are PG_LOG_ERROR, but a few ones are
 PG_LOG_FATAL.

That being said, most of the caller sites are satisfied by
fixed-formed messages.

So I changed the interface as the following to deal with the above issues:

+extern optparse_result option_parse_int(int loglevel,
+   
const char *optarg, const char *optname,
+   
int min_range, int max_range,
+   
int *result);

loglevel specifies the loglevel to use to emit error messages. If it
is the newly added item PG_LOG_OMIT, the function never emits an error
message. Addition to that, the return type is changed to an enum which
indicates what trouble the given string has. The caller can print
arbitrary error messages consulting the value. (killproc in pg_ctl.c)

Other points:

I added two more similar functions option_parse_long/double. The
former is a clone of _int. The latter doesn't perform explicit range
checks for the reason described above.

Maybe we need to make pg_upgraded use the common-logging features
instead of its own, but it is not included in this patch.

pgbench's -L option accepts out-of-range values for internal
variable. As the added comment says, we can limit the value with the
large exact number but I limited it to 3600s since I can't imagine
people needs larger latency limit than that.

Similarly to the above, -R option can take for example 1E-300, which
leads to int64 overflow later. Similar to -L, I don't think people
don't need less than 1E-5 or so as this parameter.


The attached patch needs more polish but should be enough to tell what
I have in my mind.

regards.

1: 
https://www.postgresql.org/message-id/CAKJS1f94kkuB_53LcEf0HF%2BuxMiTCk5FtLx9gSsXcCByUKMz1g%40mail.gmail.com

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From aaf81ac0340e9df3b74e18c1492e5984c0412fb5 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 8 Jul 2021 15:08:01 +0900
Subject: [PATCH v3 1/3] Be strict in numeric parameters on command line

Some numeric command line parameters are tolerant of valid values
followed by garbage like "123xyz".  Be strict to reject such invalid
values. Do the same for psql meta command parameters.
---
 src/bin/pg_amcheck/pg_amcheck.c|   9 +-
 src/bin/pg_basebackup/pg_basebackup.c  |  17 ++-
 

Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-15 Thread Michael Paquier
On Wed, Jul 14, 2021 at 11:02:47AM -0400, Alvaro Herrera wrote:
> On 2021-Jul-14, Kyotaro Horiguchi wrote:
>>> Should we take this occasion to reduce the burden of translators and
>>> reword that as "%s must be in range %d..%d"?  That could be a separate
>>> patch.
> 
> Yes, please, let's do it here.

Okay.

> No, this doesn't work.  When the first word is something that is
> not to be translated (a literal parameter name), let's use a %s (but of
> course the values must be taken out of the phrase too).  But when it is
> a translatable phrase, it must be present a full phrase, no
> substitution:
> 
>   pg_log_error("%s must be in range %s..%s", "extra_float_digits", "-15", 
> "3");
>   pg_log..("compression level must be in range %s..%s", "0", "9"))
> 
> I think the purity test is whether you want to use a _() around the
> argument, then you have to include it into the original message.

After thinking about that, it seems to me that we don't have that much
context to lose once we build those error messages to show the option
name of the command.  And the patch, as proposed, finishes with the
same error message patterns all over the place, which would be a
recipe for more inconsistencies in the future.  I think that we should
centralize all that, say with a small-ish routine in a new file called
src/fe_utils/option_parsing.c that uses strtoint() as follows:
bool option_parse_int(const char *optarg,
const char *optname,
int min_range,
int max_range,
int *result);

Then this routine may print two types of errors through
pg_log_error():
- Incorrect range, using min_range/max_range.
- Junk input.
The boolean status is here to let the caller do any custom exit()
actions he wishes if there one of those two failures.  pg_dump has
some of that with exit_nicely(), for one.
--
Michael


signature.asc
Description: PGP signature


Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-14 Thread Alvaro Herrera
On 2021-Jul-14, Kyotaro Horiguchi wrote:

> > >  pg_log_error("extra_float_digits must be in range 
> > > -15..3");
> > >  exit_nicely(1);
> > 
> > Should we take this occasion to reduce the burden of translators and
> > reword that as "%s must be in range %d..%d"?  That could be a separate
> > patch.

Yes, please, let's do it here.

> The first %s is not always an invariable symbol name so it could
> result in concatenating several phrases into one, like this.
> 
>  pg_log..("%s must be in range %s..%s", _("compression level"), "0", "9"))
> 
> It is translatable at least into Japanese but I'm not sure about other
> languages.

No, this doesn't work.  When the first word is something that is
not to be translated (a literal parameter name), let's use a %s (but of
course the values must be taken out of the phrase too).  But when it is
a translatable phrase, it must be present a full phrase, no
substitution:

pg_log_error("%s must be in range %s..%s", "extra_float_digits", "-15", 
"3");
pg_log..("compression level must be in range %s..%s", "0", "9"))

I think the purity test is whether you want to use a _() around the
argument, then you have to include it into the original message.

Thanks

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"After a quick R of TFM, all I can say is HOLY CR** THAT IS COOL! PostgreSQL was
amazing when I first started using it at 7.2, and I'm continually astounded by
learning new features and techniques made available by the continuing work of
the development team."
Berend Tober, http://archives.postgresql.org/pgsql-hackers/2007-08/msg01009.php




Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-13 Thread Kyotaro Horiguchi
Thanks for the discussion.

At Tue, 13 Jul 2021 09:28:30 +0900, Michael Paquier  wrote 
in 
> On Fri, Jul 09, 2021 at 04:50:28PM +0900, Kyotaro Horiguchi wrote:
> > At Fri, 9 Jul 2021 10:29:07 +0900, Michael Paquier  
> > wrote in 
> >> Er, wait.  We've actually allowed negative values for pg_ctl
> >> --timeout or the subcommand kill!?
> >
> > --timeout accepts values less than 1, which values cause the command
> > ends with the timeout error before checking for the ending state. Not
> > destructive but useless as a behavior.  We have two choices here. One
> > is simply inhibiting zero or negative timeouts, and another is
> > allowing zero as timeout and giving it the same meaning to
> > --no-wait. The former is strictly right behavior but the latter is
> > casual and convenient. I took the former way in this version.
> 
> Yeah, that's not useful.

^^; Ok, I'm fine with taking the second way. Changed it.

"-t 0" means "-W" in the attached and that behavior is described in
the doc part. The environment variable PGCTLTIMEOUT accepts the same
range of values. I added a warning that notifies that -t 0 overrides
explicit -w .

> $ pg_ctl -w -t 0 start
> pg_ctl: WARNING: -w is ignored because timeout is set to 0
> server starting


> >> This one in pg_upgrade is incomplete.  Perhaps the missing comment
> >> should tell that negative job values are checked later on?
> > 
> > Zero or negative job numbers mean non-parallel mode and don't lead to
> > an error.  If we don't need to preserve that behavior (I personally
> > don't think it is ether useful and/or breaks someone's existing
> > usage.), it is sensible to do range-check here.
> 
> Hmm.  It would be good to preserve some compatibility here, but I can
> see more benefits in being consistent between all the tools, and make
> people aware that such commands are not generated more carefully.

Ageed.

> >  case 'j':
> > -opts.jobs = atoi(optarg);
> > -if (opts.jobs < 1)
> > +errno = 0;
> > +opts.jobs = strtoint(optarg, , 10);
> > +if (*endptr || errno == ERANGE || opts.jobs < 1)
> >  {
> >  pg_log_error("number of parallel jobs must be at least 
> > 1");
> >  exit(1);
> 
> specifying a value that triggers ERANGE could be confusing for values
> higher than INT_MAX with pg_amcheck --jobs:
> $ pg_amcheck --jobs=40
> pg_amcheck: error: number of parallel jobs must be at least 1
> I think that this should be reworded as "invalid number of parallel
> jobs \"$s\"", or "number of parallel jobs must be in range %d..%d".
> Perhaps we could have a combination of both?  Using the first message
> is useful with junk, non-numeric values or trailing characters.  The
> second is useful to make users aware that the value is numeric, but
> not quite right.

Yeah, I noticed that but ignored as a kind of impossible, or
something-needless-to-say:p

> "invalid number of parallel jobs \"$s\""
> "number of parallel jobs must be in range %d..%d"

The resulting combined message looks like this:

> "number of parallel jobs must be an integer in range 1..2147483647"

I don't think it's not great that the message explicitly suggests a
limit like INT_MAX, which is far above the practical limit.  So, (even
though I avoided to do that but) in the attached, I changed my mind to
split most of the errors into two messages to avoid suggesting such
impractical limits.

$ pg_amcheck -j -1
$ pg_amcheck -j 1x
$ pg_amcheck -j 10x
pg_amcheck: error: number of parallel jobs must be an integer greater than 
zero: ""
$ pg_amcheck -j 10
pg_amcheck: error: number of parallel jobs out of range: "10"

If you feel it's too-much or pointless to split that way, I'll happy
to change it the "%d..%d" form.


Still I used the "%d..%d" notation for some parameters that has a
finite range by design.  They look like the following:
(%d..%d doesn't work well for real numbers.)

"sampling rate must be an real number between 0.0 and 1.0: \"%s\""

I'm not sure what to do for numWorkers of pg_dump/restore.  The limit
for numWorkers are lowered on Windows to quite low value, which would
be worth showing, but otherwise the limit is INT_MAX. The attached
makes pg_dump respond to -j 100 with the following error message which
doesn't suggest the finite limit 64 on Windows.

$ pg_dump -j 100
pg_dump: error: number of parallel jobs out of range: "100"


> > @@ -587,8 +602,10 @@ main(int argc, char **argv)
> >  
> >  case 8:
> >  have_extra_float_digits = true;
> > -extra_float_digits = atoi(optarg);
> > -if (extra_float_digits < -15 || extra_float_digits > 3)
> > +errno = 0;
> > +extra_float_digits = strtoint(optarg, , 10);
> > +if (*endptr || errno == ERANGE ||
> > +extra_float_digits < -15 || extra_float_digits > 

Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-12 Thread Michael Paquier
On Fri, Jul 09, 2021 at 04:50:28PM +0900, Kyotaro Horiguchi wrote:
> At Fri, 9 Jul 2021 10:29:07 +0900, Michael Paquier  
> wrote in 
>> Er, wait.  We've actually allowed negative values for pg_ctl
>> --timeout or the subcommand kill!?
>
> --timeout accepts values less than 1, which values cause the command
> ends with the timeout error before checking for the ending state. Not
> destructive but useless as a behavior.  We have two choices here. One
> is simply inhibiting zero or negative timeouts, and another is
> allowing zero as timeout and giving it the same meaning to
> --no-wait. The former is strictly right behavior but the latter is
> casual and convenient. I took the former way in this version.

Yeah, that's not useful.

>> This one in pg_upgrade is incomplete.  Perhaps the missing comment
>> should tell that negative job values are checked later on?
> 
> Zero or negative job numbers mean non-parallel mode and don't lead to
> an error.  If we don't need to preserve that behavior (I personally
> don't think it is ether useful and/or breaks someone's existing
> usage.), it is sensible to do range-check here.

Hmm.  It would be good to preserve some compatibility here, but I can
see more benefits in being consistent between all the tools, and make
people aware that such commands are not generated more carefully.

>  case 'j':
> -opts.jobs = atoi(optarg);
> -if (opts.jobs < 1)
> +errno = 0;
> +opts.jobs = strtoint(optarg, , 10);
> +if (*endptr || errno == ERANGE || opts.jobs < 1)
>  {
>  pg_log_error("number of parallel jobs must be at least 
> 1");
>  exit(1);

specifying a value that triggers ERANGE could be confusing for values
higher than INT_MAX with pg_amcheck --jobs:
$ pg_amcheck --jobs=40
pg_amcheck: error: number of parallel jobs must be at least 1
I think that this should be reworded as "invalid number of parallel
jobs \"$s\"", or "number of parallel jobs must be in range %d..%d".
Perhaps we could have a combination of both?  Using the first message
is useful with junk, non-numeric values or trailing characters.  The
second is useful to make users aware that the value is numeric, but
not quite right.

> --- a/src/bin/pg_checksums/pg_checksums.c
>  case 'f':
> -if (atoi(optarg) == 0)
> +errno = 0;
> +if (strtoint(optarg, , 10) == 0
> +|| *endptr || errno == ERANGE)
>  {
>  pg_log_error("invalid filenode specification, must be 
> numeric: %s", optarg);
>  exit(1);

The confusion is equal here with pg_checksums -f:
$ ./pg_checksums --f 40
pg_checksums: error: invalid filenode specification, must be numeric: 4
We could say "invalid file specification: \"%s\"".  Another idea to be
crystal-clear about the range requirements is to use that:
"file specification must be in range %d..%d"

> @@ -587,8 +602,10 @@ main(int argc, char **argv)
>  
>  case 8:
>  have_extra_float_digits = true;
> -extra_float_digits = atoi(optarg);
> -if (extra_float_digits < -15 || extra_float_digits > 3)
> +errno = 0;
> +extra_float_digits = strtoint(optarg, , 10);
> +if (*endptr || errno == ERANGE ||
> +extra_float_digits < -15 || extra_float_digits > 3)
>  {
>  pg_log_error("extra_float_digits must be in range 
> -15..3");
>  exit_nicely(1);

Should we take this occasion to reduce the burden of translators and
reword that as "%s must be in range %d..%d"?  That could be a separate
patch.

>  case 'p':
> -if ((old_cluster.port = atoi(optarg)) <= 0)
> -pg_fatal("invalid old port number\n");
> +errno = 0;
> +if ((old_cluster.port = strtoint(optarg, , 10)) <= 0 
> ||
> +*endptr || errno == ERANGE)
> +pg_fatal("invalid old port number %s\n", optarg);
>  break;

You may want to use columns here, or specify the port range:
"invalid old port number: %s" or "old port number must be in range
%d..%d".

>  case 'P':
> -if ((new_cluster.port = atoi(optarg)) <= 0)
> -pg_fatal("invalid new port number\n");
> +errno = 0;
> +if ((new_cluster.port = strtoint(optarg, , 10)) <= 0 
> ||
> +*endptr || errno == ERANGE)
> +pg_fatal("invalid new port number %s\n", optarg);
>  break;

Ditto.

> +if (*endptr || errno == ERANGE || concurrentCons <= 0)
>  {
> -pg_log_error("number of parallel jobs must be at least 
> 1");
> +

Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-09 Thread Kyotaro Horiguchi
Thank you for the comments.

At Fri, 9 Jul 2021 10:29:07 +0900, Michael Paquier  wrote 
in 
> On Thu, Jul 08, 2021 at 05:30:23PM +0900, Kyotaro Horiguchi wrote:
> > [1] is trying to expose pg_strtoint16/32 to frontend, but I don't see
> > much point in doing that in conjunction with [2] or this thread. Since
> > the integral parameter values of pg-commands are in int, which the
> > exising function strtoint() is sufficient to read.  So even [2] itself
> > doesn't need to utilize [1].
> 
> It sounds sensible from here to just use strtoint(), some strtol(),
> son strtod() and call it a day as these are already available.

Thanks.

> > -wait_seconds = atoi(optarg);
> > +errno = 0;
> > +wait_seconds = strtoint(optarg, , 10);
> > +if (*endptr || errno == ERANGE || wait_seconds < 0)
> > +{
> > +pg_log_error("invalid timeout \"%s\"", optarg);
> > +exit(1);
> > +}
> > [ ... ]
> > -killproc = atol(argv[++optind]);
> > +errno = 0;
> > +killproc = strtol(argv[++optind], , 10);
> > +if (*endptr || errno == ERANGE || killproc < 0)
> > +{
> > +pg_log_error("invalid process ID \"%s\"", 
> > argv[optind]);
> > +exit(1);
> > +}
> 
> Er, wait.  We've actually allowed negative values for pg_ctl
> --timeout or the subcommand kill!?

For killproc, leading minus sign suggests that it is an command line
option. Since pg_ctl doesn't have such an option, that values is
recognized as invalid *options*, even with the additional check.  The
additional check is useless in that sense. My intension is just to
make the restriction explicit so I won't feel sad even if it should be
removed.

$ pg_ctl kill HUP -12345
cg_ctl: infalid option -- '1'

--timeout accepts values less than 1, which values cause the command
ends with the timeout error before checking for the ending state. Not
destructive but useless as a behavior.  We have two choices here. One
is simply inhibiting zero or negative timeouts, and another is
allowing zero as timeout and giving it the same meaning to
--no-wait. The former is strictly right behavior but the latter is
casual and convenient. I took the former way in this version.

$ pg_ctl -w -t 0 start
pg_ctl: error: invalid timeout value "0", use --no-wait to finish without 
waiting

The same message is shown for non-decimal values but that would not matters.

> >  case 'j':
> > -user_opts.jobs = atoi(optarg);
> > +errno = 0;
> > +user_opts.jobs = strtoint(optarg, , 10);
> > +/**/
> > +if (*endptr || errno == ERANGE)
> > +pg_fatal("invalid number of jobs %s\n", optarg);
> > +
> >  break;
> 
> This one in pg_upgrade is incomplete.  Perhaps the missing comment
> should tell that negative job values are checked later on?

Zero or negative job numbers mean non-parallel mode and don't lead to
an error.  If we don't need to preserve that behavior (I personally
don't think it is ether useful and/or breaks someone's existing
usage.), it is sensible to do range-check here.

I noticed that I overlooked PGCTLTIMEOUT.

The attached is:

- disallowed less-than-one numbers as jobs in pg_upgrade. 
- disallowed less-than-one timeout in pg_ctl
- Use strtoint for PGCTLTIMEOUT in pg_ctl is 

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 0818de16f9febea3d90ae0404b4f5b3643f6cbe0 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 8 Jul 2021 15:08:01 +0900
Subject: [PATCH v2 1/2] Be strict in numeric parameters on command line

Some numeric command line parameters are tolerant of valid values
followed by garbage like "123xyz".  Be strict to reject such invalid
values. Do the same for psql meta command parameters.
---
 src/bin/pg_amcheck/pg_amcheck.c|  6 ++-
 src/bin/pg_basebackup/pg_basebackup.c  | 13 +++--
 src/bin/pg_basebackup/pg_receivewal.c  | 18 +--
 src/bin/pg_basebackup/pg_recvlogical.c | 17 +--
 src/bin/pg_checksums/pg_checksums.c|  7 ++-
 src/bin/pg_ctl/pg_ctl.c| 18 ++-
 src/bin/pg_dump/pg_dump.c  | 39 ---
 src/bin/pg_dump/pg_restore.c   | 17 ---
 src/bin/pg_upgrade/option.c| 21 ++--
 src/bin/pgbench/pgbench.c  | 66 --
 src/bin/psql/command.c | 52 ++--
 src/bin/scripts/reindexdb.c| 10 ++--
 src/bin/scripts/vacuumdb.c | 23 +
 13 files changed, 219 insertions(+), 88 deletions(-)

diff --git a/src/bin/pg_amcheck/pg_amcheck.c b/src/bin/pg_amcheck/pg_amcheck.c
index 4bde16fb4b..71a82f9b75 100644
--- a/src/bin/pg_amcheck/pg_amcheck.c
+++ b/src/bin/pg_amcheck/pg_amcheck.c

Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-08 Thread Michael Paquier
On Thu, Jul 08, 2021 at 05:30:23PM +0900, Kyotaro Horiguchi wrote:
> Looked through the three threads.

Thanks!

> [1] is trying to expose pg_strtoint16/32 to frontend, but I don't see
> much point in doing that in conjunction with [2] or this thread. Since
> the integral parameter values of pg-commands are in int, which the
> exising function strtoint() is sufficient to read.  So even [2] itself
> doesn't need to utilize [1].

It sounds sensible from here to just use strtoint(), some strtol(),
son strtod() and call it a day as these are already available.

> -wait_seconds = atoi(optarg);
> +errno = 0;
> +wait_seconds = strtoint(optarg, , 10);
> +if (*endptr || errno == ERANGE || wait_seconds < 0)
> +{
> +pg_log_error("invalid timeout \"%s\"", optarg);
> +exit(1);
> +}
> [ ... ]
> -killproc = atol(argv[++optind]);
> +errno = 0;
> +killproc = strtol(argv[++optind], , 10);
> +if (*endptr || errno == ERANGE || killproc < 0)
> +{
> +pg_log_error("invalid process ID \"%s\"", argv[optind]);
> +exit(1);
> +}

Er, wait.  We've actually allowed negative values for pg_ctl
--timeout or the subcommand kill!?

>  case 'j':
> -user_opts.jobs = atoi(optarg);
> +errno = 0;
> +user_opts.jobs = strtoint(optarg, , 10);
> +/**/
> +if (*endptr || errno == ERANGE)
> +pg_fatal("invalid number of jobs %s\n", optarg);
> +
>  break;

This one in pg_upgrade is incomplete.  Perhaps the missing comment
should tell that negative job values are checked later on?
--
Michael


signature.asc
Description: PGP signature


Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-08 Thread Kyotaro Horiguchi
At Wed, 7 Jul 2021 17:40:13 +0530, Bharath Rupireddy 
 wrote in 
> On Fri, Jun 4, 2021 at 10:23 PM Alvaro Herrera  
> wrote:
> >
> > On 2021-Jun-04, Bharath Rupireddy wrote:
> >
> > > On Fri, Jun 4, 2021 at 8:58 PM Alvaro Herrera  
> > > wrote:
> >
> > > > I would suggest that the best way forward in this area is to rebase both
> > > > there patches on current master.
> > >
> > > Thanks. I will read both the threads [1], [2] and try to rebase the
> > > patches. If at all I get to rebase them, do you prefer the patches to
> > > be in this thread or in a new thread?
> >
> > Thanks, that would be helpful.  This thread is a good place.
> 
> I'm unable to spend time on this work as promised. I'd be happy if
> someone could take it forward, although it's not critical work(IMO)
> that needs immediate focus. I will try to spend time maybe later this
> year.

Looked through the three threads.

[1] is trying to expose pg_strtoint16/32 to frontend, but I don't see
much point in doing that in conjunction with [2] or this thread. Since
the integral parameter values of pg-commands are in int, which the
exising function strtoint() is sufficient to read.  So even [2] itself
doesn't need to utilize [1].

So I agree to the Bharath's point.

So the attached is a standalone patch that:

- doesn't contain [1], since that functions are not needed for this
  purpose.

- basically does the same thing with [2], but uses
  strtoint/strtol/strtod instead of pg_strtoint16/32.

- doesn't try to make error messages verbose. That results in a
  somewhat strange message like this but I'm not sure we should be so
  strict at that point.

  > reindexdb: error: number of parallel jobs must be at least 1: hoge

- is extended to cover all usages of atoi/l/f in command line
  processing, which are not fully covered by [2]. (Maybe)

- is extended to cover psql's meta command parameters.

- is extended to cover integral environment variables. (PGPORTOLD/NEW
  of pg_upgrade and COLUMNS of psql). The commands emit a warning for
  invalid values, but I'm not sure it's worthwhile. (The second attached.)

  > psql: warning: ignored invalid setting of environemt variable COLUMNS: 3x

- doesn't cover pgbench's meta command parameters (for speed).


[1] - 
https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1904201223040.29102@lancre
[2] - https://www.postgresql.org/message-id/20191028012000.ga59...@begriffs.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 54ed83143012473727faff9f9e64dee613cedcfe Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 8 Jul 2021 15:08:01 +0900
Subject: [PATCH 1/2] Be strict in numeric parameters on command line

Some numeric command line parameters are tolerant of valid values
followed by garbage like "123xyz".  Be strict to reject such invalid
values. Do the same for psql meta command parameters.
---
 src/bin/pg_amcheck/pg_amcheck.c|  6 ++-
 src/bin/pg_basebackup/pg_basebackup.c  | 13 +++--
 src/bin/pg_basebackup/pg_receivewal.c  | 18 +--
 src/bin/pg_basebackup/pg_recvlogical.c | 17 +--
 src/bin/pg_checksums/pg_checksums.c|  7 ++-
 src/bin/pg_ctl/pg_ctl.c| 18 ++-
 src/bin/pg_dump/pg_dump.c  | 39 ---
 src/bin/pg_dump/pg_restore.c   | 17 ---
 src/bin/pg_upgrade/option.c| 21 ++--
 src/bin/pgbench/pgbench.c  | 66 --
 src/bin/psql/command.c | 52 ++--
 src/bin/scripts/reindexdb.c| 10 ++--
 src/bin/scripts/vacuumdb.c | 23 +
 13 files changed, 219 insertions(+), 88 deletions(-)

diff --git a/src/bin/pg_amcheck/pg_amcheck.c b/src/bin/pg_amcheck/pg_amcheck.c
index 4bde16fb4b..71a82f9b75 100644
--- a/src/bin/pg_amcheck/pg_amcheck.c
+++ b/src/bin/pg_amcheck/pg_amcheck.c
@@ -17,6 +17,7 @@
 #include "catalog/pg_am_d.h"
 #include "catalog/pg_namespace_d.h"
 #include "common/logging.h"
+#include "common/string.h"
 #include "common/username.h"
 #include "fe_utils/cancel.h"
 #include "fe_utils/option_utils.h"
@@ -326,8 +327,9 @@ main(int argc, char *argv[])
 append_btree_pattern(, optarg, encoding);
 break;
 			case 'j':
-opts.jobs = atoi(optarg);
-if (opts.jobs < 1)
+errno = 0;
+opts.jobs = strtoint(optarg, , 10);
+if (*endptr || errno == ERANGE || opts.jobs < 1)
 {
 	pg_log_error("number of parallel jobs must be at least 1");
 	exit(1);
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 8bb0acf498..29be95b96a 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -2287,6 +2287,8 @@ main(int argc, char **argv)
 	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
 			long_options, _index)) != -1)
 	{
+		char   *endptr;
+
 		switch (c)
 		{
 			case 'C':
@@ -2371,8 +2373,10 @@ main(int argc, char **argv)
 #endif
 break;
 			case 'Z':
-

Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-07 Thread Bharath Rupireddy
On Fri, Jun 4, 2021 at 10:23 PM Alvaro Herrera  wrote:
>
> On 2021-Jun-04, Bharath Rupireddy wrote:
>
> > On Fri, Jun 4, 2021 at 8:58 PM Alvaro Herrera  
> > wrote:
>
> > > I would suggest that the best way forward in this area is to rebase both
> > > there patches on current master.
> >
> > Thanks. I will read both the threads [1], [2] and try to rebase the
> > patches. If at all I get to rebase them, do you prefer the patches to
> > be in this thread or in a new thread?
>
> Thanks, that would be helpful.  This thread is a good place.

I'm unable to spend time on this work as promised. I'd be happy if
someone could take it forward, although it's not critical work(IMO)
that needs immediate focus. I will try to spend time maybe later this
year.

Regards,
Bharath Rupireddy.




Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-06-04 Thread Alvaro Herrera
On 2021-Jun-04, Bharath Rupireddy wrote:

> On Fri, Jun 4, 2021 at 8:58 PM Alvaro Herrera  wrote:

> > I would suggest that the best way forward in this area is to rebase both
> > there patches on current master.
> 
> Thanks. I will read both the threads [1], [2] and try to rebase the
> patches. If at all I get to rebase them, do you prefer the patches to
> be in this thread or in a new thread?

Thanks, that would be helpful.  This thread is a good place.

-- 
Álvaro Herrera   Valdivia, Chile




Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-06-04 Thread Bharath Rupireddy
On Fri, Jun 4, 2021 at 8:58 PM Alvaro Herrera  wrote:
>
> On 2021-Jun-04, Bharath Rupireddy wrote:
>
> > On Thu, May 27, 2021 at 3:05 AM Alvaro Herrera  
> > wrote:
> > > Hi, how is this related to
> > > https://postgr.es/m/20191028012000.ga59...@begriffs.com ?
> >
> > Thanks. The proposed approach there was to implement postgres's own
> > strtol i.e. string parsing, conversion to integers and use it in the
> > places where atoi is being used. I'm not sure how far that can go.
> > What I'm proposing here is to use strtol inplace of atoi to properly
> > detect errors in case of inputs like '1211efe', '-14adc' and so on as
> > atoi can't detect such errors. Thoughts?
>
> Well, if you scroll back to Surafel's initial submission in that thread,
> it looks very similar in spirit to what you have here.
>
> Another thing I just noticed which I hadn't realized is that Joe
> Nelson's patch depends on Fabien Coelho's patch in this other thread,
> https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1904201223040.29102@lancre
> which was closed as returned-with-feedback, I suppose mostly due to
> exhaustion/frustration at the lack of progress/interest.
>
> I would suggest that the best way forward in this area is to rebase both
> there patches on current master.

Thanks. I will read both the threads [1], [2] and try to rebase the
patches. If at all I get to rebase them, do you prefer the patches to
be in this thread or in a new thread?

[1] - 
https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1904201223040.29102@lancre
[2] - https://www.postgresql.org/message-id/20191028012000.ga59...@begriffs.com

With Regards,
Bharath Rupireddy.




Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-06-04 Thread Alvaro Herrera
On 2021-Jun-04, Bharath Rupireddy wrote:

> On Thu, May 27, 2021 at 3:05 AM Alvaro Herrera  
> wrote:
> > Hi, how is this related to
> > https://postgr.es/m/20191028012000.ga59...@begriffs.com ?
> 
> Thanks. The proposed approach there was to implement postgres's own
> strtol i.e. string parsing, conversion to integers and use it in the
> places where atoi is being used. I'm not sure how far that can go.
> What I'm proposing here is to use strtol inplace of atoi to properly
> detect errors in case of inputs like '1211efe', '-14adc' and so on as
> atoi can't detect such errors. Thoughts?

Well, if you scroll back to Surafel's initial submission in that thread,
it looks very similar in spirit to what you have here.

Another thing I just noticed which I hadn't realized is that Joe
Nelson's patch depends on Fabien Coelho's patch in this other thread,
https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1904201223040.29102@lancre
which was closed as returned-with-feedback, I suppose mostly due to
exhaustion/frustration at the lack of progress/interest.

I would suggest that the best way forward in this area is to rebase both
there patches on current master.

-- 
Álvaro Herrera   Valdivia, Chile
"La virtud es el justo medio entre dos defectos" (Aristóteles)




Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-06-04 Thread Bharath Rupireddy
On Thu, May 27, 2021 at 3:05 AM Alvaro Herrera  wrote:
>
> On 2021-May-19, Bharath Rupireddy wrote:
>
> > While working on [1], I found that some parts of the code is using
> > strtol and atoi without checking for non-numeric junk input strings. I
> > found this strange. Most of the time users provide proper numeric
> > strings but there can be some scenarios where these strings are not
> > user-supplied but generated by some other code which may contain
> > non-numeric strings. Shouldn't the code use strtol or atoi
> > appropriately and error out in such cases?  One way to fix this once
> > and for all is to have a common API something like int
> > pg_strtol/pg_str_convert_to_int(char *opt_name, char *opt_value) which
> > returns a generic message upon invalid strings ("invalid value \"%s\"
> > is provided for option \"%s\", opt_name, opt_value) and returns
> > integers on successful parsing.
>
> Hi, how is this related to
> https://postgr.es/m/20191028012000.ga59...@begriffs.com ?

Thanks. The proposed approach there was to implement postgres's own
strtol i.e. string parsing, conversion to integers and use it in the
places where atoi is being used. I'm not sure how far that can go.
What I'm proposing here is to use strtol inplace of atoi to properly
detect errors in case of inputs like '1211efe', '-14adc' and so on as
atoi can't detect such errors. Thoughts?

With Regards,
Bharath Rupireddy.




Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-05-26 Thread Alvaro Herrera
On 2021-May-19, Bharath Rupireddy wrote:

> While working on [1], I found that some parts of the code is using
> strtol and atoi without checking for non-numeric junk input strings. I
> found this strange. Most of the time users provide proper numeric
> strings but there can be some scenarios where these strings are not
> user-supplied but generated by some other code which may contain
> non-numeric strings. Shouldn't the code use strtol or atoi
> appropriately and error out in such cases?  One way to fix this once
> and for all is to have a common API something like int
> pg_strtol/pg_str_convert_to_int(char *opt_name, char *opt_value) which
> returns a generic message upon invalid strings ("invalid value \"%s\"
> is provided for option \"%s\", opt_name, opt_value) and returns
> integers on successful parsing.

Hi, how is this related to
https://postgr.es/m/20191028012000.ga59...@begriffs.com ?

-- 
Álvaro Herrera   Valdivia, Chile




Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-05-19 Thread Bharath Rupireddy
Hi,

While working on [1], I found that some parts of the code is using
strtol and atoi without checking for non-numeric junk input strings. I
found this strange. Most of the time users provide proper numeric
strings but there can be some scenarios where these strings are not
user-supplied but generated by some other code which may contain
non-numeric strings. Shouldn't the code use strtol or atoi
appropriately and error out in such cases?  One way to fix this once
and for all is to have a common API something like int
pg_strtol/pg_str_convert_to_int(char *opt_name, char *opt_value) which
returns a generic message upon invalid strings ("invalid value \"%s\"
is provided for option \"%s\", opt_name, opt_value) and returns
integers on successful parsing.

Although this is not a critical issue, I would like to seek thoughts.

[1] - 
https://www.postgresql.org/message-id/CALj2ACVMO6wY5Pc4oe1OCgUOAtdjHuFsBDw8R5uoYR86eWFQDA%40mail.gmail.com
[2] strtol:
vacuumlo.c --> ./vacuumlo -l '2323adfd'  postgres -p '5432ERE'
libpq_pipeline.c --> ./libpq_pipeline -r '2232adf' tests

atoi:
pg_amcheck.c  --> ./pg_amcheck -j '1211efe'
pg_basebackup --> ./pg_basebackup -Z 'EFEF' -s 'a$##'
pg_receivewal.c --> ./pg_receivewal -p '54343GDFD' -s '54343GDFD'
pg_recvlogical.c --> ./pg_recvlogical -F '-$#$#' -p '5432$$$' -s '100$$$'
pg_checksums.c. --> ./pg_checksums -f '1212abc'
pg_ctl.c --> ./pg_ctl -t 'efc'
pg_dump.c --> ./pg_dump -j '454adc' -Z '4adc' --extra-float-digits '-14adc'
pg_upgrade/option.c
pgbench.c
reindexdb.c
vacuumdb.c
pg_regress.c

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com