Re: strtoi(3) ERANGE vs ENOTSUP

2024-01-21 Thread Edgar Fuß
I think it depends on what you consider valid use cases of strtoi().

> This is the case when the number is out of range, and there's trailing
> garbage -- e.g., s="42z", min=3, max=7.
Is it correct to consider the first non-digit character garbage? I.e., would 
you regard it as an abuse of strtoi() to parse strings like "32k", "16M" or 
"1.5", "4 2/3?

If the answer is "don't do that" (i.e. a valid use should be entirely 
parseble), then I'd expect ENOTSUP. If that's valid use, I'd expect ERANGE.


Re: strtoi(3) ERANGE vs ENOTSUP

2024-01-20 Thread Alejandro Colomar
Hello Taylor,

On Sat, Jan 20, 2024 at 07:07:57PM +, Taylor R Campbell wrote:
> PR lib/57828 (https://gnats.NetBSD.org/57828) proposes changing an
> edge case of the strtoi(3) function so that if ERANGE and ENOTSUP are
> both applicable then it should fail with ERANGE rather than ENOTSUP.
> 
> This is the case when the number is out of range, and there's trailing
> garbage -- e.g., s="42z", min=3, max=7.
> 
> 
> The rationale is that the caller can trivially detect the ENOTSUP
> condition (test end[0] != '\0'), but not the ERANGE condition (short
> of parsing the number again, which would defeat the purpose).  And
> applications may find the out-of-range case more significant than the
> trailing garbage case.
> 
> For example, libutil's pidfile_read(3) tries to parse a pid on a
> single line, so any file whose content matches /[0-9]+\n/ with the
> resulting number in the interval [1, INT_MAX] is accepted, and one
> might hope that anything else should be rejected:
> 
> https://nxr.netbsd.org/xref/src/lib/libutil/pidfile.c?r=1.16#132
> 
> But if the pidfile contains, e.g., `999\n',
> pidfile_read(3) as implemented will accept this -- and interpret it as
> pid 2147483647.  If it's `-999\n',
> pidfile_read(3) will accept it as pid 1.
> 
> 
> Counterarguments:
> - We've had this semantics since NetBSD 7, so changing it is risky --
>   we'd have to audit all the existing code that uses it.

I have audited all existing code in NetBSD and in Debian.  That should
cover a large amount of free software.  Small personal projects and
closed source projects are not covered, though.

From my investigation, this change would fix many bugs, and it would
only break one specific case: the implementation of strtonum(3) --which
BTW, has an easy fix, and we anticipated it--.

> - There are copies of the code in various places and they might
>   diverge.
> - I searched github for /[^a-zA-Z0-9_]strtoi[^a-zA-Z0-9_]/ and came up
>   with almost 16k results, so auditing that myself isn't going to
>   happen unless we can narrow it down a lot.

What I did on NetBSD and Debian was to search projects that contain
'strto[iu]\>' _and_ either ERANGE or ENOTSUP in the same file.

That should be good enough.  It might forget some corner case, like a
wrapper around strto[iu], but that's all I could do.

> - If you need to act on the trailing stuff, maybe it's better to just
>   use a real lexer instead of relying on edge cases of error cases.
> - If you rely on this change, your code won't work in any of the
>   deployed implementations of strtoi anyway.

How many widespread implementations of strtoi(3) are there?  I know of
NetBSD's and libbsd's.  I have a patch for libbsd already prepared.

> 
> 
> Thoughts?
> 
> The attached patch implements the proposed change in libc and the
> automatic tests.
> From a64349a5ac209d9967ba3c5d6a62aa243f44f603 Mon Sep 17 00:00:00 2001
> From: Taylor R Campbell 
> Date: Sat, 20 Jan 2024 18:17:27 +
> Subject: [PATCH 1/2] strtoi(3): Test for ERANGE before ENOTSUP.
> 
> Expect failure because that's not what we implemented.
> 
> PR lib/57828
> ---
>  tests/lib/libc/stdlib/t_strtoi.c | 14 ++
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/tests/lib/libc/stdlib/t_strtoi.c 
> b/tests/lib/libc/stdlib/t_strtoi.c
> index 3301058a5be1..093b71abe04a 100644
> --- a/tests/lib/libc/stdlib/t_strtoi.c
> +++ b/tests/lib/libc/stdlib/t_strtoi.c
> @@ -313,14 +313,9 @@ ATF_TC_BODY(strtoi_erroredges, tc)
>   { "+5z", 5, 0, "z", 3, 7, ENOTSUP },
>   { "+6z", 6, 0, "z", 3, 7, ENOTSUP },
>   { "+7z", 7, 0, "z", 3, 7, ENOTSUP },
> - /*
> -  * PR lib/57828 suggests these should yield ERANGE, not
> -  * ENOTSUP, because ENOTSUP can be discovered
> -  * separately by the caller anyway.
> -  */
> - { "-42z",3, 0, "z", 3, 7, ENOTSUP }, /* XXX ERANGE */
> - { "2z",  3, 0, "z", 3, 7, ENOTSUP }, /* XXX ERANGE */
> - { "42z", 7, 0, "z", 3, 7, ENOTSUP }, /* XXX ERANGE */
> + { "-42z",3, 0, "z", 3, 7, ERANGE },
> + { "2z",  3, 0, "z", 3, 7, ERANGE },
> + { "42z", 7, 0, "z", 3, 7, ERANGE },
>   };
>  
>   intmax_t rv;
> @@ -330,6 +325,9 @@ ATF_TC_BODY(strtoi_erroredges, tc)
>  
>   for (i = 0; i < __arraycount(t); i++) {
>  
> + if (t[i].rstatus == ERANGE)
> + atf_tc_expect_fail("PR lib/57828");
> +
>   errno = 0;
>   rv = strtoi(t[i].str, &end, t[i].base, t[i].lo, t[i].hi, &e);
>  
> 
> From ee1b6083ea09f569e95ae5137b5fe870beb9d017 Mon Sep 17 00:00:00 2001
> From: Taylor R Campbell 
> Date: Sat, 20 Jan 2024 18:19:34 +
> Subject: [PATCH 2/2] strtoi(3): Return ERANGE before ENOTSUP if both are
>  applicable.
> 
> Caller of strtoi can trivially detect 

Re: strtoi(3) ERANGE vs ENOTSUP

2024-01-20 Thread Mouse
> PR lib/57828 (https://gnats.NetBSD.org/57828) proposes changing an
> edge case of the strtoi(3) function so that if ERANGE and ENOTSUP are
> both applicable then it should fail with ERANGE rather than ENOTSUP.

Has it ever been part of strtoi's contract which error it fails with if
multiple errors are applicable?  What you wrote doesn't, to me, sound
as though there is any such spec.

If there is, that's the place to start, it seems to me.

If not, then anything this change affects is already broken and
deserves to be rendered _obviously_ broken, so there's some hope of
getting it fixed.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


strtoi(3) ERANGE vs ENOTSUP

2024-01-20 Thread Taylor R Campbell
PR lib/57828 (https://gnats.NetBSD.org/57828) proposes changing an
edge case of the strtoi(3) function so that if ERANGE and ENOTSUP are
both applicable then it should fail with ERANGE rather than ENOTSUP.

This is the case when the number is out of range, and there's trailing
garbage -- e.g., s="42z", min=3, max=7.


The rationale is that the caller can trivially detect the ENOTSUP
condition (test end[0] != '\0'), but not the ERANGE condition (short
of parsing the number again, which would defeat the purpose).  And
applications may find the out-of-range case more significant than the
trailing garbage case.

For example, libutil's pidfile_read(3) tries to parse a pid on a
single line, so any file whose content matches /[0-9]+\n/ with the
resulting number in the interval [1, INT_MAX] is accepted, and one
might hope that anything else should be rejected:

https://nxr.netbsd.org/xref/src/lib/libutil/pidfile.c?r=1.16#132

But if the pidfile contains, e.g., `999\n',
pidfile_read(3) as implemented will accept this -- and interpret it as
pid 2147483647.  If it's `-999\n',
pidfile_read(3) will accept it as pid 1.


Counterarguments:
- We've had this semantics since NetBSD 7, so changing it is risky --
  we'd have to audit all the existing code that uses it.
- There are copies of the code in various places and they might
  diverge.
- I searched github for /[^a-zA-Z0-9_]strtoi[^a-zA-Z0-9_]/ and came up
  with almost 16k results, so auditing that myself isn't going to
  happen unless we can narrow it down a lot.
- If you need to act on the trailing stuff, maybe it's better to just
  use a real lexer instead of relying on edge cases of error cases.
- If you rely on this change, your code won't work in any of the
  deployed implementations of strtoi anyway.


Thoughts?

The attached patch implements the proposed change in libc and the
automatic tests.
>From a64349a5ac209d9967ba3c5d6a62aa243f44f603 Mon Sep 17 00:00:00 2001
From: Taylor R Campbell 
Date: Sat, 20 Jan 2024 18:17:27 +
Subject: [PATCH 1/2] strtoi(3): Test for ERANGE before ENOTSUP.

Expect failure because that's not what we implemented.

PR lib/57828
---
 tests/lib/libc/stdlib/t_strtoi.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/tests/lib/libc/stdlib/t_strtoi.c b/tests/lib/libc/stdlib/t_strtoi.c
index 3301058a5be1..093b71abe04a 100644
--- a/tests/lib/libc/stdlib/t_strtoi.c
+++ b/tests/lib/libc/stdlib/t_strtoi.c
@@ -313,14 +313,9 @@ ATF_TC_BODY(strtoi_erroredges, tc)
{ "+5z", 5, 0, "z", 3, 7, ENOTSUP },
{ "+6z", 6, 0, "z", 3, 7, ENOTSUP },
{ "+7z", 7, 0, "z", 3, 7, ENOTSUP },
-   /*
-* PR lib/57828 suggests these should yield ERANGE, not
-* ENOTSUP, because ENOTSUP can be discovered
-* separately by the caller anyway.
-*/
-   { "-42z",3, 0, "z", 3, 7, ENOTSUP }, /* XXX ERANGE */
-   { "2z",  3, 0, "z", 3, 7, ENOTSUP }, /* XXX ERANGE */
-   { "42z", 7, 0, "z", 3, 7, ENOTSUP }, /* XXX ERANGE */
+   { "-42z",3, 0, "z", 3, 7, ERANGE },
+   { "2z",  3, 0, "z", 3, 7, ERANGE },
+   { "42z", 7, 0, "z", 3, 7, ERANGE },
};
 
intmax_t rv;
@@ -330,6 +325,9 @@ ATF_TC_BODY(strtoi_erroredges, tc)
 
for (i = 0; i < __arraycount(t); i++) {
 
+   if (t[i].rstatus == ERANGE)
+   atf_tc_expect_fail("PR lib/57828");
+
errno = 0;
rv = strtoi(t[i].str, &end, t[i].base, t[i].lo, t[i].hi, &e);
 

>From ee1b6083ea09f569e95ae5137b5fe870beb9d017 Mon Sep 17 00:00:00 2001
From: Taylor R Campbell 
Date: Sat, 20 Jan 2024 18:19:34 +
Subject: [PATCH 2/2] strtoi(3): Return ERANGE before ENOTSUP if both are
 applicable.

Caller of strtoi can trivially detect the ENOTSUP case themselves by
checking whether end[0] == '\0', but they can't trivially detect the
ERANGE case (short of doing their own parsing which would defeat the
purpose of using strtoi(3)).

PR lib/57828
---
 common/lib/libc/stdlib/_strtoi.h | 9 ++---
 tests/lib/libc/stdlib/t_strtoi.c | 3 ---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/common/lib/libc/stdlib/_strtoi.h b/common/lib/libc/stdlib/_strtoi.h
index e6b0ce778b60..d48d58350050 100644
--- a/common/lib/libc/stdlib/_strtoi.h
+++ b/common/lib/libc/stdlib/_strtoi.h
@@ -104,9 +104,6 @@ INT_FUNCNAME(_int_, _FUNCNAME, _l)(const char * __restrict 
nptr,
/* No digits were found */
if (nptr == *endptr)
*rstatus = ECANCELED;
-   /* There are further characters after number */
-   else if (**endptr != '\0')
-   *rstatus = ENOTSUP;
}
 
if (im < lo) {
@@ -120,6 +117,12 @@ INT_FUNCNAME(_int_, _FUNCNAME, _l)(c