Re: strtoi(3) ERANGE vs ENOTSUP

2024-01-21 Thread Edgar Fuß
I think it depends on what you consider valid use cases of strtoi().

> This is the case when the number is out of range, and there's trailing
> garbage -- e.g., s="42z", min=3, max=7.
Is it correct to consider the first non-digit character garbage? I.e., would 
you regard it as an abuse of strtoi() to parse strings like "32k", "16M" or 
"1.5", "4 2/3?

If the answer is "don't do that" (i.e. a valid use should be entirely 
parseble), then I'd expect ENOTSUP. If that's valid use, I'd expect ERANGE.


Re: strtoi(3) ERANGE vs ENOTSUP

2024-01-20 Thread Alejandro Colomar
Hello Taylor,

On Sat, Jan 20, 2024 at 07:07:57PM +, Taylor R Campbell wrote:
> PR lib/57828 (https://gnats.NetBSD.org/57828) proposes changing an
> edge case of the strtoi(3) function so that if ERANGE and ENOTSUP are
> both applicable then it should fail with ERANGE rather than ENOTSUP.
> 
> This is the case when the number is out of range, and there's trailing
> garbage -- e.g., s="42z", min=3, max=7.
> 
> 
> The rationale is that the caller can trivially detect the ENOTSUP
> condition (test end[0] != '\0'), but not the ERANGE condition (short
> of parsing the number again, which would defeat the purpose).  And
> applications may find the out-of-range case more significant than the
> trailing garbage case.
> 
> For example, libutil's pidfile_read(3) tries to parse a pid on a
> single line, so any file whose content matches /[0-9]+\n/ with the
> resulting number in the interval [1, INT_MAX] is accepted, and one
> might hope that anything else should be rejected:
> 
> https://nxr.netbsd.org/xref/src/lib/libutil/pidfile.c?r=1.16#132
> 
> But if the pidfile contains, e.g., `999\n',
> pidfile_read(3) as implemented will accept this -- and interpret it as
> pid 2147483647.  If it's `-999\n',
> pidfile_read(3) will accept it as pid 1.
> 
> 
> Counterarguments:
> - We've had this semantics since NetBSD 7, so changing it is risky --
>   we'd have to audit all the existing code that uses it.

I have audited all existing code in NetBSD and in Debian.  That should
cover a large amount of free software.  Small personal projects and
closed source projects are not covered, though.

From my investigation, this change would fix many bugs, and it would
only break one specific case: the implementation of strtonum(3) --which
BTW, has an easy fix, and we anticipated it--.

> - There are copies of the code in various places and they might
>   diverge.
> - I searched github for /[^a-zA-Z0-9_]strtoi[^a-zA-Z0-9_]/ and came up
>   with almost 16k results, so auditing that myself isn't going to
>   happen unless we can narrow it down a lot.

What I did on NetBSD and Debian was to search projects that contain
'strto[iu]\>' _and_ either ERANGE or ENOTSUP in the same file.

That should be good enough.  It might forget some corner case, like a
wrapper around strto[iu], but that's all I could do.

> - If you need to act on the trailing stuff, maybe it's better to just
>   use a real lexer instead of relying on edge cases of error cases.
> - If you rely on this change, your code won't work in any of the
>   deployed implementations of strtoi anyway.

How many widespread implementations of strtoi(3) are there?  I know of
NetBSD's and libbsd's.  I have a patch for libbsd already prepared.

> 
> 
> Thoughts?
> 
> The attached patch implements the proposed change in libc and the
> automatic tests.
> From a64349a5ac209d9967ba3c5d6a62aa243f44f603 Mon Sep 17 00:00:00 2001
> From: Taylor R Campbell 
> Date: Sat, 20 Jan 2024 18:17:27 +
> Subject: [PATCH 1/2] strtoi(3): Test for ERANGE before ENOTSUP.
> 
> Expect failure because that's not what we implemented.
> 
> PR lib/57828
> ---
>  tests/lib/libc/stdlib/t_strtoi.c | 14 ++
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/tests/lib/libc/stdlib/t_strtoi.c 
> b/tests/lib/libc/stdlib/t_strtoi.c
> index 3301058a5be1..093b71abe04a 100644
> --- a/tests/lib/libc/stdlib/t_strtoi.c
> +++ b/tests/lib/libc/stdlib/t_strtoi.c
> @@ -313,14 +313,9 @@ ATF_TC_BODY(strtoi_erroredges, tc)
>   { "+5z", 5, 0, "z", 3, 7, ENOTSUP },
>   { "+6z", 6, 0, "z", 3, 7, ENOTSUP },
>   { "+7z", 7, 0, "z", 3, 7, ENOTSUP },
> - /*
> -  * PR lib/57828 suggests these should yield ERANGE, not
> -  * ENOTSUP, because ENOTSUP can be discovered
> -  * separately by the caller anyway.
> -  */
> - { "-42z",3, 0, "z", 3, 7, ENOTSUP }, /* XXX ERANGE */
> - { "2z",  3, 0, "z", 3, 7, ENOTSUP }, /* XXX ERANGE */
> - { "42z", 7, 0, "z", 3, 7, ENOTSUP }, /* XXX ERANGE */
> + { "-42z",3, 0, "z", 3, 7, ERANGE },
> + { "2z",  3, 0, "z", 3, 7, ERANGE },
> + { "42z", 7, 0, "z", 3, 7, ERANGE },
>   };
>  
>   intmax_t rv;
> @@ -330,6 +325,9 @@ ATF_TC_BODY(strtoi_erroredges, tc)
>  
>   for (i = 0; i < __arraycount(t); i++) {
>  
> + if (t[i].rstatus == ERANGE)
> + atf_tc_expect_fail("PR lib/57828");
> +
>   errno = 0;
>   rv = strtoi(t[i].str, , t[i].base, t[i].lo, t[i].hi, );
>  
> 
> From ee1b6083ea09f569e95ae5137b5fe870beb9d017 Mon Sep 17 00:00:00 2001
> From: Taylor R Campbell 
> Date: Sat, 20 Jan 2024 18:19:34 +
> Subject: [PATCH 2/2] strtoi(3): Return ERANGE before ENOTSUP if both are
>  applicable.
> 
> Caller of strtoi can trivially detect the 

Re: strtoi(3) ERANGE vs ENOTSUP

2024-01-20 Thread Mouse
> PR lib/57828 (https://gnats.NetBSD.org/57828) proposes changing an
> edge case of the strtoi(3) function so that if ERANGE and ENOTSUP are
> both applicable then it should fail with ERANGE rather than ENOTSUP.

Has it ever been part of strtoi's contract which error it fails with if
multiple errors are applicable?  What you wrote doesn't, to me, sound
as though there is any such spec.

If there is, that's the place to start, it seems to me.

If not, then anything this change affects is already broken and
deserves to be rendered _obviously_ broken, so there's some hope of
getting it fixed.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B