Re: Correct parsers for bounded integral values

Andreas Klebinger via ghc-devs Mon, 21 Jul 2025 11:54:26 -0700

For base introducing a new function `readBoundedNum :: (Bounded a, Numa) => String -> a` or similar seems very reasonable to me.

Changing "read" to throw an exception or similar after decades less so.



On 20/07/2025 22:08, Viktor Dukhovni wrote:

On Sun, Jul 20, 2025 at 09:12:20PM +0200, Stefan Klinger wrote:

I'd like to bring to your attention a discussion that I have started
over at Haskell-cafe [1].  I was complaining about the silent overflow
of parsers for bounded integers:

     > read "298" :: Word8
     42

FWIW, there haven't AFAIK any complaints about ByteString's readInt,
readWord, readInteger, readNatural and various sized variants having
overflow checks.  But these have always been more like `reads` than
`read`, returning `Maybe (a, ByteString)`, so perhaps somewhat more
oriented towards detecting unexpected excess input, as well as for
some time now range overflow.  So there's some precedent for overflow
checking, but...

It is also fair to point out that once an Int or other bounded integral
type is read, arithmetic with that type (addition, subtraction and
multiplication) silently overflows.  And so silent overflow in `read`
is not inconsistent with the type's semantics.

If converting strings to numbers is in support of string-oriented
network protocols (e.g. the SIZE ESMTP extension), then one really
should make an effort to avoid silent overflow, but in that context the
various ByteString read methods are already available.

That said, if various middleware libraries hide overflows, because under
the covers thay're using `read`, that could be a problem, so we do want
the ecosystem at large to make sensible choices about when silent
overflow may or may not be appropriate.  Perhaps that means having
both wrapping and overflow-checked implementations available, and
clear docs with each about its behaviour and the corresponding
alternative.

I find this unsatisfying, and I have demonstrated a solution [2] that
seems correct and performant.

A few of quick observations about [2]:

     - It disallows expliccit leading "+" (just like "read", but perhaps
       that should be tolerated).

     - It disallows multiple leading zeros, perhaps these should be
       tolerated.

     - It disallows "-0", perhaps these should be tolerated, as well
       as "-0000", "-000001", ...  (With lazy ByteStrings, which might
       never terminate, there is a generous, but sensible limit on
       the number of leading zeros allowed).

     - One way to avoid difficulties with handling negative minBound is
       to parse signed values via the corresponding unsigned type, which
       can accommodate `-minBound` as a positive value, and then negate
       the final result.  This makse possible sharing the low-level
       digit-by-digit code between the positive and negative cases.

If parsing of Integer and Natual is also in scope, I would expect that
it avoids doing multi-precision arithmetic for each digit, parsing
groups of digits into ~Word sized blocks, and merge the blocks
hierarchically with only a logarithmic number of MP multiplies.

_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Correct parsers for bounded integral values

Reply via email to