On Sun, Jul 20, 2025 at 09:12:20PM +0200, Stefan Klinger wrote: > I'd like to bring to your attention a discussion that I have started > over at Haskell-cafe [1]. I was complaining about the silent overflow > of parsers for bounded integers: > > > read "298" :: Word8 > 42
FWIW, there haven't AFAIK any complaints about ByteString's readInt, readWord, readInteger, readNatural and various sized variants having overflow checks. But these have always been more like `reads` than `read`, returning `Maybe (a, ByteString)`, so perhaps somewhat more oriented towards detecting unexpected excess input, as well as for some time now range overflow. So there's some precedent for overflow checking, but... It is also fair to point out that once an Int or other bounded integral type is read, arithmetic with that type (addition, subtraction and multiplication) silently overflows. And so silent overflow in `read` is not inconsistent with the type's semantics. If converting strings to numbers is in support of string-oriented network protocols (e.g. the SIZE ESMTP extension), then one really should make an effort to avoid silent overflow, but in that context the various ByteString read methods are already available. That said, if various middleware libraries hide overflows, because under the covers thay're using `read`, that could be a problem, so we do want the ecosystem at large to make sensible choices about when silent overflow may or may not be appropriate. Perhaps that means having both wrapping and overflow-checked implementations available, and clear docs with each about its behaviour and the corresponding alternative. > I find this unsatisfying, and I have demonstrated a solution [2] that > seems correct and performant. A few of quick observations about [2]: - It disallows expliccit leading "+" (just like "read", but perhaps that should be tolerated). - It disallows multiple leading zeros, perhaps these should be tolerated. - It disallows "-0", perhaps these should be tolerated, as well as "-0000", "-000001", ... (With lazy ByteStrings, which might never terminate, there is a generous, but sensible limit on the number of leading zeros allowed). - One way to avoid difficulties with handling negative minBound is to parse signed values via the corresponding unsigned type, which can accommodate `-minBound` as a positive value, and then negate the final result. This makse possible sharing the low-level digit-by-digit code between the positive and negative cases. If parsing of Integer and Natual is also in scope, I would expect that it avoids doing multi-precision arithmetic for each digit, parsing groups of digits into ~Word sized blocks, and merge the blocks hierarchically with only a logarithmic number of MP multiplies. -- Viktor. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs