aryan-212 opened a new issue, #9538:
URL: https://github.com/apache/arrow-rs/issues/9538
**Describe the bug**
The Parser::parse implementations for numeric types fail to parse strings
that contain leading or trailing whitespace.
In practice, this happens quite often when reading data from CSVs or other
text-based sources where values may be padded with spaces, tabs, or newline
characters. Instead of parsing successfully, these inputs currently return None.
**To Reproduce**
```rust
use arrow_array::types::*;
use arrow_cast::parse::Parser;
// they return None instead of the parsed number
assert_eq!(Float32Type::parse(" 1.5 "), None); // expected Some(1.5)
assert_eq!(Int32Type::parse(" 42 "), None); // expected Some(42)
assert_eq!(Int64Type::parse("\t100\n"), None); // expected Some(100)
assert_eq!(UInt64Type::parse(" 7 "), None); // expected Some(7)
```
**Expected behavior**
Numeric parsers should ignore leading and trailing whitespace before
parsing. For example, " 42 " should parse successfully to Some(42) rather than
returning None.
This behavior is consistent with how most data ingestion systems handle
text-to-number conversion.
**Additional context**
The issue originates in arrow-cast/src/parse.rs. The float parsers pass
string.as_bytes() directly to lexical_core::parse, and the parser_primitive!
macro (used for integers and durations) similarly operates on the input without
trimming.
A simple fix would be to call .trim() on the input string before attempting
to parse.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]