aryan-212 opened a new issue, #9538:
URL: https://github.com/apache/arrow-rs/issues/9538

   **Describe the bug**
   
   The Parser::parse implementations for numeric types fail to parse strings 
that contain leading or trailing whitespace.
   
   In practice, this happens quite often when reading data from CSVs or other 
text-based sources where values may be padded with spaces, tabs, or newline 
characters. Instead of parsing successfully, these inputs currently return None.
   
   **To Reproduce**
   ```rust
   use arrow_array::types::*;
   use arrow_cast::parse::Parser;
   
   // they return None instead of the parsed number
   assert_eq!(Float32Type::parse(" 1.5 "), None);   // expected Some(1.5)
   assert_eq!(Int32Type::parse(" 42 "), None);      // expected Some(42)
   assert_eq!(Int64Type::parse("\t100\n"), None);   // expected Some(100)
   assert_eq!(UInt64Type::parse(" 7 "), None);      // expected Some(7)
   ```
   **Expected behavior**
   
   Numeric parsers should ignore leading and trailing whitespace before 
parsing. For example, " 42 " should parse successfully to Some(42) rather than 
returning None.
   
   This behavior is consistent with how most data ingestion systems handle 
text-to-number conversion.
   
   **Additional context**
   
   The issue originates in arrow-cast/src/parse.rs. The float parsers pass 
string.as_bytes() directly to lexical_core::parse, and the parser_primitive! 
macro (used for integers and durations) similarly operates on the input without 
trimming.
   
   A simple fix would be to call .trim() on the input string before attempting 
to parse.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to