Ziru Niu created ARROW-10132: -------------------------------- Summary: Considers scientific notation when inferring schema from csv Key: ARROW-10132 URL: https://issues.apache.org/jira/browse/ARROW-10132 Project: Apache Arrow Issue Type: Improvement Components: Rust Affects Versions: 1.0.1 Environment: Ubuntu Reporter: Ziru Niu
||col|| |1.2| |1.3e-2| |1.4| Currently this column would be inferred as Utf8 type, since csv::reader::DECIMAL_RE is defined as r"^-?(\d+\.\d+)$". Maybe we could change this to r"^-?(\d+\.\d+)(e-?(\d+))?$" or similar stuff to allow scientific notation of real number inferred as float? (The RE I currently proposed doesn't handle "5e-4" correctly though) And I would wish we could infer "3." or ".3" as float too. I will come up with an exact RE when I get time. -- This message was sent by Atlassian Jira (v8.3.4#803005)