[ https://issues.apache.org/jira/browse/ARROW-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nic Crane updated ARROW-13421: ------------------------------ Description: When reading in data where commas have been used as decimal separators (e.g. 3,141 to indicate pi), the column is read in as a character string. If I try to specify a schema in R, i.e.: {{tbl <- tibble::tibble(x = rnorm(5))}} {{# write to disk with comma separator}} {{readr::write_csv2(tbl, "tst.csv")}} {{# read back in}} {{read_delim_arrow("tst.csv", delim = ";", schema = schema(x = float32()))}} I get the following error: {{Error: Invalid: In CSV column #0: CSV conversion error to float: invalid value 'x'}} {{/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:437 decoder_.Decode(data, size, quoted, &value)}} {{/home/nic2/arrow/cpp/src/arrow/csv/parser.h:84 status}} {{/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:441 parser.VisitColumn(col_index, visit) }} Please can we have the functionality to be able to read in data from this format as it's fairly common across a number of countries? was: When reading in data where commas have been used as decimal separators (e.g. 3,141 to indicate pi), the column is read in as a character string. If I try to specify a schema in R, i.e.: {{read_delim_arrow("tst.csv", delim = ";", schema = schema(x = float32()))}} I get the following error: {{Error: Invalid: In CSV column #0: CSV conversion error to float: invalid value 'x'}} {{/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:437 decoder_.Decode(data, size, quoted, &value)}} {{/home/nic2/arrow/cpp/src/arrow/csv/parser.h:84 status}} {{/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:441 parser.VisitColumn(col_index, visit) }} Please can we have the functionality to be able to read in data from this format as it's fairly common across a number of countries? > [C++] Add functionality for reading in columns as floats from delimited > files where a comma has been used as a decimal separator > --------------------------------------------------------------------------------------------------------------------------------- > > Key: ARROW-13421 > URL: https://issues.apache.org/jira/browse/ARROW-13421 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Daniel Paierl > Priority: Minor > > When reading in data where commas have been used as decimal separators (e.g. > 3,141 to indicate pi), the column is read in as a character string. If I try > to specify a schema in R, i.e.: > {{tbl <- tibble::tibble(x = rnorm(5))}} > {{# write to disk with comma separator}} > {{readr::write_csv2(tbl, "tst.csv")}} > {{# read back in}} > {{read_delim_arrow("tst.csv", delim = ";", schema = schema(x = float32()))}} > I get the following error: > {{Error: Invalid: In CSV column #0: CSV conversion error to float: invalid > value 'x'}} > {{/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:437 decoder_.Decode(data, > size, quoted, &value)}} > {{/home/nic2/arrow/cpp/src/arrow/csv/parser.h:84 status}} > {{/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:441 > parser.VisitColumn(col_index, visit) }} > Please can we have the functionality to be able to read in data from this > format as it's fairly common across a number of countries? -- This message was sent by Atlassian Jira (v8.3.4#803005)