[ 
https://issues.apache.org/jira/browse/ARROW-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nic Crane updated ARROW-13421:
------------------------------
    Description: 
When reading in data where commas have been used as decimal separators (e.g. 
3,141 to indicate pi), the column is read in as a character string.  If I try 
to specify a schema in R, i.e.:

{{tbl <- tibble::tibble(x = rnorm(5))}}
{{# write to disk with comma separator}}
{{readr::write_csv2(tbl, "tst.csv")}}
{{# read back in}}
{{read_delim_arrow("tst.csv", delim = ";", schema = schema(x = float32()))}}

I get the following error:

{{Error: Invalid: In CSV column #0: CSV conversion error to float: invalid 
value 'x'}}
{{/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:437  decoder_.Decode(data, 
size, quoted, &value)}}
{{/home/nic2/arrow/cpp/src/arrow/csv/parser.h:84  status}}
{{/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:441  
parser.VisitColumn(col_index, visit) }}

Please can we have the functionality to be able to read in data from this 
format as it's fairly common across a number of countries?

  was:
When reading in data where commas have been used as decimal separators (e.g. 
3,141 to indicate pi), the column is read in as a character string.  If I try 
to specify a schema in R, i.e.:

{{read_delim_arrow("tst.csv", delim = ";", schema = schema(x = float32()))}}

I get the following error:

{{Error: Invalid: In CSV column #0: CSV conversion error to float: invalid 
value 'x'}}
{{/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:437  decoder_.Decode(data, 
size, quoted, &value)}}
{{/home/nic2/arrow/cpp/src/arrow/csv/parser.h:84  status}}
{{/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:441  
parser.VisitColumn(col_index, visit) }}

Please can we have the functionality to be able to read in data from this 
format as it's fairly common across a number of countries?


> [C++]  Add functionality for reading in columns as floats from delimited 
> files where a comma has been used as a decimal separator
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-13421
>                 URL: https://issues.apache.org/jira/browse/ARROW-13421
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Daniel Paierl
>            Priority: Minor
>
> When reading in data where commas have been used as decimal separators (e.g. 
> 3,141 to indicate pi), the column is read in as a character string.  If I try 
> to specify a schema in R, i.e.:
> {{tbl <- tibble::tibble(x = rnorm(5))}}
> {{# write to disk with comma separator}}
> {{readr::write_csv2(tbl, "tst.csv")}}
> {{# read back in}}
> {{read_delim_arrow("tst.csv", delim = ";", schema = schema(x = float32()))}}
> I get the following error:
> {{Error: Invalid: In CSV column #0: CSV conversion error to float: invalid 
> value 'x'}}
> {{/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:437  decoder_.Decode(data, 
> size, quoted, &value)}}
> {{/home/nic2/arrow/cpp/src/arrow/csv/parser.h:84  status}}
> {{/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:441  
> parser.VisitColumn(col_index, visit) }}
> Please can we have the functionality to be able to read in data from this 
> format as it's fairly common across a number of countries?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to