[ https://issues.apache.org/jira/browse/ARROW-15123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Keane resolved ARROW-15123. ------------------------------------ Fix Version/s: 8.0.0 Resolution: Fixed Issue resolved by pull request 12152 [https://github.com/apache/arrow/pull/12152] > [R] CSV dataset file header read in as data > ------------------------------------------- > > Key: ARROW-15123 > URL: https://issues.apache.org/jira/browse/ARROW-15123 > Project: Apache Arrow > Issue Type: Bug > Components: R > Affects Versions: 6.0.0, 6.0.1 > Reporter: N D > Assignee: Nicola Crane > Priority: Major > Labels: pull-request-available, schema > Fix For: 8.0.0 > > Attachments: reprex-arrow-6-read.tar.gz > > Time Spent: 2.5h > Remaining Estimate: 0h > > In `arrow` 6.0.0+ for R, when I read in a CSV file using a schema where the > order of the columns in the schema doesn't match the order of columns in the > CSV, the data is read in incorrectly. > The header is included as an observation in the read-in dataset. The columns > are renamed *but not reordered* to match the schema. So I end up with the > "quantile" column called "location", etc, as below. > {code:java} > [1] "last few obs in sorted order with arrow" > # A tibble: 6 × 7 > forecast_date target target_end_date location type quantile > value > <chr> <chr> <chr> <chr> <chr> <chr> > <chr> > 1 2021-12-12 9 day ahead… 2021-12-21 0.99 946.43313… 06 > quant… > 2 2021-12-12 9 day ahead… 2021-12-21 0.99 956.43294… 39 > quant… > 3 2021-12-12 9 day ahead… 2021-12-21 0.99 97.948144… 41 > quant… > 4 2021-12-12 9 day ahead… 2021-12-21 0.99 98.573545… 49 > quant… > 5 2021-12-12 9 day ahead… 2021-12-21 0.99 98.978636… 33 > quant… > 6 forecast_date target target_end_date quantile value location > type {code} > The last line ("forecast_date target...") is the original header. > The file in question > ([https://raw.githubusercontent.com/reichlab/covid19-forecast-hub/master/data-processed/JHUAPL-Gecko/2021-12-12-JHUAPL-Gecko.csv)] > has 45360 observations + 1 line for the header. But the read-in dataset has > {code:java} > [1] "dimensions with arrow" > [1] 45361 7 {code} > Reprex attached with working (`packageVersion("arrow") == 4.0.1`; 5.0.0 also > works) and non-working (`packageVersion("arrow") == 6.0.1`) examples. Run > examples using `make run-broken` and `make run-works`. -- This message was sent by Atlassian Jira (v8.20.1#820001)