[ https://issues.apache.org/jira/browse/ARROW-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neal Richardson updated ARROW-7639: ----------------------------------- Fix Version/s: 0.16.0 > [R] Cannot convert Dictionary Array to R when values aren't strings > ------------------------------------------------------------------- > > Key: ARROW-7639 > URL: https://issues.apache.org/jira/browse/ARROW-7639 > Project: Apache Arrow > Issue Type: Bug > Components: R > Affects Versions: 0.15.1 > Environment: Ubuntu 16.04.5 LTS > Reporter: Etienne Racine > Assignee: Neal Richardson > Priority: Major > Labels: pull-request-available > Fix For: 0.16.0 > > Time Spent: 10m > Remaining Estimate: 0h > > I got an error in R when reading a feather file using arrow::read_feather() > prepared in python. > {code:r} > #' Error in Table__to_dataframe(x, use_threads = option_use_threads()) : > #' Cannot convert Dictionary Array of type `dictionary<values=double, > indices=int8, ordered=0>` to R{code} > I could reproduce the issue with a minimal example: > In python: > {code:python} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame({"float": [0.1, .2, 0.5, .001]}) > df["category"] = df["float"].astype('category') > df.dtypes > #' float float64 > #' A object > #' category category > #' dtype: object > df.to_feather("series.feather") > pa.__version__ > #' '0.15.1' > {code} > From R: > {code:r} > arrow::read_feather("series.feather") > #' Error in Table__to_dataframe(x, use_threads = option_use_threads()) : > #' Cannot convert Dictionary Array of type `dictionary<values=double, > indices=int8, ordered=0>` to R > #' Backtrace: > #' █ > #' 1. └─arrow::read_feather("series.feather") > #' 2. ├─[ base::as.data.frame(...) ] > #' 3. └─arrow:::as.data.frame.Table(out) > #' 4. └─arrow:::Table__to_dataframe(x, use_threads = option_use_threads()) > {code} > The feather file is read correctly back in python > {code:python} > ft = pd.read_feather("series.feather") > ft.dtypes > #' float float64 > #' A object > #' category category > #' dtype: object > {code} > {code:r} > sessionInfo() > #' R version 3.5.1 (2018-07-02) > #' Platform: x86_64-conda_cos6-linux-gnu (64-bit) > #' Running under: Ubuntu 16.04.5 LTS > #' > #' Matrix products: default > #' BLAS/LAPACK: /misc/DLshare/home/etbellem/miniconda3/lib/R/lib/libRblas.so > #' > #' locale: > #' [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > #' [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > #' [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > #' [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > #' [9] LC_ADDRESS=C LC_TELEPHONE=C > #' [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > #' > #' attached base packages: > #' [1] stats graphics grDevices utils datasets methods base > #' > #' loaded via a namespace (and not attached): > #' [1] Rcpp_1.0.3 arrow_0.15.1 crayon_1.3.4 assertthat_0.2.1 > #' [5] R6_2.4.1 magrittr_1.5 rlang_0.4.2 rstudioapi_0.10 > #' [9] bit64_0.9-7 glue_1.3.1 purrr_0.3.3 bit_1.1-15.1 > #' [13] compiler_3.5.1 tidyselect_0.2.5{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)