[ 
https://issues.apache.org/jira/browse/ARROW-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-7639:
-----------------------------------
    Fix Version/s: 0.16.0

> [R] Cannot convert Dictionary Array to R when values aren't strings
> -------------------------------------------------------------------
>
>                 Key: ARROW-7639
>                 URL: https://issues.apache.org/jira/browse/ARROW-7639
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 0.15.1
>         Environment: Ubuntu 16.04.5 LTS
>            Reporter: Etienne Racine
>            Assignee: Neal Richardson
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.16.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I got an error in R when reading a feather file using arrow::read_feather() 
> prepared in python.
> {code:r}
> #' Error in Table__to_dataframe(x, use_threads = option_use_threads()) :
> #' Cannot convert Dictionary Array of type `dictionary<values=double, 
> indices=int8, ordered=0>` to R{code}
> I could reproduce the issue with a minimal example:
> In python:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"float": [0.1, .2, 0.5, .001]})
> df["category"] = df["float"].astype('category')
> df.dtypes
> #' float float64
> #' A object
> #' category category
> #' dtype: object
> df.to_feather("series.feather")
> pa.__version__
> #' '0.15.1'
> {code}
> From R:
> {code:r}
> arrow::read_feather("series.feather")
> #' Error in Table__to_dataframe(x, use_threads = option_use_threads()) :
> #' Cannot convert Dictionary Array of type `dictionary<values=double, 
> indices=int8, ordered=0>` to R
> #' Backtrace:
> #' █
> #' 1. └─arrow::read_feather("series.feather")
> #' 2. ├─[ base::as.data.frame(...) ]
> #' 3. └─arrow:::as.data.frame.Table(out)
> #' 4. └─arrow:::Table__to_dataframe(x, use_threads = option_use_threads())
> {code}
>  The feather file is read correctly back in python 
> {code:python}
> ft = pd.read_feather("series.feather")
> ft.dtypes
> #' float        float64
> #' A             object
> #' category    category
> #' dtype: object
> {code}
> {code:r}
> sessionInfo()
> #' R version 3.5.1 (2018-07-02)
> #' Platform: x86_64-conda_cos6-linux-gnu (64-bit)
> #' Running under: Ubuntu 16.04.5 LTS
> #' 
> #' Matrix products: default
> #' BLAS/LAPACK: /misc/DLshare/home/etbellem/miniconda3/lib/R/lib/libRblas.so
> #' 
> #' locale:
> #' [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> #' [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> #' [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> #' [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> #' [9] LC_ADDRESS=C LC_TELEPHONE=C
> #' [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> #' 
> #' attached base packages:
> #' [1] stats graphics grDevices utils datasets methods base
> #' 
> #' loaded via a namespace (and not attached):
> #' [1] Rcpp_1.0.3 arrow_0.15.1 crayon_1.3.4 assertthat_0.2.1
> #' [5] R6_2.4.1 magrittr_1.5 rlang_0.4.2 rstudioapi_0.10
> #' [9] bit64_0.9-7 glue_1.3.1 purrr_0.3.3 bit_1.1-15.1
> #' [13] compiler_3.5.1 tidyselect_0.2.5{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to