[ 
https://issues.apache.org/jira/browse/ARROW-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458823#comment-17458823
 ] 

Will Jones commented on ARROW-15072:
------------------------------------

Yes, LIBARROW_MINIMAL false should mean dataset is included. I don't seem to 
have access to that jfrog repo; I get 404 trying to download the deb file.

I was able to build this dockerfile:

{code}
FROM rocker/r-base:4.1.2

# TO READ FROM S3
RUN apt update -qq \    
    && apt install -t unstable -y --no-install-recommends \    
       libcurl4-openssl-dev 

ENV LIBARROW_MINIMAL false
ENV ARROW_DEV true

RUN install2.r --error \    
      arrow
{code}

And here is what I got from {{arrow_info()}}:

{code}
Arrow package version: 6.0.1

Capabilities:
               
dataset    TRUE
parquet    TRUE
json       TRUE
s3        FALSE
utf8proc   TRUE
re2        TRUE
snappy     TRUE
gzip       TRUE
brotli     TRUE
zstd       TRUE
lz4        TRUE
lz4_frame  TRUE
lzo       FALSE
bz2        TRUE
jemalloc   TRUE
mimalloc   TRUE
{code}

S3 failed on mine because it was missing SSL libraries. There were likely some 
sort of errors in linking to the downloaded dataset and parquet binaries during 
your build.

I recommend you build again with flags {{--no-cache --progress=plain}} and 
adding the environment variable {{ARROW_DEV true}}, and pipe the results to a 
log file. That should look something like {{docker build --no-cache 
--progress=plain . > build.log}}. We should then be able to see what went wrong 
in that log file.

> [R] Error: This build of the arrow package does not support Datasets
> --------------------------------------------------------------------
>
>                 Key: ARROW-15072
>                 URL: https://issues.apache.org/jira/browse/ARROW-15072
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Parquet, R
>    Affects Versions: 6.0.1
>         Environment: x86_64-pc-linux-gnu (64-bit) via rocker/docker 
> rocker/r-base:4.1.2
>            Reporter: hu geme
>            Priority: Minor
>             Fix For: 6.0.1
>
>
> Hello,
> I would like to report a possible issue (or I did not grasp the documentation 
> and I apologize in advance)
> Im trying to use R with arrow on docker in {*}order to read parquet files 
> from s3{*}:
>  
> {code:java}
> FROM rocker/r-base:4.1.2
> # TO READ FROM S3
> RUN apt update -qq \    
>     && apt install -t unstable -y --no-install-recommends \    
>        libcurl4-openssl-dev 
> ENV LIBARROW_MINIMAL false
> RUN apt update && \    
>     apt install -y -V ca-certificates lsb-release wget && \    
>     wget "https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id 
> --short | tr 'A-Z' 'a-z')/apache-arrow-  apt-source-latest-$(lsb_release 
> --codename --short).deb" && \    
>     apt install -y -V ./apache-arrow-apt-source-latest-$(lsb_release 
> --codename --short).deb
> RUN apt update && \    
>      apt install -y -V -f \        
>      libarrow-dev \        
>      libarrow-dataset-dev \        
>      libarrow-glib-dev \        
>      libarrow-flight-dev \        
>      libparquet-dev \        
>      libparquet-glib-dev
> RUN install2.r --error \    
>       arrow{code}
> Thats the output of sessionInfo from the container running R
>  
> {code:java}
> sessionInfo()
> R version 4.1.2 (2021-11-01)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Debian GNU/Linux 11 (bullseye)Matrix products: default
> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
> LAPACK: 
> /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.18.solocale:
>  [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
>  [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
>  [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
>  [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
>  [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
> [11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8attached base 
> packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> other attached packages:
> [1] arrow_6.0.1 DBI_1.1.1  loaded via a namespace (and not attached):
>  [1] tidyselect_1.1.1   bit_4.0.4          compiler_4.1.2     magrittr_2.0.1  
>   
>  [5] assertthat_0.2.1   R6_2.5.1           tools_4.1.2        glue_1.5.1      
>   
>  [9] bit64_4.0.5        vctrs_0.3.8        RJDBC_0.2-8        rlang_0.4.12    
>   
> [13] rJava_1.0-5        AWR.Athena_2.0.7-0 purrr_0.3.4      {code}
> And as far as I understand,  all requierements are fulfilled to use datasets
> R version 4.1.2
> Platform: x86_64-pc-linux-gnu (64-bit)
> arrow_6.0.1
>  
> {code:java}
> > .Machine$sizeof.pointer < 8
> [1] FALSE
> > getRversion() < "4.0.0"
> [1] FALSE
> > tolower(Sys.info()[["sysname"]]) == "windows"
> [1] FALSE
> >  {code}
> Nevertheless I get 
> Error: This build of the arrow package does not support Datasets
> in return when
> {code:java}
> arrow::open_dataset(sources = path) {code}
> Appreciate any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to