from:"Neal Richardson $JIRA$"

[jira] [Assigned] (ARROW-8586) [R] installation failure on CentOS 7

2020-05-14 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-8586:
--

Assignee: Neal Richardson

> [R] installation failure on CentOS 7
> 
>
> Key: ARROW-8586
> URL: https://issues.apache.org/jira/browse/ARROW-8586
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: CentOS 7
>Reporter: Hei
>Assignee: Neal Richardson
>Priority: Major
>
> Hi,
> I am trying to install arrow via RStudio, but it seems like it is not working 
> that after I installed the package, it kept asking me to run 
> arrow::install_arrow() even after I did:
> {code}
> > install.packages("arrow")
> Installing package into ‘/home/hc/R/x86_64-redhat-linux-gnu-library/3.6’
> (as ‘lib’ is unspecified)
> trying URL 'https://cran.rstudio.com/src/contrib/arrow_0.17.0.tar.gz'
> Content type 'application/x-gzip' length 242534 bytes (236 KB)
> ==
> downloaded 236 KB
> * installing *source* package ‘arrow’ ...
> ** package ‘arrow’ successfully unpacked and MD5 sums checked
> ** using staged installation
> *** Successfully retrieved C++ source
> *** Building C++ libraries
>  cmake
>  arrow  
> ./configure: line 132: cd: libarrow/arrow-0.17.0/lib: Not a directory
> - NOTE ---
> After installation, please run arrow::install_arrow()
> for help installing required runtime libraries
> -
> ** libs
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array.cpp -o array.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array_from_vector.cpp -o 
> array_from_vector.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array_to_vector.cpp -o 
> array_to_vector.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c arraydata.cpp -o arraydata.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c arrowExports.cpp -o 
> arrowExports.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c buffer.cpp -o buffer.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c chunkedarray.cpp -o 
> chunkedarray.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c compression.cpp -o 
> compression.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c compute.cpp -o compute.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include

[jira] [Assigned] (ARROW-8734) [R] autobrew script always builds from master

2020-05-14 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-8734:
--

Assignee: Neal Richardson

> [R] autobrew script always builds from master
> -
>
> Key: ARROW-8734
> URL: https://issues.apache.org/jira/browse/ARROW-8734
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Jonathan Keane
>Assignee: Neal Richardson
>Priority: Major
>
> I've tried to install / build from source (with from a git checkout and using 
> the built-in `install_arrow()`) and when compiling I'm getting the following 
> error reliably during the auto brew process:
> {code:bash}
>  x System command 'R' failed, exit status: 1, stdout + stderr:
> E> * checking for file ‘/Users/jkeane/Dropbox/arrow/r/DESCRIPTION’ ... OK
> E> * preparing ‘arrow’:
> E> * checking DESCRIPTION meta-information ... OK
> E> * cleaning src
> E> * running ‘cleanup’
> E> * installing the package to build vignettes
> E>   ---
> E> * installing *source* package ‘arrow’ ...
> E> ** using staged installation
> E> *** Generating code with data-raw/codegen.R
> E> There were 27 warnings (use warnings() to see them)
> E> *** > 375 functions decorated with [[arrow|s3::export]]
> E> *** > generated file `src/arrowExports.cpp`
> E> *** > generated file `R/arrowExports.R`
> E> *** Downloading apache-arrow
> E>  Using local manifest for apache-arrow
> E> Thu May  7 13:13:42 CDT 2020: Auto-brewing apache-arrow in 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T//build-apache-arrow...
> E> ==> Tapping autobrew/core from https://github.com/autobrew/homebrew-core
> E> Tapped 2 commands and 4639 formulae (4,888 files, 12.7MB).
> E> lz4
> E> openssl
> E> thrift
> E> snappy
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/lz4-1.8.3.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/b4158ef68d619dbf78935df6a42a70b8339a65bc8876cbb4446355ccd40fa5de--lz4-1.8.3.mojave.bottle.tar.gz
> E> ==> Pouring lz4-1.8.3.mojave.bottle.tar.gz
> E> ==> Skipping post_install step for autobrew...
> E> 🍺  
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/Cellar/lz4/1.8.3:
>  22 files, 512.7KB
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/openssl-1.0.2p.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/fbb493745981c8b26c0fab115c76c2a70142bfde9e776c450277e9dfbbba0bb2--openssl-1.0.2p.mojave.bottle.tar.gz
> E> ==> Pouring openssl-1.0.2p.mojave.bottle.tar.gz
> E> ==> Skipping post_install step for autobrew...
> E> ==> Caveats
> E> openssl is keg-only, which means it was not symlinked into 
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow,
> E> because Apple has deprecated use of OpenSSL in favor of its own TLS and 
> crypto libraries.
> E> 
> E> If you need to have openssl first in your PATH run:
> E>   echo 'export 
> PATH="/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/bin:$PATH"'
>  >> ~/.zshrc
> E> 
> E> For compilers to find openssl you may need to set:
> E>   export 
> LDFLAGS="-L/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/lib"
> E>   export 
> CPPFLAGS="-I/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/include"
> E> 
> E> For pkg-config to find openssl you may need to set:
> E>   export 
> PKG_CONFIG_PATH="/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/lib/pkgconfig"
> E> 
> E> ==> Summary
> E> 🍺  
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/Cellar/openssl/1.0.2p:
>  1,793 files, 12MB
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/thrift-0.11.0.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/7e05ea11a9f7f924dd7f8f36252ec73a24958b7f214f71e3752a355e75e589bd--thrift-0.11.0.mojave.bottle.tar.gz
> E> ==> Pouring thrift-0.11.0.mojave.bottle.tar.gz
> E> ==> Skipping post_install step for autobrew...
> E> ==> Caveats
> E> To install Ruby binding:
> E>   gem install thrift
> E> ==> Summary
> E> 🍺  
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/Cellar/thrift/0.11.0:
>  102 files, 7MB
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/snappy-1.1.7_1.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/1f09938804055499d1dd951b13b26d80c56eae359aa051284bf4f51d109a9f73--snappy-1.1.7_1.mojave.bottle.tar.gz
> E> ==> Pouring snappy-1.1.7_1.mojave.bottle.tar.gz
> E> ==> Skipping post_install step for autobrew...
> E> 🍺

[jira] [Created] (ARROW-8804) [R][CI] Followup to Rtools40 upgrade

2020-05-14 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-8804:
--

 Summary: [R][CI] Followup to Rtools40 upgrade
 Key: ARROW-8804
 URL: https://issues.apache.org/jira/browse/ARROW-8804
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, R
Reporter: Neal Richardson
Assignee: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (ARROW-8787) [R] read_parquet() don't end

2020-05-13 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson closed ARROW-8787.
--
  Assignee: Neal Richardson
Resolution: Duplicate

> [R] read_parquet() don't end
> 
>
> Key: ARROW-8787
> URL: https://issues.apache.org/jira/browse/ARROW-8787
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Windows10
> R 3.6.3
>Reporter: Masaru
>Assignee: Neal Richardson
>Priority: Major
>
> I have tried to use read_parquet() function as follows:
> {code:java}
> write_parquet(data.table(matrix(1,1)),"test.parquet")
> read_parquet("test.parquet"){code}
> The data set is very small. 
>  However, the process never end at read_parquet().
> Could you please show how to fix the settings or code?
>  I have installed the package via cran site.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (ARROW-8787) [R] read_parquet() don't end

2020-05-13 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reopened ARROW-8787:


> [R] read_parquet() don't end
> 
>
> Key: ARROW-8787
> URL: https://issues.apache.org/jira/browse/ARROW-8787
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Windows10
> R 3.6.3
>Reporter: Masaru
>Priority: Major
> Fix For: 0.17.0
>
>
> I have tried to use read_parquet() function as follows:
> {code:java}
> write_parquet(data.table(matrix(1,1)),"test.parquet")
> read_parquet("test.parquet"){code}
> The data set is very small. 
>  However, the process never end at read_parquet().
> Could you please show how to fix the settings or code?
>  I have installed the package via cran site.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8787) [R] read_parquet() don't end

2020-05-13 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8787:
---
Fix Version/s: (was: 0.17.0)

> [R] read_parquet() don't end
> 
>
> Key: ARROW-8787
> URL: https://issues.apache.org/jira/browse/ARROW-8787
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Windows10
> R 3.6.3
>Reporter: Masaru
>Priority: Major
>
> I have tried to use read_parquet() function as follows:
> {code:java}
> write_parquet(data.table(matrix(1,1)),"test.parquet")
> read_parquet("test.parquet"){code}
> The data set is very small. 
>  However, the process never end at read_parquet().
> Could you please show how to fix the settings or code?
>  I have installed the package via cran site.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (ARROW-8787) [R] read_parquet() don't end

2020-05-13 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson closed ARROW-8787.
--
Resolution: Duplicate

> [R] read_parquet() don't end
> 
>
> Key: ARROW-8787
> URL: https://issues.apache.org/jira/browse/ARROW-8787
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Windows10
> R 3.6.3
>Reporter: Masaru
>Priority: Major
> Fix For: 0.17.0
>
>
> I have tried to use read_parquet() function as follows:
> {code:java}
> write_parquet(data.table(matrix(1,1)),"test.parquet")
> read_parquet("test.parquet"){code}
> The data set is very small. 
>  However, the process never end at read_parquet().
> Could you please show how to fix the settings or code?
>  I have installed the package via cran site.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8787) [R] read_parquet() don't end

2020-05-13 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106876#comment-17106876
 ] 

Neal Richardson commented on ARROW-8787:


Glad to hear that works around the issue for you, though obviously it's not 
ideal. Hopefully someone will be able to fix this properly.

> [R] read_parquet() don't end
> 
>
> Key: ARROW-8787
> URL: https://issues.apache.org/jira/browse/ARROW-8787
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Windows10
> R 3.6.3
>Reporter: Masaru
>Priority: Major
> Fix For: 0.17.0
>
>
> I have tried to use read_parquet() function as follows:
> {code:java}
> write_parquet(data.table(matrix(1,1)),"test.parquet")
> read_parquet("test.parquet"){code}
> The data set is very small. 
>  However, the process never end at read_parquet().
> Could you please show how to fix the settings or code?
>  I have installed the package via cran site.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8787) [R] read_parquet() don't end

2020-05-13 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106820#comment-17106820
 ] 

Neal Richardson commented on ARROW-8787:


Hmm, this could be a duplicate of ARROW-7288. Could you try setting the locale 
like that issue mentions and see if that works?

> [R] read_parquet() don't end
> 
>
> Key: ARROW-8787
> URL: https://issues.apache.org/jira/browse/ARROW-8787
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Windows10
> R 3.6.3
>Reporter: Masaru
>Priority: Major
> Fix For: 0.17.0
>
>
> I have tried to use read_parquet() function as follows:
> {code:java}
> write_parquet(data.table(matrix(1,1)),"test.parquet")
> read_parquet("test.parquet"){code}
> The data set is very small. 
>  However, the process never end at read_parquet().
> Could you please show how to fix the settings or code?
>  I have installed the package via cran site.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8787) [R] read_parquet() don't end

2020-05-13 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106791#comment-17106791
 ] 

Neal Richardson commented on ARROW-8787:


It should be instant. It's instantaneous when we run the tests in our CI, and 
also when CRAN runs it too.

I'm not sure what's different about your system to even begin to give you 
recommendations on what to do. Providing {{sessionInfo()}} might be a start, as 
well as anything unique about how your system is configured.

> [R] read_parquet() don't end
> 
>
> Key: ARROW-8787
> URL: https://issues.apache.org/jira/browse/ARROW-8787
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Windows10
> R 3.6.3
>Reporter: Masaru
>Priority: Major
> Fix For: 0.17.0
>
>
> I have tried to use read_parquet() function as follows:
> {code:java}
> write_parquet(data.table(matrix(1,1)),"test.parquet")
> read_parquet("test.parquet"){code}
> The data set is very small. 
>  However, the process never end at read_parquet().
> Could you please show how to fix the settings or code?
>  I have installed the package via cran site.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-8717) [CI][Packaging] Add build dependency on boost to homebrew

2020-05-13 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-8717.

Resolution: Fixed

Issue resolved by pull request 7173
[https://github.com/apache/arrow/pull/7173]

> [CI][Packaging] Add build dependency on boost to homebrew
> -
>
> Key: ARROW-8717
> URL: https://issues.apache.org/jira/browse/ARROW-8717
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Packaging
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> cf. https://github.com/Homebrew/homebrew-core/pull/54287
> and revise the Travis jobs to uninstall boost and thrift before checking the 
> formula



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-8604) [R][CI] Update CI to use R 4.0

2020-05-13 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-8604.

Resolution: Fixed

Issue resolved by pull request 7107
[https://github.com/apache/arrow/pull/7107]

> [R][CI] Update CI to use R 4.0
> --
>
> Key: ARROW-8604
> URL: https://issues.apache.org/jira/browse/ARROW-8604
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, R
>Reporter: Francois Saint-Jacques
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [Master|[https://github.com/apache/arrow/runs/622393526]] fails to compile. 
> The C++ cmake build is not using the same 
> [compiler|[https://github.com/apache/arrow/runs/622393526#step:8:807]] than 
> the R extension 
> [compiler|[https://github.com/apache/arrow/runs/622393526#step:11:141]].
> {code:java}
> // Files installed here
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libarrow.a (deflated 85%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libarrow_dataset.a (deflated 82%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libparquet.a (deflated 84%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libsnappy.a (deflated 61%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libthrift.a (deflated 81%)
> // Linker is using `-L`
> C:/Rtools/mingw_32/bin/g++ -shared -s -static-libgcc -o arrow.dll tmp.def 
> array.o array_from_vector.o array_to_vector.o arraydata.o arrowExports.o 
> buffer.o chunkedarray.o compression.o compute.o csv.o dataset.o datatype.o 
> expression.o feather.o field.o filesystem.o io.o json.o memorypool.o 
> message.o parquet.o py-to-r.o recordbatch.o recordbatchreader.o 
> recordbatchwriter.o schema.o symbols.o table.o threadpool.o 
> -L../windows/arrow-0.17.0.9000/lib-8.3.0/i386 
> -L../windows/arrow-0.17.0.9000/lib/i386 -lparquet -larrow_dataset -larrow 
> -lthrift -lsnappy -lz -lzstd -llz4 -lcrypto -lcrypt32 -lws2_32 
> -LC:/R/bin/i386 -lR
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -lparquet
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -larrow_dataset
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -larrow
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -lthrift
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -lsnappy
> {code}
>  
> C++ developers, rejoice, this is almost the end of gcc-4.9.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-8768) [R][CI] Fix nightly as-cran spurious failure

2020-05-13 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-8768.

Resolution: Fixed

Issue resolved by pull request 7151
[https://github.com/apache/arrow/pull/7151]

> [R][CI] Fix nightly as-cran spurious failure
> 
>
> Key: ARROW-8768
> URL: https://issues.apache.org/jira/browse/ARROW-8768
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> An extra check we added to ensure that the package doesn't write anything to 
> the user's home directory started failing on one of the 5 as-cran checks. It 
> appears that a new feature of texlive2020, which is apparently invoked on 
> checking that the pdf manual can be built, adds some caching junk to the home 
> dir. It is unlikely that this is a real failure, probably just an artifact of 
> the test environment. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8782) [Rust] [DataFusion] Add benchmarks based on NYC Taxi data set

2020-05-13 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106398#comment-17106398
 ] 

Neal Richardson commented on ARROW-8782:


[~fsaintjacques] has a Python script somewhere for downloading taxi CSVs and 
turning them into Parquet

> [Rust] [DataFusion] Add benchmarks based on NYC Taxi data set
> -
>
> Key: ARROW-8782
> URL: https://issues.apache.org/jira/browse/ARROW-8782
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 1.0.0
>
>
> I plan on adding a new benchmarks folder beneatch the datafusion crate, 
> containing benchmarks based on the NYC Taxi data set. The benchmark will be a 
> CLI and will support running a number of different queries against CSV and 
> Parquet.
> The README will contain instructions for downloading the data set.
> The benchmark will produce CSV files containing results.
> These benchmarks will allow us to manually verify performance before major 
> releases and on an ongoing basis as we make changes to 
> Arrow/Parquet/DataFusion.
> I will be basing this on existing benchmarks I recently built in Ballista [1] 
> (I am the only contributor to these benchmarks so far).
> A dockerfile will be provided, making it easy to restrict CPU and RAM when 
> running these benchmarks.
> [1] https://github.com/ballista-compute/ballista/tree/master/rust/benchmarks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8787) [R] read_parquet() don't end

2020-05-13 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106377#comment-17106377
 ] 

Neal Richardson commented on ARROW-8787:


Your example works on my machine. Are you able to run the examples from the 
docs? {{example(read_parquet)}}

> [R] read_parquet() don't end
> 
>
> Key: ARROW-8787
> URL: https://issues.apache.org/jira/browse/ARROW-8787
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Windows10
> R 3.6.3
>Reporter: Masaru
>Priority: Major
> Fix For: 0.17.0
>
>
> I have tried to use read_parquet() function as follows:
> {code:java}
> write_parquet(data.table(matrix(1,1)),"test.parquet")
> read_parquet("test.parquet"){code}
> The data set is very small. 
>  However, the process never end at read_parquet().
> Could you please show how to fix the settings or code?
>  I have installed the package via cran site.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8779) [R] Implement conversion to List

2020-05-12 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8779:
---
Summary: [R] Implement conversion to List  (was: [R] Unable to 
write Struct Layout to file (.arrow, .parquet))

> [R] Implement conversion to List
> 
>
> Key: ARROW-8779
> URL: https://issues.apache.org/jira/browse/ARROW-8779
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 0.16.0, 0.17.0
>Reporter: Dominic Dennenmoser
>Priority: Major
>
> It seems there is no method implemented to write a StructArrow (within a 
> TableArrow) to file. A common case would be list columns in a dataframe. If I 
> have understood the documentation correctly, the should be realisable within 
> the current C++ library framework.
> I tested this with the follow df structure:
> {code:none}
> df
> |-- id 
> |-- data 
> |   |-- a 
> |   |-- b 
> |   |-- c 
> |   |-- d {code}
>  I got the follow error message:
> {code:none}
> Error in Table__from_dots(dots, schema) : NotImplemented: Converting vector 
> to arrow type struct indices=int8, ordered=0>, d: double> not implemented{code}
>  I have tried it with {{arrow}} 0.17.0 under {{R}} 3.6.1 .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8779) [R] Unable to write Struct Layout to file (.arrow, .parquet)

2020-05-12 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8779:
---
Labels:   (was: features patch)

> [R] Unable to write Struct Layout to file (.arrow, .parquet)
> 
>
> Key: ARROW-8779
> URL: https://issues.apache.org/jira/browse/ARROW-8779
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 0.16.0, 0.17.0
>Reporter: Dominic Dennenmoser
>Priority: Major
>
> It seems there is no method implemented to write a StructArrow (within a 
> TableArrow) to file. A common case would be list columns in a dataframe. If I 
> have understood the documentation correctly, the should be realisable within 
> the current C++ library framework.
> I tested this with the follow df structure:
> {code:none}
> df
> |-- id 
> |-- data 
> |   |-- a 
> |   |-- b 
> |   |-- c 
> |   |-- d {code}
>  I got the follow error message:
> {code:none}
> Error in Table__from_dots(dots, schema) : NotImplemented: Converting vector 
> to arrow type struct indices=int8, ordered=0>, d: double> not implemented{code}
>  I have tried it with {{arrow}} 0.17.0 under {{R}} 3.6.1 .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8779) [R] Unable to write Struct Layout to file (.arrow, .parquet)

2020-05-12 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105751#comment-17105751
 ] 

Neal Richardson commented on ARROW-8779:


Here's a more minimal reproducer: 

{code:r}
Array$create(list(data.frame(a = 1)))

Error in Array__from_vector(x, type) : 
  NotImplemented: Converting vector to arrow type struct not 
implemented
{code}

It seems that we support creating ListArrays and StructArrays from R, but not a 
List of Structs:

{code}
> Array$create(list(1))
ListArray
>
[
  [
1
  ]
]
> Array$create(data.frame(a = 1))
StructArray
>
-- is_valid: all not null
-- child 0 type: double
  [
1
  ]
{code}


> [R] Unable to write Struct Layout to file (.arrow, .parquet)
> 
>
> Key: ARROW-8779
> URL: https://issues.apache.org/jira/browse/ARROW-8779
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 0.16.0, 0.17.0
>Reporter: Dominic Dennenmoser
>Priority: Major
>  Labels: features, patch
>
> It seems there is no method implemented to write a StructArrow (within a 
> TableArrow) to file. A common case would be list columns in a dataframe. If I 
> have understood the documentation correctly, the should be realisable within 
> the current C++ library framework.
> I tested this with the follow df structure:
> {code:none}
> df
> |-- id 
> |-- data 
> |   |-- a 
> |   |-- b 
> |   |-- c 
> |   |-- d {code}
>  I got the follow error message:
> {code:none}
> Error in Table__from_dots(dots, schema) : NotImplemented: Converting vector 
> to arrow type struct indices=int8, ordered=0>, d: double> not implemented{code}
>  I have tried it with {{arrow}} 0.17.0 under {{R}} 3.6.1 .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8779) [R] Unable to write Struct Layout to file (.arrow, .parquet)

2020-05-12 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105733#comment-17105733
 ] 

Neal Richardson commented on ARROW-8779:


Could you please provide a minimal reproducible example?

> [R] Unable to write Struct Layout to file (.arrow, .parquet)
> 
>
> Key: ARROW-8779
> URL: https://issues.apache.org/jira/browse/ARROW-8779
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 0.16.0, 0.17.0
>Reporter: Dominic Dennenmoser
>Priority: Major
>  Labels: features, patch
>
> It seems there is no method implemented to write a StructArrow (within a 
> TableArrow) to file. A common case would be list columns in a dataframe. If I 
> have understood the documentation correctly, the should be realisable within 
> the current C++ library framework.
> I tested this with the follow df structure:
> {code:none}
> df
> |-- id 
> |-- data 
> |   |-- a 
> |   |-- b 
> |   |-- c 
> |   |-- d {code}
>  I got the follow error message:
> {code:none}
> Error in Table__from_dots(dots, schema) : NotImplemented: Converting vector 
> to arrow type struct indices=int8, ordered=0>, d: double> not implemented{code}
>  I have tried it with {{arrow}} 0.17.0 under {{R}} 3.6.1 .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8768) [R][CI] Fix nightly as-cran spurious failure

2020-05-11 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-8768:
--

 Summary: [R][CI] Fix nightly as-cran spurious failure
 Key: ARROW-8768
 URL: https://issues.apache.org/jira/browse/ARROW-8768
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


An extra check we added to ensure that the package doesn't write anything to 
the user's home directory started failing on one of the 5 as-cran checks. It 
appears that a new feature of texlive2020, which is apparently invoked on 
checking that the pdf manual can be built, adds some caching junk to the home 
dir. It is unlikely that this is a real failure, probably just an artifact of 
the test environment. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8748) [R] Implementing methodes for combining arrow tabels using dplyr::bind_rows and dplyr::bind_cols

2020-05-11 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104730#comment-17104730
 ] 

Neal Richardson commented on ARROW-8748:


We could add methods to concatenate Tables in Arrow memory (the function 
probably exists in the C++ library). But I'm not sure that's the best solution 
to your problem. If you have several Tables and you dump them to a file, you 
don't need to concatenate them in memory first. You can use the lower-level 
{{RecordBatchStreamWriter}} that {{write_ipc_stream}} wraps. Something like:

{code:r}
file_obj <- FileOutputStream$create(file_name)
writer <- RecordBatchFileWriter$create(file_obj, batch$schema)
for (batch in batches) {
  writer$write(batch)
}
writer$close()
file_obj$close()
{code}

See {{?RecordBatchWriter}}.

> [R] Implementing methodes for combining arrow tabels using dplyr::bind_rows 
> and dplyr::bind_cols
> 
>
> Key: ARROW-8748
> URL: https://issues.apache.org/jira/browse/ARROW-8748
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Dominic Dennenmoser
>Priority: Major
>  Labels: features, performance, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> First at all, many thanks for your hard work! I was quite exited, when you 
> guys implemented some basic function of the the {{dplyr}} package. Is there a 
> why to combine tow or more arrow tables into one by rows or columns? At the 
> moment my workaround looks like this:
> {code:r}
> dplyr::bind_rows(
>"a" = arrow.table.1 %>% dplyr::collect(),
>"b" = arrow.table.2 %>% dplyr::collect(),
>"c" = arrow.table.3 %>% dplyr::collect(),
>"d" = arrow.table.4 %>% dplyr::collect(),
>.id = "ID"
>  ) %>% 
>  arrow::write_ipc_stream(sink = "file_name_combined_tables.arrow")
> {code}
> But this is actually not really a meaningful measure because of putting the 
> data back as dataframes/tibbles into the r environment, which might lead to 
> an exhaust of RAM space. Perhaps you might have a better workaround on hand. 
> It would be great if you guys could implement the {{bind_rows}} and 
> {{bind_cols}} methods provided by {{dplyr}}.
> {code:java}
> dplyr::bind_rows(
>"a" = arrow.table.1,
>"b" = arrow.table.2,
>"c" = arrow.table.3,
>"d" = arrow.table.4, 
>.id = "ID"
> ) %>% 
>  arrow::write_ipc_stream(sink = "file_name_combined_tables.arrow"){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8748) [R] Implementing methodes for combining arrow tabels using dplyr::bind_rows and dplyr::bind_cols

2020-05-11 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8748:
---
Labels: features performance  (was: features performance 
pull-request-available)

> [R] Implementing methodes for combining arrow tabels using dplyr::bind_rows 
> and dplyr::bind_cols
> 
>
> Key: ARROW-8748
> URL: https://issues.apache.org/jira/browse/ARROW-8748
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Dominic Dennenmoser
>Priority: Major
>  Labels: features, performance
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> First at all, many thanks for your hard work! I was quite exited, when you 
> guys implemented some basic function of the the {{dplyr}} package. Is there a 
> why to combine tow or more arrow tables into one by rows or columns? At the 
> moment my workaround looks like this:
> {code:r}
> dplyr::bind_rows(
>"a" = arrow.table.1 %>% dplyr::collect(),
>"b" = arrow.table.2 %>% dplyr::collect(),
>"c" = arrow.table.3 %>% dplyr::collect(),
>"d" = arrow.table.4 %>% dplyr::collect(),
>.id = "ID"
>  ) %>% 
>  arrow::write_ipc_stream(sink = "file_name_combined_tables.arrow")
> {code}
> But this is actually not really a meaningful measure because of putting the 
> data back as dataframes/tibbles into the r environment, which might lead to 
> an exhaust of RAM space. Perhaps you might have a better workaround on hand. 
> It would be great if you guys could implement the {{bind_rows}} and 
> {{bind_cols}} methods provided by {{dplyr}}.
> {code:java}
> dplyr::bind_rows(
>"a" = arrow.table.1,
>"b" = arrow.table.2,
>"c" = arrow.table.3,
>"d" = arrow.table.4, 
>.id = "ID"
> ) %>% 
>  arrow::write_ipc_stream(sink = "file_name_combined_tables.arrow"){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8549) [R] Assorted post-0.17 release cleanups

2020-05-11 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8549:
---
Fix Version/s: 0.17.1

> [R] Assorted post-0.17 release cleanups
> ---
>
> Key: ARROW-8549
> URL: https://issues.apache.org/jira/browse/ARROW-8549
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.17.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8699) [R] Fix automatic r_to_py conversion

2020-05-11 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8699:
---
Fix Version/s: 0.17.1

> [R] Fix automatic r_to_py conversion
> 
>
> Key: ARROW-8699
> URL: https://issues.apache.org/jira/browse/ARROW-8699
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.17.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> See https://github.com/rstudio/reticulate/issues/748



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-8758) [R] Updates for compatibility with dplyr 1.0

2020-05-11 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-8758.

Resolution: Fixed

Issue resolved by pull request 7147
[https://github.com/apache/arrow/pull/7147]

> [R] Updates for compatibility with dplyr 1.0
> 
>
> Key: ARROW-8758
> URL: https://issues.apache.org/jira/browse/ARROW-8758
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.1, 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8726) [R][Dataset] segfault with a mis-specified partition

2020-05-11 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8726:
---
Fix Version/s: (was: 0.17.1)

> [R][Dataset] segfault with a mis-specified partition
> 
>
> Key: ARROW-8726
> URL: https://issues.apache.org/jira/browse/ARROW-8726
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Jonathan Keane
>Assignee: Francois Saint-Jacques
>Priority: Major
> Fix For: 1.0.0
>
>
> Calling filter + collect on a dataset with a mis-specified partitioning 
> causes a segfault. Though this is clearly input error, it would be nice if 
> there was some guidance that something was wrong with the partitioning.
> {code:r}
> library(arrow)
> library(dplyr)
> dir.create("multi_mtcars/one", recursive = TRUE)
> dir.create("multi_mtcars/two", recursive = TRUE)
> write_parquet(mtcars, "multi_mtcars/one/mtcars.parquet")
> write_parquet(mtcars, "multi_mtcars/two/mtcars.parquet")
> ds <- open_dataset("multi_mtcars", partitioning = c("level", "nothing"))
> # the following will segfault
> ds %>%
>   filter(cyl > 8) %>% 
>   collect()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8741) [Python][Packaging] Keep VS2015 with bundled dependencies for the windows wheels

2020-05-11 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8741:
---
Fix Version/s: 1.0.0

> [Python][Packaging] Keep VS2015 with bundled dependencies for the windows 
> wheels
> 
>
> Key: ARROW-8741
> URL: https://issues.apache.org/jira/browse/ARROW-8741
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.17.1
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The windows wheels needs to be fixed for the release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8758) [R] Updates for compatibility with dplyr 1.0

2020-05-10 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-8758:
--

 Summary: [R] Updates for compatibility with dplyr 1.0
 Key: ARROW-8758
 URL: https://issues.apache.org/jira/browse/ARROW-8758
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0, 0.17.1






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8734) [R] autobrew script always builds from master

2020-05-07 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8734:
---
Summary: [R] autobrew script always builds from master  (was: [R] 
Compilation error on macOS)

> [R] autobrew script always builds from master
> -
>
> Key: ARROW-8734
> URL: https://issues.apache.org/jira/browse/ARROW-8734
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Jonathan Keane
>Priority: Major
>
> I've tried to install / build from source (with from a git checkout and using 
> the built-in `install_arrow()`) and when compiling I'm getting the following 
> error reliably during the auto brew process:
> {code:bash}
>  x System command 'R' failed, exit status: 1, stdout + stderr:
> E> * checking for file ‘/Users/jkeane/Dropbox/arrow/r/DESCRIPTION’ ... OK
> E> * preparing ‘arrow’:
> E> * checking DESCRIPTION meta-information ... OK
> E> * cleaning src
> E> * running ‘cleanup’
> E> * installing the package to build vignettes
> E>   ---
> E> * installing *source* package ‘arrow’ ...
> E> ** using staged installation
> E> *** Generating code with data-raw/codegen.R
> E> There were 27 warnings (use warnings() to see them)
> E> *** > 375 functions decorated with [[arrow|s3::export]]
> E> *** > generated file `src/arrowExports.cpp`
> E> *** > generated file `R/arrowExports.R`
> E> *** Downloading apache-arrow
> E>  Using local manifest for apache-arrow
> E> Thu May  7 13:13:42 CDT 2020: Auto-brewing apache-arrow in 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T//build-apache-arrow...
> E> ==> Tapping autobrew/core from https://github.com/autobrew/homebrew-core
> E> Tapped 2 commands and 4639 formulae (4,888 files, 12.7MB).
> E> lz4
> E> openssl
> E> thrift
> E> snappy
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/lz4-1.8.3.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/b4158ef68d619dbf78935df6a42a70b8339a65bc8876cbb4446355ccd40fa5de--lz4-1.8.3.mojave.bottle.tar.gz
> E> ==> Pouring lz4-1.8.3.mojave.bottle.tar.gz
> E> ==> Skipping post_install step for autobrew...
> E> 🍺  
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/Cellar/lz4/1.8.3:
>  22 files, 512.7KB
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/openssl-1.0.2p.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/fbb493745981c8b26c0fab115c76c2a70142bfde9e776c450277e9dfbbba0bb2--openssl-1.0.2p.mojave.bottle.tar.gz
> E> ==> Pouring openssl-1.0.2p.mojave.bottle.tar.gz
> E> ==> Skipping post_install step for autobrew...
> E> ==> Caveats
> E> openssl is keg-only, which means it was not symlinked into 
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow,
> E> because Apple has deprecated use of OpenSSL in favor of its own TLS and 
> crypto libraries.
> E> 
> E> If you need to have openssl first in your PATH run:
> E>   echo 'export 
> PATH="/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/bin:$PATH"'
>  >> ~/.zshrc
> E> 
> E> For compilers to find openssl you may need to set:
> E>   export 
> LDFLAGS="-L/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/lib"
> E>   export 
> CPPFLAGS="-I/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/include"
> E> 
> E> For pkg-config to find openssl you may need to set:
> E>   export 
> PKG_CONFIG_PATH="/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/lib/pkgconfig"
> E> 
> E> ==> Summary
> E> 🍺  
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/Cellar/openssl/1.0.2p:
>  1,793 files, 12MB
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/thrift-0.11.0.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/7e05ea11a9f7f924dd7f8f36252ec73a24958b7f214f71e3752a355e75e589bd--thrift-0.11.0.mojave.bottle.tar.gz
> E> ==> Pouring thrift-0.11.0.mojave.bottle.tar.gz
> E> ==> Skipping post_install step for autobrew...
> E> ==> Caveats
> E> To install Ruby binding:
> E>   gem install thrift
> E> ==> Summary
> E> 🍺  
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/Cellar/thrift/0.11.0:
>  102 files, 7MB
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/snappy-1.1.7_1.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/1f09938804055499d1dd951b13b26d80c56eae359aa051284bf4f51d109a9f73--snappy-1.1.7_1.mojave.bottle.tar.gz
> E> ==> Pouring snappy-1.1.7_1.mojave.bottle.tar.gz
> E> ==> Skipping post_install step fo

[jira] [Commented] (ARROW-8734) [R] Compilation error on macOS

2020-05-07 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101968#comment-17101968
 ] 

Neal Richardson commented on ARROW-8734:


I think the load error is because you had the package already loaded, and I 
think it is fixed if you restart R. 

You're right on the other point, we aren't building nightly binaries for 4.0 
yet it seems.

> [R] Compilation error on macOS
> --
>
> Key: ARROW-8734
> URL: https://issues.apache.org/jira/browse/ARROW-8734
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Jonathan Keane
>Priority: Major
>
> I've tried to install / build from source (with from a git checkout and using 
> the built-in `install_arrow()`) and when compiling I'm getting the following 
> error reliably during the auto brew process:
> {code:bash}
>  x System command 'R' failed, exit status: 1, stdout + stderr:
> E> * checking for file ‘/Users/jkeane/Dropbox/arrow/r/DESCRIPTION’ ... OK
> E> * preparing ‘arrow’:
> E> * checking DESCRIPTION meta-information ... OK
> E> * cleaning src
> E> * running ‘cleanup’
> E> * installing the package to build vignettes
> E>   ---
> E> * installing *source* package ‘arrow’ ...
> E> ** using staged installation
> E> *** Generating code with data-raw/codegen.R
> E> There were 27 warnings (use warnings() to see them)
> E> *** > 375 functions decorated with [[arrow|s3::export]]
> E> *** > generated file `src/arrowExports.cpp`
> E> *** > generated file `R/arrowExports.R`
> E> *** Downloading apache-arrow
> E>  Using local manifest for apache-arrow
> E> Thu May  7 13:13:42 CDT 2020: Auto-brewing apache-arrow in 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T//build-apache-arrow...
> E> ==> Tapping autobrew/core from https://github.com/autobrew/homebrew-core
> E> Tapped 2 commands and 4639 formulae (4,888 files, 12.7MB).
> E> lz4
> E> openssl
> E> thrift
> E> snappy
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/lz4-1.8.3.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/b4158ef68d619dbf78935df6a42a70b8339a65bc8876cbb4446355ccd40fa5de--lz4-1.8.3.mojave.bottle.tar.gz
> E> ==> Pouring lz4-1.8.3.mojave.bottle.tar.gz
> E> ==> Skipping post_install step for autobrew...
> E> 🍺  
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/Cellar/lz4/1.8.3:
>  22 files, 512.7KB
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/openssl-1.0.2p.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/fbb493745981c8b26c0fab115c76c2a70142bfde9e776c450277e9dfbbba0bb2--openssl-1.0.2p.mojave.bottle.tar.gz
> E> ==> Pouring openssl-1.0.2p.mojave.bottle.tar.gz
> E> ==> Skipping post_install step for autobrew...
> E> ==> Caveats
> E> openssl is keg-only, which means it was not symlinked into 
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow,
> E> because Apple has deprecated use of OpenSSL in favor of its own TLS and 
> crypto libraries.
> E> 
> E> If you need to have openssl first in your PATH run:
> E>   echo 'export 
> PATH="/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/bin:$PATH"'
>  >> ~/.zshrc
> E> 
> E> For compilers to find openssl you may need to set:
> E>   export 
> LDFLAGS="-L/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/lib"
> E>   export 
> CPPFLAGS="-I/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/include"
> E> 
> E> For pkg-config to find openssl you may need to set:
> E>   export 
> PKG_CONFIG_PATH="/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/lib/pkgconfig"
> E> 
> E> ==> Summary
> E> 🍺  
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/Cellar/openssl/1.0.2p:
>  1,793 files, 12MB
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/thrift-0.11.0.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/7e05ea11a9f7f924dd7f8f36252ec73a24958b7f214f71e3752a355e75e589bd--thrift-0.11.0.mojave.bottle.tar.gz
> E> ==> Pouring thrift-0.11.0.mojave.bottle.tar.gz
> E> ==> Skipping post_install step for autobrew...
> E> ==> Caveats
> E> To install Ruby binding:
> E>   gem install thrift
> E> ==> Summary
> E> 🍺  
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/Cellar/thrift/0.11.0:
>  102 files, 7MB
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/snappy-1.1.7_1.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/1f09938804055499d1dd951b13b26d80c56eae359aa051284bf4f51d1

[jira] [Commented] (ARROW-8734) [R] Compilation error on macOS

2020-05-07 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101958#comment-17101958
 ] 

Neal Richardson commented on ARROW-8734:


Aside: we don't yet have as simple tooling around setting up a full development 
build (C++ and R from source), and in particular on macOS because of the funky 
autobrew build system. It's on my wishlist but understandably it's been farther 
down the list given the effort to get release packaging smooth.

I think the binary should work for you for your current need, but if you want 
to build from source the autobrew way, see what 
https://github.com/ursa-labs/arrow-r-nightly/blob/master/.travis.yml 
does--that's how the nightly binaries are made from a git checkout.

> [R] Compilation error on macOS
> --
>
> Key: ARROW-8734
> URL: https://issues.apache.org/jira/browse/ARROW-8734
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Jonathan Keane
>Priority: Major
>
> I've tried to install / build from source (with from a git checkout and using 
> the built-in `install_arrow()`) and when compiling I'm getting the following 
> error reliably during the auto brew process:
> {code:bash}
>  x System command 'R' failed, exit status: 1, stdout + stderr:
> E> * checking for file ‘/Users/jkeane/Dropbox/arrow/r/DESCRIPTION’ ... OK
> E> * preparing ‘arrow’:
> E> * checking DESCRIPTION meta-information ... OK
> E> * cleaning src
> E> * running ‘cleanup’
> E> * installing the package to build vignettes
> E>   ---
> E> * installing *source* package ‘arrow’ ...
> E> ** using staged installation
> E> *** Generating code with data-raw/codegen.R
> E> There were 27 warnings (use warnings() to see them)
> E> *** > 375 functions decorated with [[arrow|s3::export]]
> E> *** > generated file `src/arrowExports.cpp`
> E> *** > generated file `R/arrowExports.R`
> E> *** Downloading apache-arrow
> E>  Using local manifest for apache-arrow
> E> Thu May  7 13:13:42 CDT 2020: Auto-brewing apache-arrow in 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T//build-apache-arrow...
> E> ==> Tapping autobrew/core from https://github.com/autobrew/homebrew-core
> E> Tapped 2 commands and 4639 formulae (4,888 files, 12.7MB).
> E> lz4
> E> openssl
> E> thrift
> E> snappy
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/lz4-1.8.3.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/b4158ef68d619dbf78935df6a42a70b8339a65bc8876cbb4446355ccd40fa5de--lz4-1.8.3.mojave.bottle.tar.gz
> E> ==> Pouring lz4-1.8.3.mojave.bottle.tar.gz
> E> ==> Skipping post_install step for autobrew...
> E> 🍺  
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/Cellar/lz4/1.8.3:
>  22 files, 512.7KB
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/openssl-1.0.2p.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/fbb493745981c8b26c0fab115c76c2a70142bfde9e776c450277e9dfbbba0bb2--openssl-1.0.2p.mojave.bottle.tar.gz
> E> ==> Pouring openssl-1.0.2p.mojave.bottle.tar.gz
> E> ==> Skipping post_install step for autobrew...
> E> ==> Caveats
> E> openssl is keg-only, which means it was not symlinked into 
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow,
> E> because Apple has deprecated use of OpenSSL in favor of its own TLS and 
> crypto libraries.
> E> 
> E> If you need to have openssl first in your PATH run:
> E>   echo 'export 
> PATH="/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/bin:$PATH"'
>  >> ~/.zshrc
> E> 
> E> For compilers to find openssl you may need to set:
> E>   export 
> LDFLAGS="-L/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/lib"
> E>   export 
> CPPFLAGS="-I/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/include"
> E> 
> E> For pkg-config to find openssl you may need to set:
> E>   export 
> PKG_CONFIG_PATH="/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/lib/pkgconfig"
> E> 
> E> ==> Summary
> E> 🍺  
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/Cellar/openssl/1.0.2p:
>  1,793 files, 12MB
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/thrift-0.11.0.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/7e05ea11a9f7f924dd7f8f36252ec73a24958b7f214f71e3752a355e75e589bd--thrift-0.11.0.mojave.bottle.tar.gz
> E> ==> Pouring thrift-0.11.0.mojave.bottle.tar.gz
> E> ==> Skipping post_install step for autobrew...
> E> ==> Caveats
> E> To install Ruby binding:
> E>   gem install thrift
> E> ==> Summary
>

[jira] [Comment Edited] (ARROW-8734) [R] Compilation error on macOS

2020-05-07 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101953#comment-17101953
 ] 

Neal Richardson edited comment on ARROW-8734 at 5/7/20, 6:34 PM:
-

-You need to be on the exact same version of C++ library and R package. Are you 
installing from a git checkout?- Ah yes, you said that. But if you're 
installing from a checkout, don't use {{install_arrow}}.

If you want a development version on macOS, why not use our nightly binaries 
and avoid the hassle of a source build? {{arrow::install_arrow(nightly=TRUE)}} 
should do it, or you could set the {{repos}} arg to install.packages yourself.


was (Author: npr):
You need to be on the exact same version of C++ library and R package. Are you 
installing from a git checkout? 

If you want a development version on macOS, why not use our nightly binaries 
and avoid the hassle of a source build? {{arrow::install_arrow(nightly=TRUE)}} 
should do it, or you could set the {{repos}} arg to install.packages yourself.

> [R] Compilation error on macOS
> --
>
> Key: ARROW-8734
> URL: https://issues.apache.org/jira/browse/ARROW-8734
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Jonathan Keane
>Priority: Major
>
> I've tried to install / build from source (with from a git checkout and using 
> the built-in `install_arrow()`) and when compiling I'm getting the following 
> error reliably during the auto brew process:
> {code:bash}
>  x System command 'R' failed, exit status: 1, stdout + stderr:
> E> * checking for file ‘/Users/jkeane/Dropbox/arrow/r/DESCRIPTION’ ... OK
> E> * preparing ‘arrow’:
> E> * checking DESCRIPTION meta-information ... OK
> E> * cleaning src
> E> * running ‘cleanup’
> E> * installing the package to build vignettes
> E>   ---
> E> * installing *source* package ‘arrow’ ...
> E> ** using staged installation
> E> *** Generating code with data-raw/codegen.R
> E> There were 27 warnings (use warnings() to see them)
> E> *** > 375 functions decorated with [[arrow|s3::export]]
> E> *** > generated file `src/arrowExports.cpp`
> E> *** > generated file `R/arrowExports.R`
> E> *** Downloading apache-arrow
> E>  Using local manifest for apache-arrow
> E> Thu May  7 13:13:42 CDT 2020: Auto-brewing apache-arrow in 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T//build-apache-arrow...
> E> ==> Tapping autobrew/core from https://github.com/autobrew/homebrew-core
> E> Tapped 2 commands and 4639 formulae (4,888 files, 12.7MB).
> E> lz4
> E> openssl
> E> thrift
> E> snappy
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/lz4-1.8.3.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/b4158ef68d619dbf78935df6a42a70b8339a65bc8876cbb4446355ccd40fa5de--lz4-1.8.3.mojave.bottle.tar.gz
> E> ==> Pouring lz4-1.8.3.mojave.bottle.tar.gz
> E> ==> Skipping post_install step for autobrew...
> E> 🍺  
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/Cellar/lz4/1.8.3:
>  22 files, 512.7KB
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/openssl-1.0.2p.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/fbb493745981c8b26c0fab115c76c2a70142bfde9e776c450277e9dfbbba0bb2--openssl-1.0.2p.mojave.bottle.tar.gz
> E> ==> Pouring openssl-1.0.2p.mojave.bottle.tar.gz
> E> ==> Skipping post_install step for autobrew...
> E> ==> Caveats
> E> openssl is keg-only, which means it was not symlinked into 
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow,
> E> because Apple has deprecated use of OpenSSL in favor of its own TLS and 
> crypto libraries.
> E> 
> E> If you need to have openssl first in your PATH run:
> E>   echo 'export 
> PATH="/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/bin:$PATH"'
>  >> ~/.zshrc
> E> 
> E> For compilers to find openssl you may need to set:
> E>   export 
> LDFLAGS="-L/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/lib"
> E>   export 
> CPPFLAGS="-I/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/include"
> E> 
> E> For pkg-config to find openssl you may need to set:
> E>   export 
> PKG_CONFIG_PATH="/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/lib/pkgconfig"
> E> 
> E> ==> Summary
> E> 🍺  
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/Cellar/openssl/1.0.2p:
>  1,793 files, 12MB
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/thrift-0.11.0.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/7e

[jira] [Commented] (ARROW-8734) [R] Compilation error on macOS

2020-05-07 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101953#comment-17101953
 ] 

Neal Richardson commented on ARROW-8734:


You need to be on the exact same version of C++ library and R package. Are you 
installing from a git checkout? 

If you want a development version on macOS, why not use our nightly binaries 
and avoid the hassle of a source build? {{arrow::install_arrow(nightly=TRUE)}} 
should do it, or you could set the {{repos}} arg to install.packages yourself.

> [R] Compilation error on macOS
> --
>
> Key: ARROW-8734
> URL: https://issues.apache.org/jira/browse/ARROW-8734
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Jonathan Keane
>Priority: Major
>
> I've tried to install / build from source (with from a git checkout and using 
> the built-in `install_arrow()`) and when compiling I'm getting the following 
> error reliably during the auto brew process:
> {code:bash}
>  x System command 'R' failed, exit status: 1, stdout + stderr:
> E> * checking for file ‘/Users/jkeane/Dropbox/arrow/r/DESCRIPTION’ ... OK
> E> * preparing ‘arrow’:
> E> * checking DESCRIPTION meta-information ... OK
> E> * cleaning src
> E> * running ‘cleanup’
> E> * installing the package to build vignettes
> E>   ---
> E> * installing *source* package ‘arrow’ ...
> E> ** using staged installation
> E> *** Generating code with data-raw/codegen.R
> E> There were 27 warnings (use warnings() to see them)
> E> *** > 375 functions decorated with [[arrow|s3::export]]
> E> *** > generated file `src/arrowExports.cpp`
> E> *** > generated file `R/arrowExports.R`
> E> *** Downloading apache-arrow
> E>  Using local manifest for apache-arrow
> E> Thu May  7 13:13:42 CDT 2020: Auto-brewing apache-arrow in 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T//build-apache-arrow...
> E> ==> Tapping autobrew/core from https://github.com/autobrew/homebrew-core
> E> Tapped 2 commands and 4639 formulae (4,888 files, 12.7MB).
> E> lz4
> E> openssl
> E> thrift
> E> snappy
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/lz4-1.8.3.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/b4158ef68d619dbf78935df6a42a70b8339a65bc8876cbb4446355ccd40fa5de--lz4-1.8.3.mojave.bottle.tar.gz
> E> ==> Pouring lz4-1.8.3.mojave.bottle.tar.gz
> E> ==> Skipping post_install step for autobrew...
> E> 🍺  
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/Cellar/lz4/1.8.3:
>  22 files, 512.7KB
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/openssl-1.0.2p.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/fbb493745981c8b26c0fab115c76c2a70142bfde9e776c450277e9dfbbba0bb2--openssl-1.0.2p.mojave.bottle.tar.gz
> E> ==> Pouring openssl-1.0.2p.mojave.bottle.tar.gz
> E> ==> Skipping post_install step for autobrew...
> E> ==> Caveats
> E> openssl is keg-only, which means it was not symlinked into 
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow,
> E> because Apple has deprecated use of OpenSSL in favor of its own TLS and 
> crypto libraries.
> E> 
> E> If you need to have openssl first in your PATH run:
> E>   echo 'export 
> PATH="/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/bin:$PATH"'
>  >> ~/.zshrc
> E> 
> E> For compilers to find openssl you may need to set:
> E>   export 
> LDFLAGS="-L/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/lib"
> E>   export 
> CPPFLAGS="-I/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/include"
> E> 
> E> For pkg-config to find openssl you may need to set:
> E>   export 
> PKG_CONFIG_PATH="/private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/opt/openssl/lib/pkgconfig"
> E> 
> E> ==> Summary
> E> 🍺  
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/Cellar/openssl/1.0.2p:
>  1,793 files, 12MB
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/thrift-0.11.0.mojave.bottle.tar.gz
> E> Already downloaded: 
> /var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/downloads/7e05ea11a9f7f924dd7f8f36252ec73a24958b7f214f71e3752a355e75e589bd--thrift-0.11.0.mojave.bottle.tar.gz
> E> ==> Pouring thrift-0.11.0.mojave.bottle.tar.gz
> E> ==> Skipping post_install step for autobrew...
> E> ==> Caveats
> E> To install Ruby binding:
> E>   gem install thrift
> E> ==> Summary
> E> 🍺  
> /private/var/folders/45/n5gfjjtn05j877spnpbnhqqwgn/T/build-apache-arrow/Cellar/thrift/0.11.0:
>  102 files, 7MB
> E> ==> Downloading 
> https://homebrew.bintray.com/bottles/snappy-1.1.7_1.mojave.bottle.tar.gz

[jira] [Created] (ARROW-8718) [R] Add str() methods to objects

2020-05-06 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-8718:
--

 Summary: [R] Add str() methods to objects
 Key: ARROW-8718
 URL: https://issues.apache.org/jira/browse/ARROW-8718
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


Apparently this will make the RStudio IDE show useful things in the environment 
panel.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8717) [CI][Packaging] Add build dependency on boost to homebrew

2020-05-06 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-8717:
--

 Summary: [CI][Packaging] Add build dependency on boost to homebrew
 Key: ARROW-8717
 URL: https://issues.apache.org/jira/browse/ARROW-8717
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Packaging
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


cf. https://github.com/Homebrew/homebrew-core/pull/54287

and revise the Travis jobs to uninstall boost and thrift before checking the 
formula



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8604) [R][CI] Update CI to use R 4.0

2020-05-05 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8604:
---
Component/s: Continuous Integration

> [R][CI] Update CI to use R 4.0
> --
>
> Key: ARROW-8604
> URL: https://issues.apache.org/jira/browse/ARROW-8604
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, R
>Reporter: Francois Saint-Jacques
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> [Master|[https://github.com/apache/arrow/runs/622393526]] fails to compile. 
> The C++ cmake build is not using the same 
> [compiler|[https://github.com/apache/arrow/runs/622393526#step:8:807]] than 
> the R extension 
> [compiler|[https://github.com/apache/arrow/runs/622393526#step:11:141]].
> {code:java}
> // Files installed here
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libarrow.a (deflated 85%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libarrow_dataset.a (deflated 82%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libparquet.a (deflated 84%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libsnappy.a (deflated 61%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libthrift.a (deflated 81%)
> // Linker is using `-L`
> C:/Rtools/mingw_32/bin/g++ -shared -s -static-libgcc -o arrow.dll tmp.def 
> array.o array_from_vector.o array_to_vector.o arraydata.o arrowExports.o 
> buffer.o chunkedarray.o compression.o compute.o csv.o dataset.o datatype.o 
> expression.o feather.o field.o filesystem.o io.o json.o memorypool.o 
> message.o parquet.o py-to-r.o recordbatch.o recordbatchreader.o 
> recordbatchwriter.o schema.o symbols.o table.o threadpool.o 
> -L../windows/arrow-0.17.0.9000/lib-8.3.0/i386 
> -L../windows/arrow-0.17.0.9000/lib/i386 -lparquet -larrow_dataset -larrow 
> -lthrift -lsnappy -lz -lzstd -llz4 -lcrypto -lcrypt32 -lws2_32 
> -LC:/R/bin/i386 -lR
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -lparquet
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -larrow_dataset
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -larrow
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -lthrift
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -lsnappy
> {code}
>  
> C++ developers, rejoice, this is almost the end of gcc-4.9.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8604) [R][CI] Update CI to use R 4.0

2020-05-05 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8604:
---
Summary: [R][CI] Update CI to use R 4.0  (was: [R] Update CI to use R 4.0)

> [R][CI] Update CI to use R 4.0
> --
>
> Key: ARROW-8604
> URL: https://issues.apache.org/jira/browse/ARROW-8604
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Francois Saint-Jacques
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> [Master|[https://github.com/apache/arrow/runs/622393526]] fails to compile. 
> The C++ cmake build is not using the same 
> [compiler|[https://github.com/apache/arrow/runs/622393526#step:8:807]] than 
> the R extension 
> [compiler|[https://github.com/apache/arrow/runs/622393526#step:11:141]].
> {code:java}
> // Files installed here
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libarrow.a (deflated 85%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libarrow_dataset.a (deflated 82%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libparquet.a (deflated 84%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libsnappy.a (deflated 61%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libthrift.a (deflated 81%)
> // Linker is using `-L`
> C:/Rtools/mingw_32/bin/g++ -shared -s -static-libgcc -o arrow.dll tmp.def 
> array.o array_from_vector.o array_to_vector.o arraydata.o arrowExports.o 
> buffer.o chunkedarray.o compression.o compute.o csv.o dataset.o datatype.o 
> expression.o feather.o field.o filesystem.o io.o json.o memorypool.o 
> message.o parquet.o py-to-r.o recordbatch.o recordbatchreader.o 
> recordbatchwriter.o schema.o symbols.o table.o threadpool.o 
> -L../windows/arrow-0.17.0.9000/lib-8.3.0/i386 
> -L../windows/arrow-0.17.0.9000/lib/i386 -lparquet -larrow_dataset -larrow 
> -lthrift -lsnappy -lz -lzstd -llz4 -lcrypto -lcrypt32 -lws2_32 
> -LC:/R/bin/i386 -lR
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -lparquet
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -larrow_dataset
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -larrow
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -lthrift
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -lsnappy
> {code}
>  
> C++ developers, rejoice, this is almost the end of gcc-4.9.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8703) [R] schema$metadata should be properly typed

2020-05-05 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8703:
---
Priority: Major  (was: Critical)

> [R] schema$metadata should be properly typed
> 
>
> Key: ARROW-8703
> URL: https://issues.apache.org/jira/browse/ARROW-8703
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 0.17.0
>Reporter: René Rex
>Priority: Major
>
> Currently, I try to export numeric data plus some metadata in Python into to 
> a parquet file and read it in R. However, the metadata seems to be a dict in 
> Python but a string in R. I would have expected a list (which is roughly a 
> dict in Python). Am I missing something? Here is the code to demonstrate the 
> issue:
> {{import sys}}
> {{import numpy as np}}
> {{import pyarrow as pa}}
> {{import pyarrow.parquet as pq}}
> {{print(sys.version)}}
> {{print(pa.__version__)}}
> {{x = np.random.randint(0, 10, (10, 3))}}
> {{arrays = [pa.array(x[:, i]) for i in range(x.shape[1])]}}
> {{table = pa.Table.from_arrays(arrays=arrays, names=['A', 'B', 'C'],}}
> {{ metadata=\{'foo': '42'})}}
> {{pq.write_table(table, 'array.parquet', compression='snappy')}}
> {{table = pq.read_table('array.parquet')}}
> {{metadata = table.schema.metadata}}
> {{print(metadata)}}
> {{print(type(metadata))}}
>  
> And in R:
>  
> {{library(arrow)}}
> {{print(R.version)}}
> {{print(packageVersion("arrow"))}}
> {{table <- read_parquet("array.parquet", as_data_frame = FALSE)}}
> {{metadata <- table$schema$metadata}}
> {{print(metadata)}}
> {{print(is(metadata))}}
> {{print(metadata["foo"])}}{{ }}
>  
> Output Python:
> {{3.6.8 (default, Aug 7 2019, 17:28:10) }}
> {{[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]}}
> {{0.13.0}}
> {{OrderedDict([(b'foo', b'42')])}}
> {{}}
>  
> Output R:
> {{[1] ‘0.17.0’}}
> {{[1] "\n-- metadata --\nfoo: 42"}}
> {{[1] "character" "vector" "data.frameRowLabels"}}
> {{[4] "SuperClassMethod" }}
> {{[1] NA}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-8699) [R] Fix automatic r_to_py conversion

2020-05-04 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-8699.

Resolution: Fixed

Issue resolved by pull request 7102
[https://github.com/apache/arrow/pull/7102]

> [R] Fix automatic r_to_py conversion
> 
>
> Key: ARROW-8699
> URL: https://issues.apache.org/jira/browse/ARROW-8699
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> See https://github.com/rstudio/reticulate/issues/748



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8699) [R] Fix automatic r_to_py conversion

2020-05-04 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-8699:
--

 Summary: [R] Fix automatic r_to_py conversion
 Key: ARROW-8699
 URL: https://issues.apache.org/jira/browse/ARROW-8699
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


See https://github.com/rstudio/reticulate/issues/748



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8635) [R] test-filesystem.R takes ~40 seconds to run?

2020-04-29 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095975#comment-17095975
 ] 

Neal Richardson commented on ARROW-8635:


Have you set this aws-sdk environment variable? 
https://github.com/apache/arrow/blob/master/ci/scripts/r_test.sh#L44-L46 
François found it, and it seems to help.

> [R] test-filesystem.R takes ~40 seconds to run?
> ---
>
> Key: ARROW-8635
> URL: https://issues.apache.org/jira/browse/ARROW-8635
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> {code}
> ✔ |  22   | Expressions
> ✔ | 107   | Feather [0.2 s]
> ✔ |   7   | Field
> ✔ |  40   | File system [38.1 s]
> ✔ |   6   | install_arrow()
> ✔ |  26   | JsonTableReader [0.1 s]
> ✔ |  24   | MessageReader
> ✔ |  12   | Message
> ✔ |  31   | Parquet file reading/writing [0.2 s]
> ⠏ |   0   | To/from Pythonvirtualenv: arrow-test
> {code}
> Is this expected? I assume it's related to S3 but that seems like a long 
> time. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8624) [Website] Install page should mention arrow-dataset packages

2020-04-29 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8624:
---
Description: 
I've seen a few reports like [https://github.com/apache/arrow/issues/7055], 
where the user reports that they've installed the arrow system packages, we can 
see that they exist, but {{pkg-config}} reports that it doesn't have them. I 
think this is because {{-larrow_dataset}} isn't found. As the output on that 
issue shows, while arrow core headers and libraries are there, arrow_dataset is 
not.

-Searching through the packaging scripts (such as 
[https://github.com/apache/arrow/blob/master/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in]),
 while there is some metadata about a dataset package, I see that 
ARROW_DATASET=ON is not set anywhere, so I don't think we're building it.- So 
apparently we are building it, but we aren't documenting how to get it.

  was:
I've seen a few reports like https://github.com/apache/arrow/issues/7055, where 
the user reports that they've installed the arrow system packages, we can see 
that they exist, but {{pkg-config}} reports that it doesn't have them. I think 
this is because {{-larrow_dataset}} isn't found. As the output on that issue 
shows, while arrow core headers and libraries are there, arrow_dataset is not.

~~Searching through the packaging scripts (such as 
https://github.com/apache/arrow/blob/master/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in),
 while there is some metadata about a dataset package, I see that 
ARROW_DATASET=ON is not set anywhere, so I don't think we're building it.~~ So 
apparently we are building it, but we aren't documenting how to get it. 


> [Website] Install page should mention arrow-dataset packages
> 
>
> Key: ARROW-8624
> URL: https://issues.apache.org/jira/browse/ARROW-8624
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Affects Versions: 0.17.0
>Reporter: Neal Richardson
>Priority: Critical
>
> I've seen a few reports like [https://github.com/apache/arrow/issues/7055], 
> where the user reports that they've installed the arrow system packages, we 
> can see that they exist, but {{pkg-config}} reports that it doesn't have 
> them. I think this is because {{-larrow_dataset}} isn't found. As the output 
> on that issue shows, while arrow core headers and libraries are there, 
> arrow_dataset is not.
> -Searching through the packaging scripts (such as 
> [https://github.com/apache/arrow/blob/master/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in]),
>  while there is some metadata about a dataset package, I see that 
> ARROW_DATASET=ON is not set anywhere, so I don't think we're building it.- So 
> apparently we are building it, but we aren't documenting how to get it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (ARROW-8624) [Packaging] Linux system packages aren't building with ARROW_DATASET=ON

2020-04-29 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reopened ARROW-8624:


> [Packaging] Linux system packages aren't building with ARROW_DATASET=ON
> ---
>
> Key: ARROW-8624
> URL: https://issues.apache.org/jira/browse/ARROW-8624
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Affects Versions: 0.17.0
>Reporter: Neal Richardson
>Priority: Critical
>
> I've seen a few reports like https://github.com/apache/arrow/issues/7055, 
> where the user reports that they've installed the arrow system packages, we 
> can see that they exist, but {{pkg-config}} reports that it doesn't have 
> them. I think this is because {{-larrow_dataset}} isn't found. As the output 
> on that issue shows, while arrow core headers and libraries are there, 
> arrow_dataset is not.
> Searching through the packaging scripts (such as 
> https://github.com/apache/arrow/blob/master/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in),
>  while there is some metadata about a dataset package, I see that 
> ARROW_DATASET=ON is not set anywhere, so I don't think we're building it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8624) [Website] Install page should mention arrow-dataset packages

2020-04-29 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8624:
---
Summary: [Website] Install page should mention arrow-dataset packages  
(was: [Packaging] Linux system packages aren't building with ARROW_DATASET=ON)

> [Website] Install page should mention arrow-dataset packages
> 
>
> Key: ARROW-8624
> URL: https://issues.apache.org/jira/browse/ARROW-8624
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Affects Versions: 0.17.0
>Reporter: Neal Richardson
>Priority: Critical
>
> I've seen a few reports like https://github.com/apache/arrow/issues/7055, 
> where the user reports that they've installed the arrow system packages, we 
> can see that they exist, but {{pkg-config}} reports that it doesn't have 
> them. I think this is because {{-larrow_dataset}} isn't found. As the output 
> on that issue shows, while arrow core headers and libraries are there, 
> arrow_dataset is not.
> Searching through the packaging scripts (such as 
> https://github.com/apache/arrow/blob/master/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in),
>  while there is some metadata about a dataset package, I see that 
> ARROW_DATASET=ON is not set anywhere, so I don't think we're building it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8624) [Website] Install page should mention arrow-dataset packages

2020-04-29 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8624:
---
Description: 
I've seen a few reports like https://github.com/apache/arrow/issues/7055, where 
the user reports that they've installed the arrow system packages, we can see 
that they exist, but {{pkg-config}} reports that it doesn't have them. I think 
this is because {{-larrow_dataset}} isn't found. As the output on that issue 
shows, while arrow core headers and libraries are there, arrow_dataset is not.

~~Searching through the packaging scripts (such as 
https://github.com/apache/arrow/blob/master/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in),
 while there is some metadata about a dataset package, I see that 
ARROW_DATASET=ON is not set anywhere, so I don't think we're building it.~~ So 
apparently we are building it, but we aren't documenting how to get it. 

  was:
I've seen a few reports like https://github.com/apache/arrow/issues/7055, where 
the user reports that they've installed the arrow system packages, we can see 
that they exist, but {{pkg-config}} reports that it doesn't have them. I think 
this is because {{-larrow_dataset}} isn't found. As the output on that issue 
shows, while arrow core headers and libraries are there, arrow_dataset is not.

Searching through the packaging scripts (such as 
https://github.com/apache/arrow/blob/master/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in),
 while there is some metadata about a dataset package, I see that 
ARROW_DATASET=ON is not set anywhere, so I don't think we're building it. 


> [Website] Install page should mention arrow-dataset packages
> 
>
> Key: ARROW-8624
> URL: https://issues.apache.org/jira/browse/ARROW-8624
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Affects Versions: 0.17.0
>Reporter: Neal Richardson
>Priority: Critical
>
> I've seen a few reports like https://github.com/apache/arrow/issues/7055, 
> where the user reports that they've installed the arrow system packages, we 
> can see that they exist, but {{pkg-config}} reports that it doesn't have 
> them. I think this is because {{-larrow_dataset}} isn't found. As the output 
> on that issue shows, while arrow core headers and libraries are there, 
> arrow_dataset is not.
> ~~Searching through the packaging scripts (such as 
> https://github.com/apache/arrow/blob/master/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in),
>  while there is some metadata about a dataset package, I see that 
> ARROW_DATASET=ON is not set anywhere, so I don't think we're building it.~~ 
> So apparently we are building it, but we aren't documenting how to get it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8624) [Packaging] Linux system packages aren't building with ARROW_DATASET=ON

2020-04-29 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-8624:
--

 Summary: [Packaging] Linux system packages aren't building with 
ARROW_DATASET=ON
 Key: ARROW-8624
 URL: https://issues.apache.org/jira/browse/ARROW-8624
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Affects Versions: 0.17.0
Reporter: Neal Richardson


I've seen a few reports like https://github.com/apache/arrow/issues/7055, where 
the user reports that they've installed the arrow system packages, we can see 
that they exist, but {{pkg-config}} reports that it doesn't have them. I think 
this is because {{-larrow_dataset}} isn't found. As the output on that issue 
shows, while arrow core headers and libraries are there, arrow_dataset is not.

Searching through the packaging scripts (such as 
https://github.com/apache/arrow/blob/master/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in),
 while there is some metadata about a dataset package, I see that 
ARROW_DATASET=ON is not set anywhere, so I don't think we're building it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-8611) [R] Can't install arrow 0.17 on Ubuntu 18.04 R 3.6.3

2020-04-28 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-8611.

Fix Version/s: 1.0.0
 Assignee: Neal Richardson
   Resolution: Information Provided

> [R] Can't install arrow 0.17 on Ubuntu 18.04 R 3.6.3
> 
>
> Key: ARROW-8611
> URL: https://issues.apache.org/jira/browse/ARROW-8611
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Zhuo Jia Dai
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> This is the error I get when I try to install it. How do I provide more info 
> to help you diagnose? Seems to be an issue with Thrift which I have built on 
> my machine.
>  
> How do I remove thrift and install it?
>  
> "Unable to locate package libthrift-dev " when I try `sudo apt install 
> libthrift-dev`
>  
> {quote} 
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '/home/xiaodai/R/x86_64-pc-linux-gnu-library/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so':
>  
> /home/xiaodai/R/x86_64-pc-linux-gnu-library/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so:
>  undefined symbol: _ZTIN6apache6thrift8protocol9TProtocolE
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘/home/xiaodai/R/x86_64-pc-linux-gnu-library/3.6/arrow’
> * restoring previous ‘/home/xiaodai/R/x86_64-pc-linux-gnu-library/3.6/arrow’
> The downloaded source packages are in
>  ‘/tmp/RtmpUF6P1q/downloaded_packages’
> Warning message:
> In install.packages("arrow") :
>  installation of package ‘arrow’ had non-zero exit status
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8611) [R] Can't install arrow 0.17 on Ubuntu 18.04 R 3.6.3

2020-04-28 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094910#comment-17094910
 ] 

Neal Richardson commented on ARROW-8611:


Glad the binary worked for you. For future reference, if you don't want to use 
whatever version of thrift you have on your system when you install arrow, you 
can set {{EXTRA_CMAKE_FLAGS="-DThrift_SOURCE=BUNDLED"}} (sadly, case sensitive) 
and the Arrow C++ build will build thrift itself.

> [R] Can't install arrow 0.17 on Ubuntu 18.04 R 3.6.3
> 
>
> Key: ARROW-8611
> URL: https://issues.apache.org/jira/browse/ARROW-8611
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Zhuo Jia Dai
>Priority: Major
>
> This is the error I get when I try to install it. How do I provide more info 
> to help you diagnose? Seems to be an issue with Thrift which I have built on 
> my machine.
>  
> How do I remove thrift and install it?
>  
> "Unable to locate package libthrift-dev " when I try `sudo apt install 
> libthrift-dev`
>  
> {quote} 
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '/home/xiaodai/R/x86_64-pc-linux-gnu-library/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so':
>  
> /home/xiaodai/R/x86_64-pc-linux-gnu-library/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so:
>  undefined symbol: _ZTIN6apache6thrift8protocol9TProtocolE
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘/home/xiaodai/R/x86_64-pc-linux-gnu-library/3.6/arrow’
> * restoring previous ‘/home/xiaodai/R/x86_64-pc-linux-gnu-library/3.6/arrow’
> The downloaded source packages are in
>  ‘/tmp/RtmpUF6P1q/downloaded_packages’
> Warning message:
> In install.packages("arrow") :
>  installation of package ‘arrow’ had non-zero exit status
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-8513) [Python] Expose Take with Table input in Python

2020-04-28 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-8513.

Resolution: Fixed

Issue resolved by pull request 7039
[https://github.com/apache/arrow/pull/7039]

> [Python] Expose Take with Table input in Python
> ---
>
> Key: ARROW-8513
> URL: https://issues.apache.org/jira/browse/ARROW-8513
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is implemented in C++ but not exposed in the bindings



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-8572) [Python] Expose UnionArray.array and other fields

2020-04-28 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-8572.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7027
[https://github.com/apache/arrow/pull/7027]

> [Python] Expose UnionArray.array and other fields
> -
>
> Key: ARROW-8572
> URL: https://issues.apache.org/jira/browse/ARROW-8572
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.17.0
>Reporter: David Li
>Assignee: David Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently in Python, you can construct a UnionArray easily, but getting the 
> data back out (without copying) is near-impossible. We should expose the 
> getter for UnionArray.array so we can pull out the constituent arrays. We 
> should also expose fields like mode while we're at it.
> The use case is: in Flight, we'd like to write multiple distinct datasets 
> (with distinct schemas) in a single logical call; using UnionArrays lets us 
> combine these datasets into a single logical dataset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8611) [R] Can't install arrow 0.17 on Ubuntu 18.04 R 3.6.3

2020-04-28 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094602#comment-17094602
 ] 

Neal Richardson commented on ARROW-8611:


You can also set {{NOT_CRAN=true}} so that you'll install with a prebuilt 
binary. See http://arrow.apache.org/docs/r/articles/install.html for more 
details.

> [R] Can't install arrow 0.17 on Ubuntu 18.04 R 3.6.3
> 
>
> Key: ARROW-8611
> URL: https://issues.apache.org/jira/browse/ARROW-8611
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Zhuo Jia Dai
>Priority: Major
>
> This is the error I get when I try to install it. How do I provide more info 
> to help you diagnose? Seems to be an issue with Thrift which I have built on 
> my machine.
>  
> How do I remove thrift and install it?
>  
> "Unable to locate package libthrift-dev " when I try `sudo apt install 
> libthrift-dev`
>  
> {quote} 
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '/home/xiaodai/R/x86_64-pc-linux-gnu-library/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so':
>  
> /home/xiaodai/R/x86_64-pc-linux-gnu-library/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so:
>  undefined symbol: _ZTIN6apache6thrift8protocol9TProtocolE
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘/home/xiaodai/R/x86_64-pc-linux-gnu-library/3.6/arrow’
> * restoring previous ‘/home/xiaodai/R/x86_64-pc-linux-gnu-library/3.6/arrow’
> The downloaded source packages are in
>  ‘/tmp/RtmpUF6P1q/downloaded_packages’
> Warning message:
> In install.packages("arrow") :
>  installation of package ‘arrow’ had non-zero exit status
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8605) [R] Add support for brotli to Windows build

2020-04-28 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094599#comment-17094599
 ] 

Neal Richardson commented on ARROW-8605:


The more important part would be adding brotli to the rtools-packages project 
so that it could be included. See my comment on ARROW-6960.

> [R] Add support for brotli to Windows build
> ---
>
> Key: ARROW-8605
> URL: https://issues.apache.org/jira/browse/ARROW-8605
> Project: Apache Arrow
>  Issue Type: New Feature
>Affects Versions: 0.17.0
>Reporter: Hei
>Priority: Major
>
> Hi,
> My friend installed arrow and tried to open a parquet file with brotli codec. 
>  But then, he got an error when calling read_parquet("my.parquet") on Windows:
> {code}
> Error in parquet__arrow__FileReader__ReadTable(self) :
>IOError: NotImplemented: Brotli codec support not built
> {code}
> It sounds similar to ARROW-6960.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8586) [R] installation failure on CentOS 7

2020-04-28 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094591#comment-17094591
 ] 

Neal Richardson commented on ARROW-8586:


Sorry, I should have checked before commenting. Apparently it's 
{{EXTRA_CMAKE_FLAGS}}, not {{EXTRA_CMAKE_ARGS}}. That's why there wasn't more 
output. If you're willing to try again with that correction, I'd be curious to 
see why thrift is failing to install. 

FTR, my conclusions at this point are:

1. Binary version detection on centos when lsb_release is installed isn't 
behaving correctly (so you have to specify LIBARROW_BINARY=centos-7 instead of 
that being determined automatically). Will fix that.
2. The centos-7 binary is being built with {{CC=/usr/bin/gcc 
CXX=/usr/bin/g++}}, which appears to mean gcc 4.8, and that doesn't play well 
with newer compiler versions if you have them. I'll have to explore why I set 
those in the build Dockerfile and think about ways to ensure that compatibility 
when you install.
3. We still don't know why thrift is having problems building. 

> [R] installation failure on CentOS 7
> 
>
> Key: ARROW-8586
> URL: https://issues.apache.org/jira/browse/ARROW-8586
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: CentOS 7
>Reporter: Hei
>Priority: Major
>
> Hi,
> I am trying to install arrow via RStudio, but it seems like it is not working 
> that after I installed the package, it kept asking me to run 
> arrow::install_arrow() even after I did:
> {code}
> > install.packages("arrow")
> Installing package into ‘/home/hc/R/x86_64-redhat-linux-gnu-library/3.6’
> (as ‘lib’ is unspecified)
> trying URL 'https://cran.rstudio.com/src/contrib/arrow_0.17.0.tar.gz'
> Content type 'application/x-gzip' length 242534 bytes (236 KB)
> ==
> downloaded 236 KB
> * installing *source* package ‘arrow’ ...
> ** package ‘arrow’ successfully unpacked and MD5 sums checked
> ** using staged installation
> *** Successfully retrieved C++ source
> *** Building C++ libraries
>  cmake
>  arrow  
> ./configure: line 132: cd: libarrow/arrow-0.17.0/lib: Not a directory
> - NOTE ---
> After installation, please run arrow::install_arrow()
> for help installing required runtime libraries
> -
> ** libs
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array.cpp -o array.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array_from_vector.cpp -o 
> array_from_vector.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array_to_vector.cpp -o 
> array_to_vector.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c arraydata.cpp -o arraydata.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c arrowExports.cpp -o 
> arrowExports.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c buffer.cpp -o buffer.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-s

[jira] [Updated] (ARROW-8556) [R] zstd symbol not found if there are multiple installations of zstd

2020-04-27 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8556:
---
Summary: [R] zstd symbol not found if there are multiple installations of 
zstd  (was: [R] zstd symbol not found on Ubuntu 19.10)

> [R] zstd symbol not found if there are multiple installations of zstd
> -
>
> Key: ARROW-8556
> URL: https://issues.apache.org/jira/browse/ARROW-8556
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Ubuntu 19.10
> R 3.6.1
>Reporter: Karl Dunkle Werner
>Priority: Major
>
> I would like to install the `arrow` R package on my Ubuntu 19.10 system. 
> Prebuilt binaries are unavailable, and I want to enable compression, so I set 
> the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks 
> like the package is able to compile, but can't be loaded. I'm able to install 
> correctly if I don't set the {{LIBARROW_MINIMAL}} variable.
> Here's the error I get:
> {code:java}
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so':
>   ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: 
> ZSTD_initCStream
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘~/.R/3.6/arrow’
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10

2020-04-27 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094017#comment-17094017
 ] 

Neal Richardson commented on ARROW-8556:


Thanks, that makes some sense. Googling the original undefined symbol error 
message, all I found were issues caused by having multiple versions of zstd 
installed (e.g. https://github.com/facebook/wangle/issues/73), but since you 
said you didn't have it installed before, I didn't think it was relevant.

I wish there were a good way to make it not fail in that case, to make sure 
that if you build from source in the R build, that that version gets picked up. 
Maybe someone else will have an idea on how to achieve that.

> [R] zstd symbol not found on Ubuntu 19.10
> -
>
> Key: ARROW-8556
> URL: https://issues.apache.org/jira/browse/ARROW-8556
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Ubuntu 19.10
> R 3.6.1
>Reporter: Karl Dunkle Werner
>Priority: Major
>
> I would like to install the `arrow` R package on my Ubuntu 19.10 system. 
> Prebuilt binaries are unavailable, and I want to enable compression, so I set 
> the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks 
> like the package is able to compile, but can't be loaded. I'm able to install 
> correctly if I don't set the {{LIBARROW_MINIMAL}} variable.
> Here's the error I get:
> {code:java}
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so':
>   ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: 
> ZSTD_initCStream
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘~/.R/3.6/arrow’
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8586) [R] installation failure on CentOS 7

2020-04-27 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094013#comment-17094013
 ] 

Neal Richardson commented on ARROW-8586:


Thanks. A few thoughts. Apologies if this is confusing; we're going deep in 
some different directions:

* {{ARROW_R_DEV=true}} is for installation verbosity only, not for crash 
reporting, and from the install logs you shared, I can see that apparently 
thrift failed to build/install. I haven't seen it fail in that specific way 
before, I don't think. If you want to go deeper into the Matrix with me, try 
reinstalling with {{ARROW_R_DEV=true}} and 
{{EXTRA_CMAKE_ARGS="-DARROW_VERBOSE_THIRDPARTY_BUILD=ON"}} (but unset 
{{LIBARROW_BINARY}} so that we build from source) and maybe we'll see what's 
going on there.
* Alternatively, you could try installing {{thrift}} from {{yum}}, though I'm 
not sure that they have a new enough version (0.11 is the minimum).
* Odd that you got a segfault when reading a parquet file. Is there anything 
special about how your system is configured (compilers, toolchains, etc.) 
beyond a vanilla CentOS 7 environment? The centos-7 binary is built on a base 
centos image with this Dockerfile: 
https://github.com/ursa-labs/arrow-r-nightly/blob/master/linux/yum.Dockerfile 
So maybe see if setting {{CC=/usr/bin/gcc CXX=/usr/bin/g++}} before installing 
the R package (with {{LIBARROW_BINARY=centos-7}}).
* If that makes a difference, I wonder if 
https://github.com/ursa-labs/arrow-r-nightly/blob/master/linux/yum.Dockerfile#L18-L20
 is what is needed to get the thrift compilation when building everything from 
source to work.
* Thanks for the {{lsb_release}} output. That confirms my suspicion about why 
it did not try to download the centos-7 binary to begin with (though obviously 
that's not desirable unless we get it not to segfault for you).

> [R] installation failure on CentOS 7
> 
>
> Key: ARROW-8586
> URL: https://issues.apache.org/jira/browse/ARROW-8586
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: CentOS 7
>Reporter: Hei
>Priority: Major
>
> Hi,
> I am trying to install arrow via RStudio, but it seems like it is not working 
> that after I installed the package, it kept asking me to run 
> arrow::install_arrow() even after I did:
> {code}
> > install.packages("arrow")
> Installing package into ‘/home/hc/R/x86_64-redhat-linux-gnu-library/3.6’
> (as ‘lib’ is unspecified)
> trying URL 'https://cran.rstudio.com/src/contrib/arrow_0.17.0.tar.gz'
> Content type 'application/x-gzip' length 242534 bytes (236 KB)
> ==
> downloaded 236 KB
> * installing *source* package ‘arrow’ ...
> ** package ‘arrow’ successfully unpacked and MD5 sums checked
> ** using staged installation
> *** Successfully retrieved C++ source
> *** Building C++ libraries
>  cmake
>  arrow  
> ./configure: line 132: cd: libarrow/arrow-0.17.0/lib: Not a directory
> - NOTE ---
> After installation, please run arrow::install_arrow()
> for help installing required runtime libraries
> -
> ** libs
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array.cpp -o array.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array_from_vector.cpp -o 
> array_from_vector.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array_to_vector.cpp -o 
> array_to_vector.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c arraydata.cpp -o arraydata.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wal

[jira] [Commented] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10

2020-04-27 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094002#comment-17094002
 ] 

Neal Richardson commented on ARROW-8556:


Any ideas [~fsaintjacques] [~bkietz]?

> [R] zstd symbol not found on Ubuntu 19.10
> -
>
> Key: ARROW-8556
> URL: https://issues.apache.org/jira/browse/ARROW-8556
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Ubuntu 19.10
> R 3.6.1
>Reporter: Karl Dunkle Werner
>Priority: Major
>
> I would like to install the `arrow` R package on my Ubuntu 19.10 system. 
> Prebuilt binaries are unavailable, and I want to enable compression, so I set 
> the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks 
> like the package is able to compile, but can't be loaded. I'm able to install 
> correctly if I don't set the {{LIBARROW_MINIMAL}} variable.
> Here's the error I get:
> {code:java}
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so':
>   ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: 
> ZSTD_initCStream
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘~/.R/3.6/arrow’
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-8607) [R][CI] Unbreak builds following R 4.0 release

2020-04-27 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-8607.

Resolution: Fixed

Issue resolved by pull request 7047
[https://github.com/apache/arrow/pull/7047]

> [R][CI] Unbreak builds following R 4.0 release
> --
>
> Key: ARROW-8607
> URL: https://issues.apache.org/jira/browse/ARROW-8607
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Just a tourniquet to get master passing again while I work on ARROW-8604.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8607) [R][CI] Unbreak builds following R 4.0 release

2020-04-27 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-8607:
--

 Summary: [R][CI] Unbreak builds following R 4.0 release
 Key: ARROW-8607
 URL: https://issues.apache.org/jira/browse/ARROW-8607
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


Just a tourniquet to get master passing again while I work on ARROW-8604.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8606) [CI] Don't trigger all builds on a change to any file in ci/

2020-04-27 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-8606:
--

 Summary: [CI] Don't trigger all builds on a change to any file in 
ci/
 Key: ARROW-8606
 URL: https://issues.apache.org/jira/browse/ARROW-8606
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Neal Richardson
Assignee: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8605) [R] Add support for brotli to Windows build

2020-04-27 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093766#comment-17093766
 ] 

Neal Richardson commented on ARROW-8605:


You are correct. We do not build the windows package with brotli. Here is what 
we do build with: 
https://github.com/apache/arrow/blob/master/ci/scripts/PKGBUILD#L28-L31

If you were interested in adding it, ARROW-6960 is the right model to follow.

> [R] Add support for brotli to Windows build
> ---
>
> Key: ARROW-8605
> URL: https://issues.apache.org/jira/browse/ARROW-8605
> Project: Apache Arrow
>  Issue Type: New Feature
>Affects Versions: 0.17.0
>Reporter: Hei
>Priority: Major
>
> Hi,
> My friend installed arrow and tried to open a parquet file with brotli codec. 
>  But then, he got an error when calling read_parquet("my.parquet") on Windows:
> {code}
> Error in parquet__arrow__FileReader__ReadTable(self) :
>IOError: NotImplemented: Brotli codec support not built
> {code}
> It sounds similar to ARROW-6960.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8605) [R] Add support for brotli to Windows build

2020-04-27 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8605:
---
Summary: [R] Add support for brotli to Windows build  (was: Missing brotli 
Support in R Package?)

> [R] Add support for brotli to Windows build
> ---
>
> Key: ARROW-8605
> URL: https://issues.apache.org/jira/browse/ARROW-8605
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: Hei
>Priority: Major
>
> Hi,
> My friend installed arrow and tried to open a parquet file with brotli codec. 
>  But then, he got an error when calling read_parquet("my.parquet") on Windows:
> {code}
> Error in parquet__arrow__FileReader__ReadTable(self) :
>IOError: NotImplemented: Brotli codec support not built
> {code}
> It sounds similar to ARROW-6960.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8605) [R] Add support for brotli to Windows build

2020-04-27 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8605:
---
Issue Type: New Feature  (was: Bug)

> [R] Add support for brotli to Windows build
> ---
>
> Key: ARROW-8605
> URL: https://issues.apache.org/jira/browse/ARROW-8605
> Project: Apache Arrow
>  Issue Type: New Feature
>Affects Versions: 0.17.0
>Reporter: Hei
>Priority: Major
>
> Hi,
> My friend installed arrow and tried to open a parquet file with brotli codec. 
>  But then, he got an error when calling read_parquet("my.parquet") on Windows:
> {code}
> Error in parquet__arrow__FileReader__ReadTable(self) :
>IOError: NotImplemented: Brotli codec support not built
> {code}
> It sounds similar to ARROW-6960.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-8604) [R] Windows compilation failure

2020-04-27 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-8604:
--

Assignee: Neal Richardson

> [R] Windows compilation failure
> ---
>
> Key: ARROW-8604
> URL: https://issues.apache.org/jira/browse/ARROW-8604
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Francois Saint-Jacques
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> [Master|[https://github.com/apache/arrow/runs/622393526]] fails to compile. 
> The C++ cmake build is not using the same 
> [compiler|[https://github.com/apache/arrow/runs/622393526#step:8:807]] than 
> the R extension 
> [compiler|[https://github.com/apache/arrow/runs/622393526#step:11:141]].
> {code:java}
> // Files installed here
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libarrow.a (deflated 85%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libarrow_dataset.a (deflated 82%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libparquet.a (deflated 84%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libsnappy.a (deflated 61%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libthrift.a (deflated 81%)
> // Linker is using `-L`
> C:/Rtools/mingw_32/bin/g++ -shared -s -static-libgcc -o arrow.dll tmp.def 
> array.o array_from_vector.o array_to_vector.o arraydata.o arrowExports.o 
> buffer.o chunkedarray.o compression.o compute.o csv.o dataset.o datatype.o 
> expression.o feather.o field.o filesystem.o io.o json.o memorypool.o 
> message.o parquet.o py-to-r.o recordbatch.o recordbatchreader.o 
> recordbatchwriter.o schema.o symbols.o table.o threadpool.o 
> -L../windows/arrow-0.17.0.9000/lib-8.3.0/i386 
> -L../windows/arrow-0.17.0.9000/lib/i386 -lparquet -larrow_dataset -larrow 
> -lthrift -lsnappy -lz -lzstd -llz4 -lcrypto -lcrypt32 -lws2_32 
> -LC:/R/bin/i386 -lR
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -lparquet
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -larrow_dataset
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -larrow
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -lthrift
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -lsnappy
> {code}
>  
> C++ developers, rejoice, this is almost the end of gcc-4.9.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8604) [R] Update CI to use R 4.0

2020-04-27 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8604:
---
Summary: [R] Update CI to use R 4.0  (was: [R] Windows compilation failure)

> [R] Update CI to use R 4.0
> --
>
> Key: ARROW-8604
> URL: https://issues.apache.org/jira/browse/ARROW-8604
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Francois Saint-Jacques
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> [Master|[https://github.com/apache/arrow/runs/622393526]] fails to compile. 
> The C++ cmake build is not using the same 
> [compiler|[https://github.com/apache/arrow/runs/622393526#step:8:807]] than 
> the R extension 
> [compiler|[https://github.com/apache/arrow/runs/622393526#step:11:141]].
> {code:java}
> // Files installed here
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libarrow.a (deflated 85%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libarrow_dataset.a (deflated 82%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libparquet.a (deflated 84%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libsnappy.a (deflated 61%)
>   adding: arrow-0.17.0.9000/lib-4.9.3/i386/libthrift.a (deflated 81%)
> // Linker is using `-L`
> C:/Rtools/mingw_32/bin/g++ -shared -s -static-libgcc -o arrow.dll tmp.def 
> array.o array_from_vector.o array_to_vector.o arraydata.o arrowExports.o 
> buffer.o chunkedarray.o compression.o compute.o csv.o dataset.o datatype.o 
> expression.o feather.o field.o filesystem.o io.o json.o memorypool.o 
> message.o parquet.o py-to-r.o recordbatch.o recordbatchreader.o 
> recordbatchwriter.o schema.o symbols.o table.o threadpool.o 
> -L../windows/arrow-0.17.0.9000/lib-8.3.0/i386 
> -L../windows/arrow-0.17.0.9000/lib/i386 -lparquet -larrow_dataset -larrow 
> -lthrift -lsnappy -lz -lzstd -llz4 -lcrypto -lcrypt32 -lws2_32 
> -LC:/R/bin/i386 -lR
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -lparquet
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -larrow_dataset
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -larrow
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -lthrift
> C:/Rtools/mingw_32/bin/../lib/gcc/i686-w64-mingw32/4.9.3/../../../../i686-w64-mingw32/bin/ld.exe:
>  cannot find -lsnappy
> {code}
>  
> C++ developers, rejoice, this is almost the end of gcc-4.9.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8593) [C++] Parquet file_serialize_test.cc fails to build with musl libc

2020-04-25 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8593:
---
Summary: [C++] Parquet file_serialize_test.cc fails to build with musl libc 
 (was: Parquet file_serialize_test.cc fails to build with musl libc)

> [C++] Parquet file_serialize_test.cc fails to build with musl libc
> --
>
> Key: ARROW-8593
> URL: https://issues.apache.org/jira/browse/ARROW-8593
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.17.0
>Reporter: Tobias Mayer
>Assignee: Tobias Mayer
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{TestBufferedRowGroupWriter}} declares a variable named {{PAGE_SIZE}}. This 
> clashes with a macro constant by the same name defined in musl's {{limits.h}}.
> I don't think using ALLCAPS for a local name adds value here, so I'm going to 
> change it to {{page_size}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10

2020-04-24 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091937#comment-17091937
 ] 

Neal Richardson commented on ARROW-8556:


Thanks, that's helpful. So what I see is that when the C++ library builds, 
`cmake` finds the system `zstd` so it opts to use that instead of build it from 
source too. But then when the R package shared library tries to load, it can't 
find it. 

This is beyond my level of C++ competence to debug further, so I'll solicit 
help from someone else.

> [R] zstd symbol not found on Ubuntu 19.10
> -
>
> Key: ARROW-8556
> URL: https://issues.apache.org/jira/browse/ARROW-8556
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Ubuntu 19.10
> R 3.6.1
>Reporter: Karl Dunkle Werner
>Priority: Major
>
> I would like to install the `arrow` R package on my Ubuntu 19.10 system. 
> Prebuilt binaries are unavailable, and I want to enable compression, so I set 
> the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks 
> like the package is able to compile, but can't be loaded. I'm able to install 
> correctly if I don't set the {{LIBARROW_MINIMAL}} variable.
> Here's the error I get:
> {code:java}
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so':
>   ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: 
> ZSTD_initCStream
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘~/.R/3.6/arrow’
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10

2020-04-24 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091890#comment-17091890
 ] 

Neal Richardson commented on ARROW-8556:


Maybe it's something about 19.10, maybe it's something about your particular 
setup, or maybe it's a more general issue. To debug, I'd recommend setting 
`ARROW_R_DEV=true` (for verbosity), `LIBARROW_BINARY=false` (to ensure that we 
build from source), and `LIBARROW_MINIMAL=false` (so that it turns on zstd) and 
reinstalling. Then attach here the full installation logs, and I can try to 
sift through them. Then I may have some other ideas of things to try. Thanks 
for your help!

> [R] zstd symbol not found on Ubuntu 19.10
> -
>
> Key: ARROW-8556
> URL: https://issues.apache.org/jira/browse/ARROW-8556
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Ubuntu 19.10
> R 3.6.1
>Reporter: Karl Dunkle Werner
>Priority: Major
>
> I would like to install the `arrow` R package on my Ubuntu 19.10 system. 
> Prebuilt binaries are unavailable, and I want to enable compression, so I set 
> the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks 
> like the package is able to compile, but can't be loaded. I'm able to install 
> correctly if I don't set the {{LIBARROW_MINIMAL}} variable.
> Here's the error I get:
> {code:java}
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so':
>   ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: 
> ZSTD_initCStream
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘~/.R/3.6/arrow’
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-8575) [Developer] Add issue_comment workflow to rebase a PR

2020-04-24 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-8575.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7028
[https://github.com/apache/arrow/pull/7028]

> [Developer] Add issue_comment workflow to rebase a PR
> -
>
> Key: ARROW-8575
> URL: https://issues.apache.org/jira/browse/ARROW-8575
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10

2020-04-24 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091678#comment-17091678
 ] 

Neal Richardson commented on ARROW-8556:


Thanks. I've mapped ubuntu 19.10 to ubuntu-18.04 
[here|https://github.com/ursa-labs/arrow-r-nightly/blob/master/linux/distro-map.csv#L13]
 so installation with a binary should Just Work now. I'm curious why zstd 
wasn't included correctly before (see that there is no {{-lzstd}} in the 
{{PKG_LIBS}} line), but if you want to let it lie and move on, that's fine with 
me, we can wait and see if anyone else experiences that.

> [R] zstd symbol not found on Ubuntu 19.10
> -
>
> Key: ARROW-8556
> URL: https://issues.apache.org/jira/browse/ARROW-8556
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Ubuntu 19.10
> R 3.6.1
>Reporter: Karl Dunkle Werner
>Priority: Major
>
> I would like to install the `arrow` R package on my Ubuntu 19.10 system. 
> Prebuilt binaries are unavailable, and I want to enable compression, so I set 
> the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks 
> like the package is able to compile, but can't be loaded. I'm able to install 
> correctly if I don't set the {{LIBARROW_MINIMAL}} variable.
> Here's the error I get:
> {code:java}
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so':
>   ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: 
> ZSTD_initCStream
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘~/.R/3.6/arrow’
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8586) [R] installation failure on CentOS 7

2020-04-24 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8586:
---
Summary: [R] installation failure on CentOS 7  (was: Failed to Install 
arrow From CRAN)

> [R] installation failure on CentOS 7
> 
>
> Key: ARROW-8586
> URL: https://issues.apache.org/jira/browse/ARROW-8586
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: CentOS 7
>Reporter: Hei
>Priority: Major
>
> Hi,
> I am trying to install arrow via RStudio, but it seems like it is not working 
> that after I installed the package, it kept asking me to run 
> arrow::install_arrow() even after I did:
> {code}
> > install.packages("arrow")
> Installing package into ‘/home/hc/R/x86_64-redhat-linux-gnu-library/3.6’
> (as ‘lib’ is unspecified)
> trying URL 'https://cran.rstudio.com/src/contrib/arrow_0.17.0.tar.gz'
> Content type 'application/x-gzip' length 242534 bytes (236 KB)
> ==
> downloaded 236 KB
> * installing *source* package ‘arrow’ ...
> ** package ‘arrow’ successfully unpacked and MD5 sums checked
> ** using staged installation
> *** Successfully retrieved C++ source
> *** Building C++ libraries
>  cmake
>  arrow  
> ./configure: line 132: cd: libarrow/arrow-0.17.0/lib: Not a directory
> - NOTE ---
> After installation, please run arrow::install_arrow()
> for help installing required runtime libraries
> -
> ** libs
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array.cpp -o array.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array_from_vector.cpp -o 
> array_from_vector.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array_to_vector.cpp -o 
> array_to_vector.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c arraydata.cpp -o arraydata.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c arrowExports.cpp -o 
> arrowExports.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c buffer.cpp -o buffer.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c chunkedarray.cpp -o 
> chunkedarray.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c compression.cpp -o 
> compression.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c compute.cpp -o compute.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library

[jira] [Commented] (ARROW-8586) Failed to Install arrow From CRAN

2020-04-24 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091670#comment-17091670
 ] 

Neal Richardson commented on ARROW-8586:


Thanks for the report. There seem to be two issues: (1) C++ build from source 
is failing, and (2) when {{install_arrow}} tries to download a prebuilt binary, 
it's not correctly identifying your OS version. 

To debug the first issue, could you please set the environment variable 
{{ARROW_R_DEV=true}} and retry, and share with me the (much more verbose) 
installation logs?

To debug the second, could you please tell me what {{lsb_release -rs}} says at 
the command line?

A workaround will be to set {{LIBARROW_BINARY=centos-7}} and reinstall (or, 
equivalently, call {{arrow::install_arrow(binary="centos-7")}} from R, since 
you have that installed). But I'd appreciate your help in debugging the issue 
so that we can make it work correctly going forward.

> Failed to Install arrow From CRAN
> -
>
> Key: ARROW-8586
> URL: https://issues.apache.org/jira/browse/ARROW-8586
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: CentOS 7
>Reporter: Hei
>Priority: Major
>
> Hi,
> I am trying to install arrow via RStudio, but it seems like it is not working 
> that after I installed the package, it kept asking me to run 
> arrow::install_arrow() even after I did:
> {code}
> > install.packages("arrow")
> Installing package into ‘/home/hc/R/x86_64-redhat-linux-gnu-library/3.6’
> (as ‘lib’ is unspecified)
> trying URL 'https://cran.rstudio.com/src/contrib/arrow_0.17.0.tar.gz'
> Content type 'application/x-gzip' length 242534 bytes (236 KB)
> ==
> downloaded 236 KB
> * installing *source* package ‘arrow’ ...
> ** package ‘arrow’ successfully unpacked and MD5 sums checked
> ** using staged installation
> *** Successfully retrieved C++ source
> *** Building C++ libraries
>  cmake
>  arrow  
> ./configure: line 132: cd: libarrow/arrow-0.17.0/lib: Not a directory
> - NOTE ---
> After installation, please run arrow::install_arrow()
> for help installing required runtime libraries
> -
> ** libs
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array.cpp -o array.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array_from_vector.cpp -o 
> array_from_vector.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c array_to_vector.cpp -o 
> array_to_vector.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c arraydata.cpp -o arraydata.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c arrowExports.cpp -o 
> arrowExports.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c buffer.cpp -o buffer.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/hc/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include" 
> -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
> -grecord-gcc-switches   -m64 -mtune=generic  -c chunkedarray.cpp -o 
> chunkedarray.o
> g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  
> -I"/home/

[jira] [Created] (ARROW-8575) [Developer] Add issue_comment workflow to rebase a PR

2020-04-23 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-8575:
--

 Summary: [Developer] Add issue_comment workflow to rebase a PR
 Key: ARROW-8575
 URL: https://issues.apache.org/jira/browse/ARROW-8575
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Neal Richardson
Assignee: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8566) [R] error when writing POSIXct to spark

2020-04-23 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090916#comment-17090916
 ] 

Neal Richardson commented on ARROW-8566:


Great, thanks for debugging with me. I created 
https://github.com/sparklyr/sparklyr/issues/2439 because I think the current 
{{arrow}} behavior is correct (certainly the 0.16 behavior was not correct, 
unless you happen to live in UTC) so this might need to be worked around in 
{{sparklyr}}. 

> [R] error when writing POSIXct to spark
> ---
>
> Key: ARROW-8566
> URL: https://issues.apache.org/jira/browse/ARROW-8566
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: #> R version 3.6.3 (2020-02-29)
> #> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> #> Running under: macOS Mojave 10.14.6
> sparklyr::spark_version(sc)
> #> [1] '2.4.5'
>Reporter: Curt Bergmann
>Priority: Major
>
> monospaced text}}``` r
> library(DBI)
> library(sparklyr)
> library(arrow)
> #> 
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #> 
> #> timestamp
> sc <- spark_connect(master = "local")
> sparklyr::spark_version(sc)
> #> [1] '2.4.5'
> x <- data.frame(y = Sys.time())
> dbWriteTable(sc, "test_posixct", x)
> #> Error: org.apache.spark.SparkException: Job aborted.
> #> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
> #> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
> #> at 
> org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:503)
> #> at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.saveDataIntoTable(createDataSourceTables.scala:217)
> #> at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:176)
> #> at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
> #> at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
> #> at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
> #> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> #> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> #> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> #> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> #> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> #> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
> #> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
> #> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
> #> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
> #> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
> #> at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
> #> at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
> #> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
> #> at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
> #> at 
> org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:474)
> #> at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:453)
> #> at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:409)
> #> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> #> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> #> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> #> at java.lang.reflect.Method.invoke(Method.java:498)
> #> at sparklyr.Invoke.invoke(invoke.scala:147)
> #> at sparklyr.StreamHandler.handleMethodCall(stream.scala:136)
> #> at sparklyr.StreamHandler.read(stream.scala:61)
> #> at 
> sparklyr.BackendHandler$$anonfun$channelRead0$1.apply$mcV$sp(handler.scala:58)
> #> at scala.util.control.Breaks.breakable(Breaks.scala:38)
> #> at sparklyr.BackendHandler.channelRead0(handler.scala:38)
> #> at sparklyr.BackendHandler.channelRead0(handler.scala:14)
> #> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> #> at 
> io.netty.channel.AbstractChannelHandl

[jira] [Updated] (ARROW-8566) [R] error when writing POSIXct to spark

2020-04-23 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8566:
---
Summary: [R] error when writing POSIXct to spark  (was: Upgraded from r 
package arrow 16 to r package arrow 17 and now get an error when writing 
posixct to spark)

> [R] error when writing POSIXct to spark
> ---
>
> Key: ARROW-8566
> URL: https://issues.apache.org/jira/browse/ARROW-8566
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: #> R version 3.6.3 (2020-02-29)
> #> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> #> Running under: macOS Mojave 10.14.6
> sparklyr::spark_version(sc)
> #> [1] '2.4.5'
>Reporter: Curt Bergmann
>Priority: Major
>
> monospaced text}}``` r
> library(DBI)
> library(sparklyr)
> library(arrow)
> #> 
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #> 
> #> timestamp
> sc <- spark_connect(master = "local")
> sparklyr::spark_version(sc)
> #> [1] '2.4.5'
> x <- data.frame(y = Sys.time())
> dbWriteTable(sc, "test_posixct", x)
> #> Error: org.apache.spark.SparkException: Job aborted.
> #> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
> #> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
> #> at 
> org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:503)
> #> at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.saveDataIntoTable(createDataSourceTables.scala:217)
> #> at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:176)
> #> at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
> #> at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
> #> at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
> #> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> #> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> #> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> #> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> #> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> #> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
> #> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
> #> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
> #> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
> #> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
> #> at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
> #> at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
> #> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
> #> at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
> #> at 
> org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:474)
> #> at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:453)
> #> at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:409)
> #> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> #> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> #> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> #> at java.lang.reflect.Method.invoke(Method.java:498)
> #> at sparklyr.Invoke.invoke(invoke.scala:147)
> #> at sparklyr.StreamHandler.handleMethodCall(stream.scala:136)
> #> at sparklyr.StreamHandler.read(stream.scala:61)
> #> at 
> sparklyr.BackendHandler$$anonfun$channelRead0$1.apply$mcV$sp(handler.scala:58)
> #> at scala.util.control.Breaks.breakable(Breaks.scala:38)
> #> at sparklyr.BackendHandler.channelRead0(handler.scala:38)
> #> at sparklyr.BackendHandler.channelRead0(handler.scala:14)
> #> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> #> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
> #> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)

[jira] [Updated] (ARROW-8566) Upgraded from r package arrow 16 to r package arrow 17 and now get an error when writing posixct to spark

2020-04-23 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8566:
---
Priority: Major  (was: Blocker)

> Upgraded from r package arrow 16 to r package arrow 17 and now get an error 
> when writing posixct to spark
> -
>
> Key: ARROW-8566
> URL: https://issues.apache.org/jira/browse/ARROW-8566
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: #> R version 3.6.3 (2020-02-29)
> #> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> #> Running under: macOS Mojave 10.14.6
> sparklyr::spark_version(sc)
> #> [1] '2.4.5'
>Reporter: Curt Bergmann
>Priority: Major
>
> monospaced text}}``` r
> library(DBI)
> library(sparklyr)
> library(arrow)
> #> 
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #> 
> #> timestamp
> sc <- spark_connect(master = "local")
> sparklyr::spark_version(sc)
> #> [1] '2.4.5'
> x <- data.frame(y = Sys.time())
> dbWriteTable(sc, "test_posixct", x)
> #> Error: org.apache.spark.SparkException: Job aborted.
> #> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
> #> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
> #> at 
> org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:503)
> #> at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.saveDataIntoTable(createDataSourceTables.scala:217)
> #> at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:176)
> #> at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
> #> at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
> #> at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
> #> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> #> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> #> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> #> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> #> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> #> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
> #> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
> #> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
> #> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
> #> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
> #> at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
> #> at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
> #> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
> #> at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
> #> at 
> org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:474)
> #> at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:453)
> #> at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:409)
> #> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> #> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> #> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> #> at java.lang.reflect.Method.invoke(Method.java:498)
> #> at sparklyr.Invoke.invoke(invoke.scala:147)
> #> at sparklyr.StreamHandler.handleMethodCall(stream.scala:136)
> #> at sparklyr.StreamHandler.read(stream.scala:61)
> #> at 
> sparklyr.BackendHandler$$anonfun$channelRead0$1.apply$mcV$sp(handler.scala:58)
> #> at scala.util.control.Breaks.breakable(Breaks.scala:38)
> #> at sparklyr.BackendHandler.channelRead0(handler.scala:38)
> #> at sparklyr.BackendHandler.channelRead0(handler.scala:14)
> #> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> #> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
> #> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360

[jira] [Resolved] (ARROW-8569) [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-8569.

Resolution: Fixed

Issue resolved by pull request 7019
[https://github.com/apache/arrow/pull/7019]

> [CI] Upgrade xcode version for testing homebrew formulae
> 
>
> Key: ARROW-8569
> URL: https://issues.apache.org/jira/browse/ARROW-8569
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Packaging
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> To prevent as many bottles from being built from source.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8566) Upgraded from r package arrow 16 to r package arrow 17 and now get an error when writing posixct to spark

2020-04-23 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090772#comment-17090772
 ] 

Neal Richardson commented on ARROW-8566:


Hmm. Unfortunately, {{java.lang.UnsupportedOperationException}} doesn't tell me 
anything about what is unsupported.

The only thing about posixt types that changed in the last {{arrow}} release 
was a fix for ARROW-3543, specifically 
https://github.com/apache/arrow/commit/507762fa51d17e61f08d36d3626ab8b8df716198.
 I wonder, does it work if you explicitly set {{tz="GMT"}} on a POSIXct and 
send that?



> Upgraded from r package arrow 16 to r package arrow 17 and now get an error 
> when writing posixct to spark
> -
>
> Key: ARROW-8566
> URL: https://issues.apache.org/jira/browse/ARROW-8566
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: #> R version 3.6.3 (2020-02-29)
> #> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> #> Running under: macOS Mojave 10.14.6
> sparklyr::spark_version(sc)
> #> [1] '2.4.5'
>Reporter: Curt Bergmann
>Priority: Blocker
>
> monospaced text}}``` r
> library(DBI)
> library(sparklyr)
> library(arrow)
> #> 
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #> 
> #> timestamp
> sc <- spark_connect(master = "local")
> sparklyr::spark_version(sc)
> #> [1] '2.4.5'
> x <- data.frame(y = Sys.time())
> dbWriteTable(sc, "test_posixct", x)
> #> Error: org.apache.spark.SparkException: Job aborted.
> #> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
> #> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
> #> at 
> org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:503)
> #> at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.saveDataIntoTable(createDataSourceTables.scala:217)
> #> at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:176)
> #> at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
> #> at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
> #> at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
> #> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> #> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> #> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> #> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> #> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> #> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
> #> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
> #> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
> #> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
> #> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
> #> at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
> #> at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
> #> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
> #> at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
> #> at 
> org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:474)
> #> at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:453)
> #> at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:409)
> #> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> #> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> #> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> #> at java.lang.reflect.Method.invoke(Method.java:498)
> #> at sparklyr.Invoke.invoke(invoke.scala:147)
> #> at sparklyr.StreamHandler.handleMethodCall(stream.scala:136)
> #> at sparklyr.StreamHandler.read(stream.scala:61)
> #> at 
> sparklyr.BackendHandler$$anonfun$channelRead0$1.apply$mcV$sp(handler.scala:58)
> #> at scala.util.control.Breaks.breakable(Breaks.scala:38)
> #> at sparklyr.BackendHandler

[jira] [Created] (ARROW-8569) [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-8569:
--

 Summary: [CI] Upgrade xcode version for testing homebrew formulae
 Key: ARROW-8569
 URL: https://issues.apache.org/jira/browse/ARROW-8569
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Packaging
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


To prevent as many bottles from being built from source.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8566) Upgraded from r package arrow 16 to r package arrow 17 and now get an error when writing posixct to spark

2020-04-23 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090673#comment-17090673
 ] 

Neal Richardson commented on ARROW-8566:


Is this consistently reproducible? Do any other data types cause issues? I 
can't tell from the spark traceback what is failing exactly.

> Upgraded from r package arrow 16 to r package arrow 17 and now get an error 
> when writing posixct to spark
> -
>
> Key: ARROW-8566
> URL: https://issues.apache.org/jira/browse/ARROW-8566
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: #> R version 3.6.3 (2020-02-29)
> #> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> #> Running under: macOS Mojave 10.14.6
> sparklyr::spark_version(sc)
> #> [1] '2.4.5'
>Reporter: Curt Bergmann
>Priority: Blocker
>
> monospaced text}}``` r
> library(DBI)
> library(sparklyr)
> library(arrow)
> #> 
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #> 
> #> timestamp
> sc <- spark_connect(master = "local")
> sparklyr::spark_version(sc)
> #> [1] '2.4.5'
> x <- data.frame(y = Sys.time())
> dbWriteTable(sc, "test_posixct", x)
> #> Error: org.apache.spark.SparkException: Job aborted.
> #> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
> #> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
> #> at 
> org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:503)
> #> at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.saveDataIntoTable(createDataSourceTables.scala:217)
> #> at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:176)
> #> at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
> #> at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
> #> at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
> #> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> #> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> #> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> #> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> #> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> #> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
> #> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
> #> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
> #> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
> #> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
> #> at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
> #> at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
> #> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
> #> at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
> #> at 
> org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:474)
> #> at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:453)
> #> at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:409)
> #> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> #> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> #> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> #> at java.lang.reflect.Method.invoke(Method.java:498)
> #> at sparklyr.Invoke.invoke(invoke.scala:147)
> #> at sparklyr.StreamHandler.handleMethodCall(stream.scala:136)
> #> at sparklyr.StreamHandler.read(stream.scala:61)
> #> at 
> sparklyr.BackendHandler$$anonfun$channelRead0$1.apply$mcV$sp(handler.scala:58)
> #> at scala.util.control.Breaks.breakable(Breaks.scala:38)
> #> at sparklyr.BackendHandler.channelRead0(handler.scala:38)
> #> at sparklyr.BackendHandler.channelRead0(handler.scala:14)
> #> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> #> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChann

[jira] [Resolved] (ARROW-8549) [R] Assorted post-0.17 release cleanups

2020-04-22 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-8549.

Resolution: Fixed

Issue resolved by pull request 6995
[https://github.com/apache/arrow/pull/6995]

> [R] Assorted post-0.17 release cleanups
> ---
>
> Key: ARROW-8549
> URL: https://issues.apache.org/jira/browse/ARROW-8549
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8556) [R] zstd symbol not found on Ubuntu 19.10

2020-04-22 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8556:
---
Summary: [R] zstd symbol not found on Ubuntu 19.10  (was: [R] Installation 
fails with `LIBARROW_MINIMAL=false`)

> [R] zstd symbol not found on Ubuntu 19.10
> -
>
> Key: ARROW-8556
> URL: https://issues.apache.org/jira/browse/ARROW-8556
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Ubuntu 19.10
> R 3.6.1
>Reporter: Karl Dunkle Werner
>Priority: Major
>
> I would like to install the `arrow` R package on my Ubuntu 19.10 system. 
> Prebuilt binaries are unavailable, and I want to enable compression, so I set 
> the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks 
> like the package is able to compile, but can't be loaded. I'm able to install 
> correctly if I don't set the {{LIBARROW_MINIMAL}} variable.
> Here's the error I get:
> {code:java}
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so':
>   ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: 
> ZSTD_initCStream
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘~/.R/3.6/arrow’
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8556) [R] Installation fails with `LIBARROW_MINIMAL=false`

2020-04-22 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089913#comment-17089913
 ] 

Neal Richardson commented on ARROW-8556:


Thanks for the report. Several ideas:

* Could you please share the install logs from above that, where it's compiling?
* You could retry with {{LIBARROW_BINARY=ubuntu-18.04}} and see if that works
* Do you have zstd installed on the system already? If so, what version? (Maybe 
there's a minimum version we require and that's not set right)
* If not, you could {{apt-get install zstd}} and then retry
* You could retry with {{ARROW_R_DEV=true}} for more verbosity in the build 
step (but let's go through the other options first)

> [R] Installation fails with `LIBARROW_MINIMAL=false`
> 
>
> Key: ARROW-8556
> URL: https://issues.apache.org/jira/browse/ARROW-8556
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.17.0
> Environment: Ubuntu 19.10
> R 3.6.1
>Reporter: Karl Dunkle Werner
>Priority: Major
>
> I would like to install the `arrow` R package on my Ubuntu 19.10 system. 
> Prebuilt binaries are unavailable, and I want to enable compression, so I set 
> the {{LIBARROW_MINIMAL=false}} environment variable. When I do so, it looks 
> like the package is able to compile, but can't be loaded. I'm able to install 
> correctly if I don't set the {{LIBARROW_MINIMAL}} variable.
> Here's the error I get:
> {code:java}
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so':
>   ~/.R/3.6/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: 
> ZSTD_initCStream
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘~/.R/3.6/arrow’
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8550) [CI] Don't run cron GHA jobs on forks

2020-04-21 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-8550:
--

 Summary: [CI] Don't run cron GHA jobs on forks
 Key: ARROW-8550
 URL: https://issues.apache.org/jira/browse/ARROW-8550
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Neal Richardson
Assignee: Neal Richardson


It's wasteful, and I'm tired of seeing them clogging up my Actions tab and 
notifications. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8549) [R] Assorted post-0.17 release cleanups

2020-04-21 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-8549:
--

 Summary: [R] Assorted post-0.17 release cleanups
 Key: ARROW-8549
 URL: https://issues.apache.org/jira/browse/ARROW-8549
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8548) [Website] 0.17 release post

2020-04-21 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-8548:
--

 Summary: [Website] 0.17 release post
 Key: ARROW-8548
 URL: https://issues.apache.org/jira/browse/ARROW-8548
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Website
Reporter: Neal Richardson
Assignee: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8545) [Python] Allow fast writing of Decimal column to parquet

2020-04-21 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8545:
---
Summary: [Python] Allow fast writing of Decimal column to parquet  (was: 
Allow fast writing of Decimal column to parquet)

> [Python] Allow fast writing of Decimal column to parquet
> 
>
> Key: ARROW-8545
> URL: https://issues.apache.org/jira/browse/ARROW-8545
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.17.0
>Reporter: Fons de Leeuw
>Priority: Minor
>
> Currently, when one wants to use a decimal datatype in Pandas, the only 
> possibility is to use the `decimal.Decimal` standard-libary type. This is 
> then an "object" column in the DataFrame.
> Arrow can write a column of decimal type to Parquet, which is quite 
> impressive given that [fastparquet does not write decimals|#data-types]] at 
> all. However, the writing is *very* slow, in the code snippet below a factor 
> of 4.
> *Improvements*
> Of course the best outcome would be if the conversion of a decimal column can 
> be made faster, but I am not familiar enough with pandas internals to know if 
> that's possible. (This same behavior also applies to `.to_pickle` etc.)
> It would be nice, if a warning is shown that object-typed columns are being 
> converted which is very slow. That would at least make this behavior more 
> explicit.
> Now, if fast parsing of a decimal.Decimal object column is not possible, it 
> would be nice if a workaround is possible. For example, pass an int and then 
> shift the dot "x" places to the left. (It is already possible to pass an int 
> column and specify "decimal" dtype in the Arrow schema during 
> `pa.Table.from_pandas()` but then it simply becomes a decimal without 
> decimals.) Also, it might be nice if it can be encoded as a 128-bit byte 
> string in the pandas column and then directly interpreted by Arrow.
> *Usecase*
> I need to save large dataframes (~10GB) of geospatial data with 
> latitude/longitude. I can't use float as comparisons need to be exact, and 
> the BigQuery "clustering" feature needs either an integer or a decimal but 
> not a float. In the meantime, I have to do a workaround where I use only ints 
> (the original number multiplied by 1000.)
> *Snippet*
> {code:java}
> import decimal
> from time import time
> import numpy as np
> import pandas as pd
> d = dict()
> for col in "abcdefghijklmnopqrstuvwxyz":
> d[col] = np.random.rand(int(1E7)) * 100
> df = pd.DataFrame(d)
> t0 = time()
> df.to_parquet("/tmp/testabc.pq", engine="pyarrow")
> t1 = time()
> df["a"] = df["a"].round(decimals=3).astype(str).map(decimal.Decimal)
> t2 = time()
> df.to_parquet("/tmp/testabc_dec.pq", engine="pyarrow")
> t3 = time()
> print(f"Saving the normal dataframe took {t1-t0:.3f}s, with one decimal 
> column {t3-t2:.3f}s")
> # Saving the normal dataframe took 4.430s, with one decimal column 
> 17.673s{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8538) [Packaging] Remove boost from homebrew formula

2020-04-20 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-8538:
--

 Summary: [Packaging] Remove boost from homebrew formula
 Key: ARROW-8538
 URL: https://issues.apache.org/jira/browse/ARROW-8538
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Packaging
Reporter: Neal Richardson
Assignee: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-8488) [R] Replace VALUE_OR_STOP with ValueOrStop

2020-04-17 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-8488:
--

Assignee: Francois Saint-Jacques  (was: Neal Richardson)

> [R] Replace VALUE_OR_STOP with ValueOrStop
> --
>
> Key: ARROW-8488
> URL: https://issues.apache.org/jira/browse/ARROW-8488
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We should avoid macro as much as possible as per style guide.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-8488) [R] Replace VALUE_OR_STOP with ValueOrStop

2020-04-17 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-8488:
--

Assignee: Neal Richardson

> [R] Replace VALUE_OR_STOP with ValueOrStop
> --
>
> Key: ARROW-8488
> URL: https://issues.apache.org/jira/browse/ARROW-8488
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Francois Saint-Jacques
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We should avoid macro as much as possible as per style guide.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8498) [Python] Schema.from_pandas fails on extension type, while Table.from_pandas works

2020-04-17 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8498:
---
Summary: [Python] Schema.from_pandas fails on extension type, while 
Table.from_pandas works  (was: Schema.from_pandas fails on extension type, 
while Table.from_pandas works)

> [Python] Schema.from_pandas fails on extension type, while Table.from_pandas 
> works
> --
>
> Key: ARROW-8498
> URL: https://issues.apache.org/jira/browse/ARROW-8498
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.16.0
>Reporter: Thomas Buhrmann
>Priority: Major
>
> While Table.from_pandas() seems to work as expected with extension types,
>  Schema.from_pandas()  raises an ArrowTypeError:
> {code:python}
> df = pd.DataFrame({
>"x": pd.Series([1, 2, None], dtype="Int8"),
>"y": pd.Series(["a", "b", None], dtype="category"),
>"z": pd.Series(["ab", "bc", None], dtype="string"),
> })
> print(pa.Table.from_pandas(df).schema)
> print(pa.Schema.from_pandas(df))
> {code}
>  
> Results in:
> {noformat}
> x: int8
> y: dictionary
> z: string
> metadata
> 
> {b'pandas': b'{"index_columns": [{"kind": "range", "name": null, "start": 0, 
> "'
> b'stop": 3, "step": 1}], "column_indexes": [{"name": null, 
> "field_'
> b'name": null, "pandas_type": "unicode", "numpy_type": "object", 
> "'
> b'metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "x", 
> "f'
> b'ield_name": "x", "pandas_type": "int8", "numpy_type": "Int8", 
> "m'
> b'etadata": null}, {"name": "y", "field_name": "y", 
> "pandas_type":'
> b' "categorical", "numpy_type": "int8", "metadata": 
> {"num_categori'
> b'es": 2, "ordered": false}}, {"name": "z", "field_name": "z", 
> "pa'
> b'ndas_type": "unicode", "numpy_type": "string", "metadata": 
> null}'
> b'], "creator": {"library": "pyarrow", "version": "0.16.0"}, 
> "pand'
> b'as_version": "1.0.3"}'}
> ---
> ArrowTypeErrorTraceback (most recent call last)
> ...
> ArrowTypeError: Did not pass numpy.dtype object
> {noformat}
> I'd imagine Table.from_pandas(df).schema and Schema.from_pandas(df) should 
> result in the exact same object?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8473) [Rust] "Statistics support" in rust/parquet readme is incorrect

2020-04-17 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-8473:
---
Summary: [Rust] "Statistics support" in rust/parquet readme is incorrect  
(was: "Statistics support" in rust/parquet readme is incorrect)

> [Rust] "Statistics support" in rust/parquet readme is incorrect
> ---
>
> Key: ARROW-8473
> URL: https://issues.apache.org/jira/browse/ARROW-8473
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Krzysztof Stanisławek
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Statistics are not actually supported in rust implementation of parquet. See 
> [https://github.com/apache/arrow/blob/3e3712a14a3242d70145fb9d3d6f0f4b8c374e68/rust/parquet/src/column/writer.rs#L522]
>  or similar lines in this file, or writer.rs.
> https://github.com/apache/arrow/pull/6951



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-7801) [Developer] Add issue_comment workflow to fix lint/style/codegen

2020-04-16 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-7801.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 6932
[https://github.com/apache/arrow/pull/6932]

> [Developer] Add issue_comment workflow to fix lint/style/codegen
> 
>
> Key: ARROW-7801
> URL: https://issues.apache.org/jira/browse/ARROW-7801
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Like https://github.com/r-lib/actions/tree/master/examples#render-readme. 
> * If changes to r/README.Rmd, render readme
> * If changes to r/R, render docs
> * If changes to r/src, lint.sh --fix



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-7801) [Developer] Add issue_comment workflow to fix lint/style/codegen

2020-04-16 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-7801:
---
Component/s: (was: R)
 (was: Continuous Integration)
 Developer Tools

> [Developer] Add issue_comment workflow to fix lint/style/codegen
> 
>
> Key: ARROW-7801
> URL: https://issues.apache.org/jira/browse/ARROW-7801
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Like https://github.com/r-lib/actions/tree/master/examples#render-readme. 
> * If changes to r/README.Rmd, render readme
> * If changes to r/R, render docs
> * If changes to r/src, lint.sh --fix



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-7801) [Developer] Add issue_comment workflow to fix lint/style/codegen

2020-04-16 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-7801:
---
Summary: [Developer] Add issue_comment workflow to fix lint/style/codegen  
(was: [R][CI] Add lint and doc GitHub Action workflows)

> [Developer] Add issue_comment workflow to fix lint/style/codegen
> 
>
> Key: ARROW-7801
> URL: https://issues.apache.org/jira/browse/ARROW-7801
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Like https://github.com/r-lib/actions/tree/master/examples#render-readme. 
> * If changes to r/README.Rmd, render readme
> * If changes to r/R, render docs
> * If changes to r/src, lint.sh --fix



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-6439) [R] Implement S3 file-system interface in R

2020-04-16 Thread Neal Richardson (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-6439.

Resolution: Fixed

Issue resolved by pull request 6901
[https://github.com/apache/arrow/pull/6901]

> [R] Implement S3 file-system interface in R
> ---
>
> Key: ARROW-6439
> URL: https://issues.apache.org/jira/browse/ARROW-6439
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8489) [Developer] Autotune more things

2020-04-16 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-8489:
--

 Summary: [Developer] Autotune more things
 Key: ARROW-8489
 URL: https://issues.apache.org/jira/browse/ARROW-8489
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools, Python
Reporter: Neal Richardson


ARROW-7801 added the "autotune" comment bot to fix linting errors and rebuild 
some generated files. cmake-format was left off because of Python problems (see 
description on https://github.com/apache/arrow/pull/6932). And there's probably 
other things we want to add (autopep8 for python, and similar for other 
languages?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8482) [Python][R][Parquet] Possible time zone handling inconsistencies

2020-04-16 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085119#comment-17085119
 ] 

Neal Richardson commented on ARROW-8482:


read_parquet() doesn't alter types. It reads what is in the file.

> [Python][R][Parquet] Possible time zone handling inconsistencies 
> -
>
> Key: ARROW-8482
> URL: https://issues.apache.org/jira/browse/ARROW-8482
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python, R
>Reporter: Olaf
>Priority: Critical
>
> Hello there!
>  
> First of all, thanks for making parquet files a reality in *R* and *Python*. 
> This is really great.
> I found a very nasty bug when exchanging parquet files between the two 
> platforms. Consider this.
>  
>  
> {code:java}
> import pandas as pd
> import pyarrow.parquet as pq
> import numpy as np
> df = pd.DataFrame({'string_time_utc' : [pd.to_datetime('2018-02-01 
> 14:00:00.531'), 
>  pd.to_datetime('2018-02-01 14:01:00.456'),
>  pd.to_datetime('2018-03-05 14:01:02.200')]})
> df['timestamp_est'] = 
> pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None)
> df
> Out[5]: 
>  string_time_utc timestamp_est
> 0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531
> 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
> 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
> {code}
>  
> Now I simply write to disk
>  
> {code:java}
> df.to_parquet('myparquet.pq')
> {code}
>  
> And the use *R* to load it.
>  
> {code:java}
> test <- read_parquet('myparquet.pq')
> > test
> # A tibble: 3 x 2
>  string_time_utc timestamp_est 
>
> 1 2018-02-01 09:00:00.530999 2018-02-01 04:00:00.530999
> 2 2018-02-01 09:01:00.456000 2018-02-01 04:01:00.456000
> 3 2018-03-05 09:01:02.20 2018-03-05 04:01:02.20
> {code}
>  
>  
> As you can see, the timestamps have been converted in the process. I first 
> referenced this bug in feather but I still it is still there. This is a very 
> dangerous, silent bug.
>  
> What do you think?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8482) [Python][R][Parquet] Possible time zone handling inconsistencies

2020-04-16 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085008#comment-17085008
 ] 

Neal Richardson commented on ARROW-8482:


This shows the fix wrt 0.16 
https://github.com/apache/arrow/commit/507762fa51d17e61f08d36d3626ab8b8df716198

But that doesn't affect how R prints datetime data with no timezone specified.

> [Python][R][Parquet] Possible time zone handling inconsistencies 
> -
>
> Key: ARROW-8482
> URL: https://issues.apache.org/jira/browse/ARROW-8482
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python, R
>Reporter: Olaf
>Assignee: Wes McKinney
>Priority: Critical
>
> Hello there!
>  
> First of all, thanks for making parquet files a reality in *R* and *Python*. 
> This is really great.
> I found a very nasty bug when exchanging parquet files between the two 
> platforms. Consider this.
>  
>  
> {code:java}
> import pandas as pd
> import pyarrow.parquet as pq
> import numpy as np
> df = pd.DataFrame({'string_time_utc' : [pd.to_datetime('2018-02-01 
> 14:00:00.531'), 
>  pd.to_datetime('2018-02-01 14:01:00.456'),
>  pd.to_datetime('2018-03-05 14:01:02.200')]})
> df['timestamp_est'] = 
> pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None)
> df
> Out[5]: 
>  string_time_utc timestamp_est
> 0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531
> 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
> 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
> {code}
>  
> Now I simply write to disk
>  
> {code:java}
> df.to_parquet('myparquet.pq')
> {code}
>  
> And the use *R* to load it.
>  
> {code:java}
> test <- read_parquet('myparquet.pq')
> > test
> # A tibble: 3 x 2
>  string_time_utc timestamp_est 
>
> 1 2018-02-01 09:00:00.530999 2018-02-01 04:00:00.530999
> 2 2018-02-01 09:01:00.456000 2018-02-01 04:01:00.456000
> 3 2018-03-05 09:01:02.20 2018-03-05 04:01:02.20
> {code}
>  
>  
> As you can see, the timestamps have been converted in the process. I first 
> referenced this bug in feather but I still it is still there. This is a very 
> dangerous, silent bug.
>  
> What do you think?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8482) [Python][R][Parquet] Possible time zone handling inconsistencies

2020-04-16 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085002#comment-17085002
 ] 

Neal Richardson commented on ARROW-8482:


> R apparently treats naive timestamps as localtime

Yes, in the print method, but as you say, it doesn't alter the data itself. See 
https://issues.apache.org/jira/browse/ARROW-3543?focusedCommentId=16929592&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16929592

> [Python][R][Parquet] Possible time zone handling inconsistencies 
> -
>
> Key: ARROW-8482
> URL: https://issues.apache.org/jira/browse/ARROW-8482
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python, R
>Reporter: Olaf
>Assignee: Wes McKinney
>Priority: Critical
>
> Hello there!
>  
> First of all, thanks for making parquet files a reality in *R* and *Python*. 
> This is really great.
> I found a very nasty bug when exchanging parquet files between the two 
> platforms. Consider this.
>  
>  
> {code:java}
> import pandas as pd
> import pyarrow.parquet as pq
> import numpy as np
> df = pd.DataFrame({'string_time_utc' : [pd.to_datetime('2018-02-01 
> 14:00:00.531'), 
>  pd.to_datetime('2018-02-01 14:01:00.456'),
>  pd.to_datetime('2018-03-05 14:01:02.200')]})
> df['timestamp_est'] = 
> pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None)
> df
> Out[5]: 
>  string_time_utc timestamp_est
> 0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531
> 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
> 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
> {code}
>  
> Now I simply write to disk
>  
> {code:java}
> df.to_parquet('myparquet.pq')
> {code}
>  
> And the use *R* to load it.
>  
> {code:java}
> test <- read_parquet('myparquet.pq')
> > test
> # A tibble: 3 x 2
>  string_time_utc timestamp_est 
>
> 1 2018-02-01 09:00:00.530999 2018-02-01 04:00:00.530999
> 2 2018-02-01 09:01:00.456000 2018-02-01 04:01:00.456000
> 3 2018-03-05 09:01:02.20 2018-03-05 04:01:02.20
> {code}
>  
>  
> As you can see, the timestamps have been converted in the process. I first 
> referenced this bug in feather but I still it is still there. This is a very 
> dangerous, silent bug.
>  
> What do you think?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

< 2 3 4 5 6 7 8 9 10 11 >

601 - 700 of 2369 matches

Mail list logo