from:"Uwe L. Korn \(JIRA\)"

[jira] [Commented] (ARROW-5488) [R] Workaround when C++ lib not available

2019-06-03 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854674#comment-16854674
 ] 

Uwe L. Korn commented on ARROW-5488:


Would this involve compiling the C++ lib from source in that case?

> [R] Workaround when C++ lib not available
> -
>
> Key: ARROW-5488
> URL: https://issues.apache.org/jira/browse/ARROW-5488
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Romain François
>Priority: Major
>
> As a way to get to CRAN, we need some way for the package still compile and 
> install and test (although do nothing useful) even when the c++ lib is not 
> available. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-5474) [C++] What version of Boost do we require now?

2019-06-03 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854687#comment-16854687
 ] 

Uwe L. Korn commented on ARROW-5474:


For adoption reasons, it would be nice to use Ubuntu 16.04 as a baseline. This 
has Boost 1.58.

> [C++] What version of Boost do we require now?
> --
>
> Key: ARROW-5474
> URL: https://issues.apache.org/jira/browse/ARROW-5474
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.14.0
>
>
> See debugging on https://issues.apache.org/jira/browse/ARROW-5470. One 
> possible cause for that error is that the local filesystem patch increased 
> the version of boost that we actually require. The boost version (1.54 vs 
> 1.58) was one difference between failure and success. 
> Another point of confusion was that CMake reported two different versions of 
> boost at different times. 
> If we require a minimum version of boost, can we document that better, check 
> for it more accurately in the build scripts, and fail with a useful message 
> if that minimum isn't met? Or something else helpful.
> If the actual cause of the failure was something else (e.g. compiler 
> version), we should figure that out too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-5497) [R][Release] Build and publish R package docs

2019-06-05 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857095#comment-16857095
 ] 

Uwe L. Korn commented on ARROW-5497:


I'm not sure whether JS and Java docs currently get build at all. The 
{{gen_apidocs}} broke at some time and the solution was to migrate everything 
to the main {{docker-compose.yml}} but just hasn't happened yet.

> [R][Release] Build and publish R package docs
> -
>
> Key: ARROW-5497
> URL: https://issues.apache.org/jira/browse/ARROW-5497
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools, Documentation, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> https://issues.apache.org/jira/browse/ARROW-5452 added the R pkgdown site 
> config. Adding the wiring into the apidocs build scripts was deferred because 
> there was some discussion about which workflow was supported and which was 
> deprecated.  
> Uwe says: "Have a look at 
> [https://github.com/apache/arrow/blob/master/docs/Dockerfile] and 
> [https://github.com/apache/arrow/blob/master/ci/docker_build_sphinx.sh] Add 
> that and a docs-r entry in the main {{docker-compose.yml}} should be 
> sufficient to get it running in the docker setup. But actually I would rather 
> like to see that we also add the R build to the above linked files."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-5449) [C++] Local filesystem implementation: investigate Windows UNC paths

2019-06-06 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-5449.

   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4487
[https://github.com/apache/arrow/pull/4487]

> [C++] Local filesystem implementation: investigate Windows UNC paths
> 
>
> Key: ARROW-5449
> URL: https://issues.apache.org/jira/browse/ARROW-5449
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Followup to ARROW-5378: Windows paths to networked files (e.g. 
> "\\server\share\path\file.txt") and extended-length paths (e.g. 
> "\\?\c:\some\absolute\path.txt") should be checked for compatibility with the 
> LocalFileSystem implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-5521) [Packaging] License check fails with Apache RAT 0.13

2019-06-06 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-5521.

   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4486
[https://github.com/apache/arrow/pull/4486]

> [Packaging] License check fails with Apache RAT 0.13
> 
>
> Key: ARROW-5521
> URL: https://issues.apache.org/jira/browse/ARROW-5521
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Antoine Pitrou
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently use version 0.12. With 0.13 I get:
> {code:java}
> NOT APPROVED: js/src/fb/File.ts (xx/js/src/fb/File.ts): false
> NOT APPROVED: js/src/fb/Message.ts (xx/js/src/fb/Message.ts): false
> NOT APPROVED: js/src/fb/Schema.ts (xx/js/src/fb/Schema.ts): false
> NOT APPROVED: js/test/inference/column.ts (xx/js/test/inference/column.ts): 
> false
> NOT APPROVED: js/test/inference/nested.ts (xx/js/test/inference/nested.ts): 
> false
> NOT APPROVED: js/test/inference/visitor/get.ts 
> (xx/js/test/inference/visitor/get.ts): false
> 6 unapproved licences. Check rat report: rat.txt
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (ARROW-5521) [Packaging] License check fails with Apache RAT 0.13

2019-06-06 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-5521:
--

Assignee: Antoine Pitrou

> [Packaging] License check fails with Apache RAT 0.13
> 
>
> Key: ARROW-5521
> URL: https://issues.apache.org/jira/browse/ARROW-5521
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We currently use version 0.12. With 0.13 I get:
> {code:java}
> NOT APPROVED: js/src/fb/File.ts (xx/js/src/fb/File.ts): false
> NOT APPROVED: js/src/fb/Message.ts (xx/js/src/fb/Message.ts): false
> NOT APPROVED: js/src/fb/Schema.ts (xx/js/src/fb/Schema.ts): false
> NOT APPROVED: js/test/inference/column.ts (xx/js/test/inference/column.ts): 
> false
> NOT APPROVED: js/test/inference/nested.ts (xx/js/test/inference/nested.ts): 
> false
> NOT APPROVED: js/test/inference/visitor/get.ts 
> (xx/js/test/inference/visitor/get.ts): false
> 6 unapproved licences. Check rat report: rat.txt
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-5436) [Python] expose filters argument in parquet.read_table

2019-06-06 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-5436.

Resolution: Fixed

Issue resolved by pull request 4409
[https://github.com/apache/arrow/pull/4409]

> [Python] expose filters argument in parquet.read_table
> --
>
> Key: ARROW-5436
> URL: https://issues.apache.org/jira/browse/ARROW-5436
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Major
>  Labels: parquet, pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, the {{parquet.read_table}} function can be used both for reading a 
> single file (interface to ParquetFile) as a directory (interface to 
> ParquetDataset). 
> ParquetDataset has some extra keywords such as {{filters}} that would be nice 
> to expose through {{read_table}} as well.
> Of course one can always use {{ParquetDataset}} if you need its power, but 
> for pandas wrapping pyarrow it is easier to be able to pass through keywords 
> just to {{parquet.read_table}} instead of calling either {{read_table}} or 
> {{ParquetDataset}}. Context: https://github.com/pandas-dev/pandas/issues/26551



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (ARROW-5436) [Python] expose filters argument in parquet.read_table

2019-06-06 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-5436:
--

Assignee: Joris Van den Bossche

> [Python] expose filters argument in parquet.read_table
> --
>
> Key: ARROW-5436
> URL: https://issues.apache.org/jira/browse/ARROW-5436
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Joris Van den Bossche
>Priority: Major
>  Labels: parquet, pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, the {{parquet.read_table}} function can be used both for reading a 
> single file (interface to ParquetFile) as a directory (interface to 
> ParquetDataset). 
> ParquetDataset has some extra keywords such as {{filters}} that would be nice 
> to expose through {{read_table}} as well.
> Of course one can always use {{ParquetDataset}} if you need its power, but 
> for pandas wrapping pyarrow it is easier to be able to pass through keywords 
> just to {{parquet.read_table}} instead of calling either {{read_table}} or 
> {{ParquetDataset}}. Context: https://github.com/pandas-dev/pandas/issues/26551



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (ARROW-5509) [R] write_parquet()

2019-06-06 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-5509:
--

Assignee: Uwe L. Korn

> [R] write_parquet()
> ---
>
> Key: ARROW-5509
> URL: https://issues.apache.org/jira/browse/ARROW-5509
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.14.0
>
>
> We can read but not yet write. The C++ library supports this and pyarrow does 
> it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-5509) [R] write_parquet()

2019-06-11 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860771#comment-16860771
 ] 

Uwe L. Korn commented on ARROW-5509:


[~romainfrancois] see my PR, I'm already working on this and will continue 
today.

> [R] write_parquet()
> ---
>
> Key: ARROW-5509
> URL: https://issues.apache.org/jira/browse/ARROW-5509
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Romain François
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We can read but not yet write. The C++ library supports this and pyarrow does 
> it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-5565) [Python] Document how to use gdb when working on pyarrow

2019-06-14 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-5565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-5565.

Resolution: Fixed

Issue resolved by pull request 4560
[https://github.com/apache/arrow/pull/4560]

> [Python] Document how to use gdb when working on pyarrow
> 
>
> Key: ARROW-5565
> URL: https://issues.apache.org/jira/browse/ARROW-5565
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It may not be obvious to new developers how to set breakpoints in the C++ 
> libraries when driven from Python. The incantation is slightly abstruse, for 
> example
> {code}
> $ gdb --args env py.test pyarrow/tests/test_array.py -k scalars_mixed_type
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-5614) [R] Error: 'install_arrow' is not an exported object from 'namespace:arrow'

2019-06-14 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16864463#comment-16864463
 ] 

Uwe L. Korn commented on ARROW-5614:


Building the R package using conda-forge based packages is quite 
straightforward. All packages that are also on CRAN are on conda-forge with an 
{{r-}} prefix. Installing them, you can then use the same {{R}} commands as you 
would use with all other tools.

> [R] Error: 'install_arrow' is not an exported object from 'namespace:arrow'
> ---
>
> Key: ARROW-5614
> URL: https://issues.apache.org/jira/browse/ARROW-5614
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Thomas Buhrmann
>Assignee: Neal Richardson
>Priority: Major
>
> I'm trying to get the R package installed in a Debian docker image that 
> already contains R and RStudio (via rocker/rstudio from dockerhub), as well 
> as arrow-cpp, parquet-cpp and pyarrow installed via conda. I.e. I should have 
> all required arrow dependencies in my conda environment's /lib and /include 
> folders.
> I then tried to install the R package in two ways (as stated in the README, 
> having devtools, and after managing to get git2r installed)
> 1/ via remotes
> {code:java}
> remotes::install_github("apache/arrow/r", 
> ref="76e1bc5dfb9d08e31eddd5cbcc0b1bab934da2c7"){code}
> 2/ from source
> {code:java}
> git clone https://github.com/apache/arrow.git
> cd arrow/r
> R -e 'remotes::install_deps()'
> R CMD INSTALL 
> --configure-vars='INCLUDE_DIR=/root/miniconda/envs/my_env/include
> LIB_DIR=/root/miniconda/envs/my_env/lib' .{code}
> In both cases the install seems to work fine:
> {code:java}
> ** building package indices
> ** testing if installed package can be loaded from temporary location
> ** checking absolute paths in shared objects and dynamic libraries
> ** testing if installed package can be loaded from final location
> ** testing if installed package keeps a record of temporary installation path
> * DONE (arrow)
> {code}
>  But when I then do the following as prompted:
> {code:java}
> library(arrow)
> arrow::install_arrow()
> {code}
> The result is
> {code:java}
> Error: 'install_arrow' is not an exported object from 'namespace:arrow'
> {code}
> And running the example without calling that non-existing function I get the 
> error
> {code:java}
> Error in Table__from_dots(dots, schema) : 
>   Cannot call Table__from_dots(). Please use arrow::install_arrow() to 
> install required runtime libraries. 
> {code}
> So I don't know if I'm doing something wrong or if the documentation isn't up 
> to date? Specifically, what is the arrow::install_arrow() function supposed 
> to install, given that I already have the arrow and parquet libs and headers 
> installed, and supposedly they've been used (linked to) when I installed the 
> R package?
> In general, is there any way to get this package installed in the above 
> context (arrow-cpp etc. installed via conda)?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-5691) [C++] Relocate src/parquet/arrow code to src/arrow/dataset/parquet

2019-06-23 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16870498#comment-16870498
 ] 

Uwe L. Korn commented on ARROW-5691:


I would be 100% fine with moving it into {{src/arrow/parquet}} but I question a 
bit of making the Parquet adaptor a full subset of the dataset project. For me 
these are two different entities, an adaptor providing access to the Parquet 
file format, either standalone low-level access or high-level reads into Arrow. 
Whereas, the dataset project builds on top of various adaptors but is not 
required for simple interactions with the file formats it supports.

> [C++] Relocate src/parquet/arrow code to src/arrow/dataset/parquet
> --
>
> Key: ARROW-5691
> URL: https://issues.apache.org/jira/browse/ARROW-5691
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> I think it may make sense to continue developing and maintaining this code in 
> the same place as other file format <-> Arrow serialization code and dataset 
> handling routines (e.g. schema normalization). Under this scheme, libparquet 
> becomes a link time dependency of libarrow_dataset



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-5735) [C++] Appveyor builds failing persistently in thrift_ep build

2019-06-26 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873092#comment-16873092
 ] 

Uwe L. Korn commented on ARROW-5735:


The problem here is that this is picking up Boost's new CMake config which we 
cannot ingest. We need to disable this using {{-DBoost_NO_BOOST_CMAKE=ON}}

> [C++] Appveyor builds failing persistently in thrift_ep build
> -
>
> Key: ARROW-5735
> URL: https://issues.apache.org/jira/browse/ARROW-5735
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> See
> {code}
> 72/541] Performing configure step for 'thrift_ep'
> FAILED: thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-configure 
> cmd.exe /C "cd /D 
> C:\projects\arrow\cpp\build\thrift_ep-prefix\src\thrift_ep-build && 
> "C:\Program Files (x86)\CMake\bin\cmake.exe" 
> -DFLEX_EXECUTABLE=C:/projects/arrow/cpp/build/winflexbison_ep/src/winflexbison_ep-install/win_flex.exe
>  
> -DBISON_EXECUTABLE=C:/projects/arrow/cpp/build/winflexbison_ep/src/winflexbison_ep-install/win_bison.exe
>  -DZLIB_INCLUDE_DIR= -DWITH_SHARED_LIB=OFF -DWITH_PLUGIN=OFF -DZLIB_LIBRARY= 
> "-DCMAKE_C_COMPILER=C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe" 
> -DCMAKE_CXX_COMPILER=C:/Miniconda36-x64/Scripts/clcache.exe 
> -DCMAKE_BUILD_TYPE=RELEASE "-DCMAKE_C_FLAGS=/DWIN32 /D_WINDOWS /W3  /MD /O2 
> /Ob2 /DNDEBUG" "-DCMAKE_C_FLAGS_RELEASE=/DWIN32 /D_WINDOWS /W3  /MD /O2 /Ob2 
> /DNDEBUG" "-DCMAKE_CXX_FLAGS=/DWIN32 /D_WINDOWS  /GR /EHsc 
> /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING  /MD /Od /UNDEBUG" 
> "-DCMAKE_CXX_FLAGS_RELEASE=/DWIN32 /D_WINDOWS  /GR /EHsc 
> /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING  /MD /Od /UNDEBUG" 
> -DCMAKE_INSTALL_PREFIX=C:/projects/arrow/cpp/build/thrift_ep/src/thrift_ep-install
>  
> -DCMAKE_INSTALL_RPATH=C:/projects/arrow/cpp/build/thrift_ep/src/thrift_ep-install/lib
>  -DBUILD_SHARED_LIBS=OFF -DBUILD_TESTING=OFF -DBUILD_EXAMPLES=OFF 
> -DBUILD_TUTORIALS=OFF -DWITH_QT4=OFF -DWITH_C_GLIB=OFF -DWITH_JAVA=OFF 
> -DWITH_PYTHON=OFF -DWITH_HASKELL=OFF -DWITH_CPP=ON -DWITH_STATIC_LIB=ON 
> -DWITH_LIBEVENT=OFF -DWITH_MT=OFF -GNinja 
> C:/projects/arrow/cpp/build/thrift_ep-prefix/src/thrift_ep && "C:\Program 
> Files (x86)\CMake\bin\cmake.exe" -E touch 
> C:/projects/arrow/cpp/build/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-configure"
> -- The C compiler identification is MSVC 19.16.27030.1
> -- The CXX compiler identification is MSVC 19.16.27030.1
> -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe
> -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe -- 
> works
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Detecting C compile features
> -- Detecting C compile features - done
> -- Check for working CXX compiler: C:/Miniconda36-x64/Scripts/clcache.exe
> -- Check for working CXX compiler: C:/Miniconda36-x64/Scripts/clcache.exe -- 
> works
> -- Detecting CXX compiler ABI info
> -- Detecting CXX compiler ABI info - done
> -- Detecting CXX compile features
> -- Detecting CXX compile features - done
> -- Parsed Thrift package version: 0.12.0
> -- Parsed Thrift version: 0.12.0 (0.2.0)
> -- Setting C++11 as the default language level.
> -- To specify a different C++ language level, set CMAKE_CXX_STANDARD
> CMake Warning (dev) at build/cmake/DefineOptions.cmake:63 (find_package):
>   Policy CMP0074 is not set: find_package uses _ROOT variables.
>   Run "cmake --help-policy CMP0074" for policy details.  Use the cmake_policy
>   command to set the policy and suppress this warning.
>   Environment variable Boost_ROOT is set to:
> C:\Miniconda36-x64\envs\arrow\Library
>   For compatibility, CMake is ignoring the variable.
> Call Stack (most recent call first):
>   CMakeLists.txt:52 (include)
> This warning is for project developers.  Use -Wno-dev to suppress it.
> -- Found Boost 1.70.0 at 
> C:/Miniconda36-x64/envs/arrow/Library/lib/cmake/Boost-1.70.0
> --   Requested configuration: QUIET
> -- Found boost_headers 1.70.0 at 
> C:/Miniconda36-x64/envs/arrow/Library/lib/cmake/boost_headers-1.70.0
> -- Boost 1.53 found.
> -- libevent NOT found.
> -- Could NOT find RUN_HASKELL (missing: RUN_HASKELL) 
> -- Could NOT find CABAL (missing: CABAL) 
> -- Looking for arpa/inet.h
> -- Looking for arpa/inet.h - not found
> -- Looking for fcntl.h
> -- Looking for fcntl.h - found
> -- Looking for getopt.h
> -- Looking for

[jira] [Assigned] (ARROW-5731) [CI] Turbodbc integration tests are failing

2019-06-29 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-5731:
--

Assignee: Uwe L. Korn

> [CI] Turbodbc integration tests are failing 
> 
>
> Key: ARROW-5731
> URL: https://issues.apache.org/jira/browse/ARROW-5731
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 1.0.0
>
>
> Have not investigated yet, build: 
> https://circleci.com/gh/ursa-labs/crossbow/383
> cc [~xhochy]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (ARROW-5609) [C++] Set CMP0068 CMake policy to avoid macOS warnings

2019-06-29 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-5609:
--

Assignee: Uwe L. Korn

> [C++] Set CMP0068 CMake policy to avoid macOS warnings
> --
>
> Key: ARROW-5609
> URL: https://issues.apache.org/jira/browse/ARROW-5609
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
>
> These warnings are appearing in the build on macOS
> {code}
> CMake Warning (dev):
>   Policy CMP0068 is not set: RPATH settings on macOS do not affect
>   install_name.  Run "cmake --help-policy CMP0068" for policy details.  Use
>   the cmake_policy command to set the policy and suppress this warning.
>   For compatibility with older versions of CMake, the install_name fields for
>   the following targets are still affected by RPATH settings:
>arrow_dataset_shared
>arrow_python_shared
>arrow_shared
>arrow_testing_shared
>parquet_shared
>plasma_shared
> This warning is for project developers.  Use -Wno-dev to suppress it.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-5133) [Integration] Update turbodbc integration test to install a pinned version in the Dockerfile

2019-06-29 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875551#comment-16875551
 ] 

Uwe L. Korn commented on ARROW-5133:


[~kszucs] I don't think that this would work well given how often we break 
things in Arrow. I would rather keep building against master.

> [Integration] Update turbodbc integration test to install a pinned version in 
> the Dockerfile
> 
>
> Key: ARROW-5133
> URL: https://issues.apache.org/jira/browse/ARROW-5133
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Integration
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: turbodbc
>
> integration/turbodbc/runtest.sh currently installs and tests the integration 
> with
> a fork's branch.
> We should test against the official turbodbc release once Uwe's PR gets 
> merged.
> The turbodbc install step should be run during the docker image build (
> in the Dockerfile) instead of the runtest.sh script.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-5874) [Python] pyarrow 0.14.0 macOS wheels depend on shared libs under /usr/local/opt

2019-07-08 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880457#comment-16880457
 ] 

Uwe L. Korn commented on ARROW-5874:


We should bundle OpenSSL with the wheel and declare it as unsafe to use in 
production. Either compile from source when using {{pip}} in your production 
environment or use {{conda}}. This is roughly the same way {{psycopg2}} went. 
You cannot 

I'm aware of {{delocate}} but we are explcitly not using it as we rely on CMake 
and {{setup.py}} to bundle all required libraries. In our case it might be 
better to statically link OpenSSL to not pollute the global namespace with our 
shipped version of OpenSSL.

> [Python] pyarrow 0.14.0 macOS wheels depend on shared libs under 
> /usr/local/opt
> ---
>
> Key: ARROW-5874
> URL: https://issues.apache.org/jira/browse/ARROW-5874
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.0
> Environment: macOS 10.14.5
> Anaconda Python 3.7.3
>Reporter: Michael Anselmi
>Priority: Critical
>  Labels: pyarrow, wheel
>
> Hello, and congrats on the recent release of Apache Arrow 0.14.0!
> This morning I installed pyarrow 0.14.0 on my macOS 10.14.5 system like so:
> {code:java}
> python3.7 -m venv ~/virtualenv/pyarrow-0.14.0
> source ~/virtualenv/pyarrow-0.14.0/bin/activate
> pip install --upgrade pip setuptools
> pip install pyarrow  # installs 
> pyarrow-0.14.0-cp37-cp37m-macosx_10_6_intel.whl
> pip freeze --all
> # numpy==1.16.4
> # pip==19.1.1
> # pyarrow==0.14.0
> # setuptools==41.0.1
> # six==1.12.0
> {code}
> However I am unable to import pyarrow:
> {code:java}
> python -c 'import pyarrow'
> # Traceback (most recent call last):
> #   File "", line 1, in 
> #   File 
> "/Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/__init__.py",
>  line 49, in 
> # from pyarrow.lib import cpu_count, set_cpu_count
> # ImportError: 
> dlopen(/Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-darwin.so,
>  2): Library not loaded: /usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib
> #   Referenced from: 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow.14.dylib
> #   Reason: image not found
> {code}
> pyarrow is trying to load a shared library (OpenSSL in this case) from a path 
> under {{/usr/local/opt}} that doesn't exist; perhaps that OpenSSL had been 
> provided by Homebrew as part of your build process?  Unfortunately this makes 
> the pyarrow 0.14.0 wheel completely unusable on my system or any system that 
> doesn't have OpenSSL installed in that location.  This is a regression from 
> pyarrow 0.13.0 as those wheels "just worked".
> Additional diagnostic output below.  I ran {{otool -L}} on each {{.dylib}} 
> and {{.so}} file in 
> {{/Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow}}
>  and included the output for those with dependencies under {{/usr/local/opt}}:
> {code:java}
> otool -L 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow.14.dylib
> # 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow.14.dylib:
> # @rpath/libarrow.14.dylib (compatibility version 14.0.0, current 
> version 14.0.0)
> # /usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib (compatibility 
> version 1.0.0, current version 1.0.0)
> # /usr/local/opt/openssl/lib/libssl.1.0.0.dylib (compatibility 
> version 1.0.0, current version 1.0.0)
> # /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 
> 1.2.8)
> # @rpath/libarrow_boost_system.dylib (compatibility version 0.0.0, 
> current version 0.0.0)
> # @rpath/libarrow_boost_filesystem.dylib (compatibility version 
> 0.0.0, current version 0.0.0)
> # @rpath/libarrow_boost_regex.dylib (compatibility version 0.0.0, 
> current version 0.0.0)
> # /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current 
> version 307.5.0)
> # /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
> version 1238.50.2)
> otool -L 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow_flight.14.dylib
> # 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow_flight.14.dylib:
> # @rpath/libarrow_flight.14.dylib (compatibility version 14.0.0, 
> current version 14.0.0)
> # @rpath/libarrow.14.dylib (compatibility version 14.0.0, current 
> version 14.0.0)
> # /usr/local/opt/openssl/lib/libssl.1.0.0.dylib (compatibility 
> version 1.0.0, current version 1.0.0)
> # /usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib (c

[jira] [Comment Edited] (ARROW-5874) [Python] pyarrow 0.14.0 macOS wheels depend on shared libs under /usr/local/opt

2019-07-08 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880457#comment-16880457
 ] 

Uwe L. Korn edited comment on ARROW-5874 at 7/8/19 3:17 PM:


We should bundle OpenSSL with the wheel and declare it as unsafe to use in 
production. Either compile from source when using {{pip}} in your production 
environment or use {{conda}}. This is roughly the same way {{psycopg2}} went. 
You cannot manage this type of binary dependencies with {{pip}}, this is why 
{{conda}} was created.

I'm aware of {{delocate}} but we are explcitly not using it as we rely on CMake 
and {{setup.py}} to bundle all required libraries. In our case it might be 
better to statically link OpenSSL to not pollute the global namespace with our 
shipped version of OpenSSL.


was (Author: xhochy):
We should bundle OpenSSL with the wheel and declare it as unsafe to use in 
production. Either compile from source when using {{pip}} in your production 
environment or use {{conda}}. This is roughly the same way {{psycopg2}} went. 
You cannot 

I'm aware of {{delocate}} but we are explcitly not using it as we rely on CMake 
and {{setup.py}} to bundle all required libraries. In our case it might be 
better to statically link OpenSSL to not pollute the global namespace with our 
shipped version of OpenSSL.

> [Python] pyarrow 0.14.0 macOS wheels depend on shared libs under 
> /usr/local/opt
> ---
>
> Key: ARROW-5874
> URL: https://issues.apache.org/jira/browse/ARROW-5874
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.0
> Environment: macOS 10.14.5
> Anaconda Python 3.7.3
>Reporter: Michael Anselmi
>Priority: Critical
>  Labels: pyarrow, wheel
>
> Hello, and congrats on the recent release of Apache Arrow 0.14.0!
> This morning I installed pyarrow 0.14.0 on my macOS 10.14.5 system like so:
> {code:java}
> python3.7 -m venv ~/virtualenv/pyarrow-0.14.0
> source ~/virtualenv/pyarrow-0.14.0/bin/activate
> pip install --upgrade pip setuptools
> pip install pyarrow  # installs 
> pyarrow-0.14.0-cp37-cp37m-macosx_10_6_intel.whl
> pip freeze --all
> # numpy==1.16.4
> # pip==19.1.1
> # pyarrow==0.14.0
> # setuptools==41.0.1
> # six==1.12.0
> {code}
> However I am unable to import pyarrow:
> {code:java}
> python -c 'import pyarrow'
> # Traceback (most recent call last):
> #   File "", line 1, in 
> #   File 
> "/Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/__init__.py",
>  line 49, in 
> # from pyarrow.lib import cpu_count, set_cpu_count
> # ImportError: 
> dlopen(/Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-darwin.so,
>  2): Library not loaded: /usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib
> #   Referenced from: 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow.14.dylib
> #   Reason: image not found
> {code}
> pyarrow is trying to load a shared library (OpenSSL in this case) from a path 
> under {{/usr/local/opt}} that doesn't exist; perhaps that OpenSSL had been 
> provided by Homebrew as part of your build process?  Unfortunately this makes 
> the pyarrow 0.14.0 wheel completely unusable on my system or any system that 
> doesn't have OpenSSL installed in that location.  This is a regression from 
> pyarrow 0.13.0 as those wheels "just worked".
> Additional diagnostic output below.  I ran {{otool -L}} on each {{.dylib}} 
> and {{.so}} file in 
> {{/Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow}}
>  and included the output for those with dependencies under {{/usr/local/opt}}:
> {code:java}
> otool -L 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow.14.dylib
> # 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow.14.dylib:
> # @rpath/libarrow.14.dylib (compatibility version 14.0.0, current 
> version 14.0.0)
> # /usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib (compatibility 
> version 1.0.0, current version 1.0.0)
> # /usr/local/opt/openssl/lib/libssl.1.0.0.dylib (compatibility 
> version 1.0.0, current version 1.0.0)
> # /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 
> 1.2.8)
> # @rpath/libarrow_boost_system.dylib (compatibility version 0.0.0, 
> current version 0.0.0)
> # @rpath/libarrow_boost_filesystem.dylib (compatibility version 
> 0.0.0, current version 0.0.0)
> # @rpath/libarrow_boost_regex.dylib (compatibility version 0.0.0, 
> current version 0.0.0)
> # /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current 
> version 307.5.0)
> # /usr/lib/libSystem

[jira] [Assigned] (ARROW-5874) [Python] pyarrow 0.14.0 macOS wheels depend on shared libs under /usr/local/opt

2019-07-08 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-5874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-5874:
--

Assignee: Krisztian Szucs

> [Python] pyarrow 0.14.0 macOS wheels depend on shared libs under 
> /usr/local/opt
> ---
>
> Key: ARROW-5874
> URL: https://issues.apache.org/jira/browse/ARROW-5874
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.0
> Environment: macOS 10.14.5
> Anaconda Python 3.7.3
>Reporter: Michael Anselmi
>Assignee: Krisztian Szucs
>Priority: Critical
>  Labels: pull-request-available, pyarrow, wheel
> Fix For: 0.14.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hello, and congrats on the recent release of Apache Arrow 0.14.0!
> This morning I installed pyarrow 0.14.0 on my macOS 10.14.5 system like so:
> {code:java}
> python3.7 -m venv ~/virtualenv/pyarrow-0.14.0
> source ~/virtualenv/pyarrow-0.14.0/bin/activate
> pip install --upgrade pip setuptools
> pip install pyarrow  # installs 
> pyarrow-0.14.0-cp37-cp37m-macosx_10_6_intel.whl
> pip freeze --all
> # numpy==1.16.4
> # pip==19.1.1
> # pyarrow==0.14.0
> # setuptools==41.0.1
> # six==1.12.0
> {code}
> However I am unable to import pyarrow:
> {code:java}
> python -c 'import pyarrow'
> # Traceback (most recent call last):
> #   File "", line 1, in 
> #   File 
> "/Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/__init__.py",
>  line 49, in 
> # from pyarrow.lib import cpu_count, set_cpu_count
> # ImportError: 
> dlopen(/Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-darwin.so,
>  2): Library not loaded: /usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib
> #   Referenced from: 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow.14.dylib
> #   Reason: image not found
> {code}
> pyarrow is trying to load a shared library (OpenSSL in this case) from a path 
> under {{/usr/local/opt}} that doesn't exist; perhaps that OpenSSL had been 
> provided by Homebrew as part of your build process?  Unfortunately this makes 
> the pyarrow 0.14.0 wheel completely unusable on my system or any system that 
> doesn't have OpenSSL installed in that location.  This is a regression from 
> pyarrow 0.13.0 as those wheels "just worked".
> Additional diagnostic output below.  I ran {{otool -L}} on each {{.dylib}} 
> and {{.so}} file in 
> {{/Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow}}
>  and included the output for those with dependencies under {{/usr/local/opt}}:
> {code:java}
> otool -L 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow.14.dylib
> # 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow.14.dylib:
> # @rpath/libarrow.14.dylib (compatibility version 14.0.0, current 
> version 14.0.0)
> # /usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib (compatibility 
> version 1.0.0, current version 1.0.0)
> # /usr/local/opt/openssl/lib/libssl.1.0.0.dylib (compatibility 
> version 1.0.0, current version 1.0.0)
> # /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 
> 1.2.8)
> # @rpath/libarrow_boost_system.dylib (compatibility version 0.0.0, 
> current version 0.0.0)
> # @rpath/libarrow_boost_filesystem.dylib (compatibility version 
> 0.0.0, current version 0.0.0)
> # @rpath/libarrow_boost_regex.dylib (compatibility version 0.0.0, 
> current version 0.0.0)
> # /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current 
> version 307.5.0)
> # /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
> version 1238.50.2)
> otool -L 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow_flight.14.dylib
> # 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow_flight.14.dylib:
> # @rpath/libarrow_flight.14.dylib (compatibility version 14.0.0, 
> current version 14.0.0)
> # @rpath/libarrow.14.dylib (compatibility version 14.0.0, current 
> version 14.0.0)
> # /usr/local/opt/openssl/lib/libssl.1.0.0.dylib (compatibility 
> version 1.0.0, current version 1.0.0)
> # /usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib (compatibility 
> version 1.0.0, current version 1.0.0)
> # /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current 
> version 307.5.0)
> # /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
> version 1238.50.2)
> otool -L 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow_python.14.dylib
> # 
> /Users/manse

[jira] [Resolved] (ARROW-5874) [Python] pyarrow 0.14.0 macOS wheels depend on shared libs under /usr/local/opt

2019-07-08 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-5874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-5874.

Resolution: Fixed

Issue resolved by pull request 4823
[https://github.com/apache/arrow/pull/4823]

> [Python] pyarrow 0.14.0 macOS wheels depend on shared libs under 
> /usr/local/opt
> ---
>
> Key: ARROW-5874
> URL: https://issues.apache.org/jira/browse/ARROW-5874
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.0
> Environment: macOS 10.14.5
> Anaconda Python 3.7.3
>Reporter: Michael Anselmi
>Priority: Critical
>  Labels: pull-request-available, pyarrow, wheel
> Fix For: 0.14.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hello, and congrats on the recent release of Apache Arrow 0.14.0!
> This morning I installed pyarrow 0.14.0 on my macOS 10.14.5 system like so:
> {code:java}
> python3.7 -m venv ~/virtualenv/pyarrow-0.14.0
> source ~/virtualenv/pyarrow-0.14.0/bin/activate
> pip install --upgrade pip setuptools
> pip install pyarrow  # installs 
> pyarrow-0.14.0-cp37-cp37m-macosx_10_6_intel.whl
> pip freeze --all
> # numpy==1.16.4
> # pip==19.1.1
> # pyarrow==0.14.0
> # setuptools==41.0.1
> # six==1.12.0
> {code}
> However I am unable to import pyarrow:
> {code:java}
> python -c 'import pyarrow'
> # Traceback (most recent call last):
> #   File "", line 1, in 
> #   File 
> "/Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/__init__.py",
>  line 49, in 
> # from pyarrow.lib import cpu_count, set_cpu_count
> # ImportError: 
> dlopen(/Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-darwin.so,
>  2): Library not loaded: /usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib
> #   Referenced from: 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow.14.dylib
> #   Reason: image not found
> {code}
> pyarrow is trying to load a shared library (OpenSSL in this case) from a path 
> under {{/usr/local/opt}} that doesn't exist; perhaps that OpenSSL had been 
> provided by Homebrew as part of your build process?  Unfortunately this makes 
> the pyarrow 0.14.0 wheel completely unusable on my system or any system that 
> doesn't have OpenSSL installed in that location.  This is a regression from 
> pyarrow 0.13.0 as those wheels "just worked".
> Additional diagnostic output below.  I ran {{otool -L}} on each {{.dylib}} 
> and {{.so}} file in 
> {{/Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow}}
>  and included the output for those with dependencies under {{/usr/local/opt}}:
> {code:java}
> otool -L 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow.14.dylib
> # 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow.14.dylib:
> # @rpath/libarrow.14.dylib (compatibility version 14.0.0, current 
> version 14.0.0)
> # /usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib (compatibility 
> version 1.0.0, current version 1.0.0)
> # /usr/local/opt/openssl/lib/libssl.1.0.0.dylib (compatibility 
> version 1.0.0, current version 1.0.0)
> # /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 
> 1.2.8)
> # @rpath/libarrow_boost_system.dylib (compatibility version 0.0.0, 
> current version 0.0.0)
> # @rpath/libarrow_boost_filesystem.dylib (compatibility version 
> 0.0.0, current version 0.0.0)
> # @rpath/libarrow_boost_regex.dylib (compatibility version 0.0.0, 
> current version 0.0.0)
> # /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current 
> version 307.5.0)
> # /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
> version 1238.50.2)
> otool -L 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow_flight.14.dylib
> # 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow_flight.14.dylib:
> # @rpath/libarrow_flight.14.dylib (compatibility version 14.0.0, 
> current version 14.0.0)
> # @rpath/libarrow.14.dylib (compatibility version 14.0.0, current 
> version 14.0.0)
> # /usr/local/opt/openssl/lib/libssl.1.0.0.dylib (compatibility 
> version 1.0.0, current version 1.0.0)
> # /usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib (compatibility 
> version 1.0.0, current version 1.0.0)
> # /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current 
> version 307.5.0)
> # /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
> version 1238.50.2)
> otool -L 
> /Users/manselmi/virtualenv/pyarrow-0.14.0/lib/python3.7/site-packages/pyarrow/libarrow_python.

[jira] [Commented] (ARROW-5886) [Python][Packaging] Manylinux1/2010 compliance issue with libz

2019-07-09 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881282#comment-16881282
 ] 

Uwe L. Korn commented on ARROW-5886:


Actually the work of {{auditwheel repair}} should be to rename these libs and 
use {{patchelf}} on all binaries so that they linked to the renamed ones. If 
there is a binary that still links to the old name, then there is a bug in 
{{auditwheel repair}}.

> [Python][Packaging] Manylinux1/2010 compliance issue with libz
> --
>
> Key: ARROW-5886
> URL: https://issues.apache.org/jira/browse/ARROW-5886
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging, Python
>Affects Versions: 0.14.0
>Reporter: Krisztian Szucs
>Priority: Major
>
> So we statically link liblz4 in the manylinux1 wheels
> {code}
> # ldd pyarrow-manylinux1/libarrow.so.14 | grep z
> libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x7fc28cef4000)
> {code}
> but dynamically in the manylinux2010 wheels
> {code}
> # ldd pyarrow-manylinux2010/libarrow.so.14 | grep z
> liblz4.so.1 => not found  (already deleted to reproduce the issue)
> libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x7f56f744)
> {code}
> this what this PR resolves.
> What I'm finding strange, that auditwheel seems to bundle libz for manylinux1:
> {code}
> # ls -lah pyarrow-manylinux1/*z*so.*
> -rwxr-xr-x 1 root root 115K Jun 29 00:14 
> pyarrow-manylinux1/libz-7f57503f.so.1.2.11
> {code}
> while ldd still uses the system libz:
> {code}
> # ldd pyarrow-manylinux1/libarrow.so.14 | grep z
> libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x7f91fcf3f000)
> {code}
> For manylinux2010 we also have liblz4:
> {code}
> #  ls -lah pyarrow-manylinux2010/*z*so.*
> -rwxr-xr-x 1 root root 191K Jun 28 23:38 
> pyarrow-manylinux2010/liblz4-8cb8bdde.so.1.8.3
> -rwxr-xr-x 1 root root 115K Jun 28 23:38 
> pyarrow-manylinux2010/libz-c69b9943.so.1.2.11
> {code}
> and ldd similarly tries to load the system libs:
> {code}
> # ldd pyarrow-manylinux2010/libarrow.so.14 | grep z
> liblz4.so.1 => not found
> libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x7fd72764e000)
> {code}
> Inspecting manylinux1 with `LD_DEBUG=files,libs ldd libarrow.so.14` it seems 
> like to search the right path, but cannot find the hashed version of libz 
> `libz-7f57503f.so.1.2.11`
> {code}
>463: file=libz.so.1 [0];  needed by ./libarrow.so.14 [0]
>463: find library=libz.so.1 [0]; searching
>463:  search path=/tmp/pyarrow-manylinux1/.  (RPATH from 
> file ./libarrow.so.14)
>463:   trying file=/tmp/pyarrow-manylinux1/./libz.so.1
>463:  search cache=/etc/ld.so.cache
>463:   trying file=/lib/x86_64-linux-gnu/libz.so.1
> {code}
> There is no `libz.so.1` just `libz-7f57503f.so.1.2.11`.
> Similarly for manylinux2010 and libz:
> {code}
>470: file=libz.so.1 [0];  needed by ./libarrow.so.14 [0]
>470: find library=libz.so.1 [0]; searching
>470:  search path=/tmp/pyarrow-manylinux2010/.   
> (RPATH from file ./libarrow.so.14)
>470:   trying file=/tmp/pyarrow-manylinux2010/./libz.so.1
>470:  search cache=/etc/ld.so.cache
>470:   trying file=/lib/x86_64-linux-gnu/libz.so.1
> {code}
> for liblz4 (again, I've deleted the system one):
> {code}
>470: file=liblz4.so.1 [0];  needed by ./libarrow.so.14 [0]
>470: find library=liblz4.so.1 [0]; searching
>470:  search path=/tmp/pyarrow-manylinux2010/.   
> (RPATH from file ./libarrow.so.14)
>470:   trying file=/tmp/pyarrow-manylinux2010/./liblz4.so.1
>470:  search cache=/etc/ld.so.cache
>470:  search 
> path=/lib/x86_64-linux-gnu/tls/x86_64:/lib/x86_64-linux-gnu/tls:/lib/x86_64-linux-gnu/x86_64:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/tls/x86_64:/usr/lib/x86_64-linux-gnu/tls:/usr/lib/x86_64-linux-gnu/x86_6$
> :/usr/lib/x86_64-linux-gnu:/lib/tls/x86_64:/lib/tls:/lib/x86_64:/lib:/usr/lib/tls/x86_64:/usr/lib/tls:/usr/lib/x86_64:/usr/lib
>   (system search path)
> {code}
> There are no `libz.so.1` nor `liblz4.so.1`, just `libz-c69b9943.so.1.2.11` 
> and `liblz4-8cb8bdde.so.1.8.3`
> According to https://www.python.org/dev/peps/pep-0571/ `liblz4` nor `libz` 
> are part of the whitelist, and while these are bundled with the wheel, 
> seemingly cannot be found - perhaps because of the hash in the library name?
> I've tried to inspect the wheels with `auditwheel show` with version `2` and 
> `1.10`, both says the following:
> {code}
> # auditwheel show pyarrow-0.14.0-cp37-cp37m-manylinux2010_x86_64.whl
> pyarrow-0.14.0-cp37-cp37m-manylinux2010_x86_64.whl is consistent with
> the followin

[jira] [Updated] (ARROW-5885) [Python] Support optional arrow components via extras_require

2019-07-09 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-5885:
---
Summary: [Python] Support optional arrow components via extras_require  
(was: Support optional arrow components via extras_require)

> [Python] Support optional arrow components via extras_require
> -
>
> Key: ARROW-5885
> URL: https://issues.apache.org/jira/browse/ARROW-5885
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Python
>Reporter: George Sakkis
>Priority: Minor
>
> Since Arrow (and pyarrow) have several independent optional component, 
> instead of installing all of them it would be convenient if these could be 
> opt-in from pip like 
> {{pip install pyarrow[gandiva,flight,plasma]}}
> or opt-out like
> {{pip install pyarrow[no-gandiva,no-flight,no-plasma]}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-5885) [Python] Support optional arrow components via extras_require

2019-07-09 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881284#comment-16881284
 ] 

Uwe L. Korn commented on ARROW-5885:


This will only work when you split the {{pyarrow}} package into multiple 
packages. The extras are not on a single wheel basis. At the current stage this 
requires a big refactoring of the Python package.

> [Python] Support optional arrow components via extras_require
> -
>
> Key: ARROW-5885
> URL: https://issues.apache.org/jira/browse/ARROW-5885
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Python
>Reporter: George Sakkis
>Priority: Minor
>
> Since Arrow (and pyarrow) have several independent optional component, 
> instead of installing all of them it would be convenient if these could be 
> opt-in from pip like 
> {{pip install pyarrow[gandiva,flight,plasma]}}
> or opt-out like
> {{pip install pyarrow[no-gandiva,no-flight,no-plasma]}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-5914) [CI] Build bundled dependencies in docker build step

2019-07-12 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883681#comment-16883681
 ] 

Uwe L. Korn commented on ARROW-5914:


[~fsaintjacques][~kszucs][~wesmckinn] This is why we have used conda in these 
builds. I have the great fear that we rely more and on more on manual building 
of thirdparty dependencies in our build scripts which just adds more 
maintenance overhead. I was so frustrated with the manual scripts in the 
manylinux1 one case that I was considering in making a manylinux1 conda channel 
to build the dependencies. This would have greatly reduced my pain in the 
maintenance of the manylinux1 container.

We need to test against system dependencies but then we should do this as we 
were doing this previously in the nightlies.

> [CI] Build bundled dependencies in docker build step
> 
>
> Key: ARROW-5914
> URL: https://issues.apache.org/jira/browse/ARROW-5914
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Francois Saint-Jacques
>Priority: Minor
> Fix For: 1.0.0
>
>
> In the recently introduced ARROW-5803, some heavy dependencies (thrift, 
> protobuf, flatbufers, grpc) are build at each invocation of docker-compose 
> build (thus each travis test).
> We should aim to build the third party dependencies in docker build phase 
> instead, to exploit caching and docker-compose pull so that the CI step 
> doesn't need to build said dependencies each time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (ARROW-5919) [R] Add nightly tests for building r-arrow with dependencies from conda-forge

2019-07-12 Thread Uwe L. Korn (JIRA)

Uwe L. Korn created ARROW-5919:
--

 Summary: [R] Add nightly tests for building r-arrow with 
dependencies from conda-forge
 Key: ARROW-5919
 URL: https://issues.apache.org/jira/browse/ARROW-5919
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, R
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 1.0.0






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (ARROW-5914) [CI] Build bundled dependencies in docker build step

2019-07-12 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883823#comment-16883823
 ] 

Uwe L. Korn commented on ARROW-5914:


{quote}Yeah, so an alternative is that we use conda only for the dependencies 
that don't work from the system package manager. I guess that's about as good 
as building the dependency in the image
{quote}
No, it is either all-from-conda or none, mixing is not working due to the 
different toolchains.
  

> [CI] Build bundled dependencies in docker build step
> 
>
> Key: ARROW-5914
> URL: https://issues.apache.org/jira/browse/ARROW-5914
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Francois Saint-Jacques
>Priority: Minor
> Fix For: 1.0.0
>
>
> In the recently introduced ARROW-5803, some heavy dependencies (thrift, 
> protobuf, flatbufers, grpc) are build at each invocation of docker-compose 
> build (thus each travis test).
> We should aim to build the third party dependencies in docker build phase 
> instead, to exploit caching and docker-compose pull so that the CI step 
> doesn't need to build said dependencies each time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Resolved] (ARROW-5919) [R] Add nightly tests for building r-arrow with dependencies from conda-forge

2019-07-15 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-5919.

Resolution: Fixed

Issue resolved by pull request 4855
[https://github.com/apache/arrow/pull/4855]

> [R] Add nightly tests for building r-arrow with dependencies from conda-forge
> -
>
> Key: ARROW-5919
> URL: https://issues.apache.org/jira/browse/ARROW-5919
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (ARROW-5956) [R] Ability for R to link to C++ libraries from pyarrow Wheel

2019-07-16 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-5956:
---
Summary: [R] Ability for R to link to C++ libraries from pyarrow Wheel  
(was: [R] Ability for R to link to C++ libraries from pyarrow)

> [R] Ability for R to link to C++ libraries from pyarrow Wheel
> -
>
> Key: ARROW-5956
> URL: https://issues.apache.org/jira/browse/ARROW-5956
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
> Environment: Ubuntu 16.04, R 3.4.4, python 3.6.5
>Reporter: Jeffrey Wong
>Priority: Major
>
> I have installed pyarrow 0.14.0 and want to be able to also use R arrow. In 
> my work I use rpy2 a lot to exchange python data structures with R data 
> structures, so would like R arrow to link against the exact same .so files 
> found in pyarrow
>  
>  
> When I pass in include_dir and lib_dir to R's configure, pointing to 
> pyarrow's include and pyarrow's root directories, I am able to compile R's 
> arrow.so file. However, I am unable to load it in an R session, getting the 
> error:
>  
> {code:java}
> > dyn.load('arrow.so')
> Error in dyn.load("arrow.so") :
>  unable to load shared object '/tmp/arrow2/r/src/arrow.so':
>  /tmp/arrow2/r/src/arrow.so: undefined symbol: 
> _ZNK5arrow11StructArray14GetFieldByNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE{code}
>  
>  
> Steps to reproduce:
>  
> Install pyarrow, which also ships libarrow.so and libparquet.so
>  
> {code:java}
> pip3 install pyarrow --upgrade --user
> PY_ARROW_PATH=$(python3 -c "import pyarrow, os; 
> print(os.path.dirname(pyarrow.__file__))")
> PY_ARROW_VERSION=$(python3 -c "import pyarrow; print(pyarrow.__version__)")
> ln -s $PY_ARROW_PATH/libarrow.so.14 $PY_ARROW_PATH/libarrow.so
> ln -s $PY_ARROW_PATH/libparquet.so.14 $PY_ARROW_PATH/libparquet.so
> {code}
>  
>  
> Add to LD_LIBRARY_PATH
>  
> {code:java}
> sudo tee -a /usr/lib/R/etc/ldpaths < LD_LIBRARY_PATH="\${LD_LIBRARY_PATH}:$PY_ARROW_PATH"
> export LD_LIBRARY_PATH
> LINES
> sudo tee -a /usr/lib/rstudio-server/bin/r-ldpath < LD_LIBRARY_PATH="\${LD_LIBRARY_PATH}:$PY_ARROW_PATH"
> export LD_LIBRARY_PATH
> LINES
> export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:$PY_ARROW_PATH"
> {code}
>  
>  
> Install r arrow from source
> {code:java}
> git clone https://github.com/apache/arrow.git /tmp/arrow2
> cd /tmp/arrow2/r
> git checkout tags/apache-arrow-0.14.0
> R CMD INSTALL ./ --configure-vars="INCLUDE_DIR=$PY_ARROW_PATH/include 
> LIB_DIR=$PY_ARROW_PATH"{code}
>  
> I have noticed that the R package for arrow no longer has an RcppExports, but 
> instead an arrowExports. Could it be that the lack of RcppExports has made it 
> difficult to find GetFieldByName?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (ARROW-5956) [R] Ability for R to link to C++ libraries from pyarrow Wheel

2019-07-16 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886285#comment-16886285
 ] 

Uwe L. Korn commented on ARROW-5956:


{quote}[https://twitter.com/xhochy/status/114029791272079] describes a 
similar wish. 
{quote}
 

In that case I actually had R use the same lib as pyarrow. This is working fine 
in a conda-provided environment. I amended the title that this here is about 
using the libraries from the {{pyarrow}} *wheel*.

> [R] Ability for R to link to C++ libraries from pyarrow Wheel
> -
>
> Key: ARROW-5956
> URL: https://issues.apache.org/jira/browse/ARROW-5956
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
> Environment: Ubuntu 16.04, R 3.4.4, python 3.6.5
>Reporter: Jeffrey Wong
>Priority: Major
>
> I have installed pyarrow 0.14.0 and want to be able to also use R arrow. In 
> my work I use rpy2 a lot to exchange python data structures with R data 
> structures, so would like R arrow to link against the exact same .so files 
> found in pyarrow
>  
>  
> When I pass in include_dir and lib_dir to R's configure, pointing to 
> pyarrow's include and pyarrow's root directories, I am able to compile R's 
> arrow.so file. However, I am unable to load it in an R session, getting the 
> error:
>  
> {code:java}
> > dyn.load('arrow.so')
> Error in dyn.load("arrow.so") :
>  unable to load shared object '/tmp/arrow2/r/src/arrow.so':
>  /tmp/arrow2/r/src/arrow.so: undefined symbol: 
> _ZNK5arrow11StructArray14GetFieldByNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE{code}
>  
>  
> Steps to reproduce:
>  
> Install pyarrow, which also ships libarrow.so and libparquet.so
>  
> {code:java}
> pip3 install pyarrow --upgrade --user
> PY_ARROW_PATH=$(python3 -c "import pyarrow, os; 
> print(os.path.dirname(pyarrow.__file__))")
> PY_ARROW_VERSION=$(python3 -c "import pyarrow; print(pyarrow.__version__)")
> ln -s $PY_ARROW_PATH/libarrow.so.14 $PY_ARROW_PATH/libarrow.so
> ln -s $PY_ARROW_PATH/libparquet.so.14 $PY_ARROW_PATH/libparquet.so
> {code}
>  
>  
> Add to LD_LIBRARY_PATH
>  
> {code:java}
> sudo tee -a /usr/lib/R/etc/ldpaths < LD_LIBRARY_PATH="\${LD_LIBRARY_PATH}:$PY_ARROW_PATH"
> export LD_LIBRARY_PATH
> LINES
> sudo tee -a /usr/lib/rstudio-server/bin/r-ldpath < LD_LIBRARY_PATH="\${LD_LIBRARY_PATH}:$PY_ARROW_PATH"
> export LD_LIBRARY_PATH
> LINES
> export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:$PY_ARROW_PATH"
> {code}
>  
>  
> Install r arrow from source
> {code:java}
> git clone https://github.com/apache/arrow.git /tmp/arrow2
> cd /tmp/arrow2/r
> git checkout tags/apache-arrow-0.14.0
> R CMD INSTALL ./ --configure-vars="INCLUDE_DIR=$PY_ARROW_PATH/include 
> LIB_DIR=$PY_ARROW_PATH"{code}
>  
> I have noticed that the R package for arrow no longer has an RcppExports, but 
> instead an arrowExports. Could it be that the lack of RcppExports has made it 
> difficult to find GetFieldByName?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (ARROW-5994) [CI] [Rust] Create nightly releases of the Rust implementation

2019-07-22 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890017#comment-16890017
 ] 

Uwe L. Korn commented on ARROW-5994:


[~andygrove] Is there a place where you could upload these nightlies where it 
is clearly visible that they are not meant for public consumption?

> [CI] [Rust] Create nightly releases of the Rust implementation
> --
>
> Key: ARROW-5994
> URL: https://issues.apache.org/jira/browse/ARROW-5994
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 1.0.0
>
>
> I would like to work on this but I'm not currently sure where to start. I 
> will follow up on the mailing list.
> I am interested in this so I can use Arrow in my new PoC and I know of 
> another project that is now using Arrow and will likely benefit from nightly 
> releases.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (ARROW-5994) [CI] [Rust] Create nightly releases of the Rust implementation

2019-07-23 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890743#comment-16890743
 ] 

Uwe L. Korn commented on ARROW-5994:


{quote} having published nightly releases
{quote}
 

No, we cannot have them. All releases need to voted on, so there won't be an 
apache-arrow-nightly on crates.io. You can however have a CI job that uploads 
andys-private-arrow-nightlies, just make sure that it is not in any way 
official. Also be careful to depend in released artifacts on this private fork, 
this may lead to very complicated situations when you want to integrate with 
other libraries that also use Arrow; rather invest some effort in making Arrow 
releases in general more frequent.

> [CI] [Rust] Create nightly releases of the Rust implementation
> --
>
> Key: ARROW-5994
> URL: https://issues.apache.org/jira/browse/ARROW-5994
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 1.0.0
>
>
> I would like to work on this but I'm not currently sure where to start. I 
> will follow up on the mailing list.
> I am interested in this so I can use Arrow in my new PoC and I know of 
> another project that is now using Arrow and will likely benefit from nightly 
> releases.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (ARROW-5757) [Python] Stop supporting Python 2.7

2019-07-31 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897332#comment-16897332
 ] 

Uwe L. Korn commented on ARROW-5757:


Release 1.0 with Python 2 support and then drop immediately?

> [Python] Stop supporting Python 2.7
> ---
>
> Key: ARROW-5757
> URL: https://issues.apache.org/jira/browse/ARROW-5757
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Python
>Reporter: Antoine Pitrou
>Priority: Major
>
> By the end of 2019 many scientific Python projects will stop supporting 
> Python 2 altogether:
> https://python3statement.org/
> We'll certainly support Python 2 in Arrow 1.0 but we could perhaps drop 
> support in 1.1.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (ARROW-6096) [C++] Remove dependency on boost regex library

2019-08-01 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898110#comment-16898110
 ] 

Uwe L. Korn commented on ARROW-6096:


[~hatem] This was in the past the C++ regex library but we had some issues. 
[~wesmckinn] [~mdeepak] Do you remember the problems with that?

> [C++] Remove dependency on boost regex library
> --
>
> Key: ARROW-6096
> URL: https://issues.apache.org/jira/browse/ARROW-6096
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Hatem Helal
>Assignee: Hatem Helal
>Priority: Minor
>
> There appears to be only one place where the boost regex library is used:
> [cpp/src/parquet/metadata.cc|https://github.com/apache/arrow/blob/eb73b962e42b5ae6983bf026ebf825f1f707e245/cpp/src/parquet/metadata.cc#L32]
> I think this can be replaced by the C++11 regex library.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (ARROW-6119) [Python] PyArrow import fails on Windows Python 3.7

2019-08-02 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899091#comment-16899091
 ] 

Uwe L. Korn commented on ARROW-6119:


How did you install this? Did you use conda (preferred) or pip or did you 
compile it yourself?

> [Python] PyArrow import fails on Windows Python 3.7
> ---
>
> Key: ARROW-6119
> URL: https://issues.apache.org/jira/browse/ARROW-6119
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.0
> Environment: Windows, Python 3.7
>Reporter: Paul Suganthan
>Priority: Major
>
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "C:\Python37\lib\site-packages\pyarrow\__init__.py", line 49, in 
> 
> from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: DLL load failed: The specified procedure could not be found.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (ARROW-3054) [Packaging] Tooling to enable nightly conda packages to be updated to some anaconda.org channel

2019-08-05 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900208#comment-16900208
 ] 

Uwe L. Korn commented on ARROW-3054:


We should be able to build nightly packages by taking the recipes that are 
checked in into the crossbow tasks and do a {{conda smithy rerender}} before 
running the build. {{conda-smithy}} should also be able to upload to a 
different channel nowadays through a simple config option.

> [Packaging] Tooling to enable nightly conda packages to be updated to some 
> anaconda.org channel
> ---
>
> Key: ARROW-3054
> URL: https://issues.apache.org/jira/browse/ARROW-3054
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging
>Affects Versions: 0.10.0
>Reporter: Phillip Cloud
>Assignee: Krisztian Szucs
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (ARROW-6132) [Python] ListArray.from_arrays does not check validity of input arrays

2019-08-05 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900260#comment-16900260
 ] 

Uwe L. Korn commented on ARROW-6132:


+1, not getting segfaults or delayed errors on Python APIs is essential.

> [Python] ListArray.from_arrays does not check validity of input arrays
> --
>
> Key: ARROW-6132
> URL: https://issues.apache.org/jira/browse/ARROW-6132
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Minor
>
> From https://github.com/apache/arrow/pull/4979#issuecomment-517593918.
> When creating a ListArray from offsets and values in python, there is no 
> validation of the offsets that it starts with 0 and ends with the length of 
> the array (but is that required? the docs seem to indicate that: 
> https://github.com/apache/arrow/blob/master/docs/source/format/Layout.rst#list-type
>  ("The first value in the offsets array is 0, and the last element is the 
> length of the values array.").
> The array you get "seems" ok (the repr), but on conversion to python or 
> flattened arrays, things go wrong:
> {code}
> In [61]: a = pa.ListArray.from_arrays([1,3,10], np.arange(5)) 
> In [62]: a
> Out[62]: 
> 
> [
>   [
> 1,
> 2
>   ],
>   [
> 3,
> 4
>   ]
> ]
> In [63]: a.flatten()
> Out[63]: 
> 
> [
>   0,   # <--- includes the 0
>   1,
>   2,
>   3,
>   4
> ]
> In [64]: a.to_pylist()
> Out[64]: [[1, 2], [3, 4, 1121, 1, 64, 93969433636432, 13]]  # <--includes 
> more elements as garbage
> {code}
> Calling {{validate}} manually correctly raises:
> {code}
> In [65]: a.validate()
> ...
> ArrowInvalid: Final offset invariant not equal to values length: 10!=5
> {code}
> In C++ the main constructors are not safe, and as the caller you need to 
> ensure that the data is correct or call a safe (slower) constructor. But do 
> we want to use the unsafe / fast constructors without validation in Python as 
> default as well? Or should we do a call to {{validate}} here?
> A quick search seems to indicate that `pa.Array.from_buffers` does 
> validation, but other `from_arrays` method don't seem to explicitly do this. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Resolved] (ARROW-6403) [Python] Expose FileReader::ReadRowGroups() to Python

2019-09-01 Thread Uwe L. Korn (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-6403.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request 5241
[https://github.com/apache/arrow/pull/5241]

> [Python] Expose FileReader::ReadRowGroups() to Python
> -
>
> Key: ARROW-6403
> URL: https://issues.apache.org/jira/browse/ARROW-6403
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Arik Funke
>Assignee: Arik Funke
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Expose ReadRowGroups to Python to allow efficient filtered reading 
> implementations as suggested @xhochy in 
> https://github.com/apache/arrow/issues/2491#issuecomment-416958663_
> Without this PR users would have to re-implement threaded reads in python.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Assigned] (ARROW-6403) [Python] Expose FileReader::ReadRowGroups() to Python

2019-09-01 Thread Uwe L. Korn (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-6403:
--

Assignee: Arik Funke

> [Python] Expose FileReader::ReadRowGroups() to Python
> -
>
> Key: ARROW-6403
> URL: https://issues.apache.org/jira/browse/ARROW-6403
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Arik Funke
>Assignee: Arik Funke
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Expose ReadRowGroups to Python to allow efficient filtered reading 
> implementations as suggested @xhochy in 
> https://github.com/apache/arrow/issues/2491#issuecomment-416958663_
> Without this PR users would have to re-implement threaded reads in python.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (ARROW-6277) [C++][Parquet] Support reading/writing other Parquet primitive types to DictionaryArray

2019-09-05 Thread Uwe L. Korn (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923401#comment-16923401
 ] 

Uwe L. Korn commented on ARROW-6277:


This could be interesting for date columns when working together with pandas. 
To correctly round-trip date columns in the cycle Parquet -> Arrow -> pandas -> 
Arrow -> Parquet you need to use object columns in pandas with datetime.date 
objects. These can be quite repetitive and thus using dictionary encoding helps 
a lot here. Otherwise I would see the same use case for float columns but that 
isn't something I haven't yet used, mostly due to pandas not really working 
well with float categories.

> [C++][Parquet] Support reading/writing other Parquet primitive types to 
> DictionaryArray
> ---
>
> Key: ARROW-6277
> URL: https://issues.apache.org/jira/browse/ARROW-6277
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>
> As follow up to ARROW-3246, we should support direct read/write of the other 
> Parquet primitive types. Currently only BYTE_ARRAY is implemented as it 
> provides the most performance benefit.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (ARROW-6456) [C++] Possible to reduce object code generated in compute/kernels/take.cc?

2019-09-05 Thread Uwe L. Korn (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923407#comment-16923407
 ] 

Uwe L. Korn commented on ARROW-6456:


We should investigate building with link time optimization. While improving 
performance, another great benefit of it is that it reduces binary size, 
especially in such cases where there is a lot of similar code like in this case.

Drawback will be that link times will increase thus we should only add it to a 
single CI job but use it in release builds.

> [C++] Possible to reduce object code generated in compute/kernels/take.cc?
> --
>
> Key: ARROW-6456
> URL: https://issues.apache.org/jira/browse/ARROW-6456
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> According to 
> https://gist.github.com/wesm/90f73d050a81cbff6772aea2203cdf93
> take.cc is our largest piece of object code in the codebase. This is a pretty 
> important function but I wonder if it's possible to make the implementation 
> "leaner" than it is currently to reduce generated code, without sacrificing 
> performance. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Resolved] (ARROW-6326) [C++] Nullable fields when converting std::tuple to Table

2019-09-08 Thread Uwe L. Korn (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-6326.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request 5171
[https://github.com/apache/arrow/pull/5171]

> [C++] Nullable fields when converting std::tuple to Table
> -
>
> Key: ARROW-6326
> URL: https://issues.apache.org/jira/browse/ARROW-6326
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Omer Ozarslan
>Assignee: Omer Ozarslan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> {{std::optional}} isn't used for representing nullable fields in Arrow's 
> current STL conversion API since it requires C++17. Also there are other ways 
> to represent an optional field other than {{std::optional}} such as using 
> pointers or external implementations of optional ({{boost::optional}}, 
> {{type_safe::optional}} and alike). 
> Since it is hard to maintain so many different kinds of specializations, 
> introducing an {{Optional}} concept covering these classes could solve this 
> issue and allow implementing nullable fields consistently.
> So, the gist of proposed change will be something along the lines of:
> {code:cpp}
> template
> constexpr bool is_optional_like_v = ...;
> template
> struct CTypeTraits>> {
>//...
> }
> template
> struct ConversionTraits>> 
> : public CTypeTraits {
>//...
> }
> {code}
> For a type {{T}} to be considered as an {{Optional}}:
> 1) It should be convertible (implicitly or explicitly)  to {{bool}}, i.e. it 
> implements {{[explicit] operator bool()}},
> 2) It should be dereferencable, i.e. it implements {{operator*()}}.
> These two requirements provide a generalized way of templating nullable 
> fields based on pointers, {{std::optional}}, {{boost::optional}} etc. 
> However, it would be better (necessary?) if this implementation should act as 
> a default while not breaking existing specializations of users (e.g. an 
> existing  implementation in which {{std::optional}} is specialized by user).
> Is there any issues this approach may cause that I may have missed?
> I will open a draft PR for working on that meanwhile.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Assigned] (ARROW-6326) [C++] Nullable fields when converting std::tuple to Table

2019-09-08 Thread Uwe L. Korn (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-6326:
--

Assignee: Omer Ozarslan

> [C++] Nullable fields when converting std::tuple to Table
> -
>
> Key: ARROW-6326
> URL: https://issues.apache.org/jira/browse/ARROW-6326
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Omer Ozarslan
>Assignee: Omer Ozarslan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> {{std::optional}} isn't used for representing nullable fields in Arrow's 
> current STL conversion API since it requires C++17. Also there are other ways 
> to represent an optional field other than {{std::optional}} such as using 
> pointers or external implementations of optional ({{boost::optional}}, 
> {{type_safe::optional}} and alike). 
> Since it is hard to maintain so many different kinds of specializations, 
> introducing an {{Optional}} concept covering these classes could solve this 
> issue and allow implementing nullable fields consistently.
> So, the gist of proposed change will be something along the lines of:
> {code:cpp}
> template
> constexpr bool is_optional_like_v = ...;
> template
> struct CTypeTraits>> {
>//...
> }
> template
> struct ConversionTraits>> 
> : public CTypeTraits {
>//...
> }
> {code}
> For a type {{T}} to be considered as an {{Optional}}:
> 1) It should be convertible (implicitly or explicitly)  to {{bool}}, i.e. it 
> implements {{[explicit] operator bool()}},
> 2) It should be dereferencable, i.e. it implements {{operator*()}}.
> These two requirements provide a generalized way of templating nullable 
> fields based on pointers, {{std::optional}}, {{boost::optional}} etc. 
> However, it would be better (necessary?) if this implementation should act as 
> a default while not breaking existing specializations of users (e.g. an 
> existing  implementation in which {{std::optional}} is specialized by user).
> Is there any issues this approach may cause that I may have missed?
> I will open a draft PR for working on that meanwhile.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (ARROW-6504) [Python][Packaging] Add mimalloc to conda packages for better performance

2019-09-12 Thread Uwe L. Korn (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928501#comment-16928501
 ] 

Uwe L. Korn commented on ARROW-6504:


In the case of jemalloc, one of the main reasons is that we need for older 
glibc versions the latest commit on the stable-4 branch. This was never 
released and as conda-forge is only building releases, we couldn't have a build 
for it there.

> [Python][Packaging] Add mimalloc to conda packages for better performance
> -
>
> Key: ARROW-6504
> URL: https://issues.apache.org/jira/browse/ARROW-6504
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (ARROW-6228) [C++] Add context lines to Diff formatting

2019-09-12 Thread Uwe L. Korn (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928547#comment-16928547
 ] 

Uwe L. Korn commented on ARROW-6228:


I would prefer the hunk headers as this gives at least some information about 
the position.

> [C++] Add context lines to Diff formatting
> --
>
> Key: ARROW-6228
> URL: https://issues.apache.org/jira/browse/ARROW-6228
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Trivial
>
> Diff currently renders only inserted or deleted elements, but context lines 
> can be helpful to viewers of the diff. Add an option for configurable context 
> line count to Diff and EqualOptions



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (ARROW-6509) [CI] Java test failures on Travis

2019-09-12 Thread Uwe L. Korn (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928548#comment-16928548
 ] 

Uwe L. Korn commented on ARROW-6509:


We should simply skip tests in the Java build that is done in the Python job. 
They consume precious runtime for something that is tested in another job 
already.

> [CI] Java test failures on Travis
> -
>
> Key: ARROW-6509
> URL: https://issues.apache.org/jira/browse/ARROW-6509
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Java
>Reporter: Antoine Pitrou
>Priority: Critical
> Fix For: 0.15.0
>
>
> This seems to happen more or less frequently on the Python - Java build (with 
> jpype enabled).
> See warnings and errors starting from 
> https://travis-ci.org/apache/arrow/jobs/583069089#L6662



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (ARROW-6509) [CI] Java test failures on Travis

2019-09-12 Thread Uwe L. Korn (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928555#comment-16928555
 ] 

Uwe L. Korn commented on ARROW-6509:


Oh, I wasn't aware of that :( I only thought of pyarrow.jvm

> [CI] Java test failures on Travis
> -
>
> Key: ARROW-6509
> URL: https://issues.apache.org/jira/browse/ARROW-6509
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Java
>Reporter: Antoine Pitrou
>Priority: Critical
> Fix For: 0.15.0
>
>
> This seems to happen more or less frequently on the Python - Java build (with 
> jpype enabled).
> See warnings and errors starting from 
> https://travis-ci.org/apache/arrow/jobs/583069089#L6662



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (ARROW-6577) Dependency conflict in conda packages

2019-09-17 Thread Uwe L. Korn (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931172#comment-16931172
 ] 

Uwe L. Korn commented on ARROW-6577:


I cannot replicate this locally. Can you share some details:

 
 * What is your conda version?
 * What is in your .condarc?

> Dependency conflict in conda packages
> -
>
> Key: ARROW-6577
> URL: https://issues.apache.org/jira/browse/ARROW-6577
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Affects Versions: 0.14.1
> Environment: kernel: 5.2.11-200.fc30.x86_64
> conda 4.6.13
> Python 3.7.3
>Reporter: Suvayu Ali
>Priority: Major
> Attachments: pa-conda.txt
>
>
> When I install pyarrow on a fresh environment, the latest version (0.14.1) is 
> picked up. But installing certain packages downgrades pyarrow to 0.13.0 or 
> 0.12.1. I think a common dependency is causing the downgrade, my guess is 
> boost or protobuf. This is based on several instances of this issue I 
> encountered over the last few weeks. It took me a while to find a somewhat 
> reproducible recipe.
> {code:java}
> $ conda create -n test pyarrow pandas numpy
> ...
> Proceed ([y]/n)? y
> ...
> $ conda install -n test ipython
> ...
> Proceed ([y]/n)? n
> CondaSystemExit: Exiting.
> {code}
> I have attached a mildly edited (to remove progress bars, and control 
> characters) transcript of this session. Here {{ipython}} triggers the 
> problem, and downgrades {{pyarrow}} to 0.12.1, but I think there are other 
> common packages who also conflict in this way. Please let me know if I can 
> provide more info.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (ARROW-6339) [Python][C++] Rowgroup statistics for pd.NaT array ill defined

2019-09-17 Thread Uwe L. Korn (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931175#comment-16931175
 ] 

Uwe L. Korn commented on ARROW-6339:


The problem here is that 
{{parquet_file.metadata.row_group(0).column(0).statistics.has_min_max}} is 
{{False}} and thus {{.max}} should never be accessed. Instead of returning 
undefined data, we should raise an exception.

> [Python][C++] Rowgroup statistics for pd.NaT array ill defined
> --
>
> Key: ARROW-6339
> URL: https://issues.apache.org/jira/browse/ARROW-6339
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.1
>Reporter: Florian Jetter
>Assignee: Florian Jetter
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When initialising an array with NaT only values the row group statistic is 
> corrupt returning either random values or raises integer out of bound 
> exceptions.
> {code:python}
> import io
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> df = pd.DataFrame({"t": pd.Series([pd.NaT], dtype="datetime64[ns]")})
> buf = pa.BufferOutputStream()
> pq.write_table(pa.Table.from_pandas(df), buf, version="2.0")
> buf = io.BytesIO(buf.getvalue().to_pybytes())
> parquet_file = pq.ParquetFile(buf)
> # Asserting behaviour is difficult since it is random and the state is ill 
> defined. 
> # After a few iterations an exception is raised.
> while True:
> parquet_file.metadata.row_group(0).column(0).statistics.max
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Assigned] (ARROW-6339) [Python][C++] Rowgroup statistics for pd.NaT array ill defined

2019-09-17 Thread Uwe L. Korn (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-6339:
--

Assignee: Uwe L. Korn  (was: Florian Jetter)

> [Python][C++] Rowgroup statistics for pd.NaT array ill defined
> --
>
> Key: ARROW-6339
> URL: https://issues.apache.org/jira/browse/ARROW-6339
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.1
>Reporter: Florian Jetter
>Assignee: Uwe L. Korn
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When initialising an array with NaT only values the row group statistic is 
> corrupt returning either random values or raises integer out of bound 
> exceptions.
> {code:python}
> import io
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> df = pd.DataFrame({"t": pd.Series([pd.NaT], dtype="datetime64[ns]")})
> buf = pa.BufferOutputStream()
> pq.write_table(pa.Table.from_pandas(df), buf, version="2.0")
> buf = io.BytesIO(buf.getvalue().to_pybytes())
> parquet_file = pq.ParquetFile(buf)
> # Asserting behaviour is difficult since it is random and the state is ill 
> defined. 
> # After a few iterations an exception is raised.
> while True:
> parquet_file.metadata.row_group(0).column(0).statistics.max
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (ARROW-6577) Dependency conflict in conda packages

2019-09-17 Thread Uwe L. Korn (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931350#comment-16931350
 ] 

Uwe L. Korn commented on ARROW-6577:


The problem seem to be that the package resolution in conda 4.6 has some 
problems. The issue is fixed with conda 4.7, please upgrade. 

> Dependency conflict in conda packages
> -
>
> Key: ARROW-6577
> URL: https://issues.apache.org/jira/browse/ARROW-6577
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Affects Versions: 0.14.1
> Environment: kernel: 5.2.11-200.fc30.x86_64
> conda 4.6.13
> Python 3.7.3
>Reporter: Suvayu Ali
>Priority: Major
> Attachments: pa-conda.txt
>
>
> When I install pyarrow on a fresh environment, the latest version (0.14.1) is 
> picked up. But installing certain packages downgrades pyarrow to 0.13.0 or 
> 0.12.1. I think a common dependency is causing the downgrade, my guess is 
> boost or protobuf. This is based on several instances of this issue I 
> encountered over the last few weeks. It took me a while to find a somewhat 
> reproducible recipe.
> {code:java}
> $ conda create -n test pyarrow pandas numpy
> ...
> Proceed ([y]/n)? y
> ...
> $ conda install -n test ipython
> ...
> Proceed ([y]/n)? n
> CondaSystemExit: Exiting.
> {code}
> I have attached a mildly edited (to remove progress bars, and control 
> characters) transcript of this session. Here {{ipython}} triggers the 
> problem, and downgrades {{pyarrow}} to 0.12.1, but I think there are other 
> common packages who also conflict in this way. Please let me know if I can 
> provide more info.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (ARROW-6577) Dependency conflict in conda packages

2019-09-17 Thread Uwe L. Korn (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931360#comment-16931360
 ] 

Uwe L. Korn commented on ARROW-6577:


[~suvayu] Otherwise this should be solved by using {{conda install ipython 
pyarrow>=0.14}}. As this is a conda issue, it's probably better to ask on the 
conda tracker but I guess that there you will get the same answer: update conda.

> Dependency conflict in conda packages
> -
>
> Key: ARROW-6577
> URL: https://issues.apache.org/jira/browse/ARROW-6577
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Affects Versions: 0.14.1
> Environment: kernel: 5.2.11-200.fc30.x86_64
> conda 4.6.13
> Python 3.7.3
>Reporter: Suvayu Ali
>Priority: Major
> Attachments: pa-conda.txt
>
>
> When I install pyarrow on a fresh environment, the latest version (0.14.1) is 
> picked up. But installing certain packages downgrades pyarrow to 0.13.0 or 
> 0.12.1. I think a common dependency is causing the downgrade, my guess is 
> boost or protobuf. This is based on several instances of this issue I 
> encountered over the last few weeks. It took me a while to find a somewhat 
> reproducible recipe.
> {code:java}
> $ conda create -n test pyarrow pandas numpy
> ...
> Proceed ([y]/n)? y
> ...
> $ conda install -n test ipython
> ...
> Proceed ([y]/n)? n
> CondaSystemExit: Exiting.
> {code}
> I have attached a mildly edited (to remove progress bars, and control 
> characters) transcript of this session. Here {{ipython}} triggers the 
> problem, and downgrades {{pyarrow}} to 0.12.1, but I think there are other 
> common packages who also conflict in this way. Please let me know if I can 
> provide more info.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Assigned] (ARROW-6577) Dependency conflict in conda packages

2019-09-17 Thread Uwe L. Korn (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-6577:
--

Assignee: Uwe L. Korn

> Dependency conflict in conda packages
> -
>
> Key: ARROW-6577
> URL: https://issues.apache.org/jira/browse/ARROW-6577
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Affects Versions: 0.14.1
> Environment: kernel: 5.2.11-200.fc30.x86_64
> conda 4.6.13
> Python 3.7.3
>Reporter: Suvayu Ali
>Assignee: Uwe L. Korn
>Priority: Major
> Attachments: pa-conda.txt
>
>
> When I install pyarrow on a fresh environment, the latest version (0.14.1) is 
> picked up. But installing certain packages downgrades pyarrow to 0.13.0 or 
> 0.12.1. I think a common dependency is causing the downgrade, my guess is 
> boost or protobuf. This is based on several instances of this issue I 
> encountered over the last few weeks. It took me a while to find a somewhat 
> reproducible recipe.
> {code:java}
> $ conda create -n test pyarrow pandas numpy
> ...
> Proceed ([y]/n)? y
> ...
> $ conda install -n test ipython
> ...
> Proceed ([y]/n)? n
> CondaSystemExit: Exiting.
> {code}
> I have attached a mildly edited (to remove progress bars, and control 
> characters) transcript of this session. Here {{ipython}} triggers the 
> problem, and downgrades {{pyarrow}} to 0.12.1, but I think there are other 
> common packages who also conflict in this way. Please let me know if I can 
> provide more info.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Resolved] (ARROW-6577) Dependency conflict in conda packages

2019-09-17 Thread Uwe L. Korn (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-6577.

Resolution: Not A Bug

Closing as this is not an Arrow bug.

> Dependency conflict in conda packages
> -
>
> Key: ARROW-6577
> URL: https://issues.apache.org/jira/browse/ARROW-6577
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Affects Versions: 0.14.1
> Environment: kernel: 5.2.11-200.fc30.x86_64
> conda 4.6.13
> Python 3.7.3
>Reporter: Suvayu Ali
>Priority: Major
> Attachments: pa-conda.txt
>
>
> When I install pyarrow on a fresh environment, the latest version (0.14.1) is 
> picked up. But installing certain packages downgrades pyarrow to 0.13.0 or 
> 0.12.1. I think a common dependency is causing the downgrade, my guess is 
> boost or protobuf. This is based on several instances of this issue I 
> encountered over the last few weeks. It took me a while to find a somewhat 
> reproducible recipe.
> {code:java}
> $ conda create -n test pyarrow pandas numpy
> ...
> Proceed ([y]/n)? y
> ...
> $ conda install -n test ipython
> ...
> Proceed ([y]/n)? n
> CondaSystemExit: Exiting.
> {code}
> I have attached a mildly edited (to remove progress bars, and control 
> characters) transcript of this session. Here {{ipython}} triggers the 
> problem, and downgrades {{pyarrow}} to 0.12.1, but I think there are other 
> common packages who also conflict in this way. Please let me know if I can 
> provide more info.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (ARROW-6585) [C++] Create "ARROW_LIBRARIES" argument to pass list of desired components to build

2019-09-17 Thread Uwe L. Korn (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932110#comment-16932110
 ] 

Uwe L. Korn commented on ARROW-6585:


FTR: there is a related ML discussion about this: "[DISCUSS] Changing C++ build 
system default options to produce more barebones builds"

> [C++] Create "ARROW_LIBRARIES"  argument to pass list of desired components 
> to build
> 
>
> Key: ARROW-6585
> URL: https://issues.apache.org/jira/browse/ARROW-6585
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> Our current {{-DARROW_*}} flag system strikes me as a little bit tedious. 
> When invoking Boost's build system, you can pass the argument 
> {{--with-libraries=filesystem,regex,system}} to indicate which components you 
> want to see built. 
> I think we should do a couple of things declare all component dependencies in 
> a central place. Presently we have many "if" statements toggling on 
> dependencies on an ad hoc basis. The code looks like this
> {code}
> if(ARROW_FLIGHT OR ARROW_PARQUET OR ARROW_BUILD_TESTS)
>   set(ARROW_IPC ON)
> endif()
> if(ARROW_IPC AND NOT ARROW_JSON)
>   message(FATAL_ERROR "JSON support is required for Arrow IPC")
> endif()
> {code}
> I don't think this is going to be scalable. 
> Secondly, I think we should make it easier to ask for a comprehensive build. 
> E.g. {{-DARROW_LIBRARIES=everything}} or similar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-4917) [C++] orc_ep fails in cpp-alpine docker

2019-09-18 Thread Uwe L. Korn (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932315#comment-16932315
 ] 

Uwe L. Korn commented on ARROW-4917:


[~mdeepak] [~owen.omalley] might care. I guess ORC is not testing on Alpine.

> [C++] orc_ep fails in cpp-alpine docker
> ---
>
> Key: ARROW-4917
> URL: https://issues.apache.org/jira/browse/ARROW-4917
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Uwe L. Korn
>Priority: Major
>
> Failure:
> {code:java}
> FAILED: c++/src/CMakeFiles/orc.dir/Timezone.cc.o
> /usr/bin/g++ -Ic++/include -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/include 
> -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/src -Ic++/src -isystem 
> /build/cpp/snappy_ep/src/snappy_ep-install/include -isystem 
> c++/libs/thirdparty/zlib_ep-install/include -isystem 
> c++/libs/thirdparty/lz4_ep-install/include -isystem 
> /arrow/cpp/thirdparty/protobuf_ep-install/include -fdiagnostics-color=always 
> -ggdb -O0 -g -fPIC -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror 
> -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror -O0 -g -MD -MT 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o -MF 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o.d -o 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o -c 
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc: In member function 
> 'void orc::TimezoneImpl::parseTimeVariants(const unsigned char*, uint64_t, 
> uint64_t, uint64_t, uint64_t)':
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: error: 'uint' 
> was not declared in this scope
> uint nameStart = ptr[variantOffset + 6 * variant + 5];
> ^~~~
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: note: 
> suggested alternative: 'rint'
> uint nameStart = ptr[variantOffset + 6 * variant + 5];
> ^~~~
> rint
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: error: 
> 'nameStart' was not declared in this scope
> if (nameStart >= nameCount) {
> ^
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: note: 
> suggested alternative: 'nameCount'
> if (nameStart >= nameCount) {
> ^
> nameCount
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: error: 
> 'nameStart' was not declared in this scope
> + nameOffset + nameStart);
> ^
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: note: 
> suggested alternative: 'nameCount'
> + nameOffset + nameStart);
> ^
> nameCount{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-3891) [Java] Remove Long.bitCount with simple bitmap operations

2018-11-27 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-3891.

   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 3039
[https://github.com/apache/arrow/pull/3039]

> [Java] Remove Long.bitCount with simple bitmap operations
> -
>
> Key: ARROW-3891
> URL: https://issues.apache.org/jira/browse/ARROW-3891
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Animesh Trivedi
>Assignee: Animesh Trivedi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> the `public int isSet(int index)` routine checks if the bit is set by calling 
> Long.bitCount function. This is unnecessary and creates performance 
> degradation. This can simply be replaced by bit shift and bitwise & 
> operation. 
> `return Long.bitCount(b & (1L << bitIndex));`
> to 
> `return (b >> bitIndex) & 0x01;` 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3884) [Python] Add LLVM6 to manylinux1 base image

2018-11-27 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701068#comment-16701068
 ] 

Uwe L. Korn commented on ARROW-3884:


[~pitrou] Where does the requirement to link statically come from? My 
{{manylinux1}} experience tells me that even there shared linking with the 
correct RPATHs and library names is working much better.

> [Python] Add LLVM6 to manylinux1 base image
> ---
>
> Key: ARROW-3884
> URL: https://issues.apache.org/jira/browse/ARROW-3884
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> This is necessary to be able to build and bundle libgandiva with the 0.12 
> release
> This (epic!) build definition in Apache Kudu may be useful for building only 
> the pieces that we need for linking the Gandiva libraries, which may help 
> keep the image size minimal
> https://github.com/apache/kudu/blob/master/thirdparty/build-definitions.sh#L175



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3899) [Python] Table.to_pandas converts Arrow date32[day] to pandas datetime64[ns]

2018-11-29 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16703768#comment-16703768
 ] 

Uwe L. Korn commented on ARROW-3899:


We currently have the parameter {{date_as_object}} on the conversion to Pandas. 
This is set to false. Although I would like to have true as the default, this 
would be a heavy breaking change. We should add a DeprecationWarning that we 
will change that in the next release and then do it a release later.

> [Python] Table.to_pandas converts Arrow date32[day] to pandas datetime64[ns]
> 
>
> Key: ARROW-3899
> URL: https://issues.apache.org/jira/browse/ARROW-3899
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> This issue was raised here:
> https://github.com/wesm/feather/issues/359
> I explored this minimally against Arrow master:
> https://gist.github.com/wesm/2ebe0ca2461d1ecfba6185777238ad1f
> While it's pretty memory-wasteful, it might be better to preserve the intent 
> of the data type when converting to pandas data structures. It also allows 
> the data to round trip successfully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (ARROW-3926) [Python] Add Gandiva bindings to Python wheels

2018-12-02 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-3926:
--

Assignee: Uwe L. Korn

> [Python] Add Gandiva bindings to Python wheels
> --
>
> Key: ARROW-3926
> URL: https://issues.apache.org/jira/browse/ARROW-3926
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.12.0
>
>
> Depends on adding LLVM6 to the build toolchain



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (ARROW-3884) [Python] Add LLVM6 to manylinux1 base image

2018-12-02 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-3884:
--

Assignee: Uwe L. Korn

> [Python] Add LLVM6 to manylinux1 base image
> ---
>
> Key: ARROW-3884
> URL: https://issues.apache.org/jira/browse/ARROW-3884
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.12.0
>
>
> This is necessary to be able to build and bundle libgandiva with the 0.12 
> release
> This (epic!) build definition in Apache Kudu may be useful for building only 
> the pieces that we need for linking the Gandiva libraries, which may help 
> keep the image size minimal
> https://github.com/apache/kudu/blob/master/thirdparty/build-definitions.sh#L175



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3928) [Python] Add option to deduplicate PyBytes / PyString / PyUnicode objects in Table.to_pandas conversion path

2018-12-02 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16706445#comment-16706445
 ] 

Uwe L. Korn commented on ARROW-3928:


We should also do this for {{PyDate}} objects.

> [Python] Add option to deduplicate PyBytes / PyString / PyUnicode objects in 
> Table.to_pandas conversion path
> 
>
> Key: ARROW-3928
> URL: https://issues.apache.org/jira/browse/ARROW-3928
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> While hashing carries a performance penalty, the memory savings can be huge. 
> See also ARROW-3911 -- we should develop some reusable machinery for 
> conversions that yield Python objects



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-3932) [Python/Documentation] Include Benchmarks.md in Sphinx docs

2018-12-03 Thread Uwe L. Korn (JIRA)

Uwe L. Korn created ARROW-3932:
--

 Summary: [Python/Documentation] Include Benchmarks.md in Sphinx 
docs
 Key: ARROW-3932
 URL: https://issues.apache.org/jira/browse/ARROW-3932
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation, Python
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn


https://github.com/apache/arrow/pull/2856#issuecomment-443711136



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-3681) [Go] add benchmarks for CSV reader

2018-12-05 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-3681.

   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 3071
[https://github.com/apache/arrow/pull/3071]

> [Go] add benchmarks for CSV reader
> --
>
> Key: ARROW-3681
> URL: https://issues.apache.org/jira/browse/ARROW-3681
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-3929) [Go] improve memory usage of CSV reader to improve runtime performances

2018-12-05 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-3929.

   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 3073
[https://github.com/apache/arrow/pull/3073]

> [Go] improve memory usage of CSV reader to improve runtime performances
> ---
>
> Key: ARROW-3929
> URL: https://issues.apache.org/jira/browse/ARROW-3929
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-3944) [Python] Build manylinux1 docker image directly in the CI

2018-12-05 Thread Uwe L. Korn (JIRA)

Uwe L. Korn created ARROW-3944:
--

 Summary: [Python] Build manylinux1 docker image directly in the CI
 Key: ARROW-3944
 URL: https://issues.apache.org/jira/browse/ARROW-3944
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.12.0


Instead of always waiting for {{quay.io}} on PRs, we should built the docker 
image in the Travis job. We will still pull the current image from {{quay.io}} 
to profit from the layer caching.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-3944) [Python] Build manylinux1 docker image directly in the CI

2018-12-09 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn closed ARROW-3944.
--
   Resolution: Won't Fix
Fix Version/s: (was: 0.12.0)

It's not feasible to build all dependencies in the CI sadly. They take too much 
time.

> [Python] Build manylinux1 docker image directly in the CI
> -
>
> Key: ARROW-3944
> URL: https://issues.apache.org/jira/browse/ARROW-3944
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Instead of always waiting for {{quay.io}} on PRs, we should built the docker 
> image in the Travis job. We will still pull the current image from 
> {{quay.io}} to profit from the layer caching.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-3972) [Gandiva] Update to LLVM 7

2018-12-09 Thread Uwe L. Korn (JIRA)

Uwe L. Korn created ARROW-3972:
--

 Summary: [Gandiva] Update to LLVM 7 
 Key: ARROW-3972
 URL: https://issues.apache.org/jira/browse/ARROW-3972
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Gandiva
Reporter: Uwe L. Korn


As {{llvmlite}}, the other package in the Python ecosystem moved to LLVM 7, we 
should follow along to avoid problems when we use it in the same Python 
environment as Gandiva.

Reference: https://github.com/numba/llvmlite/pull/412



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-3963) [Packaging/Docker] Nightly test for building sphinx documentations

2018-12-09 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-3963.

   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 3130
[https://github.com/apache/arrow/pull/3130]

> [Packaging/Docker] Nightly test for building sphinx documentations
> --
>
> Key: ARROW-3963
> URL: https://issues.apache.org/jira/browse/ARROW-3963
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: docker, pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-2266) [CI] Improve runtime of integration tests in Travis CI

2018-12-10 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714922#comment-16714922
 ] 

Uwe L. Korn commented on ARROW-2266:


Is this really the part taking a lot of time? Download speeds are not that fast 
in Travis so a docker image might not always be helpful (but I guess probably 
worth it).

> [CI] Improve runtime of integration tests in Travis CI
> --
>
> Key: ARROW-2266
> URL: https://issues.apache.org/jira/browse/ARROW-2266
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Integration
>Reporter: Wes McKinney
>Priority: Major
>
> I was surprised to see that travis_script_integration.sh is taking over 25 
> minutes to run (https://travis-ci.org/apache/arrow/jobs/349493491). My only 
> real guess about what's going on is that JVM startup time on these hosts is 
> super slow.
> I can think of some things we could do to make things better:
> * Add debugging output so we can see what's slow
> * Write a Java integration test handler that validates multiple files at once
> * Generate a single set of binary files for each producer rather than 
> regenerating them each time (so Java would only need to produce binary files 
> once instead of 3 times like now)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-3995) [CI] Use understandable names in Travis Matrix

2018-12-11 Thread Uwe L. Korn (JIRA)

Uwe L. Korn created ARROW-3995:
--

 Summary: [CI] Use understandable names in Travis Matrix 
 Key: ARROW-3995
 URL: https://issues.apache.org/jira/browse/ARROW-3995
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.12.0


Travis has a new feature to assign labels to the matrix entries making it much 
easier navigable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4024) [Python] Cython compilation error on cython==0.27.3

2018-12-16 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722460#comment-16722460
 ] 

Uwe L. Korn commented on ARROW-4024:


[~pcmoritz] Is there anything that keeps you on this Cython version? As Cython 
is a build-time but not a runtime dependency, I would close this bug as won't 
fix (maybe first ensure that we pin correctly).

> [Python] Cython compilation error on cython==0.27.3
> ---
>
> Key: ARROW-4024
> URL: https://issues.apache.org/jira/browse/ARROW-4024
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Major
>
> On the latest master, I'm getting the following error:
> {code:java}
> [ 11%] Compiling Cython CXX source for lib...
> Error compiling Cython file:
> 
> ...
>     out.init(type)
>     return out
> cdef object pyarrow_wrap_metadata(
>     ^
> 
> pyarrow/public-api.pxi:95:5: Function signature does not match previous 
> declaration
> CMakeFiles/lib_pyx.dir/build.make:57: recipe for target 'CMakeFiles/lib_pyx' 
> failed{code}
> With 0.29.0 it is working. This might have been introduced in 
> [https://github.com/apache/arrow/commit/12201841212967c78e31b2d2840b55b1707c4e7b]
>  but I'm not sure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-4045) [Packaging/Python] Add hypothesis test dependency to wheel crossbow tests

2018-12-17 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-4045.

   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 3188
[https://github.com/apache/arrow/pull/3188]

> [Packaging/Python] Add hypothesis test dependency to wheel crossbow tests
> -
>
> Key: ARROW-4045
> URL: https://issues.apache.org/jira/browse/ARROW-4045
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-4054) [Python] Update gtest, flatbuffers and OpenSSL in manylinux1 base image

2018-12-17 Thread Uwe L. Korn (JIRA)

Uwe L. Korn created ARROW-4054:
--

 Summary: [Python] Update gtest, flatbuffers and OpenSSL in 
manylinux1 base image
 Key: ARROW-4054
 URL: https://issues.apache.org/jira/browse/ARROW-4054
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, Python
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.12.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4076) [Python] schema validation and filters

2018-12-19 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724956#comment-16724956
 ] 

Uwe L. Korn commented on ARROW-4076:


I'm ok with this but be aware that having non compatible schemas in a single 
dataset is prone to error out a lot more times.

My main motivation for supporting this is, that this probably also means a 
small performance improvement as we check a lot less schemas 

> [Python] schema validation and filters
> --
>
> Key: ARROW-4076
> URL: https://issues.apache.org/jira/browse/ARROW-4076
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: George Sakkis
>Priority: Minor
>
> Currently [schema 
> validation|https://github.com/apache/arrow/blob/758bd557584107cb336cbc3422744dacd93978af/python/pyarrow/parquet.py#L900]
>  of {{ParquetDataset}} takes place before filtering. This may raise a 
> {{ValueError}} if the schema is different in some dataset pieces, even if 
> these pieces would be subsequently filtered out. I think validation should 
> happen after filtering to prevent such spurious errors:
> {noformat}
> --- a/pyarrow/parquet.py  
> +++ b/pyarrow/parquet.py  
> @@ -878,13 +878,13 @@
>  if split_row_groups:
>  raise NotImplementedError("split_row_groups not yet implemented")
>  
> -if validate_schema:
> -self.validate_schemas()
> -
>  if filters is not None:
>  filters = _check_filters(filters)
>  self._filter(filters)
>  
> +if validate_schema:
> +self.validate_schemas()
> +
>  def validate_schemas(self):
>  open_file = self._get_open_file_func()
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4068) [Gandiva] Support building with Xcode 6.4

2018-12-19 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724960#comment-16724960
 ] 

Uwe L. Korn commented on ARROW-4068:


Do we want to have Gandiva working with this Xcode or simply with conda-forge 
on OSX?

> [Gandiva] Support building with Xcode 6.4
> -
>
> Key: ARROW-4068
> URL: https://issues.apache.org/jira/browse/ARROW-4068
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Gandiva
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> In order to package Gandiva with Python wheels and conda packages on macOS, 
> it would be useful to build and run on Xcode 6.4 if it is not too difficult. 
> I am not sure what are the plans for upgrading past Xcode 6.4 in conda-forge



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3989) [Rust] CSV reader should handle case sensitivity for boolean values

2018-12-19 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724990#comment-16724990
 ] 

Uwe L. Korn commented on ARROW-3989:


New contributors need to first be added to the contributiors role in JIRA, then 
you can assign things to them.

> [Rust] CSV reader should handle case sensitivity for boolean values
> ---
>
> Key: ARROW-3989
> URL: https://issues.apache.org/jira/browse/ARROW-3989
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.11.1
>Reporter: nevi_me
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Excel saves booleans in CSV in upper case, Pandas uses Proper case.
> Our CSV reader doesn't recognise (True, False, TRUE, FALSE). I noticed this 
> when making boolean schema inference case insensitive.
>  
> I would propose that we convert Boolean strings to lower-case before casting 
> them to Rust's bool type. [~andygrove], what do you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (ARROW-3989) [Rust] CSV reader should handle case sensitivity for boolean values

2018-12-19 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-3989:
--

Assignee: nevi_me

> [Rust] CSV reader should handle case sensitivity for boolean values
> ---
>
> Key: ARROW-3989
> URL: https://issues.apache.org/jira/browse/ARROW-3989
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.11.1
>Reporter: nevi_me
>Assignee: nevi_me
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Excel saves booleans in CSV in upper case, Pandas uses Proper case.
> Our CSV reader doesn't recognise (True, False, TRUE, FALSE). I noticed this 
> when making boolean schema inference case insensitive.
>  
> I would propose that we convert Boolean strings to lower-case before casting 
> them to Rust's bool type. [~andygrove], what do you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3989) [Rust] CSV reader should handle case sensitivity for boolean values

2018-12-19 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724994#comment-16724994
 ] 

Uwe L. Korn commented on ARROW-3989:


Done.

> [Rust] CSV reader should handle case sensitivity for boolean values
> ---
>
> Key: ARROW-3989
> URL: https://issues.apache.org/jira/browse/ARROW-3989
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.11.1
>Reporter: nevi_me
>Assignee: nevi_me
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Excel saves booleans in CSV in upper case, Pandas uses Proper case.
> Our CSV reader doesn't recognise (True, False, TRUE, FALSE). I noticed this 
> when making boolean schema inference case insensitive.
>  
> I would propose that we convert Boolean strings to lower-case before casting 
> them to Rust's bool type. [~andygrove], what do you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4078) [CI] Need separate doc building job

2018-12-19 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725087#comment-16725087
 ] 

Uwe L. Korn commented on ARROW-4078:


Changes in the docs should simply trigger PYTHON_AFFFECTED?

> [CI] Need separate doc building job
> ---
>
> Key: ARROW-4078
> URL: https://issues.apache.org/jira/browse/ARROW-4078
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Documentation
>Reporter: Antoine Pitrou
>Priority: Major
>
> When only changes to the {{docs}} directory are made, most Travis jobs are 
> skipped, even the Python job which (presumably) builds the documentation to 
> check for errors etc.
> We should probably have a separate doc building job, or perhaps make it part 
> of the linting job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4079) [C++] Add machine benchmarks

2018-12-19 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725093#comment-16725093
 ] 

Uwe L. Korn commented on ARROW-4079:


These tests just test out the general performance of the system and are not 
Arrow specific? That would still be a nice addition as it can be used as a good 
baseline for comparing benchmark numbers between two systems.

> [C++] Add machine benchmarks
> 
>
> Key: ARROW-4079
> URL: https://issues.apache.org/jira/browse/ARROW-4079
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Affects Versions: 0.11.1
>Reporter: Antoine Pitrou
>Priority: Minor
>
> I wonder if it may be useful to add machine benchmarks. I have a cache/memory 
> latency benchmark lying around, we could also add e.g. memory bandwidth 
> benchmarks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-4107) [Python] Use ninja in pyarrow manylinux1 build

2018-12-23 Thread Uwe L. Korn (JIRA)

Uwe L. Korn created ARROW-4107:
--

 Summary: [Python] Use ninja in pyarrow manylinux1 build
 Key: ARROW-4107
 URL: https://issues.apache.org/jira/browse/ARROW-4107
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, Python
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn


This should speed up the built slightly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-4102) [C++] FixedSizeBinary identity cast not implemented

2018-12-27 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-4102.

Resolution: Fixed

Resolved by https://github.com/apache/arrow/pull/3265

> [C++] FixedSizeBinary identity cast not implemented
> ---
>
> Key: ARROW-4102
> URL: https://issues.apache.org/jira/browse/ARROW-4102
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Francois Saint-Jacques
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-4088) [Python] Table.from_batches() fails when passed a schema with metadata

2018-12-27 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-4088.

Resolution: Fixed

Issue resolved by pull request 3256
[https://github.com/apache/arrow/pull/3256]

> [Python] Table.from_batches() fails when passed a schema with metadata
> --
>
> Key: ARROW-4088
> URL: https://issues.apache.org/jira/browse/ARROW-4088
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.11.0
>Reporter: Thomas Buhrmann
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available, regression
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This seems to be a regression. In 0.10 I used to have this function to set 
> column-level and table-level metadata on an existing Table:
>   
> {code:python}
> def set_metadata(tbl, col_meta={}, tbl_meta={}):
> # Create updated column fields with new metadata
> if col_meta or tbl_meta:
> fields = []
> for col in tbl.itercolumns():
> if col.name in col_meta:
> # Get updated column metadata
> metadata = col.field.metadata or {}
> for k, v in col_meta[col.name].items():
> metadata[k] = json.dumps(v).encode('utf-8')
> # Update field with updated metadata
> fields.append(col.field.add_metadata(metadata))
> else:
> fields.append(col.field)
> # Get updated table metadata
> tbl_metadata = tbl.schema.metadata
> for k, v in tbl_meta.items():
> tbl_metadata[k] = json.dumps(v).encode('utf-8')
> # Create new schema with updated metadata
> schema = pa.schema(fields, metadata=tbl_metadata)
> # With updated schema build new table (shouldn't copy data?)
> tbl = pa.Table.from_batches(tbl.to_batches(), schema=schema)
> return tbl
> {code}
> However, in 0.11 this fails with error:
> {noformat}
> ArrowInvalid: Schema at index 0 was different: 
> x: int64
> vs
> x: int64
> ...
> {noformat}
> It works however if I replace from_batches() with from_arrays(), like this:
> {code}
> tbl = pa.Table.from_arrays(list(tbl.itercolumns()), schema=schema)
> {code}
> It seems that from_batches() compares the existing batch's schema with the 
> new schema, and upon encountering a difference (in metadata only) fails.
> A short test would be this:
> {code}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'x': [0,1,2]})
> tbl = pa.Table.from_pandas(df, preserve_index=False)
> field = tbl.schema[0].add_metadata({'test': 'data'})
> schema = pa.schema([field])
> # tbl2 = pa.Table.from_arrays(list(tbl.itercolumns()), schema=schema)
> tbl2 = pa.Table.from_batches(tbl.to_batches(), schema)
> tbl2.schema[0].metadata
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-3932) [Python/Documentation] Include Benchmarks.md in Sphinx docs

2018-12-27 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-3932.

   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 3249
[https://github.com/apache/arrow/pull/3249]

> [Python/Documentation] Include Benchmarks.md in Sphinx docs
> ---
>
> Key: ARROW-3932
> URL: https://issues.apache.org/jira/browse/ARROW-3932
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> https://github.com/apache/arrow/pull/2856#issuecomment-443711136



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-4129) [Python] Fix syntax problem in benchmark docs

2018-12-28 Thread Uwe L. Korn (JIRA)

Uwe L. Korn created ARROW-4129:
--

 Summary: [Python] Fix syntax problem in benchmark docs
 Key: ARROW-4129
 URL: https://issues.apache.org/jira/browse/ARROW-4129
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation, Python
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.12.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-3020) [Python] Addition of option to allow empty Parquet row groups

2018-12-28 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-3020.

Resolution: Fixed

Issue resolved by pull request 3269
[https://github.com/apache/arrow/pull/3269]

> [Python] Addition of option to allow empty Parquet row groups
> -
>
> Key: ARROW-3020
> URL: https://issues.apache.org/jira/browse/ARROW-3020
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Alex Mendelson
>Assignee: Wes McKinney
>Priority: Major
>  Labels: parquet, pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> While our use case is not common, I was able to find one related request from 
> roughly a year ago. Could this be added as a feature?
> https://issues.apache.org/jira/browse/PARQUET-1047
> *Motivation*
> We have an application where each row is associated with one of N contexts, 
> though a minority of contexts may have no associated rows. When encountering 
> the Nth context, we will wish to retrieve all the associated rows. Row groups 
> would provide a natural way to index the data, as the nth context could 
> naturally relate to the nth row group.
> Unfortunately, this is not possible at the present time, as pyarrow does not 
> support writing empty row groups. If one writes a pyarrow.Table containing 
> zero rows using pyarrow.parquet.ParquetWriter, it is omitted from the final 
> file, and this distorts the indexing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3535) [Python] pip install tensorflow install too new numpy in manylinux1 build

2018-12-28 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16730310#comment-16730310
 ] 

Uwe L. Korn commented on ARROW-3535:


We have pinned tensorflow to 1.1.0, that should suffice to keep numpy stable.

> [Python] pip install tensorflow install too new numpy in manylinux1 build
> -
>
> Key: ARROW-3535
> URL: https://issues.apache.org/jira/browse/ARROW-3535
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Blocker
> Fix For: 0.12.0
>
>
> This blocks us from doing a release again. We definitely need to get this 
> split apart before we do another release.
> [~pcmoritz] [~robertnishihara]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-3535) [Python] pip install tensorflow install too new numpy in manylinux1 build

2018-12-28 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-3535.

Resolution: Fixed
  Assignee: Uwe L. Korn

> [Python] pip install tensorflow install too new numpy in manylinux1 build
> -
>
> Key: ARROW-3535
> URL: https://issues.apache.org/jira/browse/ARROW-3535
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Blocker
> Fix For: 0.12.0
>
>
> This blocks us from doing a release again. We definitely need to get this 
> split apart before we do another release.
> [~pcmoritz] [~robertnishihara]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-4129) [Python] Fix syntax problem in benchmark docs

2018-12-28 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-4129.

Resolution: Fixed

Issue resolved by pull request 3282
[https://github.com/apache/arrow/pull/3282]

> [Python] Fix syntax problem in benchmark docs
> -
>
> Key: ARROW-4129
> URL: https://issues.apache.org/jira/browse/ARROW-4129
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (ARROW-4139) [Python] Cast Parquet column statistics to unicode if UTF8 ConvertedType is set

2019-01-02 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732236#comment-16732236
 ] 

Uwe L. Korn edited comment on ARROW-4139 at 1/2/19 5:20 PM:


We currently always return the physical types in all cases for the statistics. 
We should add {{logical_(min,max)}} as additional accessors.


was (Author: xhochy):
We currently always return the physical types in all cases for the statistics. 
We should add {{logical_{min,max}}} as additional accessors.

> [Python] Cast Parquet column statistics to unicode if UTF8 ConvertedType is 
> set
> ---
>
> Key: ARROW-4139
> URL: https://issues.apache.org/jira/browse/ARROW-4139
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Matthew Rocklin
>Priority: Minor
>  Labels: parquet, python
> Fix For: 0.13.0
>
>
> When writing Pandas data to Parquet format and reading it back again I find 
> that that statistics of text columns are stored as byte arrays rather than as 
> unicode text. 
> I'm not sure if this is a bug in Arrow, PyArrow, or just in my understanding 
> of how best to manage statistics.  (I'd be quite happy to learn that it was 
> the latter).
> Here is a minimal example
> {code:python}
> import pandas as pd
> df = pd.DataFrame({'x': ['a']})
> df.to_parquet('df.parquet')
> import pyarrow.parquet as pq
> pf = pq.ParquetDataset('df.parquet')
> piece = pf.pieces[0]
> rg = piece.row_group(0)
> md = piece.get_metadata(pq.ParquetFile)
> rg = md.row_group(0)
> c = rg.column(0)
> >>> c
> 
>   file_offset: 63
>   file_path: 
>   physical_type: BYTE_ARRAY
>   num_values: 1
>   path_in_schema: x
>   is_stats_set: True
>   statistics:
> 
>   has_min_max: True
>   min: b'a'
>   max: b'a'
>   null_count: 0
>   distinct_count: 0
>   num_values: 1
>   physical_type: BYTE_ARRAY
>   compression: SNAPPY
>   encodings: ('PLAIN_DICTIONARY', 'PLAIN', 'RLE')
>   has_dictionary_page: True
>   dictionary_page_offset: 4
>   data_page_offset: 25
>   total_compressed_size: 59
>   total_uncompressed_size: 55
> >>> type(c.statistics.min)
> bytes
> {code}
> My guess is that we would want to store a logical type in the statistics like 
> UNICODE, though I don't have enough experience with Parquet data types to 
> know if this is a good idea or possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4139) [Python] Cast Parquet column statistics to unicode if UTF8 ConvertedType is set

2019-01-02 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732236#comment-16732236
 ] 

Uwe L. Korn commented on ARROW-4139:


We currently always return the physical types in all cases for the statistics. 
We should add {{logical_{min,max}}} as additional accessors.

> [Python] Cast Parquet column statistics to unicode if UTF8 ConvertedType is 
> set
> ---
>
> Key: ARROW-4139
> URL: https://issues.apache.org/jira/browse/ARROW-4139
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Matthew Rocklin
>Priority: Minor
>  Labels: parquet, python
> Fix For: 0.13.0
>
>
> When writing Pandas data to Parquet format and reading it back again I find 
> that that statistics of text columns are stored as byte arrays rather than as 
> unicode text. 
> I'm not sure if this is a bug in Arrow, PyArrow, or just in my understanding 
> of how best to manage statistics.  (I'd be quite happy to learn that it was 
> the latter).
> Here is a minimal example
> {code:python}
> import pandas as pd
> df = pd.DataFrame({'x': ['a']})
> df.to_parquet('df.parquet')
> import pyarrow.parquet as pq
> pf = pq.ParquetDataset('df.parquet')
> piece = pf.pieces[0]
> rg = piece.row_group(0)
> md = piece.get_metadata(pq.ParquetFile)
> rg = md.row_group(0)
> c = rg.column(0)
> >>> c
> 
>   file_offset: 63
>   file_path: 
>   physical_type: BYTE_ARRAY
>   num_values: 1
>   path_in_schema: x
>   is_stats_set: True
>   statistics:
> 
>   has_min_max: True
>   min: b'a'
>   max: b'a'
>   null_count: 0
>   distinct_count: 0
>   num_values: 1
>   physical_type: BYTE_ARRAY
>   compression: SNAPPY
>   encodings: ('PLAIN_DICTIONARY', 'PLAIN', 'RLE')
>   has_dictionary_page: True
>   dictionary_page_offset: 4
>   data_page_offset: 25
>   total_compressed_size: 59
>   total_uncompressed_size: 55
> >>> type(c.statistics.min)
> bytes
> {code}
> My guess is that we would want to store a logical type in the statistics like 
> UNICODE, though I don't have enough experience with Parquet data types to 
> know if this is a good idea or possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4138) [Python] setuptools_scm customization does not work for versions above 0.9.0 on Windows

2019-01-02 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732246#comment-16732246
 ] 

Uwe L. Korn commented on ARROW-4138:


[~wesmckinn] Which version of git are you using on Windows? I suspect this to 
either be an too old git version or that the command is not escaped correctly, 
e.g. the brackets need better escaping.

> [Python] setuptools_scm customization does not work for versions above 0.9.0 
> on Windows
> ---
>
> Key: ARROW-4138
> URL: https://issues.apache.org/jira/browse/ARROW-4138
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> {code}
> C:\Users\wesm\code\arrow\python (ARROW-3910 -> origin)
> (pyarrow-dev) λ git describe --dirty --tags --long --match 
> 'apache-arrow-[0-9].*'
> fatal: No names found, cannot describe anything.
> C:\Users\wesm\code\arrow\python (ARROW-3910 -> origin)
> (pyarrow-dev) λ git describe --dirty --tags --long
> apache-arrow-0.11.0-499-gf77c2967-dirty
> {code}
> It's possible this is a Windows-specific issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4140) [C++][Gandiva] Compiled LLVM bitcode file path may result in libraries being non-relocatable

2019-01-02 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732248#comment-16732248
 ] 

Uwe L. Korn commented on ARROW-4140:


[~wesmckinn] Is this relocatable in the {{-fPIC}} sense or hard-coded 
filesystem paths?

> [C++][Gandiva] Compiled LLVM bitcode file path may result in libraries being 
> non-relocatable
> 
>
> Key: ARROW-4140
> URL: https://issues.apache.org/jira/browse/ARROW-4140
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Gandiva
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> We rely on relocatable binaries in our Python wheel and conda packages. We 
> should investigate a solution that permits a relative path to the bitcode file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-4148) [CI/Python] Disable ORC on nightly Alpine builds

2019-01-03 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-4148.

   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 3297
[https://github.com/apache/arrow/pull/3297]

> [CI/Python] Disable ORC on nightly Alpine builds
> 
>
> Key: ARROW-4148
> URL: https://issues.apache.org/jira/browse/ARROW-4148
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> See CI failure https://travis-ci.org/kszucs/crossbow/builds/474545437



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4156) [C++] xcodebuild failure for cmake generated project

2019-01-04 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16734100#comment-16734100
 ] 

Uwe L. Korn commented on ARROW-4156:


After I upgraded to the latest Xcode version, I can now reproduce this. Various 
links in [https://github.com/IntelVCL/Open3D/issues/401#issue-334282264] 
mention that Xcode and object libs don't work well together. I think the most 
simple solution would be not use object libs when building with Xcode. This 
would then sadly lead to a doubling of the build time when one builds shared 
and static libraries. I will play around with some options and update later.

 

> [C++] xcodebuild failure for cmake generated project
> 
>
> Key: ARROW-4156
> URL: https://issues.apache.org/jira/browse/ARROW-4156
> Project: Apache Arrow
>  Issue Type: Wish
>Reporter: Hatem Helal
>Assignee: Uwe L. Korn
>Priority: Minor
> Attachments: cmakeoutput.txt
>
>
> Using the cmake xcode project generator fails to build using xcodebuild as 
> follows:
> {code:java}
> $ cmake .. -G Xcode -DARROW_PARQUET=ON  -DPARQUET_BUILD_EXECUTABLES=ON 
> -DPARQUET_BUILD_EXAMPLES=ON  
> -DFLATBUFFERS_HOME=/usr/local/Cellar/flatbuffers/1.10.0 
> -DCMAKE_BUILD_TYPE=Debug  -DTHRIFT_HOME=/usr/local/Cellar/thrift/0.11.0 
> -DARROW_EXTRA_ERROR_CONTEXT=ON -DARROW_BUILD_TESTS=ON 
> -DClangTools_PATH=/usr/local/Cellar/llvm@6/6.0.1_1
> 
> Libtool 
> xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal/libarrow_objlib.a
>  normal x86_64
> cd /Users/hhelal/Documents/code/arrow/cpp
> export MACOSX_DEPLOYMENT_TARGET=10.14
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool
>  -static -arch_only x86_64 -syslibroot 
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk
>  
> -L/Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal
>  -filelist 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal/x86_64/arrow_objlib.LinkFileList
>  -o 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal/libarrow_objlib.a
> PhaseScriptExecution CMake\ PostBuild\ Rules 
> xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Script-2604120B03B14AB58C2E586A.sh
> cd /Users/hhelal/Documents/code/arrow/cpp
> /bin/sh -c 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Script-2604120B03B14AB58C2E586A.sh
> echo "Depend check for xcode"
> Depend check for xcode
> cd /Users/hhelal/Documents/code/arrow/cpp/xcode-build && make -C 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build -f 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/CMakeScripts/XCODE_DEPEND_HELPER.make
>  PostBuild.arrow_objlib.Debug
> /bin/rm -f 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.dylib
> /bin/rm -f 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.a
> === BUILD TARGET arrow_shared OF PROJECT arrow WITH THE DEFAULT CONFIGURATION 
> (Debug) ===
> Check dependencies
> Write auxiliary files
> write-file 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> chmod 0755 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> PhaseScriptExecution CMake\ PostBuild\ Rules 
> xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> cd /Users/hhelal/Documents/code/arrow/cpp
> /bin/sh -c 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> echo "Creating symlinks"
> Creating symlinks
> /usr/local/Cellar/cmake/3.12.4/bin/cmake -E cmake_symlink_library 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.12.0.0.dylib
>  
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.12.dylib
>  /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.dylib
> CMake Error: cmake_symlink_library: System Error: No such file or directory
> CMake Error: cmake_symlink_library: System Error: No such file or directory
> make: *** [arrow_shared_buildpart_0] Error 1
> ** BUILD FAILED **
> The following build commands failed:
> PhaseScriptExecution CMake\ PostBuild\ Rules 
> xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> (1 failure)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4156) [C++] xcodebuild failure for cmake generated project

2019-01-04 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16734147#comment-16734147
 ] 

Uwe L. Korn commented on ARROW-4156:


[~hatem] Can you have a look at the linked PR? That fixes the compilation for 
me locally.

> [C++] xcodebuild failure for cmake generated project
> 
>
> Key: ARROW-4156
> URL: https://issues.apache.org/jira/browse/ARROW-4156
> Project: Apache Arrow
>  Issue Type: Wish
>Reporter: Hatem Helal
>Assignee: Uwe L. Korn
>Priority: Minor
>  Labels: pull-request-available
> Attachments: cmakeoutput.txt, xcodebuildOutput.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Using the cmake xcode project generator fails to build using xcodebuild as 
> follows:
> {code:java}
> $ cmake .. -G Xcode -DARROW_PARQUET=ON  -DPARQUET_BUILD_EXECUTABLES=ON 
> -DPARQUET_BUILD_EXAMPLES=ON  
> -DFLATBUFFERS_HOME=/usr/local/Cellar/flatbuffers/1.10.0 
> -DCMAKE_BUILD_TYPE=Debug  -DTHRIFT_HOME=/usr/local/Cellar/thrift/0.11.0 
> -DARROW_EXTRA_ERROR_CONTEXT=ON -DARROW_BUILD_TESTS=ON 
> -DClangTools_PATH=/usr/local/Cellar/llvm@6/6.0.1_1
> 
> Libtool 
> xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal/libarrow_objlib.a
>  normal x86_64
> cd /Users/hhelal/Documents/code/arrow/cpp
> export MACOSX_DEPLOYMENT_TARGET=10.14
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool
>  -static -arch_only x86_64 -syslibroot 
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk
>  
> -L/Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal
>  -filelist 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal/x86_64/arrow_objlib.LinkFileList
>  -o 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal/libarrow_objlib.a
> PhaseScriptExecution CMake\ PostBuild\ Rules 
> xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Script-2604120B03B14AB58C2E586A.sh
> cd /Users/hhelal/Documents/code/arrow/cpp
> /bin/sh -c 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Script-2604120B03B14AB58C2E586A.sh
> echo "Depend check for xcode"
> Depend check for xcode
> cd /Users/hhelal/Documents/code/arrow/cpp/xcode-build && make -C 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build -f 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/CMakeScripts/XCODE_DEPEND_HELPER.make
>  PostBuild.arrow_objlib.Debug
> /bin/rm -f 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.dylib
> /bin/rm -f 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.a
> === BUILD TARGET arrow_shared OF PROJECT arrow WITH THE DEFAULT CONFIGURATION 
> (Debug) ===
> Check dependencies
> Write auxiliary files
> write-file 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> chmod 0755 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> PhaseScriptExecution CMake\ PostBuild\ Rules 
> xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> cd /Users/hhelal/Documents/code/arrow/cpp
> /bin/sh -c 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> echo "Creating symlinks"
> Creating symlinks
> /usr/local/Cellar/cmake/3.12.4/bin/cmake -E cmake_symlink_library 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.12.0.0.dylib
>  
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.12.dylib
>  /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.dylib
> CMake Error: cmake_symlink_library: System Error: No such file or directory
> CMake Error: cmake_symlink_library: System Error: No such file or directory
> make: *** [arrow_shared_buildpart_0] Error 1
> ** BUILD FAILED **
> The following build commands failed:
> PhaseScriptExecution CMake\ PostBuild\ Rules 
> xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> (1 failure)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-4156) [C++] xcodebuild failure for cmake generated project

2019-01-04 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-4156.

   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 3308
[https://github.com/apache/arrow/pull/3308]

> [C++] xcodebuild failure for cmake generated project
> 
>
> Key: ARROW-4156
> URL: https://issues.apache.org/jira/browse/ARROW-4156
> Project: Apache Arrow
>  Issue Type: Wish
>Reporter: Hatem Helal
>Assignee: Uwe L. Korn
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.0
>
> Attachments: cmakeoutput.txt, xcodebuildOutput.txt
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Using the cmake xcode project generator fails to build using xcodebuild as 
> follows:
> {code:java}
> $ cmake .. -G Xcode -DARROW_PARQUET=ON  -DPARQUET_BUILD_EXECUTABLES=ON 
> -DPARQUET_BUILD_EXAMPLES=ON  
> -DFLATBUFFERS_HOME=/usr/local/Cellar/flatbuffers/1.10.0 
> -DCMAKE_BUILD_TYPE=Debug  -DTHRIFT_HOME=/usr/local/Cellar/thrift/0.11.0 
> -DARROW_EXTRA_ERROR_CONTEXT=ON -DARROW_BUILD_TESTS=ON 
> -DClangTools_PATH=/usr/local/Cellar/llvm@6/6.0.1_1
> 
> Libtool 
> xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal/libarrow_objlib.a
>  normal x86_64
> cd /Users/hhelal/Documents/code/arrow/cpp
> export MACOSX_DEPLOYMENT_TARGET=10.14
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool
>  -static -arch_only x86_64 -syslibroot 
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk
>  
> -L/Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal
>  -filelist 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal/x86_64/arrow_objlib.LinkFileList
>  -o 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal/libarrow_objlib.a
> PhaseScriptExecution CMake\ PostBuild\ Rules 
> xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Script-2604120B03B14AB58C2E586A.sh
> cd /Users/hhelal/Documents/code/arrow/cpp
> /bin/sh -c 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Script-2604120B03B14AB58C2E586A.sh
> echo "Depend check for xcode"
> Depend check for xcode
> cd /Users/hhelal/Documents/code/arrow/cpp/xcode-build && make -C 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build -f 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/CMakeScripts/XCODE_DEPEND_HELPER.make
>  PostBuild.arrow_objlib.Debug
> /bin/rm -f 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.dylib
> /bin/rm -f 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.a
> === BUILD TARGET arrow_shared OF PROJECT arrow WITH THE DEFAULT CONFIGURATION 
> (Debug) ===
> Check dependencies
> Write auxiliary files
> write-file 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> chmod 0755 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> PhaseScriptExecution CMake\ PostBuild\ Rules 
> xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> cd /Users/hhelal/Documents/code/arrow/cpp
> /bin/sh -c 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> echo "Creating symlinks"
> Creating symlinks
> /usr/local/Cellar/cmake/3.12.4/bin/cmake -E cmake_symlink_library 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.12.0.0.dylib
>  
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.12.dylib
>  /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.dylib
> CMake Error: cmake_symlink_library: System Error: No such file or directory
> CMake Error: cmake_symlink_library: System Error: No such file or directory
> make: *** [arrow_shared_buildpart_0] Error 1
> ** BUILD FAILED **
> The following build commands failed:
> PhaseScriptExecution CMake\ PostBuild\ Rules 
> xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> (1 failure)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4083) [C++] Allowing ChunkedArrays to contain a mix of DictionaryArray and dense Array (of the dictionary type)

2019-01-04 Thread Uwe L. Korn (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16734757#comment-16734757
 ] 

Uwe L. Korn commented on ARROW-4083:


I think this will be something that really confuses users and leads to 
problems. I would rather stick to DictionaryType being a real type. For the 
described use case, I would rather expect the user to emit a set of 
RecordBatches in their API. These batches will not have to be all of the same 
schema but the schema should be evolable to each other. By this path, we keep 
the ChunkedArray as simple as it currently is but use the RecordBatch as the 
base to pass data around that doesn't have exactly the same schemas.

 

> [C++] Allowing ChunkedArrays to contain a mix of DictionaryArray and dense 
> Array (of the dictionary type)
> -
>
> Key: ARROW-4083
> URL: https://issues.apache.org/jira/browse/ARROW-4083
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> In some applications we may receive a stream of some dictionary encoded data 
> followed by some non-dictionary encoded data. For example this happens in 
> Parquet files when the dictionary reaches a certain configurable size 
> threshold.
> We should think about how we can model this in our in-memory data structures, 
> and how it can flow through to relevant computational components (i.e. 
> certain data flow observers -- like an Aggregation -- might need to be able 
> to process either a dense or dictionary encoded version of a particular array 
> in the same stream)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-4173) JIRA library name is wrong in error message of dev/merge_arrow_pr.py

2019-01-07 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-4173.

   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 3326
[https://github.com/apache/arrow/pull/3326]

> JIRA library name is wrong in error message of dev/merge_arrow_pr.py
> 
>
> Key: ARROW-4173
> URL: https://issues.apache.org/jira/browse/ARROW-4173
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1123 matches

Mail list logo