[jira] [Commented] (ARROW-4611) [C++] Rework CMake third-party logic

2019-02-19 Thread Uwe L. Korn (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771674#comment-16771674
 ] 

Uwe L. Korn commented on ARROW-4611:


[~kou] [~wesmckinn] Please take a look at 
[https://github.com/apache/arrow/pull/3688/commits/a1cd7f8121a1fdc4fc2d1f865573f26fea83801c]
 This is roughly what I image with the change done just for 
{{double-conversion}}.

> [C++] Rework CMake third-party logic
> 
>
> Key: ARROW-4611
> URL: https://issues.apache.org/jira/browse/ARROW-4611
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> Instead of the current approach we are taking with the {{*_HOME}} variables, 
> we should use more CMake features and also give users of Arrow a more 
> high-level control. This is going to be a rather lengthy issue with a lot of 
> subtasks.
>  * Let the user decide on the top-level how dependencies should be handled. 
> At the moment I can think of the following modes:
>  ** AUTO: Guess the packaging system we're running in, use this where 
> possible, otherwise build the dependencies through the {{ExternalProject}} 
> logic.
>  ** BUNDLED: Don't use any dependencies, build them all through 
> {{ExternalProject}}
>  ** SYSTEM: Use CMake's {{find_package}} and {{find_library}} without any 
> custom paths. If packages are on non-default locations, let the user indicate 
> it from the outside using the {{*_ROOT}} variables.
>  ** CONDA: Same as SYSTEM but set all {{*_ROOT}} variables to 
> {{ENV\{CONDA_PREFIX\}}}.
>  ** BREW: This uses SYSTEM but asks {{brew}} for some dependencies for their 
> installation prefix.
>  * prefer dynamic linkage where possible
>  * Use {{pkg-config}} and {{*Targets.cmake}} files in projects that publish 
> these
>  * Ensure that the necessary integration tests are in place (Fedora, Debian, 
> Ubuntu, Alpine)
>  * Integration tests that Arrow's {{*Targets.cmake}} and {{arrow.pc}} work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3058) [Python] Feather reads fail with unintuitive error when conversion from pandas yields ChunkedArray

2019-02-19 Thread Georg Heiler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771738#comment-16771738
 ] 

Georg Heiler commented on ARROW-3058:
-

Perhaps this is not yet fixing all the things 
[https://stackoverflow.com/questions/54760447/pandas-read-parquet-with-struct-not-array]
 still fails to read a simple spark struct

> [Python] Feather reads fail with unintuitive error when conversion from 
> pandas yields ChunkedArray
> --
>
> Key: ARROW-3058
> URL: https://issues.apache.org/jira/browse/ARROW-3058
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> See report in 
> https://github.com/wesm/feather/issues/321#issuecomment-412884084
> Individual string columns with more than 2GB are currently unsupported in the 
> Feather format 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4622) MakeDense and MakeSparse in UnionArray should accept a vector of Field

2019-02-19 Thread Kenta Murata (JIRA)
Kenta Murata created ARROW-4622:
---

 Summary: MakeDense and MakeSparse in UnionArray should accept a 
vector of Field
 Key: ARROW-4622
 URL: https://issues.apache.org/jira/browse/ARROW-4622
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, GLib, Python
Reporter: Kenta Murata
Assignee: Kenta Murata


Currently MakeDense and MakeUnion of UnionArray couldn't create a UnionArray 
with user-specified field names.  This is bugs of these functions.

To fix them, optional arguments of std::vector should be added.

GLib and Python bindings should be fixed, together.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4617) [C++] Support double-conversion<3.1

2019-02-19 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-4617:
---
Component/s: R
 Packaging

> [C++] Support double-conversion<3.1
> ---
>
> Key: ARROW-4617
> URL: https://issues.apache.org/jira/browse/ARROW-4617
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Packaging, R
>Reporter: Uwe L. Korn
>Priority: Major
>
> At the moment, we require {{double-conversion>=3.1}} but distributions are 
> all mostly on 2.1 or 3.0. We should support them 
> Nice side-effect is that {{double-conversion}} is not good in updating their 
> version number. So thus 3.0 report itself as 2.0.1 and 3.1.x as 3.0 in the 
> CMake definitions. Thus we cannot rely on their version number reporting for 
> detecting the version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4291) [Dev] Support selecting features in release scripts

2019-02-19 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-4291:
---
Fix Version/s: 0.12.1

> [Dev] Support selecting features in release scripts
> ---
>
> Key: ARROW-4291
> URL: https://issues.apache.org/jira/browse/ARROW-4291
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Developer Tools, Packaging
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0, 0.12.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Sometimes not all components can be verified on a system. We should provide 
> some environment variables to exclude them to proceed to the next step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4367) [C++] StringDictionaryBuilder segfaults on Finish with only null entries

2019-02-19 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-4367:
---
Fix Version/s: 0.12.1

> [C++] StringDictionaryBuilder segfaults on Finish with only null entries
> 
>
> Key: ARROW-4367
> URL: https://issues.apache.org/jira/browse/ARROW-4367
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0, 0.12.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Sadly a regression from 0.11, detected during the turbodbc integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4374) [C++] DictionaryBuilder does not correctly report length and null_count

2019-02-19 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-4374:
---
Fix Version/s: 0.12.1

> [C++] DictionaryBuilder does not correctly report length and null_count
> ---
>
> Key: ARROW-4374
> URL: https://issues.apache.org/jira/browse/ARROW-4374
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0, 0.12.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In comparison to most builders, they stay constantly at 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4255) [C++] Schema::GetFieldIndex is not thread-safe

2019-02-19 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-4255:
---
Fix Version/s: 0.12.1

> [C++] Schema::GetFieldIndex is not thread-safe
> --
>
> Key: ARROW-4255
> URL: https://issues.apache.org/jira/browse/ARROW-4255
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0, 0.12.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> See discussion on mailing list 
> https://lists.apache.org/thread.html/9a3ba43b60e0f2e706840595d3a610989ddac7017201cadda7c553d6@%3Cdev.arrow.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4582) [C++/Python] Memory corruption on Pandas->Arrow conversion

2019-02-19 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-4582:
---
Fix Version/s: 0.12.1

> [C++/Python] Memory corruption on Pandas->Arrow conversion
> --
>
> Key: ARROW-4582
> URL: https://issues.apache.org/jira/browse/ARROW-4582
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.11.0, 0.11.1, 0.12.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0, 0.12.1
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When converting DataFrames with numerical columns to Arrow tables we were 
> seeing random segfaults in core Python code. This only happened in 
> environments where we had a high level of parallelisation or slow code 
> execution (e.g. in AddressSanitizer builds).
> The reason for these segfaults was that we were incrementing the reference 
> count of the underlying NumPy buffer but were not holding the GIL while 
> changing the reference count.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4298) [Java] Building Flight fails with OpenJDK 11

2019-02-19 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-4298:
---
Fix Version/s: 0.12.1

> [Java] Building Flight fails with OpenJDK 11
> 
>
> Key: ARROW-4298
> URL: https://issues.apache.org/jira/browse/ARROW-4298
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: FlightRPC, Java
>Affects Versions: 0.12.0
>Reporter: Uwe L. Korn
>Assignee: Laurent Goujon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0, 0.12.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Building flight fails with
> {code:java}
> [INFO] --- maven-compiler-plugin:3.6.2:compile (default-compile) @ 
> arrow-flight ---
> [INFO] Compiling 39 source files to 
> /Users/uwe/Development/arrow-repos-1/arrow/java/flight/target/classes
> [INFO] -
> [ERROR] COMPILATION ERROR :
> [INFO] -
> [ERROR] 
> /Users/uwe/Development/arrow-repos-1/arrow/java/flight/target/generated-sources/protobuf/org/apache/arrow/flight/impl/FlightServiceGrpc.java:[26,17]
>  error: cannot find symbol
> symbol: class Generated
> location: package javax.annotation
> [INFO] 1 error{code}
> To fix this, I added the following dependency to {{flight/pom.xml}}:
> {code:java}
>  
>javax.annotation
>javax.annotation-api
>1.3.2
>  {code}
> This then passed the compile step but failed later with:
> {code:java}
> [INFO] --- maven-dependency-plugin:3.0.1:analyze-only (analyze) @ 
> arrow-flight ---
> [WARNING] Unused declared dependencies found:
> [WARNING] javax.annotation:javax.annotation-api:jar:1.3.2:compile
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-dependency-plugin:3.0.1:analyze-only (analyze) 
> on project arrow-flight: Dependency problems found -> [Help 1]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4611) [C++] Rework CMake third-party logic

2019-02-19 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771771#comment-16771771
 ] 

Kouhei Sutou commented on ARROW-4611:
-

I've commented to it.
I like this approach.

> [C++] Rework CMake third-party logic
> 
>
> Key: ARROW-4611
> URL: https://issues.apache.org/jira/browse/ARROW-4611
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> Instead of the current approach we are taking with the {{*_HOME}} variables, 
> we should use more CMake features and also give users of Arrow a more 
> high-level control. This is going to be a rather lengthy issue with a lot of 
> subtasks.
>  * Let the user decide on the top-level how dependencies should be handled. 
> At the moment I can think of the following modes:
>  ** AUTO: Guess the packaging system we're running in, use this where 
> possible, otherwise build the dependencies through the {{ExternalProject}} 
> logic.
>  ** BUNDLED: Don't use any dependencies, build them all through 
> {{ExternalProject}}
>  ** SYSTEM: Use CMake's {{find_package}} and {{find_library}} without any 
> custom paths. If packages are on non-default locations, let the user indicate 
> it from the outside using the {{*_ROOT}} variables.
>  ** CONDA: Same as SYSTEM but set all {{*_ROOT}} variables to 
> {{ENV\{CONDA_PREFIX\}}}.
>  ** BREW: This uses SYSTEM but asks {{brew}} for some dependencies for their 
> installation prefix.
>  * prefer dynamic linkage where possible
>  * Use {{pkg-config}} and {{*Targets.cmake}} files in projects that publish 
> these
>  * Ensure that the necessary integration tests are in place (Fedora, Debian, 
> Ubuntu, Alpine)
>  * Integration tests that Arrow's {{*Targets.cmake}} and {{arrow.pc}} work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4608) cmake script assumes that double-conversion installs static libs, when it can install shared libs

2019-02-19 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-4608:
--

Assignee: Uwe L. Korn

> cmake script assumes that double-conversion installs static libs, when it can 
> install shared libs 
> --
>
> Key: ARROW-4608
> URL: https://issues.apache.org/jira/browse/ARROW-4608
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Yuri
>Assignee: Uwe L. Korn
>Priority: Major
>
> This line: 
> [https://github.com/apache/arrow/blob/master/cpp/cmake_modules/ThirdpartyToolchain.cmake#L580]
> The {{double-conversion}} project can alternatively build shared libraries, 
> when {{BUILD_SHARED_LIBS=ON}} is used.
> You should only use libraries that {{double-conversion}} cmake script 
> provides, which is in the {{double-conversion_LIBRARIES}} variable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4608) cmake script assumes that double-conversion installs static libs, when it can install shared libs

2019-02-19 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-4608:
---
Component/s: Packaging
 C++

> cmake script assumes that double-conversion installs static libs, when it can 
> install shared libs 
> --
>
> Key: ARROW-4608
> URL: https://issues.apache.org/jira/browse/ARROW-4608
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Packaging
>Reporter: Yuri
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> This line: 
> [https://github.com/apache/arrow/blob/master/cpp/cmake_modules/ThirdpartyToolchain.cmake#L580]
> The {{double-conversion}} project can alternatively build shared libraries, 
> when {{BUILD_SHARED_LIBS=ON}} is used.
> You should only use libraries that {{double-conversion}} cmake script 
> provides, which is in the {{double-conversion_LIBRARIES}} variable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4608) [C++] cmake script assumes that double-conversion installs static libs

2019-02-19 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-4608:
---
Summary: [C++] cmake script assumes that double-conversion installs static 
libs  (was: cmake script assumes that double-conversion installs static libs, 
when it can install shared libs )

> [C++] cmake script assumes that double-conversion installs static libs
> --
>
> Key: ARROW-4608
> URL: https://issues.apache.org/jira/browse/ARROW-4608
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Packaging
>Reporter: Yuri
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> This line: 
> [https://github.com/apache/arrow/blob/master/cpp/cmake_modules/ThirdpartyToolchain.cmake#L580]
> The {{double-conversion}} project can alternatively build shared libraries, 
> when {{BUILD_SHARED_LIBS=ON}} is used.
> You should only use libraries that {{double-conversion}} cmake script 
> provides, which is in the {{double-conversion_LIBRARIES}} variable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4608) cmake script assumes that double-conversion installs static libs, when it can install shared libs

2019-02-19 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-4608:
---
Fix Version/s: 0.13.0

> cmake script assumes that double-conversion installs static libs, when it can 
> install shared libs 
> --
>
> Key: ARROW-4608
> URL: https://issues.apache.org/jira/browse/ARROW-4608
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Yuri
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> This line: 
> [https://github.com/apache/arrow/blob/master/cpp/cmake_modules/ThirdpartyToolchain.cmake#L580]
> The {{double-conversion}} project can alternatively build shared libraries, 
> when {{BUILD_SHARED_LIBS=ON}} is used.
> You should only use libraries that {{double-conversion}} cmake script 
> provides, which is in the {{double-conversion_LIBRARIES}} variable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4142) [Java] JDBC-to-Arrow: JDBC Arrays

2019-02-19 Thread Pindikura Ravindra (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pindikura Ravindra reassigned ARROW-4142:
-

Assignee: Michael Pigott

> [Java] JDBC-to-Arrow: JDBC Arrays
> -
>
> Key: ARROW-4142
> URL: https://issues.apache.org/jira/browse/ARROW-4142
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Michael Pigott
>Assignee: Michael Pigott
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 17h 20m
>  Remaining Estimate: 0h
>
> The JDBC Adapter does not support JDBC Arrays.
> JDBC Arrays can be walked using an internal ResultSet object, but its 
> ResultSetMetaData may not contain the proper type information.  The sub-type 
> information can be stored in the JdbcToArrowConfig object, because there may 
> not be any other way to get it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-4142) [Java] JDBC-to-Arrow: JDBC Arrays

2019-02-19 Thread Pindikura Ravindra (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pindikura Ravindra closed ARROW-4142.
-
   Resolution: Fixed
Fix Version/s: 0.13.0

> [Java] JDBC-to-Arrow: JDBC Arrays
> -
>
> Key: ARROW-4142
> URL: https://issues.apache.org/jira/browse/ARROW-4142
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Michael Pigott
>Assignee: Michael Pigott
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 17h 20m
>  Remaining Estimate: 0h
>
> The JDBC Adapter does not support JDBC Arrays.
> JDBC Arrays can be walked using an internal ResultSet object, but its 
> ResultSetMetaData may not contain the proper type information.  The sub-type 
> information can be stored in the JdbcToArrowConfig object, because there may 
> not be any other way to get it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4623) [R] update Rcpp dependency

2019-02-19 Thread JIRA
Romain François created ARROW-4623:
--

 Summary: [R] update Rcpp dependency
 Key: ARROW-4623
 URL: https://issues.apache.org/jira/browse/ARROW-4623
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Romain François
Assignee: Romain François


Rcpp no longer need to be a remote



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4624) [C++] Linker errors when building benchmarks

2019-02-19 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-4624:
-

 Summary: [C++] Linker errors when building benchmarks
 Key: ARROW-4624
 URL: https://issues.apache.org/jira/browse/ARROW-4624
 Project: Apache Arrow
  Issue Type: Bug
  Components: Benchmarking, C++
Reporter: Antoine Pitrou


All C++ benchmarks now fail linking here:
{code}
[10/162] Linking CXX executable release/arrow-io-file-benchmark
FAILED: release/arrow-io-file-benchmark 
: && /usr/bin/ccache /usr/bin/g++-7  -Wno-noexcept-type  -O3 -DNDEBUG  -Wall 
-msse4.2 -fdiagnostics-color=always -Wextra -Wunused-result 
-Wno-unused-parameter -Wno-implicit-fallthrough -Wconversion 
-D_GLIBCXX_USE_CXX11_ABI=1 -fno-omit-frame-pointer -g -O3 -DNDEBUG  -rdynamic 
src/arrow/io/CMakeFiles/arrow-io-file-benchmark.dir/file-benchmark.cc.o  -o 
release/arrow-io-file-benchmark  release/libarrow_benchmark_main.a 
gbenchmark_ep/src/gbenchmark_ep-install/lib/libbenchmark.a -lpthread && :
src/arrow/io/CMakeFiles/arrow-io-file-benchmark.dir/file-benchmark.cc.o: In 
function `arrow::BenchmarkStreamingWrites(benchmark::State&, 
std::valarray, arrow::io::OutputStream*, arrow::BackgroundReader*)':
/home/antoine/arrow/cpp/build/../src/arrow/io/file-benchmark.cc:139: undefined 
reference to `arrow::Status::ToString[abi:cxx11]() const'
/home/antoine/arrow/cpp/build/../src/arrow/io/file-benchmark.cc:63: undefined 
reference to `arrow::internal::FileWrite(int, unsigned char const*, long)'

[ snip tons of similar errors ]
{code}

My build script:
{code}
ARROW_CXXFLAGS="$ARROW_CXXFLAGS -fno-omit-frame-pointer -g"

mkdir -p build
pushd build

cmake .. -GNinja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
-DCMAKE_INSTALL_MESSAGE=LAZY \
-DARROW_CXXFLAGS="$ARROW_CXXFLAGS" \
-DARROW_BUILD_TESTS=off \
-DARROW_BUILD_BENCHMARKS=on \
-DARROW_CUDA=on \
-DARROW_FLIGHT=on \
-DARROW_PARQUET=on \
-DARROW_PLASMA=off \
-DARROW_PYTHON=on \
-DARROW_USE_GLOG=off \

nice cmake --build . --target install

popd
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4623) [R] update Rcpp dependency

2019-02-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4623:
--
Labels: pull-request-available  (was: )

> [R] update Rcpp dependency
> --
>
> Key: ARROW-4623
> URL: https://issues.apache.org/jira/browse/ARROW-4623
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Romain François
>Assignee: Romain François
>Priority: Minor
>  Labels: pull-request-available
>
> Rcpp no longer need to be a remote



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4624) [C++] Linker errors when building benchmarks

2019-02-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4624:
--
Labels: pull-request-available  (was: )

> [C++] Linker errors when building benchmarks
> 
>
> Key: ARROW-4624
> URL: https://issues.apache.org/jira/browse/ARROW-4624
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Critical
>  Labels: pull-request-available
>
> All C++ benchmarks now fail linking here:
> {code}
> [10/162] Linking CXX executable release/arrow-io-file-benchmark
> FAILED: release/arrow-io-file-benchmark 
> : && /usr/bin/ccache /usr/bin/g++-7  -Wno-noexcept-type  -O3 -DNDEBUG  -Wall 
> -msse4.2 -fdiagnostics-color=always -Wextra -Wunused-result 
> -Wno-unused-parameter -Wno-implicit-fallthrough -Wconversion 
> -D_GLIBCXX_USE_CXX11_ABI=1 -fno-omit-frame-pointer -g -O3 -DNDEBUG  -rdynamic 
> src/arrow/io/CMakeFiles/arrow-io-file-benchmark.dir/file-benchmark.cc.o  -o 
> release/arrow-io-file-benchmark  release/libarrow_benchmark_main.a 
> gbenchmark_ep/src/gbenchmark_ep-install/lib/libbenchmark.a -lpthread && :
> src/arrow/io/CMakeFiles/arrow-io-file-benchmark.dir/file-benchmark.cc.o: In 
> function `arrow::BenchmarkStreamingWrites(benchmark::State&, 
> std::valarray, arrow::io::OutputStream*, arrow::BackgroundReader*)':
> /home/antoine/arrow/cpp/build/../src/arrow/io/file-benchmark.cc:139: 
> undefined reference to `arrow::Status::ToString[abi:cxx11]() const'
> /home/antoine/arrow/cpp/build/../src/arrow/io/file-benchmark.cc:63: undefined 
> reference to `arrow::internal::FileWrite(int, unsigned char const*, long)'
> [ snip tons of similar errors ]
> {code}
> My build script:
> {code}
> ARROW_CXXFLAGS="$ARROW_CXXFLAGS -fno-omit-frame-pointer -g"
> mkdir -p build
> pushd build
> cmake .. -GNinja \
> -DCMAKE_BUILD_TYPE=Release \
> -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
> -DCMAKE_INSTALL_MESSAGE=LAZY \
> -DARROW_CXXFLAGS="$ARROW_CXXFLAGS" \
> -DARROW_BUILD_TESTS=off \
> -DARROW_BUILD_BENCHMARKS=on \
> -DARROW_CUDA=on \
> -DARROW_FLIGHT=on \
> -DARROW_PARQUET=on \
> -DARROW_PLASMA=off \
> -DARROW_PYTHON=on \
> -DARROW_USE_GLOG=off \
> nice cmake --build . --target install
> popd
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4583) [Plasma] There are bugs reported by code scan tool

2019-02-19 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-4583.
---
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3656
[https://github.com/apache/arrow/pull/3656]

> [Plasma] There are bugs reported by code scan tool
> --
>
> Key: ARROW-4583
> URL: https://issues.apache.org/jira/browse/ARROW-4583
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, C++ - Plasma, Java
>Reporter: Yuhong Guo
>Assignee: Yuhong Guo
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4624) [C++] Linker errors when building benchmarks

2019-02-19 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-4624:
-

Assignee: Antoine Pitrou

> [C++] Linker errors when building benchmarks
> 
>
> Key: ARROW-4624
> URL: https://issues.apache.org/jira/browse/ARROW-4624
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Critical
>
> All C++ benchmarks now fail linking here:
> {code}
> [10/162] Linking CXX executable release/arrow-io-file-benchmark
> FAILED: release/arrow-io-file-benchmark 
> : && /usr/bin/ccache /usr/bin/g++-7  -Wno-noexcept-type  -O3 -DNDEBUG  -Wall 
> -msse4.2 -fdiagnostics-color=always -Wextra -Wunused-result 
> -Wno-unused-parameter -Wno-implicit-fallthrough -Wconversion 
> -D_GLIBCXX_USE_CXX11_ABI=1 -fno-omit-frame-pointer -g -O3 -DNDEBUG  -rdynamic 
> src/arrow/io/CMakeFiles/arrow-io-file-benchmark.dir/file-benchmark.cc.o  -o 
> release/arrow-io-file-benchmark  release/libarrow_benchmark_main.a 
> gbenchmark_ep/src/gbenchmark_ep-install/lib/libbenchmark.a -lpthread && :
> src/arrow/io/CMakeFiles/arrow-io-file-benchmark.dir/file-benchmark.cc.o: In 
> function `arrow::BenchmarkStreamingWrites(benchmark::State&, 
> std::valarray, arrow::io::OutputStream*, arrow::BackgroundReader*)':
> /home/antoine/arrow/cpp/build/../src/arrow/io/file-benchmark.cc:139: 
> undefined reference to `arrow::Status::ToString[abi:cxx11]() const'
> /home/antoine/arrow/cpp/build/../src/arrow/io/file-benchmark.cc:63: undefined 
> reference to `arrow::internal::FileWrite(int, unsigned char const*, long)'
> [ snip tons of similar errors ]
> {code}
> My build script:
> {code}
> ARROW_CXXFLAGS="$ARROW_CXXFLAGS -fno-omit-frame-pointer -g"
> mkdir -p build
> pushd build
> cmake .. -GNinja \
> -DCMAKE_BUILD_TYPE=Release \
> -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
> -DCMAKE_INSTALL_MESSAGE=LAZY \
> -DARROW_CXXFLAGS="$ARROW_CXXFLAGS" \
> -DARROW_BUILD_TESTS=off \
> -DARROW_BUILD_BENCHMARKS=on \
> -DARROW_CUDA=on \
> -DARROW_FLIGHT=on \
> -DARROW_PARQUET=on \
> -DARROW_PLASMA=off \
> -DARROW_PYTHON=on \
> -DARROW_USE_GLOG=off \
> nice cmake --build . --target install
> popd
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4624) [C++] Linker errors when building benchmarks

2019-02-19 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771814#comment-16771814
 ] 

Antoine Pitrou commented on ARROW-4624:
---

Bisecting points to 240c46959ac631d646d98ac57aaa719a63a9dc97 aka ARROW-4265.

> [C++] Linker errors when building benchmarks
> 
>
> Key: ARROW-4624
> URL: https://issues.apache.org/jira/browse/ARROW-4624
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++
>Reporter: Antoine Pitrou
>Priority: Critical
>
> All C++ benchmarks now fail linking here:
> {code}
> [10/162] Linking CXX executable release/arrow-io-file-benchmark
> FAILED: release/arrow-io-file-benchmark 
> : && /usr/bin/ccache /usr/bin/g++-7  -Wno-noexcept-type  -O3 -DNDEBUG  -Wall 
> -msse4.2 -fdiagnostics-color=always -Wextra -Wunused-result 
> -Wno-unused-parameter -Wno-implicit-fallthrough -Wconversion 
> -D_GLIBCXX_USE_CXX11_ABI=1 -fno-omit-frame-pointer -g -O3 -DNDEBUG  -rdynamic 
> src/arrow/io/CMakeFiles/arrow-io-file-benchmark.dir/file-benchmark.cc.o  -o 
> release/arrow-io-file-benchmark  release/libarrow_benchmark_main.a 
> gbenchmark_ep/src/gbenchmark_ep-install/lib/libbenchmark.a -lpthread && :
> src/arrow/io/CMakeFiles/arrow-io-file-benchmark.dir/file-benchmark.cc.o: In 
> function `arrow::BenchmarkStreamingWrites(benchmark::State&, 
> std::valarray, arrow::io::OutputStream*, arrow::BackgroundReader*)':
> /home/antoine/arrow/cpp/build/../src/arrow/io/file-benchmark.cc:139: 
> undefined reference to `arrow::Status::ToString[abi:cxx11]() const'
> /home/antoine/arrow/cpp/build/../src/arrow/io/file-benchmark.cc:63: undefined 
> reference to `arrow::internal::FileWrite(int, unsigned char const*, long)'
> [ snip tons of similar errors ]
> {code}
> My build script:
> {code}
> ARROW_CXXFLAGS="$ARROW_CXXFLAGS -fno-omit-frame-pointer -g"
> mkdir -p build
> pushd build
> cmake .. -GNinja \
> -DCMAKE_BUILD_TYPE=Release \
> -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
> -DCMAKE_INSTALL_MESSAGE=LAZY \
> -DARROW_CXXFLAGS="$ARROW_CXXFLAGS" \
> -DARROW_BUILD_TESTS=off \
> -DARROW_BUILD_BENCHMARKS=on \
> -DARROW_CUDA=on \
> -DARROW_FLIGHT=on \
> -DARROW_PARQUET=on \
> -DARROW_PLASMA=off \
> -DARROW_PYTHON=on \
> -DARROW_USE_GLOG=off \
> nice cmake --build . --target install
> popd
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4486) [Python][CUDA] pyarrow.cuda.Context.foreign_buffer should have a `base=None` argument

2019-02-19 Thread Pearu Peterson (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771901#comment-16771901
 ] 

Pearu Peterson commented on ARROW-4486:
---

Sure.

> [Python][CUDA] pyarrow.cuda.Context.foreign_buffer should have a `base=None` 
> argument
> -
>
> Key: ARROW-4486
> URL: https://issues.apache.org/jira/browse/ARROW-4486
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Pearu Peterson
>Priority: Major
> Fix For: 0.13.0
>
>
> Similar to `pyarrow.foreign_buffer`, we need to keep the owner of cuda memory 
> alive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4625) [Flight] Wrap server busy-wait methods

2019-02-19 Thread David Li (JIRA)
David Li created ARROW-4625:
---

 Summary: [Flight] Wrap server busy-wait methods
 Key: ARROW-4625
 URL: https://issues.apache.org/jira/browse/ARROW-4625
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Reporter: David Li
Assignee: David Li


Right now in Java, you must manually busy-wait in a loop as the gRPC server's 
awaitTermination method isn't exposed. Conversely, in C++, you have no choice 
but to busy-wait as starting the server calls awaitTermination for you. Either 
Java should also wait on the server, or both Java and C++ should expose an 
explicit operation to wait on the server.

I would prefer the latter as then the Python bindings could choose to manually 
busy-wait, which would let Ctrl-C work as normal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4624) [C++] Linker errors when building benchmarks

2019-02-19 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-4624.

   Resolution: Fixed
Fix Version/s: 0.12.1

Issue resolved by pull request 3701
[https://github.com/apache/arrow/pull/3701]

> [C++] Linker errors when building benchmarks
> 
>
> Key: ARROW-4624
> URL: https://issues.apache.org/jira/browse/ARROW-4624
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> All C++ benchmarks now fail linking here:
> {code}
> [10/162] Linking CXX executable release/arrow-io-file-benchmark
> FAILED: release/arrow-io-file-benchmark 
> : && /usr/bin/ccache /usr/bin/g++-7  -Wno-noexcept-type  -O3 -DNDEBUG  -Wall 
> -msse4.2 -fdiagnostics-color=always -Wextra -Wunused-result 
> -Wno-unused-parameter -Wno-implicit-fallthrough -Wconversion 
> -D_GLIBCXX_USE_CXX11_ABI=1 -fno-omit-frame-pointer -g -O3 -DNDEBUG  -rdynamic 
> src/arrow/io/CMakeFiles/arrow-io-file-benchmark.dir/file-benchmark.cc.o  -o 
> release/arrow-io-file-benchmark  release/libarrow_benchmark_main.a 
> gbenchmark_ep/src/gbenchmark_ep-install/lib/libbenchmark.a -lpthread && :
> src/arrow/io/CMakeFiles/arrow-io-file-benchmark.dir/file-benchmark.cc.o: In 
> function `arrow::BenchmarkStreamingWrites(benchmark::State&, 
> std::valarray, arrow::io::OutputStream*, arrow::BackgroundReader*)':
> /home/antoine/arrow/cpp/build/../src/arrow/io/file-benchmark.cc:139: 
> undefined reference to `arrow::Status::ToString[abi:cxx11]() const'
> /home/antoine/arrow/cpp/build/../src/arrow/io/file-benchmark.cc:63: undefined 
> reference to `arrow::internal::FileWrite(int, unsigned char const*, long)'
> [ snip tons of similar errors ]
> {code}
> My build script:
> {code}
> ARROW_CXXFLAGS="$ARROW_CXXFLAGS -fno-omit-frame-pointer -g"
> mkdir -p build
> pushd build
> cmake .. -GNinja \
> -DCMAKE_BUILD_TYPE=Release \
> -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
> -DCMAKE_INSTALL_MESSAGE=LAZY \
> -DARROW_CXXFLAGS="$ARROW_CXXFLAGS" \
> -DARROW_BUILD_TESTS=off \
> -DARROW_BUILD_BENCHMARKS=on \
> -DARROW_CUDA=on \
> -DARROW_FLIGHT=on \
> -DARROW_PARQUET=on \
> -DARROW_PLASMA=off \
> -DARROW_PYTHON=on \
> -DARROW_USE_GLOG=off \
> nice cmake --build . --target install
> popd
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4624) [C++] Linker errors when building benchmarks

2019-02-19 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-4624:
---
Fix Version/s: 0.13.0

> [C++] Linker errors when building benchmarks
> 
>
> Key: ARROW-4624
> URL: https://issues.apache.org/jira/browse/ARROW-4624
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.13.0, 0.12.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> All C++ benchmarks now fail linking here:
> {code}
> [10/162] Linking CXX executable release/arrow-io-file-benchmark
> FAILED: release/arrow-io-file-benchmark 
> : && /usr/bin/ccache /usr/bin/g++-7  -Wno-noexcept-type  -O3 -DNDEBUG  -Wall 
> -msse4.2 -fdiagnostics-color=always -Wextra -Wunused-result 
> -Wno-unused-parameter -Wno-implicit-fallthrough -Wconversion 
> -D_GLIBCXX_USE_CXX11_ABI=1 -fno-omit-frame-pointer -g -O3 -DNDEBUG  -rdynamic 
> src/arrow/io/CMakeFiles/arrow-io-file-benchmark.dir/file-benchmark.cc.o  -o 
> release/arrow-io-file-benchmark  release/libarrow_benchmark_main.a 
> gbenchmark_ep/src/gbenchmark_ep-install/lib/libbenchmark.a -lpthread && :
> src/arrow/io/CMakeFiles/arrow-io-file-benchmark.dir/file-benchmark.cc.o: In 
> function `arrow::BenchmarkStreamingWrites(benchmark::State&, 
> std::valarray, arrow::io::OutputStream*, arrow::BackgroundReader*)':
> /home/antoine/arrow/cpp/build/../src/arrow/io/file-benchmark.cc:139: 
> undefined reference to `arrow::Status::ToString[abi:cxx11]() const'
> /home/antoine/arrow/cpp/build/../src/arrow/io/file-benchmark.cc:63: undefined 
> reference to `arrow::internal::FileWrite(int, unsigned char const*, long)'
> [ snip tons of similar errors ]
> {code}
> My build script:
> {code}
> ARROW_CXXFLAGS="$ARROW_CXXFLAGS -fno-omit-frame-pointer -g"
> mkdir -p build
> pushd build
> cmake .. -GNinja \
> -DCMAKE_BUILD_TYPE=Release \
> -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
> -DCMAKE_INSTALL_MESSAGE=LAZY \
> -DARROW_CXXFLAGS="$ARROW_CXXFLAGS" \
> -DARROW_BUILD_TESTS=off \
> -DARROW_BUILD_BENCHMARKS=on \
> -DARROW_CUDA=on \
> -DARROW_FLIGHT=on \
> -DARROW_PARQUET=on \
> -DARROW_PLASMA=off \
> -DARROW_PYTHON=on \
> -DARROW_USE_GLOG=off \
> nice cmake --build . --target install
> popd
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4347) [Python] Run Python Travis CI unit tests on Linux when Java codebase changed

2019-02-19 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-4347.

Resolution: Fixed

Issue resolved by pull request 3699
[https://github.com/apache/arrow/pull/3699]

> [Python] Run Python Travis CI unit tests on Linux when Java codebase changed
> 
>
> Key: ARROW-4347
> URL: https://issues.apache.org/jira/browse/ARROW-4347
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Java library is also dependency of the Python tests, but the tests aren't 
> triggered if there is a change to the {{java/}} subtree. This blind spot was 
> introduced when the CI jobs were split apart
> https://github.com/apache/arrow/blob/master/.travis.yml#L133



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4624) [C++] Linker errors when building benchmarks

2019-02-19 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-4624:
---
Fix Version/s: (was: 0.12.1)

> [C++] Linker errors when building benchmarks
> 
>
> Key: ARROW-4624
> URL: https://issues.apache.org/jira/browse/ARROW-4624
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Benchmarking, C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> All C++ benchmarks now fail linking here:
> {code}
> [10/162] Linking CXX executable release/arrow-io-file-benchmark
> FAILED: release/arrow-io-file-benchmark 
> : && /usr/bin/ccache /usr/bin/g++-7  -Wno-noexcept-type  -O3 -DNDEBUG  -Wall 
> -msse4.2 -fdiagnostics-color=always -Wextra -Wunused-result 
> -Wno-unused-parameter -Wno-implicit-fallthrough -Wconversion 
> -D_GLIBCXX_USE_CXX11_ABI=1 -fno-omit-frame-pointer -g -O3 -DNDEBUG  -rdynamic 
> src/arrow/io/CMakeFiles/arrow-io-file-benchmark.dir/file-benchmark.cc.o  -o 
> release/arrow-io-file-benchmark  release/libarrow_benchmark_main.a 
> gbenchmark_ep/src/gbenchmark_ep-install/lib/libbenchmark.a -lpthread && :
> src/arrow/io/CMakeFiles/arrow-io-file-benchmark.dir/file-benchmark.cc.o: In 
> function `arrow::BenchmarkStreamingWrites(benchmark::State&, 
> std::valarray, arrow::io::OutputStream*, arrow::BackgroundReader*)':
> /home/antoine/arrow/cpp/build/../src/arrow/io/file-benchmark.cc:139: 
> undefined reference to `arrow::Status::ToString[abi:cxx11]() const'
> /home/antoine/arrow/cpp/build/../src/arrow/io/file-benchmark.cc:63: undefined 
> reference to `arrow::internal::FileWrite(int, unsigned char const*, long)'
> [ snip tons of similar errors ]
> {code}
> My build script:
> {code}
> ARROW_CXXFLAGS="$ARROW_CXXFLAGS -fno-omit-frame-pointer -g"
> mkdir -p build
> pushd build
> cmake .. -GNinja \
> -DCMAKE_BUILD_TYPE=Release \
> -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
> -DCMAKE_INSTALL_MESSAGE=LAZY \
> -DARROW_CXXFLAGS="$ARROW_CXXFLAGS" \
> -DARROW_BUILD_TESTS=off \
> -DARROW_BUILD_BENCHMARKS=on \
> -DARROW_CUDA=on \
> -DARROW_FLIGHT=on \
> -DARROW_PARQUET=on \
> -DARROW_PLASMA=off \
> -DARROW_PYTHON=on \
> -DARROW_USE_GLOG=off \
> nice cmake --build . --target install
> popd
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4626) [Flight] Add application metadata field to DoGet

2019-02-19 Thread David Li (JIRA)
David Li created ARROW-4626:
---

 Summary: [Flight] Add application metadata field to DoGet
 Key: ARROW-4626
 URL: https://issues.apache.org/jira/browse/ARROW-4626
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Reporter: David Li


As [proposed on the mailing 
list|https://lists.apache.org/thread.html/c550264cd60e000d77e10d9d7ac81ea8c49efc37ad447177fa8ee4ee@%3Cdev.arrow.apache.org%3E],
 we should add a field for application-specific metadata in DoGet payloads and 
expose this in the APIs. The current APIs are rather RecordBatch-oriented, 
though.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4627) [Flight] Add application metadata field to DoPut

2019-02-19 Thread David Li (JIRA)
David Li created ARROW-4627:
---

 Summary: [Flight] Add application metadata field to DoPut
 Key: ARROW-4627
 URL: https://issues.apache.org/jira/browse/ARROW-4627
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC
Reporter: David Li


As [proposed on the mailing 
list|https://lists.apache.org/thread.html/c550264cd60e000d77e10d9d7ac81ea8c49efc37ad447177fa8ee4ee@%3Cdev.arrow.apache.org%3E],
 we should add a field for application-specific metadata in DoPut payloads and 
expose this in the APIs. This also requires changing the client-streaming call 
into a bidirectional streaming call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4623) [R] update Rcpp dependency

2019-02-19 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-4623.

   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3700
[https://github.com/apache/arrow/pull/3700]

> [R] update Rcpp dependency
> --
>
> Key: ARROW-4623
> URL: https://issues.apache.org/jira/browse/ARROW-4623
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Romain François
>Assignee: Romain François
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Rcpp no longer need to be a remote



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3200) [C++] Add support for reading Flight streams with dictionaries

2019-02-19 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3200:
--
Component/s: FlightRPC

> [C++] Add support for reading Flight streams with dictionaries
> --
>
> Key: ARROW-3200
> URL: https://issues.apache.org/jira/browse/ARROW-3200
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Priority: Major
>  Labels: flight
> Fix For: 0.13.0
>
>
> Some work is needed to handle schemas sent separately from their 
> dictionaries, i.e. ARROW-3144. I'm going to punt on implementing support for 
> this in the initial C++ Flight client



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3150) [Python] Ship Flight-enabled Python wheels

2019-02-19 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3150:
--
Component/s: FlightRPC

> [Python] Ship Flight-enabled Python wheels
> --
>
> Key: ARROW-3150
> URL: https://issues.apache.org/jira/browse/ARROW-3150
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: flight
> Fix For: 0.13.0
>
>
> This may involve statically-linking (or bundling where shared libs makes 
> sense) the various required dependencies with {{libarrow_flight.so}} in the 
> manylinux1 wheel build



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4562) [C++][Flight] Create outgoing composite grpc::ByteBuffer instead of allocating contiguous slice and copying IpcPayload into it

2019-02-19 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4562:
--
Component/s: FlightRPC

> [C++][Flight] Create outgoing composite grpc::ByteBuffer instead of 
> allocating contiguous slice and copying IpcPayload into it
> --
>
> Key: ARROW-4562
> URL: https://issues.apache.org/jira/browse/ARROW-4562
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> See discussion in https://github.com/apache/arrow/pull/3633



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4556) [Rust] Preserve order of JSON inferred schema

2019-02-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4556:
--
Labels: pull-request-available  (was: )

> [Rust] Preserve order of JSON inferred schema
> -
>
> Key: ARROW-4556
> URL: https://issues.apache.org/jira/browse/ARROW-4556
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Minor
>  Labels: pull-request-available
>
> serde_json has the ability to preserve order of JSON records read. This 
> feature might be necessary to ensure that schema inference returns a 
> consistent order of fields each time.
> I'd like to add it separately as I'd also need to update JSON tests in 
> datatypes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3201) [C++] Utilize zero-copy protobuf parsing from upstream whenever it becomes available

2019-02-19 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3201:
--
Component/s: FlightRPC

> [C++] Utilize zero-copy protobuf parsing from upstream whenever it becomes 
> available
> 
>
> Key: ARROW-3201
> URL: https://issues.apache.org/jira/browse/ARROW-3201
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Priority: Major
>  Labels: flight
>
> This has been discussed for a couple of years now; perhaps with Abseil this 
> could happen at some point:
> https://github.com/protocolbuffers/protobuf/issues/1896
> Using zero-copy proto parsing (which is standard practice inside Google, but 
> not available in open source protocol buffers) would obviate the need for the 
> zero-copy workaround that I'm going to implement for C++ Flight RPCs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3330) [C++] Spawn multiple Flight performance servers in flight-benchmark to test parallel get performance

2019-02-19 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3330:
--
Component/s: FlightRPC

> [C++] Spawn multiple Flight performance servers in flight-benchmark to test 
> parallel get performance
> 
>
> Key: ARROW-3330
> URL: https://issues.apache.org/jira/browse/ARROW-3330
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> Currently, the benchmark spawns a single server 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3294) [C++] Test Flight RPC on Windows / Appveyor

2019-02-19 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3294:
--
Component/s: FlightRPC

> [C++] Test Flight RPC on Windows / Appveyor
> ---
>
> Key: ARROW-3294
> URL: https://issues.apache.org/jira/browse/ARROW-3294
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Priority: Major
>  Labels: flight
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3162) [Python] Enable Flight servers to be implemented in pure Python

2019-02-19 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3162:
--
Component/s: FlightRPC

> [Python] Enable Flight servers to be implemented in pure Python
> ---
>
> Key: ARROW-3162
> URL: https://issues.apache.org/jira/browse/ARROW-3162
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: FlightRPC, Python
>Reporter: Wes McKinney
>Assignee: David Li
>Priority: Major
>  Labels: flight, pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> While it will be straightforward to offer a Flight client to Python users, 
> enabling _servers_ to be written _in Python_ will require a glue class to 
> invoke methods on a provided server implementation, coercing to and from 
> various Python objects and Arrow wrapper classes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4556) [Rust] Preserve order of JSON inferred schema

2019-02-19 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-4556.

   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3702
[https://github.com/apache/arrow/pull/3702]

> [Rust] Preserve order of JSON inferred schema
> -
>
> Key: ARROW-4556
> URL: https://issues.apache.org/jira/browse/ARROW-4556
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> serde_json has the ability to preserve order of JSON records read. This 
> feature might be necessary to ensure that schema inference returns a 
> consistent order of fields each time.
> I'd like to add it separately as I'd also need to update JSON tests in 
> datatypes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4421) [Flight][C++] Handle large Flight data messages

2019-02-19 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4421:
--
Component/s: FlightRPC

> [Flight][C++] Handle large Flight data messages
> ---
>
> Key: ARROW-4421
> URL: https://issues.apache.org/jira/browse/ARROW-4421
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> I believe the message payloads are currently limited to 4MB by default, see 
> one developer's discussion here:
> https://nanxiao.me/en/message-length-setting-in-grpc/
> While it is a good idea to break large messages into smaller ones, we will 
> need to address how to gracefully send larger payloads that may be provided 
> by a user's server implementation. Either we can increase the limit or break 
> up the record batches into smaller chunks in the Flight server base (or both, 
> of course)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3058) [Python] Feather reads fail with unintuitive error when conversion from pandas yields ChunkedArray

2019-02-19 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772000#comment-16772000
 ] 

Wes McKinney commented on ARROW-3058:
-

[~georg.kf.hei...@gmail.com] that is a different issue, we need to open a 
separate JIRA about handling chunked binary inside nested fields

> [Python] Feather reads fail with unintuitive error when conversion from 
> pandas yields ChunkedArray
> --
>
> Key: ARROW-3058
> URL: https://issues.apache.org/jira/browse/ARROW-3058
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> See report in 
> https://github.com/wesm/feather/issues/321#issuecomment-412884084
> Individual string columns with more than 2GB are currently unsupported in the 
> Feather format 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4142) [Java] JDBC-to-Arrow: JDBC Arrays

2019-02-19 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4142.
-
Resolution: Fixed

> [Java] JDBC-to-Arrow: JDBC Arrays
> -
>
> Key: ARROW-4142
> URL: https://issues.apache.org/jira/browse/ARROW-4142
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Michael Pigott
>Assignee: Michael Pigott
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 17h 20m
>  Remaining Estimate: 0h
>
> The JDBC Adapter does not support JDBC Arrays.
> JDBC Arrays can be walked using an internal ResultSet object, but its 
> ResultSetMetaData may not contain the proper type information.  The sub-type 
> information can be stored in the JdbcToArrowConfig object, because there may 
> not be any other way to get it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4142) [Java] JDBC-to-Arrow: JDBC Arrays

2019-02-19 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772001#comment-16772001
 ] 

Wes McKinney commented on ARROW-4142:
-

I changed the issue state from "Closed" to "Resolved"

> [Java] JDBC-to-Arrow: JDBC Arrays
> -
>
> Key: ARROW-4142
> URL: https://issues.apache.org/jira/browse/ARROW-4142
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Michael Pigott
>Assignee: Michael Pigott
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 17h 20m
>  Remaining Estimate: 0h
>
> The JDBC Adapter does not support JDBC Arrays.
> JDBC Arrays can be walked using an internal ResultSet object, but its 
> ResultSetMetaData may not contain the proper type information.  The sub-type 
> information can be stored in the JdbcToArrowConfig object, because there may 
> not be any other way to get it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4628) [Rust] [DataFusion] Implement type coercion query optimizer rule

2019-02-19 Thread Andy Grove (JIRA)
Andy Grove created ARROW-4628:
-

 Summary: [Rust] [DataFusion] Implement type coercion query 
optimizer rule
 Key: ARROW-4628
 URL: https://issues.apache.org/jira/browse/ARROW-4628
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Affects Versions: 0.12.0
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.13.0


Now that we have a query optimizer, we should re-implement type coercion as an 
optimizer rule that rewrites expressions with explicit casts where required, so 
that at runtime we are only comparing like types.

For example, the expression {{float_column < int_column}} would be rewritten as 
{{float_column < CAST(int_column AS float)}}.

DataFusion already has this logic but the current implementation is somewhat 
hacky and incomplete. Moving it to the optimizer will allow us to implement 
this correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (ARROW-4142) [Java] JDBC-to-Arrow: JDBC Arrays

2019-02-19 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reopened ARROW-4142:
-

> [Java] JDBC-to-Arrow: JDBC Arrays
> -
>
> Key: ARROW-4142
> URL: https://issues.apache.org/jira/browse/ARROW-4142
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Michael Pigott
>Assignee: Michael Pigott
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 17h 20m
>  Remaining Estimate: 0h
>
> The JDBC Adapter does not support JDBC Arrays.
> JDBC Arrays can be walked using an internal ResultSet object, but its 
> ResultSetMetaData may not contain the proper type information.  The sub-type 
> information can be stored in the JdbcToArrowConfig object, because there may 
> not be any other way to get it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4587) Flight C++ DoPut segfaults

2019-02-19 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-4587.
---
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3660
[https://github.com/apache/arrow/pull/3660]

> Flight C++ DoPut segfaults
> --
>
> Key: ARROW-4587
> URL: https://issues.apache.org/jira/browse/ARROW-4587
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC
>Reporter: David Li
>Assignee: David Li
>Priority: Major
>  Labels: flight, pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> After Wes fixed the undefined behavior, it turns out the implementation of 
> DoPut on the client side is now wrong. It should construct an IpcPayload 
> instead of going through the underlying Protobuf.
> Additionally, a previous patch accidentally exposed 
> arrow::ipc::DictionaryMemo under arrow::DictionaryMemo.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3564) [Python] writing version 2.0 parquet format with dictionary encoding enabled

2019-02-19 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-3564:
---
Fix Version/s: 0.12.1

> [Python] writing version 2.0 parquet format with dictionary encoding enabled
> 
>
> Key: ARROW-3564
> URL: https://issues.apache.org/jira/browse/ARROW-3564
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.11.0
>Reporter: Hatem Helal
>Assignee: Hatem Helal
>Priority: Major
>  Labels: parquet, pull-request-available
> Fix For: 0.13.0, 0.12.1
>
> Attachments: example_v1.0_dict_False.parquet, 
> example_v1.0_dict_True.parquet, example_v2.0_dict_False.parquet, 
> example_v2.0_dict_True.parquet, pyarrow_repro.py
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Using pyarrow v0.11.0, the attached script writes a simple table (lifted from 
> the [pyarrow doc|https://arrow.apache.org/docs/python/parquet.html]) to both 
> parquet format versions 1.0 and 2.0, with and without dictionary encoding 
> enabled.
> Inspecting the written files using 
> [parquet-tools|https://github.com/apache/parquet-mr/tree/master/parquet-tools]
>  appears to show that dictionary encoding is not used in either of the 
> version 2.0 files.  Both files report that the columns are encoded using 
> {{PLAIN,RLE}} and that the dictionary page offset is zero.  I was expecting 
> that the column encoding would include {{RLE_DICTIONARY}}. Attached are the 
> script with repro steps and the files that were generated by it.
> Below is the output of using {{parquet-tools meta}} on the version 2.0 files
> {panel:title=version='2.0', use_dictionary = True}
> {panel}
> |{{% parquet-tools meta example_v2.0_dict_True.parquet}}
>  {{file:  
> [file:.../example_v2.0_dict_True.parquet|file:///.../example_v2.0_dict_True.parquet]}}
>  {{creator:   parquet-cpp version 1.5.1-SNAPSHOT}} \{ Unknown macro: 
> {extra}
>  }}
>   
>  {{file schema:   schema}}
>  
> {{}}
>  {{one:   OPTIONAL DOUBLE R:0 D:1}}
>  {{three: OPTIONAL BOOLEAN R:0 D:1}}
>  {{two:   OPTIONAL BINARY R:0 D:1}}
>  {{__index_level_0__: OPTIONAL BINARY R:0 D:1}}
>   
>  {{row group 1:   RC:3 TS:211 OFFSET:4}}
>  
> {{}}
>  {{one:    DOUBLE SNAPPY DO:0 FPO:4 SZ:65/63/0.97 VC:3 
> ENC:PLAIN,RLE ST:[min: -1.0, max: 2.5, num_nulls: 1]}}
>  {{three:  BOOLEAN SNAPPY DO:0 FPO:142 SZ:36/34/0.94 VC:3 
> ENC:PLAIN,RLE ST:[min: false, max: true, num_nulls: 0]}}
>  {{two:    BINARY SNAPPY DO:0 FPO:225 SZ:60/58/0.97 VC:3 
> ENC:PLAIN,RLE ST:[min: 0x626172, max: 0x666F6F, num_nulls: 0]}}
>  {{__index_level_0__:  BINARY SNAPPY DO:0 FPO:328 SZ:50/48/0.96 VC:3 
> ENC:PLAIN,RLE ST:[min: 0x61, max: 0x63, num_nulls: 0]}}|
> {panel:title=version='2.0', use_dictionary = False}
> {panel}
> |{{% parquet-tools meta example_v2.0_dict_False.parquet}}
>  {{file:  
> [file:.../example_v2.0_dict_False.parquet|file:///.../example_v2.0_dict_False.parquet]}}
>  {{creator:   parquet-cpp version 1.5.1-SNAPSHOT}} \{ Unknown macro: 
> {extra}
>  }}
>   
>  {{file schema:   schema}}
>  
> {{}}
>  {{one:   OPTIONAL DOUBLE R:0 D:1}}
>  {{three: OPTIONAL BOOLEAN R:0 D:1}}
>  {{two:   OPTIONAL BINARY R:0 D:1}}
>  {{__index_level_0__: OPTIONAL BINARY R:0 D:1}}
>   
>  {{row group 1:   RC:3 TS:211 OFFSET:4}}
>  
> {{}}
>  {{one:    DOUBLE SNAPPY DO:0 FPO:4 SZ:65/63/0.97 VC:3 
> ENC:PLAIN,RLE ST:[min: -1.0, max: 2.5, num_nulls: 1]}}
>  {{three:  BOOLEAN SNAPPY DO:0 FPO:142 SZ:36/34/0.94 VC:3 
> ENC:PLAIN,RLE ST:[min: false, max: true, num_nulls: 0]}}
>  {{two:    BINARY SNAPPY DO:0 FPO:225 SZ:60/58/0.97 VC:3 
> ENC:PLAIN,RLE ST:[min: 0x626172, max: 0x666F6F, num_nulls: 0]}}
>  {{__index_level_0__:  BINARY SNAPPY DO:0 FPO:328 SZ:50/48/0.96 VC:3 
> ENC:PLAIN,RLE ST:[min: 0x61, max: 0x63, num_nulls: 0]}}|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4562) [C++][Flight] Create outgoing composite grpc::ByteBuffer instead of allocating contiguous slice and copying IpcPayload into it

2019-02-19 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-4562:
-

Assignee: Antoine Pitrou

> [C++][Flight] Create outgoing composite grpc::ByteBuffer instead of 
> allocating contiguous slice and copying IpcPayload into it
> --
>
> Key: ARROW-4562
> URL: https://issues.apache.org/jira/browse/ARROW-4562
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.13.0
>
>
> See discussion in https://github.com/apache/arrow/pull/3633



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4629) [Python] Pandas to arrow conversion slowed down by local imports

2019-02-19 Thread Florian Jetter (JIRA)
Florian Jetter created ARROW-4629:
-

 Summary: [Python] Pandas to arrow conversion slowed down by local 
imports
 Key: ARROW-4629
 URL: https://issues.apache.org/jira/browse/ARROW-4629
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Florian Jetter
Assignee: Florian Jetter
 Attachments: image-2019-02-19-19-10-46-330.png

The pandas to arrow conversion is currently slowed down significantly by 
various local import statements.
{code}
import pandas as pd
import pyarrow as pa
import cProfile
ser = pd.Series(range(1))
df = pd.DataFrame({col: ser.copy(deep=True) for col in range(50)})
# Simulate a real dataset, i.e. force copy of data
df = df.astype({col: str for col in range(25)})
prof = cProfile.Profile()

prof.enable()
# a few times to collect statistics
for _ in range(100):
pa.Table.from_pandas(df, nthreads=1)
prof.disable()
prof.dump_stats("array_conversion.prof")
{code}
!image-2019-02-19-19-10-46-330.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4629) [Python] Pandas to arrow conversion slowed down by local imports

2019-02-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4629:
--
Labels: pull-request-available  (was: )

> [Python] Pandas to arrow conversion slowed down by local imports
> 
>
> Key: ARROW-4629
> URL: https://issues.apache.org/jira/browse/ARROW-4629
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Florian Jetter
>Assignee: Florian Jetter
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2019-02-19-19-10-46-330.png
>
>
> The pandas to arrow conversion is currently slowed down significantly by 
> various local import statements.
> {code}
> import pandas as pd
> import pyarrow as pa
> import cProfile
> ser = pd.Series(range(1))
> df = pd.DataFrame({col: ser.copy(deep=True) for col in range(50)})
> # Simulate a real dataset, i.e. force copy of data
> df = df.astype({col: str for col in range(25)})
> prof = cProfile.Profile()
> prof.enable()
> # a few times to collect statistics
> for _ in range(100):
> pa.Table.from_pandas(df, nthreads=1)
> prof.disable()
> prof.dump_stats("array_conversion.prof")
> {code}
> !image-2019-02-19-19-10-46-330.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4313) Define general benchmark database schema

2019-02-19 Thread Areg Melik-Adamyan (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772241#comment-16772241
 ] 

Areg Melik-Adamyan commented on ARROW-4313:
---

[~wesmckinn] and [~pitrou] - need your input.

> My understanding of [this conversation] 
> (https://lists.apache.org/thread.html/dcc08ab10507a5139178d7f816c0f5177ff0657546a4ade3ed71ffd5@%3Cdev.arrow.apache.org%3E)
>  was that a data model not tied to any ORM tool was the desired path to take.
> 
I think we need to take a step back, and sync and agree with the @wesm and 
@pitrou on the goals for this little project: 
* for me the goal is to continuously track the performance for the core C++ 
library and help everybody who is doing performance work to catch regressions 
and contribute improvements. 
* do that in a validated form, so we can rely on the numbers.
* there is no goal to provide infrastructure for contributing 3rd party 
numbers, as they cannot be validated in a quick manner.
* there is no goal to bench other languages, as they rely on C++ library calls 
and you will benchmark the wrapper conversion speed
* there is no goal, for now, to anticipate and satisfy all the future possible 
needs.

The ability of the Arrow test library (practically GTest) to provide 
performance numbers on a run platform is more than enough. I would not like to 
limit users to have a different kind of databases, performance monitors or 
dashboards of their need. I am duplicating this in the issue 4313 to move the 
discussion from code review.

> Define general benchmark database schema
> 
>
> Key: ARROW-4313
> URL: https://issues.apache.org/jira/browse/ARROW-4313
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Benchmarking
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: benchmark-data-model.erdplus, benchmark-data-model.png
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Some possible attributes that the benchmark database should track, to permit 
> heterogeneity of hardware and programming languages
> * Timestamp of benchmark run
> * Git commit hash of codebase
> * Machine unique name (sort of the "user id")
> * CPU identification for machine, and clock frequency (in case of 
> overclocking)
> * CPU cache sizes (L1/L2/L3)
> * Whether or not CPU throttling is enabled (if it can be easily determined)
> * RAM size
> * GPU identification (if any)
> * Benchmark unique name
> * Programming language(s) associated with benchmark (e.g. a benchmark
> may involve both C++ and Python)
> * Benchmark time, plus mean and standard deviation if available, else NULL
> see discussion on mailing list 
> https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4618) [Docker] Makefile to build dependent docker images

2019-02-19 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-4618.

   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3696
[https://github.com/apache/arrow/pull/3696]

> [Docker] Makefile to build dependent docker images
> --
>
> Key: ARROW-4618
> URL: https://issues.apache.org/jira/browse/ARROW-4618
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Docker compose cannot be used to build image hierarchies:
> - https://github.com/docker/compose/issues/6093
> - https://github.com/docker/compose/issues/6264#issuecomment-429268195



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4618) [Docker] Makefile to build dependent docker images

2019-02-19 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-4618:
--

Assignee: Krisztian Szucs

> [Docker] Makefile to build dependent docker images
> --
>
> Key: ARROW-4618
> URL: https://issues.apache.org/jira/browse/ARROW-4618
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Docker compose cannot be used to build image hierarchies:
> - https://github.com/docker/compose/issues/6093
> - https://github.com/docker/compose/issues/6264#issuecomment-429268195



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4629) [Python] Pandas to arrow conversion slowed down by local imports

2019-02-19 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4629:

Component/s: Python

> [Python] Pandas to arrow conversion slowed down by local imports
> 
>
> Key: ARROW-4629
> URL: https://issues.apache.org/jira/browse/ARROW-4629
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Florian Jetter
>Assignee: Florian Jetter
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2019-02-19-19-10-46-330.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The pandas to arrow conversion is currently slowed down significantly by 
> various local import statements.
> {code}
> import pandas as pd
> import pyarrow as pa
> import cProfile
> ser = pd.Series(range(1))
> df = pd.DataFrame({col: ser.copy(deep=True) for col in range(50)})
> # Simulate a real dataset, i.e. force copy of data
> df = df.astype({col: str for col in range(25)})
> prof = cProfile.Profile()
> prof.enable()
> # a few times to collect statistics
> for _ in range(100):
> pa.Table.from_pandas(df, nthreads=1)
> prof.disable()
> prof.dump_stats("array_conversion.prof")
> {code}
> !image-2019-02-19-19-10-46-330.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4562) [C++][Flight] Create outgoing composite grpc::ByteBuffer instead of allocating contiguous slice and copying IpcPayload into it

2019-02-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4562:
--
Labels: pull-request-available  (was: )

> [C++][Flight] Create outgoing composite grpc::ByteBuffer instead of 
> allocating contiguous slice and copying IpcPayload into it
> --
>
> Key: ARROW-4562
> URL: https://issues.apache.org/jira/browse/ARROW-4562
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> See discussion in https://github.com/apache/arrow/pull/3633



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4629) [Python] Pandas to arrow conversion slowed down by local imports

2019-02-19 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4629:

Fix Version/s: 0.13.0

> [Python] Pandas to arrow conversion slowed down by local imports
> 
>
> Key: ARROW-4629
> URL: https://issues.apache.org/jira/browse/ARROW-4629
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Florian Jetter
>Assignee: Florian Jetter
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: image-2019-02-19-19-10-46-330.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The pandas to arrow conversion is currently slowed down significantly by 
> various local import statements.
> {code}
> import pandas as pd
> import pyarrow as pa
> import cProfile
> ser = pd.Series(range(1))
> df = pd.DataFrame({col: ser.copy(deep=True) for col in range(50)})
> # Simulate a real dataset, i.e. force copy of data
> df = df.astype({col: str for col in range(25)})
> prof = cProfile.Profile()
> prof.enable()
> # a few times to collect statistics
> for _ in range(100):
> pa.Table.from_pandas(df, nthreads=1)
> prof.disable()
> prof.dump_stats("array_conversion.prof")
> {code}
> !image-2019-02-19-19-10-46-330.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4313) Define general benchmark database schema

2019-02-19 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772315#comment-16772315
 ] 

Wes McKinney commented on ARROW-4313:
-

I'm involved many projects so I haven't been able to follow the discussion to 
see where there is disagreement or conflict. 

>From my perspective I want the following in the short term

* A general purpose database schema, preferably for PostgreSQL, which can be 
used to easily provision a new benchmark database
* A script for running the C++ benchmarks and inserting the results into the 
database. This script should capture hardware information as well as any 
additional information that is known about the environment (OS, thirdparty 
library versions -- e.g. so we can see if upgrading a dependency, like gRPC for 
example, causes a performance problem)

I think until we should work as quickly as possible to have a working version 
of both of these to validate that we are on the right track. If we try to come 
up with the "perfect database schema" and punt the benchmark collector script 
until later we could be waiting a long time. 

Ideally the database schema can accommodate results from multiple benchmark 
execution frameworks other than Google benchmark for C++. So we could write an 
adapter script to export data from ASV (for Python) into this database.

[~aregm] this does not seem to be out of line with the requirements you listed 
unless I am misunderstanding. I would rather not be too involved with the 
details right now unless the project stalls out for some reason and needs me to 
help push it through to completion. 

> Define general benchmark database schema
> 
>
> Key: ARROW-4313
> URL: https://issues.apache.org/jira/browse/ARROW-4313
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Benchmarking
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: benchmark-data-model.erdplus, benchmark-data-model.png
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Some possible attributes that the benchmark database should track, to permit 
> heterogeneity of hardware and programming languages
> * Timestamp of benchmark run
> * Git commit hash of codebase
> * Machine unique name (sort of the "user id")
> * CPU identification for machine, and clock frequency (in case of 
> overclocking)
> * CPU cache sizes (L1/L2/L3)
> * Whether or not CPU throttling is enabled (if it can be easily determined)
> * RAM size
> * GPU identification (if any)
> * Benchmark unique name
> * Programming language(s) associated with benchmark (e.g. a benchmark
> may involve both C++ and Python)
> * Benchmark time, plus mean and standard deviation if available, else NULL
> see discussion on mailing list 
> https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-4313) Define general benchmark database schema

2019-02-19 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772315#comment-16772315
 ] 

Wes McKinney edited comment on ARROW-4313 at 2/19/19 8:43 PM:
--

I'm involved many projects so I haven't been able to follow the discussion to 
see where there is disagreement or conflict. 

>From my perspective I want the following in the short term

* A general purpose database schema, preferably for PostgreSQL, which can be 
used to easily provision a new benchmark database
* A script for running the C++ benchmarks and inserting the results into _any 
instance_ of that database. This script should capture hardware information as 
well as any additional information that is known about the environment (OS, 
thirdparty library versions -- e.g. so we can see if upgrading a dependency, 
like gRPC for example, causes a performance problem). The script should not be 
coupled to a particular instance of the database. It should work in an 
air-gapped environment

I think until we should work as quickly as possible to have a working version 
of both of these to validate that we are on the right track. If we try to come 
up with the "perfect database schema" and punt the benchmark collector script 
until later we could be waiting a long time. 

Ideally the database schema can accommodate results from multiple benchmark 
execution frameworks other than Google benchmark for C++. So we could write an 
adapter script to export data from ASV (for Python) into this database.

[~aregm] this does not seem to be out of line with the requirements you listed 
unless I am misunderstanding. I would rather not be too involved with the 
details right now unless the project stalls out for some reason and needs me to 
help push it through to completion. 


was (Author: wesmckinn):
I'm involved many projects so I haven't been able to follow the discussion to 
see where there is disagreement or conflict. 

>From my perspective I want the following in the short term

* A general purpose database schema, preferably for PostgreSQL, which can be 
used to easily provision a new benchmark database
* A script for running the C++ benchmarks and inserting the results into the 
database. This script should capture hardware information as well as any 
additional information that is known about the environment (OS, thirdparty 
library versions -- e.g. so we can see if upgrading a dependency, like gRPC for 
example, causes a performance problem)

I think until we should work as quickly as possible to have a working version 
of both of these to validate that we are on the right track. If we try to come 
up with the "perfect database schema" and punt the benchmark collector script 
until later we could be waiting a long time. 

Ideally the database schema can accommodate results from multiple benchmark 
execution frameworks other than Google benchmark for C++. So we could write an 
adapter script to export data from ASV (for Python) into this database.

[~aregm] this does not seem to be out of line with the requirements you listed 
unless I am misunderstanding. I would rather not be too involved with the 
details right now unless the project stalls out for some reason and needs me to 
help push it through to completion. 

> Define general benchmark database schema
> 
>
> Key: ARROW-4313
> URL: https://issues.apache.org/jira/browse/ARROW-4313
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Benchmarking
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: benchmark-data-model.erdplus, benchmark-data-model.png
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Some possible attributes that the benchmark database should track, to permit 
> heterogeneity of hardware and programming languages
> * Timestamp of benchmark run
> * Git commit hash of codebase
> * Machine unique name (sort of the "user id")
> * CPU identification for machine, and clock frequency (in case of 
> overclocking)
> * CPU cache sizes (L1/L2/L3)
> * Whether or not CPU throttling is enabled (if it can be easily determined)
> * RAM size
> * GPU identification (if any)
> * Benchmark unique name
> * Programming language(s) associated with benchmark (e.g. a benchmark
> may involve both C++ and Python)
> * Benchmark time, plus mean and standard deviation if available, else NULL
> see discussion on mailing list 
> https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4605) [Rust] Move filter and limit code from DataFusion into compute module

2019-02-19 Thread Nicolas Trinquier (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772352#comment-16772352
 ] 

Nicolas Trinquier commented on ARROW-4605:
--

[filter|https://github.com/apache/arrow/blob/a65798a0ed3a96cfcd765e84615321da25ae17c8/rust/datafusion/src/execution/filter.rs#L89]
 has already the signature you suggest [~nevi_me].

However for limit it is only a usize, not an array of bools, so I initially 
suggested to have a common signature:
filter(a: &Array, predicate: Fn(usize) -> bool) -> Result
but on second though I am not sure this is worth it because of the performance 
implications. We might be better off using macros. What do you think?

> [Rust] Move filter and limit code from DataFusion into compute module
> -
>
> Key: ARROW-4605
> URL: https://issues.apache.org/jira/browse/ARROW-4605
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.12.0
>Reporter: Andy Grove
>Priority: Major
> Fix For: 0.13.0
>
>
> FilterRelation and the new LimitRelation (in ARROW-4464) contain code for 
> filtering and limiting arrays that could now be pushed down into the compute 
> module.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-4605) [Rust] Move filter and limit code from DataFusion into compute module

2019-02-19 Thread Nicolas Trinquier (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772352#comment-16772352
 ] 

Nicolas Trinquier edited comment on ARROW-4605 at 2/19/19 9:33 PM:
---

[filter|https://github.com/apache/arrow/blob/a65798a0ed3a96cfcd765e84615321da25ae17c8/rust/datafusion/src/execution/filter.rs#L89]
 has already the signature you suggest [~nevi_me].

However for limit it is only a usize, not an array of bools, so I initially 
suggested to have one function to do both filter and limit, with the signature 
signature:
 filter(a: &Array, predicate: Fn(usize) -> bool) -> Result
 but on second though I am not sure this is worth it because of the performance 
implications. We might be better off using macros. What do you think?


was (Author: ntrinquier):
[filter|https://github.com/apache/arrow/blob/a65798a0ed3a96cfcd765e84615321da25ae17c8/rust/datafusion/src/execution/filter.rs#L89]
 has already the signature you suggest [~nevi_me].

However for limit it is only a usize, not an array of bools, so I initially 
suggested to have a common signature:
filter(a: &Array, predicate: Fn(usize) -> bool) -> Result
but on second though I am not sure this is worth it because of the performance 
implications. We might be better off using macros. What do you think?

> [Rust] Move filter and limit code from DataFusion into compute module
> -
>
> Key: ARROW-4605
> URL: https://issues.apache.org/jira/browse/ARROW-4605
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.12.0
>Reporter: Andy Grove
>Priority: Major
> Fix For: 0.13.0
>
>
> FilterRelation and the new LimitRelation (in ARROW-4464) contain code for 
> filtering and limiting arrays that could now be pushed down into the compute 
> module.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-4605) [Rust] Move filter and limit code from DataFusion into compute module

2019-02-19 Thread Nicolas Trinquier (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772352#comment-16772352
 ] 

Nicolas Trinquier edited comment on ARROW-4605 at 2/19/19 9:34 PM:
---

[filter|https://github.com/apache/arrow/blob/a65798a0ed3a96cfcd765e84615321da25ae17c8/rust/datafusion/src/execution/filter.rs#L89]
 has already the signature you suggest [~nevi_me].

However for limit it is only a usize, not an array of bools, so I initially 
suggested to have one function to do both filter and limit, with the signature:
 filter(a: &Array, predicate: Fn(usize) -> bool) -> Result
 but on second though I am not sure this is worth it because of the performance 
implications. We might be better off using macros. What do you think?


was (Author: ntrinquier):
[filter|https://github.com/apache/arrow/blob/a65798a0ed3a96cfcd765e84615321da25ae17c8/rust/datafusion/src/execution/filter.rs#L89]
 has already the signature you suggest [~nevi_me].

However for limit it is only a usize, not an array of bools, so I initially 
suggested to have one function to do both filter and limit, with the signature 
signature:
 filter(a: &Array, predicate: Fn(usize) -> bool) -> Result
 but on second though I am not sure this is worth it because of the performance 
implications. We might be better off using macros. What do you think?

> [Rust] Move filter and limit code from DataFusion into compute module
> -
>
> Key: ARROW-4605
> URL: https://issues.apache.org/jira/browse/ARROW-4605
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.12.0
>Reporter: Andy Grove
>Priority: Major
> Fix For: 0.13.0
>
>
> FilterRelation and the new LimitRelation (in ARROW-4464) contain code for 
> filtering and limiting arrays that could now be pushed down into the compute 
> module.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3294) [C++] Test Flight RPC on Windows / Appveyor

2019-02-19 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3294:
---

Assignee: Wes McKinney

> [C++] Test Flight RPC on Windows / Appveyor
> ---
>
> Key: ARROW-3294
> URL: https://issues.apache.org/jira/browse/ARROW-3294
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: flight
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4631) Implement sort computational kernel

2019-02-19 Thread Areg Melik-Adamyan (JIRA)
Areg Melik-Adamyan created ARROW-4631:
-

 Summary: Implement sort computational kernel
 Key: ARROW-4631
 URL: https://issues.apache.org/jira/browse/ARROW-4631
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Affects Versions: 0.12.0
Reporter: Areg Melik-Adamyan
Assignee: Areg Melik-Adamyan


Implement serial version of sortcomputational kernel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4631) Implement sort computational kernel

2019-02-19 Thread Areg Melik-Adamyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Areg Melik-Adamyan updated ARROW-4631:
--
Description: Implement serial version of sort computational kernel.  (was: 
Implement serial version of sortcomputational kernel.)

> Implement sort computational kernel
> ---
>
> Key: ARROW-4631
> URL: https://issues.apache.org/jira/browse/ARROW-4631
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Areg Melik-Adamyan
>Assignee: Areg Melik-Adamyan
>Priority: Major
>
> Implement serial version of sort computational kernel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4631) [C++] Implement sort computational kernel

2019-02-19 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4631:

Summary: [C++] Implement sort computational kernel  (was: Implement sort 
computational kernel)

> [C++] Implement sort computational kernel
> -
>
> Key: ARROW-4631
> URL: https://issues.apache.org/jira/browse/ARROW-4631
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Areg Melik-Adamyan
>Assignee: Areg Melik-Adamyan
>Priority: Major
>
> Implement serial version of sort computational kernel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4631) [C++] Implement serial version of sort computational kernel

2019-02-19 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4631:

Summary: [C++] Implement serial version of sort computational kernel  (was: 
[C++] Implement sort computational kernel)

> [C++] Implement serial version of sort computational kernel
> ---
>
> Key: ARROW-4631
> URL: https://issues.apache.org/jira/browse/ARROW-4631
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Areg Melik-Adamyan
>Assignee: Areg Melik-Adamyan
>Priority: Major
>
> Implement serial version of sort computational kernel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4630) [CPP] Implement serial version of join

2019-02-19 Thread Areg Melik-Adamyan (JIRA)
Areg Melik-Adamyan created ARROW-4630:
-

 Summary: [CPP] Implement serial version of join
 Key: ARROW-4630
 URL: https://issues.apache.org/jira/browse/ARROW-4630
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.12.0
Reporter: Areg Melik-Adamyan
Assignee: Areg Melik-Adamyan


Implement the serial version of join operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3294) [C++] Test Flight RPC on Windows / Appveyor

2019-02-19 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772407#comment-16772407
 ] 

Wes McKinney commented on ARROW-3294:
-

Started working on this. First issue I ran into is that there are no 
libprotobuf static libs in the conda-forge Windows package

> [C++] Test Flight RPC on Windows / Appveyor
> ---
>
> Key: ARROW-3294
> URL: https://issues.apache.org/jira/browse/ARROW-3294
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: flight
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4631) [C++] Implement serial version of sort computational kernel

2019-02-19 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772397#comment-16772397
 ] 

Wes McKinney commented on ARROW-4631:
-

I linked a couple of related issues

> [C++] Implement serial version of sort computational kernel
> ---
>
> Key: ARROW-4631
> URL: https://issues.apache.org/jira/browse/ARROW-4631
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Areg Melik-Adamyan
>Assignee: Areg Melik-Adamyan
>Priority: Major
>
> Implement serial version of sort computational kernel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1560) [C++] Kernel implementations for "match" function

2019-02-19 Thread Preeti Suman (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772409#comment-16772409
 ] 

Preeti Suman commented on ARROW-1560:
-

Can someone assign this to me? 

> [C++] Kernel implementations for "match" function
> -
>
> Key: ARROW-1560
> URL: https://issues.apache.org/jira/browse/ARROW-1560
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: Analytics
> Fix For: 0.14.0
>
>
> Match computes a position index array from an array values into a set of 
> categories
> {code}
> match(['a', 'b', 'a', null, 'b', 'a', 'b'], ['b', 'a'])
> return [1, 0, 1, null, 0, 1, 0]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3294) [C++] Test Flight RPC on Windows / Appveyor

2019-02-19 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772435#comment-16772435
 ] 

Wes McKinney commented on ARROW-3294:
-

I tried using the shared libraries but it creates a sea of warnings. It seems 
that the protobuf developers want Windows to use static linking:

https://github.com/protocolbuffers/protobuf/blob/436139803ffbe0ca3dc0e563b63f29b2fd729d4f/cmake/README.md#dlls-vs-static-linking

"Due to issues with Win32's use of a separate heap for each DLL, as well as 
binary compatibility issues between different versions of MSVC's STL library, 
it is recommended that you use static linkage only"

> [C++] Test Flight RPC on Windows / Appveyor
> ---
>
> Key: ARROW-3294
> URL: https://issues.apache.org/jira/browse/ARROW-3294
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: flight
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1560) [C++] Kernel implementations for "match" function

2019-02-19 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1560:
---

Assignee: Preeti Suman

> [C++] Kernel implementations for "match" function
> -
>
> Key: ARROW-1560
> URL: https://issues.apache.org/jira/browse/ARROW-1560
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Preeti Suman
>Priority: Major
>  Labels: Analytics
> Fix For: 0.14.0
>
>
> Match computes a position index array from an array values into a set of 
> categories
> {code}
> match(['a', 'b', 'a', null, 'b', 'a', 'b'], ['b', 'a'])
> return [1, 0, 1, null, 0, 1, 0]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3294) [C++] Test Flight RPC on Windows / Appveyor

2019-02-19 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772438#comment-16772438
 ] 

Wes McKinney commented on ARROW-3294:
-

WIP branch: https://github.com/wesm/arrow/tree/ARROW-3294

I'll return to this after getting static libs into the {{libprotobuf}} 
conda-forge package

> [C++] Test Flight RPC on Windows / Appveyor
> ---
>
> Key: ARROW-3294
> URL: https://issues.apache.org/jira/browse/ARROW-3294
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: flight
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1560) [C++] Kernel implementations for "match" function

2019-02-19 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772439#comment-16772439
 ] 

Wes McKinney commented on ARROW-1560:
-

Just made you a Contributor and assigned the issue to you.

Could you describe your implementation approach before you go too far down the 
rabbit hole? We want to make use of the existing hashing machinery that we are 
using for the {{Unique}} and {{DictionaryEncode}} functions

> [C++] Kernel implementations for "match" function
> -
>
> Key: ARROW-1560
> URL: https://issues.apache.org/jira/browse/ARROW-1560
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Preeti Suman
>Priority: Major
>  Labels: Analytics
> Fix For: 0.14.0
>
>
> Match computes a position index array from an array values into a set of 
> categories
> {code}
> match(['a', 'b', 'a', null, 'b', 'a', 'b'], ['b', 'a'])
> return [1, 0, 1, null, 0, 1, 0]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1560) [C++] Kernel implementations for "match" function

2019-02-19 Thread Micah Kornfield (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772447#comment-16772447
 ] 

Micah Kornfield commented on ARROW-1560:


Should there be two implementations?  One for small lists (linear scan) and
one with hashtable ?




> [C++] Kernel implementations for "match" function
> -
>
> Key: ARROW-1560
> URL: https://issues.apache.org/jira/browse/ARROW-1560
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Preeti Suman
>Priority: Major
>  Labels: Analytics
> Fix For: 0.14.0
>
>
> Match computes a position index array from an array values into a set of 
> categories
> {code}
> match(['a', 'b', 'a', null, 'b', 'a', 'b'], ['b', 'a'])
> return [1, 0, 1, null, 0, 1, 0]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3121) [C++] Incremental Mean aggregator

2019-02-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3121:
--
Labels: analytics pull-request-available  (was: analytics)

> [C++] Incremental Mean aggregator
> -
>
> Key: ARROW-3121
> URL: https://issues.apache.org/jira/browse/ARROW-3121
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: analytics, pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3121) [C++] Mean kernel aggregate

2019-02-19 Thread Francois Saint-Jacques (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated ARROW-3121:
--
Summary: [C++]  Mean kernel aggregate  (was: [C++] Incremental Mean 
aggregator)

> [C++]  Mean kernel aggregate
> 
>
> Key: ARROW-3121
> URL: https://issues.apache.org/jira/browse/ARROW-3121
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: analytics, pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1566) [C++] Implement non-materializing sort kernels

2019-02-19 Thread Francois Saint-Jacques (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated ARROW-1566:
--
Description: The output of such operator would be a permutation vector that 
if applied to a column, would result in the data being sorted like requested. 
This is similar to numpy's argsort functionality.
Summary: [C++] Implement non-materializing sort kernels  (was: [C++] 
Implement "argsort" kernels that use mergesort to compute sorting indices)

> [C++] Implement non-materializing sort kernels
> --
>
> Key: ARROW-1566
> URL: https://issues.apache.org/jira/browse/ARROW-1566
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: Analytics
>
> The output of such operator would be a permutation vector that if applied to 
> a column, would result in the data being sorted like requested. This is 
> similar to numpy's argsort functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4630) [C++] Implement serial version of join

2019-02-19 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-4630:

Summary: [C++] Implement serial version of join  (was: [CPP] Implement 
serial version of join)

> [C++] Implement serial version of join
> --
>
> Key: ARROW-4630
> URL: https://issues.apache.org/jira/browse/ARROW-4630
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Areg Melik-Adamyan
>Assignee: Areg Melik-Adamyan
>Priority: Major
>
> Implement the serial version of join operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4632) [Ruby] Add BigDecimal#to_arrow

2019-02-19 Thread Kenta Murata (JIRA)
Kenta Murata created ARROW-4632:
---

 Summary: [Ruby] Add BigDecimal#to_arrow
 Key: ARROW-4632
 URL: https://issues.apache.org/jira/browse/ARROW-4632
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Ruby
Reporter: Kenta Murata
Assignee: Kenta Murata


It may be better that BigDecimal has to_arrow instance method to convert itself 
to Arrow::Decimal128.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4632) [Ruby] Add BigDecimal#to_arrow

2019-02-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4632:
--
Labels: pull-request-available  (was: )

> [Ruby] Add BigDecimal#to_arrow
> --
>
> Key: ARROW-4632
> URL: https://issues.apache.org/jira/browse/ARROW-4632
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Ruby
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Major
>  Labels: pull-request-available
>
> It may be better that BigDecimal has to_arrow instance method to convert 
> itself to Arrow::Decimal128.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4633) ParquetFile.read(use_threads=False) creates ThreadPool anyway

2019-02-19 Thread Taylor Johnson (JIRA)
Taylor Johnson created ARROW-4633:
-

 Summary: ParquetFile.read(use_threads=False) creates ThreadPool 
anyway
 Key: ARROW-4633
 URL: https://issues.apache.org/jira/browse/ARROW-4633
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.12.0, 0.11.1
 Environment: Linux, Python 3.7.1, pyarrow.__version__ = 0.12.0
Reporter: Taylor Johnson


The following code seems to suggest that ParquetFile.read(use_threads=False) 
still creates a ThreadPool.  This is observed in 
ParquetFile.read_row_group(use_threads=False) as well. 

This does not appear to be a problem in 
pyarrow.Table.to_pandas(use_threads=False).

I've tried tracing the error.  Starting in python/pyarrow/parquet.py, both 
ParquetReader.read_all() and ParquetReader.read_row_group() pass the 
use_threads input along to self.reader which is a ParquetReader imported from 
_parquet.pyx

Following the calls into python/pyarrow/_parquet.pyx, we see that 
ParquetReader.read_all() and ParquetReader.read_row_group() have the following 
code which seems a bit suspicious
{quote}if use_threads:

    self.set_use_threads(use_threads)
{quote}
Why not just always call self.set_use_threads(use_threads)?

The ParquetReader.set_use_threads simply calls 
self.reader.get().set_use_threads(use_threads).  This self.reader is assigned 
as unique_ptr[FileReader].  I think this points to 
cpp/src/parquet/arrow/reader.cc, but I'm not sure about that.  The 
FileReader::Impl::ReadRowGroup logic looks ok, as a call to 
::arrow::internal::GetCpuThreadPool() is only called if use_threads is True.  
The same is true for ReadTable.

So when is the ThreadPool getting created?

Example code:

--
{quote}import pandas as pd
import psutil
import pyarrow as pa
import pyarrow.parquet as pq

use_threads=False
p=psutil.Process()
print('Starting with {} threads'.format(p.num_threads()))


df = pd.DataFrame(\{'x':[0]})
table = pa.Table.from_pandas(df)
print('After table creation, {} threads'.format(p.num_threads()))

df = table.to_pandas(use_threads=use_threads)
print('table.to_pandas(use_threads={}), {} threads'.format(use_threads, 
p.num_threads()))

writer = pq.ParquetWriter('tmp.parquet', table.schema)
writer.write_table(table)
writer.close()
print('After writing parquet file, {} threads'.format(p.num_threads()))

pf = pq.ParquetFile('tmp.parquet')
print('After ParquetFile, {} threads'.format(p.num_threads()))

df = pf.read(use_threads=use_threads).to_pandas()
print('After pf.read(use_threads={}), {} threads'.format(use_threads, 
p.num_threads()))
{quote}
---
$ python pyarrow_test.py

Starting with 1 threads
After table creation, 1 threads
table.to_pandas(use_threads=False), 1 threads
After writing parquet file, 1 threads
After ParquetFile, 1 threads
After pf.read(use_threads=False), 5 threads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4632) [Ruby] Add BigDecimal#to_arrow

2019-02-19 Thread Kenta Murata (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenta Murata updated ARROW-4632:

External issue URL: https://github.com/apache/arrow/pull/3709

> [Ruby] Add BigDecimal#to_arrow
> --
>
> Key: ARROW-4632
> URL: https://issues.apache.org/jira/browse/ARROW-4632
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Ruby
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It may be better that BigDecimal has to_arrow instance method to convert 
> itself to Arrow::Decimal128.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4634) [Rust] [Parquet] Reorganize test_common mod to allow more test util codes.

2019-02-19 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-4634:
-

 Summary: [Rust] [Parquet] Reorganize test_common mod to allow more 
test util codes.
 Key: ARROW-4634
 URL: https://issues.apache.org/jira/browse/ARROW-4634
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu


Currently test_common mod is just one file, and when we need to add more test 
utils into it, things may messed up, so I propose to make test_common a 
directory with multi sub mods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4634) [Rust] [Parquet] Reorganize test_common mod to allow more test util codes.

2019-02-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4634:
--
Labels: pull-request-available  (was: )

> [Rust] [Parquet] Reorganize test_common mod to allow more test util codes.
> --
>
> Key: ARROW-4634
> URL: https://issues.apache.org/jira/browse/ARROW-4634
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Minor
>  Labels: pull-request-available
>
> Currently test_common mod is just one file, and when we need to add more test 
> utils into it, things may messed up, so I propose to make test_common a 
> directory with multi sub mods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4631) [C++] Implement serial version of sort computational kernel

2019-02-19 Thread Francois Saint-Jacques (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated ARROW-4631:
--
Labels: ana  (was: )

> [C++] Implement serial version of sort computational kernel
> ---
>
> Key: ARROW-4631
> URL: https://issues.apache.org/jira/browse/ARROW-4631
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Areg Melik-Adamyan
>Assignee: Areg Melik-Adamyan
>Priority: Major
>  Labels: ana
>
> Implement serial version of sort computational kernel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4631) [C++] Implement serial version of sort computational kernel

2019-02-19 Thread Francois Saint-Jacques (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated ARROW-4631:
--
Labels: analytics  (was: ana)

> [C++] Implement serial version of sort computational kernel
> ---
>
> Key: ARROW-4631
> URL: https://issues.apache.org/jira/browse/ARROW-4631
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Areg Melik-Adamyan
>Assignee: Areg Melik-Adamyan
>Priority: Major
>  Labels: analytics
>
> Implement serial version of sort computational kernel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2685) [C++] Implement kernels for in-place sorting of fixed-width contiguous arrays

2019-02-19 Thread Francois Saint-Jacques (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated ARROW-2685:
--
Labels: analytics  (was: )

> [C++] Implement kernels for in-place sorting of fixed-width contiguous arrays
> -
>
> Key: ARROW-2685
> URL: https://issues.apache.org/jira/browse/ARROW-2685
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: analytics
>
> See discussion in https://github.com/apache/arrow/issues/2112. A kernel may 
> want to throw if the memory being sorted is shared (in which case the user 
> should copy, then sort)
> Sorting of chunked data is a more complex topic so that's out of scope for 
> this task



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4632) [Ruby] Add BigDecimal#to_arrow

2019-02-19 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-4632.
-
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3709
[https://github.com/apache/arrow/pull/3709]

> [Ruby] Add BigDecimal#to_arrow
> --
>
> Key: ARROW-4632
> URL: https://issues.apache.org/jira/browse/ARROW-4632
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Ruby
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> It may be better that BigDecimal has to_arrow instance method to convert 
> itself to Arrow::Decimal128.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3135) [C++] Add helper functions for validity bitmap propagation in kernel context

2019-02-19 Thread Francois Saint-Jacques (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772574#comment-16772574
 ] 

Francois Saint-Jacques commented on ARROW-3135:
---

I don't quite follow the description of this task. Is this about passing the 
selection vector between chained kernels?

> [C++] Add helper functions for validity bitmap propagation in kernel context
> 
>
> Key: ARROW-3135
> URL: https://issues.apache.org/jira/browse/ARROW-3135
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: analytics
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4633) [Python] ParquetFile.read(use_threads=False) creates ThreadPool anyway

2019-02-19 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4633:

Summary: [Python] ParquetFile.read(use_threads=False) creates ThreadPool 
anyway  (was: ParquetFile.read(use_threads=False) creates ThreadPool anyway)

> [Python] ParquetFile.read(use_threads=False) creates ThreadPool anyway
> --
>
> Key: ARROW-4633
> URL: https://issues.apache.org/jira/browse/ARROW-4633
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.11.1, 0.12.0
> Environment: Linux, Python 3.7.1, pyarrow.__version__ = 0.12.0
>Reporter: Taylor Johnson
>Priority: Minor
>  Labels: newbie
>
> The following code seems to suggest that ParquetFile.read(use_threads=False) 
> still creates a ThreadPool.  This is observed in 
> ParquetFile.read_row_group(use_threads=False) as well. 
> This does not appear to be a problem in 
> pyarrow.Table.to_pandas(use_threads=False).
> I've tried tracing the error.  Starting in python/pyarrow/parquet.py, both 
> ParquetReader.read_all() and ParquetReader.read_row_group() pass the 
> use_threads input along to self.reader which is a ParquetReader imported from 
> _parquet.pyx
> Following the calls into python/pyarrow/_parquet.pyx, we see that 
> ParquetReader.read_all() and ParquetReader.read_row_group() have the 
> following code which seems a bit suspicious
> {quote}if use_threads:
>     self.set_use_threads(use_threads)
> {quote}
> Why not just always call self.set_use_threads(use_threads)?
> The ParquetReader.set_use_threads simply calls 
> self.reader.get().set_use_threads(use_threads).  This self.reader is assigned 
> as unique_ptr[FileReader].  I think this points to 
> cpp/src/parquet/arrow/reader.cc, but I'm not sure about that.  The 
> FileReader::Impl::ReadRowGroup logic looks ok, as a call to 
> ::arrow::internal::GetCpuThreadPool() is only called if use_threads is True.  
> The same is true for ReadTable.
> So when is the ThreadPool getting created?
> Example code:
> --
> {quote}import pandas as pd
> import psutil
> import pyarrow as pa
> import pyarrow.parquet as pq
> use_threads=False
> p=psutil.Process()
> print('Starting with {} threads'.format(p.num_threads()))
> df = pd.DataFrame(\{'x':[0]})
> table = pa.Table.from_pandas(df)
> print('After table creation, {} threads'.format(p.num_threads()))
> df = table.to_pandas(use_threads=use_threads)
> print('table.to_pandas(use_threads={}), {} threads'.format(use_threads, 
> p.num_threads()))
> writer = pq.ParquetWriter('tmp.parquet', table.schema)
> writer.write_table(table)
> writer.close()
> print('After writing parquet file, {} threads'.format(p.num_threads()))
> pf = pq.ParquetFile('tmp.parquet')
> print('After ParquetFile, {} threads'.format(p.num_threads()))
> df = pf.read(use_threads=use_threads).to_pandas()
> print('After pf.read(use_threads={}), {} threads'.format(use_threads, 
> p.num_threads()))
> {quote}
> ---
> $ python pyarrow_test.py
> Starting with 1 threads
> After table creation, 1 threads
> table.to_pandas(use_threads=False), 1 threads
> After writing parquet file, 1 threads
> After ParquetFile, 1 threads
> After pf.read(use_threads=False), 5 threads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3135) [C++] Add helper functions for validity bitmap propagation in kernel context

2019-02-19 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772619#comment-16772619
 ] 

Wes McKinney commented on ARROW-3135:
-

I just recently added 
https://github.com/apache/arrow/blob/47ebb1af1f6e1bcac95cf99f8258257f471f043b/cpp/src/arrow/compute/kernels/util-internal.h#L73

This needs to be generalized to kernels with multiple input arguments where 
bitmaps need to be AND-ed

> [C++] Add helper functions for validity bitmap propagation in kernel context
> 
>
> Key: ARROW-3135
> URL: https://issues.apache.org/jira/browse/ARROW-3135
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: analytics
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3135) [C++] Add helper functions for validity bitmap propagation in kernel context

2019-02-19 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772621#comment-16772621
 ] 

Wes McKinney commented on ARROW-3135:
-

For example, in the Add kernel, if either bits in a pair of bitmaps is not set, 
then the result must not be set

> [C++] Add helper functions for validity bitmap propagation in kernel context
> 
>
> Key: ARROW-3135
> URL: https://issues.apache.org/jira/browse/ARROW-3135
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: analytics
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4633) [Python] ParquetFile.read(use_threads=False) creates ThreadPool anyway

2019-02-19 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4633:

Fix Version/s: 0.14.0

> [Python] ParquetFile.read(use_threads=False) creates ThreadPool anyway
> --
>
> Key: ARROW-4633
> URL: https://issues.apache.org/jira/browse/ARROW-4633
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.11.1, 0.12.0
> Environment: Linux, Python 3.7.1, pyarrow.__version__ = 0.12.0
>Reporter: Taylor Johnson
>Priority: Minor
>  Labels: newbie
> Fix For: 0.14.0
>
>
> The following code seems to suggest that ParquetFile.read(use_threads=False) 
> still creates a ThreadPool.  This is observed in 
> ParquetFile.read_row_group(use_threads=False) as well. 
> This does not appear to be a problem in 
> pyarrow.Table.to_pandas(use_threads=False).
> I've tried tracing the error.  Starting in python/pyarrow/parquet.py, both 
> ParquetReader.read_all() and ParquetReader.read_row_group() pass the 
> use_threads input along to self.reader which is a ParquetReader imported from 
> _parquet.pyx
> Following the calls into python/pyarrow/_parquet.pyx, we see that 
> ParquetReader.read_all() and ParquetReader.read_row_group() have the 
> following code which seems a bit suspicious
> {quote}if use_threads:
>     self.set_use_threads(use_threads)
> {quote}
> Why not just always call self.set_use_threads(use_threads)?
> The ParquetReader.set_use_threads simply calls 
> self.reader.get().set_use_threads(use_threads).  This self.reader is assigned 
> as unique_ptr[FileReader].  I think this points to 
> cpp/src/parquet/arrow/reader.cc, but I'm not sure about that.  The 
> FileReader::Impl::ReadRowGroup logic looks ok, as a call to 
> ::arrow::internal::GetCpuThreadPool() is only called if use_threads is True.  
> The same is true for ReadTable.
> So when is the ThreadPool getting created?
> Example code:
> --
> {quote}import pandas as pd
> import psutil
> import pyarrow as pa
> import pyarrow.parquet as pq
> use_threads=False
> p=psutil.Process()
> print('Starting with {} threads'.format(p.num_threads()))
> df = pd.DataFrame(\{'x':[0]})
> table = pa.Table.from_pandas(df)
> print('After table creation, {} threads'.format(p.num_threads()))
> df = table.to_pandas(use_threads=use_threads)
> print('table.to_pandas(use_threads={}), {} threads'.format(use_threads, 
> p.num_threads()))
> writer = pq.ParquetWriter('tmp.parquet', table.schema)
> writer.write_table(table)
> writer.close()
> print('After writing parquet file, {} threads'.format(p.num_threads()))
> pf = pq.ParquetFile('tmp.parquet')
> print('After ParquetFile, {} threads'.format(p.num_threads()))
> df = pf.read(use_threads=use_threads).to_pandas()
> print('After pf.read(use_threads={}), {} threads'.format(use_threads, 
> p.num_threads()))
> {quote}
> ---
> $ python pyarrow_test.py
> Starting with 1 threads
> After table creation, 1 threads
> table.to_pandas(use_threads=False), 1 threads
> After writing parquet file, 1 threads
> After ParquetFile, 1 threads
> After pf.read(use_threads=False), 5 threads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4581) [C++] gbenchmark_ep is a dependency of unit tests when ARROW_BUILD_BENCHMARKS=ON

2019-02-19 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4581:

Fix Version/s: (was: 0.12.1)
   0.13.0

> [C++] gbenchmark_ep is a dependency of unit tests when 
> ARROW_BUILD_BENCHMARKS=ON
> 
>
> Key: ARROW-4581
> URL: https://issues.apache.org/jira/browse/ARROW-4581
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I hit this issue when trying to use clang-7 from conda-forge, and wasn't sure 
> why gbenchmark_ep is getting built when I'm building only a single unit test 
> executable like arrow-array-test
> https://github.com/google/benchmark/issues/351



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4581) [C++] gbenchmark_ep is a dependency of unit tests when ARROW_BUILD_BENCHMARKS=ON

2019-02-19 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4581.
-
   Resolution: Fixed
Fix Version/s: (was: 0.13.0)
   0.12.1

Issue resolved by pull request 3698
[https://github.com/apache/arrow/pull/3698]

> [C++] gbenchmark_ep is a dependency of unit tests when 
> ARROW_BUILD_BENCHMARKS=ON
> 
>
> Key: ARROW-4581
> URL: https://issues.apache.org/jira/browse/ARROW-4581
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I hit this issue when trying to use clang-7 from conda-forge, and wasn't sure 
> why gbenchmark_ep is getting built when I'm building only a single unit test 
> executable like arrow-array-test
> https://github.com/google/benchmark/issues/351



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)