[jira] [Created] (ARROW-5195) read_csv ignores null_values on string types

2019-04-22 Thread Scott Burns (JIRA)
Scott Burns created ARROW-5195: -- Summary: read_csv ignores null_values on string types Key: ARROW-5195 URL: https://issues.apache.org/jira/browse/ARROW-5195 Project: Apache Arrow Issue Type:

RE: [Rust] [DataFusion] Parallel query execution PoC

2019-04-22 Thread Melik-Adamyan, Areg
I would encourage you to familiarize yourself with the proposal https://cwiki.apache.org/confluence/display/ARROW/Parallel+Execution+Engine and join the forces for more rapid development of the engine. -Original Message- From: Andy Grove [mailto:andygrov...@gmail.com] Sent: Saturday,

[jira] [Created] (ARROW-5196) [CPP] Uniform usage of Google cpu_features library accross the codebase

2019-04-22 Thread Areg Melik-Adamyan (JIRA)
Areg Melik-Adamyan created ARROW-5196: - Summary: [CPP] Uniform usage of Google cpu_features library accross the codebase Key: ARROW-5196 URL: https://issues.apache.org/jira/browse/ARROW-5196

Re: [pyarrow] Parquet page header size limit

2019-04-22 Thread Wes McKinney
hi Shyam, Well "Invalid data. Deserializing page header failed." is not a very good error message. Can you open a JIRA issue and provide a way to reproduce the problem (e.g. code to generate a file, or a sample file)? From what you say it seems to be an atypical usage of Parquet, but there might

[jira] [Created] (ARROW-5198) [Java] Add hasNull flag to Vectors

2019-04-22 Thread Yurui Zhou (JIRA)
Yurui Zhou created ARROW-5198: - Summary: [Java] Add hasNull flag to Vectors Key: ARROW-5198 URL: https://issues.apache.org/jira/browse/ARROW-5198 Project: Apache Arrow Issue Type: Sub-task

[jira] [Created] (ARROW-5199) [Java] Add unsafe access method to ArrowBuf

2019-04-22 Thread Yurui Zhou (JIRA)
Yurui Zhou created ARROW-5199: - Summary: [Java] Add unsafe access method to ArrowBuf Key: ARROW-5199 URL: https://issues.apache.org/jira/browse/ARROW-5199 Project: Apache Arrow Issue Type:

Re: Permission Request

2019-04-22 Thread Wes McKinney
hi Fan, Just added you as a contributor on JIRA. Note this isn't necessary to contribute or open issues, only to assign yourself issues; we often add users to the Contributor to the role after they've already submitted a pull request Thanks On Mon, Apr 22, 2019 at 11:25 PM Fan Liya wrote: > >

Re: Permission Request

2019-04-22 Thread Fan Liya
Hi Wes, Thanks a lot for your kind help. Best, Liya Fan On Tue, Apr 23, 2019 at 12:51 PM Wes McKinney wrote: > hi Fan, > > Just added you as a contributor on JIRA. Note this isn't necessary to > contribute or open issues, only to assign yourself issues; we often > add users to the Contributor

[jira] [Created] (ARROW-5197) [Java] Improving Arrow Vector Reading performance

2019-04-22 Thread Yurui Zhou (JIRA)
Yurui Zhou created ARROW-5197: - Summary: [Java] Improving Arrow Vector Reading performance Key: ARROW-5197 URL: https://issues.apache.org/jira/browse/ARROW-5197 Project: Apache Arrow Issue Type:

Re: [Rust] [DataFusion] Parallel query execution PoC

2019-04-22 Thread Wes McKinney
hi Areg -- there has been discussion about introducing dependency between Rust and C++ libraries and it's unclear to me that it would be a good idea to have a hard dependency from Rust on C++. I am not doing Rust development any time soon but if it's possible for the Rust folks to avoid the build

Permission Request

2019-04-22 Thread Fan Liya
Hi Guys, I want to contribute to Apache Arrow. Would you please give me the permission as a contributor? My JIRA ID is fan_li_ya. Thanks a lot. Best, Liya Fan

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Reynold Xin
"if others think it would be helpful, we can cancel this vote, update the SPIP to clarify exactly what I am proposing, and then restart the vote after we have gotten more agreement on what APIs should be exposed" That'd be very useful. At least I was confused by what the SPIP was about. No

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Tom Graves
Ok, I'm cancelling the vote for now then and we will make some updates to the SPIP to try to clarify. Tom On Monday, April 22, 2019, 1:07:25 PM CDT, Reynold Xin wrote: "if others think it would be helpful, we can cancel this vote, update the SPIP to clarify exactly what I am

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Bobby Evans
Agreed. Tom, could you cancel the vote? On Mon, Apr 22, 2019 at 1:07 PM Reynold Xin wrote: > "if others think it would be helpful, we can cancel this vote, update the > SPIP to clarify exactly what I am proposing, and then restart the vote > after we have gotten more agreement on what APIs

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Bobby Evans
Yes, it is technically possible for the layout to change. No, it is not going to happen. It is already baked into several different official libraries which are widely used, not just for holding and processing the data, but also for transfer of the data between the various implementations.

[jira] [Created] (ARROW-5193) [C++] Linker error with bundled zlib

2019-04-22 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5193: - Summary: [C++] Linker error with bundled zlib Key: ARROW-5193 URL: https://issues.apache.org/jira/browse/ARROW-5193 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-5194) TEST(PlasmaSerialization, GetReply) is failing

2019-04-22 Thread Guillaume Horel (JIRA)
Guillaume Horel created ARROW-5194: -- Summary: TEST(PlasmaSerialization, GetReply) is failing Key: ARROW-5194 URL: https://issues.apache.org/jira/browse/ARROW-5194 Project: Apache Arrow

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Bobby Evans
Xiangrui Meng, I provided some examples in the original discussion thread. https://lists.apache.org/thread.html/f7cdc2cbfb1dafa001422031ff6a3a6dc7b51efc175327b0bbfe620e@%3Cdev.spark.apache.org%3E But the concrete use case that we have is GPU accelerated ETL on Spark. Primarily as data

[jira] [Created] (ARROW-5192) [C++] Bundled gRPC fails building (cannot find c-ares)

2019-04-22 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5192: - Summary: [C++] Bundled gRPC fails building (cannot find c-ares) Key: ARROW-5192 URL: https://issues.apache.org/jira/browse/ARROW-5192 Project: Apache Arrow

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Tom Graves
Based on there is still discussion and Spark Summit is this week, I'm going to extend the vote til Friday the 26th. TomOn Monday, April 22, 2019, 8:44:00 AM CDT, Bobby Evans wrote: Yes, it is technically possible for the layout to change.  No, it is not going to happen.  It is

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Xiangrui Meng
Per Robert's comment on the JIRA, ETL is the main use case for the SPIP. I think the SPIP should list a concrete ETL use case (from POC?) that can benefit from this *public Java/Scala API, *does *vectorization*, and significantly *boosts the performance *even with data conversion overhead. The