Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-12 Thread Jakub Wozniak
Hello, Any more thoughts on this one? Will that be let in 2.4.1 or rather not? Thanks in advance, Jakub On 8 Mar 2019, at 11:26, Jakub Wozniak mailto:jakub.wozn...@cern.ch>> wrote: Hi, To me it is backwards compatible with older Hbase versions. The code actually only falls back to the

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-08 Thread Jakub Wozniak
akub On 8 Mar 2019, at 11:15, Jakub Wozniak mailto:jakub.wozn...@cern.ch>> wrote: I guess it is that one: https://github.com/apache/spark/commit/dfed439e33b7bf224dd412b0960402068d961c7b#diff-9ebb59b7b008c694a8f583b94bd24e1d Cheers, Jakub On 7 Mar 2019, at 17:25, Sean Owen mailto:sro...

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-08 Thread Jakub Wozniak
:57 AM Jakub Wozniak mailto:jakub.wozn...@cern.ch>> wrote: Hello, I have a question regarding the 2.4.1 release. It looks like Spark 2.4 (and 2.4.1-rc) is not exactly compatible with Hbase 2.x+ for the Yarn mode. The problem is in the org.apache.spark.deploy.security.HbaseDelegationTokenPr

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-07 Thread Jakub Wozniak
Hello, I have a question regarding the 2.4.1 release. It looks like Spark 2.4 (and 2.4.1-rc) is not exactly compatible with Hbase 2.x+ for the Yarn mode. The problem is in the org.apache.spark.deploy.security.HbaseDelegationTokenProvider class that expects a specific version of TokenUtil

Re: Very slow complex type column reads from parquet

2018-06-15 Thread Jakub Wozniak
you have any recommendation / experience with that? Thanks a lot for your help, Jakub On 14 Jun 2018, at 12:07, Jakub Wozniak mailto:jakub.wozn...@cern.ch>> wrote: Dear Ryan, Thanks a lot for your answer. After having sent the e-mail we have investigated a bit more the data

Re: Very slow complex type column reads from parquet

2018-06-14 Thread Jakub Wozniak
'd be happy to see vectorization for nested Parquet data move forward, but I think you might want to get an idea of how much it will help before you move forward with it. Can you use Impala to test whether vectorization would help here? rb On Mon, Jun 11, 2018 at 6:16 AM, Jakub Wozniak mailto:jak

Very slow complex type column reads from parquet

2018-06-11 Thread Jakub Wozniak
.2.1. Best regards, Jakub Wozniak - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Custom datasource as a wrapper for existing ones?

2018-05-03 Thread Jakub Wozniak
take a look at `FileFormat`, which is the API for the Spark builtin file-based data source like parquet. It's an internal API but has not been changed for a long time. In the future, data source v2 would be the best solution. Thanks, Wenchen On Thu, May 3, 2018 at 4:17 AM, Jakub Wozniak <jakub.

Re: Custom datasource as a wrapper for existing ones?

2018-05-02 Thread Jakub Wozniak
n top of >> Spark but the datasource approach looked like a more elegant solution. Only >> the performance is still far from the desired one. >> >> Any help or direction in that matte