Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-18 Thread Hyukjin Kwon
I struggled hard to deal with this issue multiple times over a year and thankfully we finally decided to use the official version of Hive 2.3.x too (thank you, Yuming, Alan, and guys) I think this is already a huge progress that we started to use the official version of Hive. I think we should at

Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-18 Thread Dongjoon Hyun
Hi, All. First of all, I want to put this as a policy issue instead of a technical issue. Also, this is orthogonal from `hadoop` version discussion. Apache Spark community kept (not maintained) the forked Apache Hive 1.2.1 because there has been no other options before. As we see at SPARK-20202,

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-18 Thread Dongjoon Hyun
I also agree with Steve and Felix. Let's have another thread to discuss Hive issue because this thread was originally for `hadoop` version. And, now we can have `hive-2.3` profile for both `hadoop-2.7` and `hadoop-3.0` versions. We don't need to mix both. Bests, Dongjoon. On Mon, Nov 18,

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-18 Thread Felix Cheung
1000% with Steve, the org.spark-project hive 1.2 will need a solution. It is old and rather buggy; and It’s been *years* I think we should decouple hive change from everything else if people are concerned? From: Steve Loughran Sent: Sunday, November 17, 2019

Re: Adding JIRA ID as the prefix for the test case name

2019-11-18 Thread Hyukjin Kwon
Let me document as below in few days: 1. For Python and Java, write a single comment that starts with JIRA ID and short description, e.g. (SPARK-X: test blah blah) 2. For R, use JIRA ID as a prefix for its test name. assuming everybody is happy. 2019년 11월 18일 (월) 오전 11:36, Hyukjin Kwon 님이

CR for adding bucket join support to V2 Datasources

2019-11-18 Thread Long, Andrew
Hey Friends, I recently created a pull request to add an optional support for bucket joins to V2 Datasources, via a concrete class representing the Spark Style ash Partitioning. If anyone has some free time Id appreciate a code review. This also adds a concrete implementation of V2