Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2020-06-23 Thread Xiao Li
I think we just need to provide two options and let end users choose the ones they need. Hadoop 3.2 or Hadoop 2.7. Thus, SPARK-32017 (Make Pyspark Hadoop 3.2+ Variant available in PyPI) is a high priority task for Spark 3.1 release to me. I do not know how to track the popularity of Hadoop 2 vs

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2020-06-23 Thread Dongjoon Hyun
I fully understand your concern, but we cannot live with Hadoop 2.7.4 forever, Xiao. Like Hadoop 2.6, we should let it go. So, are you saying that CRAN/PyPy should have all combination of Apache Spark including Hive 1.2 distribution? What is your suggestion as a PMC on Hadoop 3.2 migration path?

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2020-06-23 Thread Sean Owen
So, we also release Spark binary distros with Hadoop 2.7, 3.2, and no Hadoop -- all of the options. Picking one profile or the other to release with pypi etc isn't more or less consistent with those releases, as all exist. Is this change only about the source code default, with no effect on

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-23 Thread Hyukjin Kwon
+1. Just as a note, - SPARK-31918 is fixed now, and there's no blocker. - When we build SparkR, we should use the latest R version at least 4.0.0+. 2020년 6월 24일 (수) 오전 11:20, Dongjoon Hyun 님이 작성: > +1 > > Bests, > Dongjoon. > > On Tue, Jun 23,

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2020-06-23 Thread Xiao Li
Then, it will be a little complex after this PR. It might make the community more confused. In PYPI and CRAN, we are using Hadoop 2.7 as the default profile; however, in the other distributions, we are using Hadoop 3.2 as the default? How to explain this to the community? I would not change the

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-23 Thread Dongjoon Hyun
+1 Bests, Dongjoon. On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim wrote: > +1 on a 3.0.1 soon. > > Probably it would be nice if some Scala experts can take a look at > https://issues.apache.org/jira/browse/SPARK-32051 and include the fix > into 3.0.1 if possible. > Looks like APIs designed to

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2020-06-23 Thread Dongjoon Hyun
Thanks. Uploading PySpark to PyPI is a simple manual step and our release script is able to build PySpark with Hadoop 2.7 still if we want. So, `No` for the following question. I updated my PR according to your comment. > If we change the default, will it impact them? If YES,... >From the

m2 cache issues in Jenkins?

2020-06-23 Thread Holden Karau
Hi Folks, I've been see some weird failures on Jenkins and it looks like it might be from the m2 cache. Would it be OK to clean it out? Or is it important? Cheers, Holden -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-23 Thread Jungtaek Lim
+1 on a 3.0.1 soon. Probably it would be nice if some Scala experts can take a look at https://issues.apache.org/jira/browse/SPARK-32051 and include the fix into 3.0.1 if possible. Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in Scala 2.12 & Java. On Wed, Jun 24, 2020

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-23 Thread Jules Damji
+1 (non-binding) Sent from my iPhone Pardon the dumb thumb typos :) > On Jun 23, 2020, at 11:36 AM, Holden Karau wrote: > >  > +1 on a patch release soon > >> On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin wrote: >> +1 on doing a new patch release soon. I saw some of these issues when >>

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-23 Thread Holden Karau
+1 on a patch release soon On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin wrote: > +1 on doing a new patch release soon. I saw some of these issues when > preparing the 3.0 release, and some of them are very serious. > > > On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman < >

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-23 Thread Reynold Xin
+1 on doing a new patch release soon. I saw some of these issues when preparing the 3.0 release, and some of them are very serious. On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu > wrote: > > > > +1 Thanks Yuanjian -- I think it'll be great to have a

Unsubscribe

2020-06-23 Thread Ankit Sinha
Unsubscribe -- Ankit

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-23 Thread Shivaram Venkataraman
+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon. Shivaram On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro wrote: > > Thanks for the heads-up, Yuanjian! > > > I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. > wow, the updates are so quick. Anyway,

Re: Unsubscribe

2020-06-23 Thread Rohit Mishra
I think you have sent this request to wrong email address. I don’t want to unsubscribe from Dev mail list. I don’t remember I have send any such request. Regards, Rohit Mishra On Tue, 23 Jun 2020 at 7:39 PM, Jeff Evans wrote: > That is not how you unsubscribe. See here: >

Re: Unsubscribe

2020-06-23 Thread Jeff Evans
That is not how you unsubscribe. See here: https://gist.github.com/jeff303/ba1906bb7bcb2f2501528a8bb1521b8e On Tue, Jun 23, 2020 at 5:02 AM Kiran Kumar Dusi wrote: > Unsubscribe > > On Tue, 23 Jun 2020 at 15:18 Akhil Anil wrote: > >> -- >> Sent from Gmail Mobile >> >

Re: Unsubscribe

2020-06-23 Thread Kiran Kumar Dusi
Unsubscribe On Tue, 23 Jun 2020 at 15:18 Akhil Anil wrote: > -- > Sent from Gmail Mobile >

Unsubscribe

2020-06-23 Thread Akhil Anil
-- Sent from Gmail Mobile

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-23 Thread Takeshi Yamamuro
Thanks for the heads-up, Yuanjian! > I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. wow, the updates are so quick. Anyway, +1 for the release. Bests, Takeshi On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li wrote: > Hi dev-list, > > I’m writing this to raise the discussion

[DISCUSS] Apache Spark 3.0.1 Release

2020-06-23 Thread Yuanjian Li
Hi dev-list, I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0: 1. [SPARK-31990] The state store compatibility broken will cause a correctness issue when

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2020-06-23 Thread Xiao Li
Our monthly pypi downloads of PySpark have reached 5.4 million. We should avoid forcing the current PySpark users to upgrade their Hadoop versions. If we change the default, will it impact them? If YES, I think we should not do it until it is ready and they have a workaround. So far, our pypi

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2020-06-23 Thread Dongjoon Hyun
Hi, All. I bump up this thread again with the title "Use Hadoop-3.2 as a default Hadoop profile in 3.1.0?" There exists some recent discussion on the following PR. Please let us know your thoughts. https://github.com/apache/spark/pull/28897 Bests, Dongjoon. On Fri, Nov 1, 2019 at 9:41 AM