Re: Apache Spark 3.5.0 Expectations (?)

2023-05-31 Thread Dongjoon Hyun
Thank you all for your replies. 1. Thank you, Jia, for those JIRAs. 2. Sounds great for "Scala 2.13 for Spark 4.0". I'll initiate a new thread for that. - "I wonder if it’s safer to do it in Spark 4 (which I believe will be discussed soon)." - "I would make it the default at 4.0, myself."

Re: Apache Spark 3.5.0 Expectations (?)

2023-05-31 Thread Bjørn Jørgensen
@Cheng Pan https://issues.apache.org/jira/browse/HIVE-22126 ons. 31. mai 2023 kl. 03:58 skrev Cheng Pan : > @Bjørn Jørgensen > > I did some investigation on upgrading Guava after Spark drop Hadoop2 > support, but unfortunately, the Hive still depends on it, the worse thing > is, that Guava’s

Re: Apache Spark 3.5.0 Expectations (?)

2023-05-30 Thread Cheng Pan
@Bjørn Jørgensen I did some investigation on upgrading Guava after Spark drop Hadoop2 support, but unfortunately, the Hive still depends on it, the worse thing is, that Guava’s classes are marked as shared in IsolatedClientLoader[1], which means Spark can not upgrade Guava even after upgrading

Re: Apache Spark 3.5.0 Expectations (?)

2023-05-30 Thread Bjørn Jørgensen
@Dongjoon Hyun Thank you. I have two points to discuss. First, we are currently conducting tests with Python versions 3.8 and 3.9. Should we consider replacing 3.9 with 3.11? Secondly, I'd like to know the status of Google Guava. With Hadoop version 2 no longer being utilized, is there any

Re: Apache Spark 3.5.0 Expectations (?)

2023-05-30 Thread Mich Talebzadeh
I don't know whether it is related but Scala 2.12.17 is fine for the Spark 3 family (compile and run) . I spent a day compiling Spark 3.4.0 code against Scala 2.13.8 with maven and was getting all sorts of weird and wonderful errors at runtime. HTH Mich Talebzadeh, Lead Solutions

Re: Apache Spark 3.5.0 Expectations (?)

2023-05-29 Thread Jungtaek Lim
Shall we initiate a new discussion thread for Scala 2.13 by default? While I'm not an expert on this area, it sounds like the change is major and (probably) breaking. It seems to be worth having a separate discussion thread rather than just treat it like one of 25 items. On Tue, May 30, 2023 at

Re: Apache Spark 3.5.0 Expectations (?)

2023-05-29 Thread Sean Owen
It does seem risky; there are still likely libs out there that don't cross compile for 2.13. I would make it the default at 4.0, myself. On Mon, May 29, 2023 at 7:16 PM Hyukjin Kwon wrote: > While I support going forward with a higher version, actually using Scala > 2.13 by default is a big

Re: Apache Spark 3.5.0 Expectations (?)

2023-05-29 Thread Hyukjin Kwon
While I support going forward with a higher version, actually using Scala 2.13 by default is a big deal especially in a way that: - Users would likely download the built-in version assuming that it’s backward binary compatible. - PyPI doesn't allow specifying the Scala version, meaning

Re: Apache Spark 3.5.0 Expectations (?)

2023-05-28 Thread Jia Fan
Thanks Dongjoon! There are some ticket I want to share. SPARK-39420 Support ANALYZE TABLE on v2 tables SPARK-42750 Support INSERT INTO by name SPARK-43521 Support CREATE TABLE LIKE FILE Dongjoon Hyun 于2023年5月29日周一 08:42写道: > Hi, All. > > Apache Spark 3.5.0 is scheduled for August (1st Release

Apache Spark 3.5.0 Expectations (?)

2023-05-28 Thread Dongjoon Hyun
Hi, All. Apache Spark 3.5.0 is scheduled for August (1st Release Candidate) and currently a few notable things are under discussions in the mailing list. I believe it's a good time to share a short summary list (containing both completed and in-progress items) to give a highlight in advance and