Re: Apache Spark 3.5.0 Expectations (?)

Cheng Pan Tue, 30 May 2023 18:58:40 -0700

@Bjørn Jørgensen

I did some investigation on upgrading Guava after Spark drop Hadoop2 support, 
but unfortunately, the Hive still depends on it, the worse thing is, that 
Guava’s classes are marked as shared in IsolatedClientLoader[1], which means 
Spark can not upgrade Guava even after upgrading the built-in Hive from current 
2.3.9 to a new version which does not stick on an old Guava, to avoid breaking 
the old version of Hive Metastore client.


I don't find clues why Guava classes need to be marked as shared, can anyone 
bring some background?

[1] 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L215

Thanks,
Cheng Pan


> On May 31, 2023, at 03:49, Bjørn Jørgensen <bjornjorgen...@gmail.com> wrote:
> 
> @Dongjoon Hyun Thank you.
> 
> I have two points to discuss. 
> First, we are currently conducting tests with Python versions 3.8 and 3.9. 
> Should we consider replacing 3.9 with 3.11?
> 
> Secondly, I'd like to know the status of Google Guava. 
> With Hadoop version 2 no longer being utilized, is there any other factor 
> that is posing a blockage for this?
> 
> tir. 30. mai 2023 kl. 10:39 skrev Mich Talebzadeh <mich.talebza...@gmail.com>:
> I don't know whether it is related but Scala 2.12.17 is fine for the Spark 3 
> family (compile and run) . I spent a day compiling  Spark 3.4.0 code against 
> Scala 2.13.8 with maven and was getting all sorts of weird and wonderful 
> errors at runtime.
> 
> HTH
> 
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead
> Palantir Technologies Limited
> London
> United Kingdom
> 
>    view my Linkedin profile
> 
>  https://en.everybodywiki.com/Mich_Talebzadeh
>  Disclaimer: Use it at your own risk. Any and all responsibility for any 
> loss, damage or destruction of data or any other property which may arise 
> from relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction. 
>   
> 
> On Tue, 30 May 2023 at 01:59, Jungtaek Lim <kabhwan.opensou...@gmail.com> 
> wrote:
> Shall we initiate a new discussion thread for Scala 2.13 by default? While 
> I'm not an expert on this area, it sounds like the change is major and 
> (probably) breaking. It seems to be worth having a separate discussion thread 
> rather than just treat it like one of 25 items.
> 
> On Tue, May 30, 2023 at 9:54 AM Sean Owen <sro...@gmail.com> wrote:
> It does seem risky; there are still likely libs out there that don't cross 
> compile for 2.13. I would make it the default at 4.0, myself.
> 
> On Mon, May 29, 2023 at 7:16 PM Hyukjin Kwon <gurwls...@apache.org> wrote:
> While I support going forward with a higher version, actually using Scala 
> 2.13 by default is a big deal especially in a way that:
>     • Users would likely download the built-in version assuming that it’s 
> backward binary compatible.
>     • PyPI doesn't allow specifying the Scala version, meaning that users 
> wouldn’t have a way to 'pip install pyspark' based on Scala 2.12.
> I wonder if it’s safer to do it in Spark 4 (which I believe will be discussed 
> soon).
> 
> 
> On Mon, 29 May 2023 at 13:21, Jia Fan <fan...@apache.org> wrote:
> Thanks Dongjoon!
> There are some ticket I want to share.
> SPARK-39420 Support ANALYZE TABLE on v2 tables
> SPARK-42750 Support INSERT INTO by name
> SPARK-43521 Support CREATE TABLE LIKE FILE
> 
> Dongjoon Hyun <dongj...@apache.org> 于2023年5月29日周一 08:42写道：
> Hi, All.
> 
> Apache Spark 3.5.0 is scheduled for August (1st Release Candidate) and 
> currently a few notable things are under discussions in the mailing list.
> 
> I believe it's a good time to share a short summary list (containing both 
> completed and in-progress items) to give a highlight in advance and to 
> collect your targets too.
> 
> Please share your expectations or working items if you want to prioritize 
> them more in the community in Apache Spark 3.5.0 timeframe.
> 
> (Sorted by ID)
> SPARK-40497 Upgrade Scala 2.13.11
> SPARK-42452 Remove hadoop-2 profile from Apache Spark 3.5.0
> SPARK-42913 Upgrade to Hadoop 3.3.5 (aws-java-sdk-bundle: 1.12.262 -> 
> 1.12.316)
> SPARK-43024 Upgrade Pandas to 2.0.0
> SPARK-43200 Remove Hadoop 2 reference in docs
> SPARK-43347 Remove Python 3.7 Support
> SPARK-43348 Support Python 3.8 in PyPy3
> SPARK-43351 Add Spark Connect Go prototype code and example
> SPARK-43379 Deprecate old Java 8 versions prior to 8u371
> SPARK-43394 Upgrade to Maven 3.8.8
> SPARK-43436 Upgrade to RocksDbjni 8.1.1.1
> SPARK-43446 Upgrade to Apache Arrow 12.0.0
> SPARK-43447 Support R 4.3.0
> SPARK-43489 Remove protobuf 2.5.0
> SPARK-43519 Bump Parquet to 1.13.1
> SPARK-43581 Upgrade kubernetes-client to 6.6.2
> SPARK-43588 Upgrade to ASM 9.5
> SPARK-43600 Update K8s doc to recommend K8s 1.24+
> SPARK-43738 Upgrade to DropWizard Metrics 4.2.18
> SPARK-43831 Build and Run Spark on Java 21
> SPARK-43832 Upgrade to Scala 2.12.18
> SPARK-43836 Make Scala 2.13 as default in Spark 3.5
> SPARK-43842 Upgrade gcs-connector to 2.2.14
> SPARK-43844 Update to ORC 1.9.0
> UMBRELLA: Add SQL functions into Scala, Python and R API
> 
> Thanks,
> Dongjoon.
> 
> PS. The above is not a list of release blockers. Instead, it could be a 
> nice-to-have from someone's perspective.
> 
> 
> -- 
> Bjørn Jørgensen 
> Vestre Aspehaug 4, 6010 Ålesund 
> Norge
> 
> +47 480 94 297


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Apache Spark 3.5.0 Expectations (?)

Reply via email to