I would recommend cutting them.
+ historically they've fixed the version of aws-sdk jar used in spark
releases, meaning s3a connector through spark rarely used the same sdk
release as that qualified through the hadoop sdk update process, so if
there were incompatibilities, it'd be up to the spark
I would like to know how we should handle the two Kinesis-related modules in
Spark 4.0. They have a very low frequency of code updates, and because the
corresponding tests are not continuously executed in any GitHub Actions
pipeline, so I think they significantly lack quality assurance. On top
What do you think about removing HiveContext and even SQLContext?
And as an extension of this question, should we re-implement the Hive using
DSv2 API in Spark 4?
For developers who want to implement a custom DataSource plugin, he/she may
want to learn something from the Spark built-in one[1],
> Are there old Hive/Hadoop version combos we should just stop supporting?
Dropping support for Java 8 means dropping support for Hive lower than
2.0(exclusive)[1].
IsolatedClientLoader is aimed to allow using different Hive jars to communicate
with different versions of HMS. AFAIK, the
I think the principle is we should remove things that block us from
supporting new things like Java 21, or come with a significant
maintenance cost. If there is no benefit to removing deprecated APIs (just
to keep the codebase clean?), I'd prefer to leave them there and not bother.
On Tue, Aug 8,
Thanks Sean for open this discussion.
1. I think drop Scala 2.12 is a good option.
2. Personally, I think we should remove most methods that are deprecated since
2.x/1.x unless it can't find a good replacement. There is already a 3.x version
as a buffer and I don't think it is good practice
While we're noodling on the topic, what else might be worth removing in
Spark 4?
For example, looks like we're finally hitting problems supporting Java 8
through 21 all at once, related to Scala 2.13.x updates. It would be
reasonable to require Java 11, or even 17, as a baseline for the