Re: What else could be removed in Spark 4?

2023-08-24 Thread Steve Loughran
I would recommend cutting them. + historically they've fixed the version of aws-sdk jar used in spark releases, meaning s3a connector through spark rarely used the same sdk release as that qualified through the hadoop sdk update process, so if there were incompatibilities, it'd be up to the spark

Re: What else could be removed in Spark 4?

2023-08-16 Thread Yang Jie
I would like to know how we should handle the two Kinesis-related modules in Spark 4.0. They have a very low frequency of code updates, and because the corresponding tests are not continuously executed in any GitHub Actions pipeline, so I think they significantly lack quality assurance. On top

Re: What else could be removed in Spark 4?

2023-08-08 Thread Cheng Pan
What do you think about removing HiveContext and even SQLContext? And as an extension of this question, should we re-implement the Hive using DSv2 API in Spark 4? For developers who want to implement a custom DataSource plugin, he/she may want to learn something from the Spark built-in one[1],

Re: What else could be removed in Spark 4?

2023-08-08 Thread Cheng Pan
> Are there old Hive/Hadoop version combos we should just stop supporting? Dropping support for Java 8 means dropping support for Hive lower than 2.0(exclusive)[1]. IsolatedClientLoader is aimed to allow using different Hive jars to communicate with different versions of HMS. AFAIK, the

Re: What else could be removed in Spark 4?

2023-08-07 Thread Wenchen Fan
I think the principle is we should remove things that block us from supporting new things like Java 21, or come with a significant maintenance cost. If there is no benefit to removing deprecated APIs (just to keep the codebase clean?), I'd prefer to leave them there and not bother. On Tue, Aug 8,

Re: What else could be removed in Spark 4?

2023-08-07 Thread Jia Fan
Thanks Sean for open this discussion. 1. I think drop Scala 2.12 is a good option. 2. Personally, I think we should remove most methods that are deprecated since 2.x/1.x unless it can't find a good replacement. There is already a 3.x version as a buffer and I don't think it is good practice

What else could be removed in Spark 4?

2023-08-07 Thread Sean Owen
While we're noodling on the topic, what else might be worth removing in Spark 4? For example, looks like we're finally hitting problems supporting Java 8 through 21 all at once, related to Scala 2.13.x updates. It would be reasonable to require Java 11, or even 17, as a baseline for the