What do you think about removing HiveContext and even SQLContext? And as an extension of this question, should we re-implement the Hive using DSv2 API in Spark 4?
For developers who want to implement a custom DataSource plugin, he/she may want to learn something from the Spark built-in one[1], and Hive is a good candidate. A kind of legacy implementation may confuse the developers. It was discussed/requested in [2][3][4][5] There were some requests for multiple Hive metastores support[6], and I have experienced that users choose Presto/Trino instead of Spark because the former supports multi HMS. BTW, there are known third-party Hive DSv2 implementations[7][8]. [1] https://www.mail-archive.com/dev@spark.apache.org/msg30353.html [2] https://www.mail-archive.com/dev@spark.apache.org/msg25715.html [3] https://issues.apache.org/jira/browse/SPARK-31241 [4] https://issues.apache.org/jira/browse/SPARK-39797 [5] https://issues.apache.org/jira/browse/SPARK-44518 [6] https://www.mail-archive.com/dev@spark.apache.org/msg30228.html [7] https://github.com/permanentstar/spark-sql-dsv2-extension [8] https://github.com/apache/kyuubi/tree/master/extensions/spark/kyuubi-spark-connector-hive Thanks, Cheng Pan > On Aug 8, 2023, at 10:09, Wenchen Fan <cloud0...@gmail.com> wrote: > > I think the principle is we should remove things that block us from > supporting new things like Java 21, or come with a significant maintenance > cost. If there is no benefit to removing deprecated APIs (just to keep the > codebase clean?), I'd prefer to leave them there and not bother. > > On Tue, Aug 8, 2023 at 9:00 AM Jia Fan <fanjiaemi...@qq.com.invalid> wrote: > Thanks Sean for open this discussion. > > 1. I think drop Scala 2.12 is a good option. > > 2. Personally, I think we should remove most methods that are deprecated > since 2.x/1.x unless it can't find a good replacement. There is already a 3.x > version as a buffer and I don't think it is good practice to use the > deprecated method of 2.x on 4.x. > > 3. For Mesos, I think we should remove it from doc first. > ________________________ > > Jia Fan > > > >> 2023年8月8日 05:47,Sean Owen <sro...@gmail.com> 写道: >> >> While we're noodling on the topic, what else might be worth removing in >> Spark 4? >> >> For example, looks like we're finally hitting problems supporting Java 8 >> through 21 all at once, related to Scala 2.13.x updates. It would be >> reasonable to require Java 11, or even 17, as a baseline for the multi-year >> lifecycle of Spark 4. >> >> Dare I ask: drop Scala 2.12? supporting 2.12 / 2.13 / 3.0 might get hard >> otherwise. >> >> There was a good discussion about whether old deprecated methods should be >> removed. They can't be removed at other times, but, doesn't mean they all >> should be. createExternalTable was brought up as a first example. What >> deprecated methods are worth removing? >> >> There's Mesos support, long since deprecated, which seems like something to >> prune. >> >> Are there old Hive/Hadoop version combos we should just stop supporting? > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org