What do you think about removing HiveContext and even SQLContext?

And as an extension of this question, should we re-implement the Hive using 
DSv2 API in Spark 4?

For developers who want to implement a custom DataSource plugin, he/she may 
want to learn something from the Spark built-in one[1], and Hive is a good 
candidate. A kind of legacy implementation may confuse the developers.

It was discussed/requested in [2][3][4][5]

There were some requests for multiple Hive metastores support[6], and I have 
experienced that users choose Presto/Trino instead of Spark because the former 
supports multi HMS.

BTW, there are known third-party Hive DSv2 implementations[7][8].

[1] https://www.mail-archive.com/dev@spark.apache.org/msg30353.html
[2] https://www.mail-archive.com/dev@spark.apache.org/msg25715.html
[3] https://issues.apache.org/jira/browse/SPARK-31241
[4] https://issues.apache.org/jira/browse/SPARK-39797
[5] https://issues.apache.org/jira/browse/SPARK-44518
[6] https://www.mail-archive.com/dev@spark.apache.org/msg30228.html
[7] https://github.com/permanentstar/spark-sql-dsv2-extension
[8] 
https://github.com/apache/kyuubi/tree/master/extensions/spark/kyuubi-spark-connector-hive

Thanks,
Cheng Pan


> On Aug 8, 2023, at 10:09, Wenchen Fan <cloud0...@gmail.com> wrote:
> 
> I think the principle is we should remove things that block us from 
> supporting new things like Java 21, or come with a significant maintenance 
> cost. If there is no benefit to removing deprecated APIs (just to keep the 
> codebase clean?), I'd prefer to leave them there and not bother.
> 
> On Tue, Aug 8, 2023 at 9:00 AM Jia Fan <fanjiaemi...@qq.com.invalid> wrote:
> Thanks Sean  for open this discussion.
> 
> 1. I think drop Scala 2.12 is a good option.
> 
> 2. Personally, I think we should remove most methods that are deprecated 
> since 2.x/1.x unless it can't find a good replacement. There is already a 3.x 
> version as a buffer and I don't think it is good practice to use the 
> deprecated method of 2.x on 4.x.
> 
> 3. For Mesos, I think we should remove it from doc first.
> ________________________
> 
> Jia Fan
> 
> 
> 
>> 2023年8月8日 05:47,Sean Owen <sro...@gmail.com> 写道:
>> 
>> While we're noodling on the topic, what else might be worth removing in 
>> Spark 4?
>> 
>> For example, looks like we're finally hitting problems supporting Java 8 
>> through 21 all at once, related to Scala 2.13.x updates. It would be 
>> reasonable to require Java 11, or even 17, as a baseline for the multi-year 
>> lifecycle of Spark 4.
>> 
>> Dare I ask: drop Scala 2.12? supporting 2.12 / 2.13 / 3.0 might get hard 
>> otherwise.
>> 
>> There was a good discussion about whether old deprecated methods should be 
>> removed. They can't be removed at other times, but, doesn't mean they all 
>> should be. createExternalTable was brought up as a first example. What 
>> deprecated methods are worth removing?
>> 
>> There's Mesos support, long since deprecated, which seems like something to 
>> prune.
>> 
>> Are there old Hive/Hadoop version combos we should just stop supporting?
> 


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to