Re: Drop support for old Hive in Spark 3.0?

Reynold Xin Fri, 26 Oct 2018 10:52:10 -0700

People do use it, and the maintenance cost is pretty low so I don't think
we should just drop it. We can be explicit about there are not a lot of
developments going on and we are unlikely to add a lot of new features to
it, and users are also welcome to use other JDBC/ODBC endpoint
implementations built by the ecosystem, so the Spark project itself is not
pressured to continue adding a lot of features.



On Fri, Oct 26, 2018 at 10:48 AM Sean Owen <sro...@gmail.com> wrote:

> Maybe that's what I really mean (you can tell I don't follow the Hive part
> closely)
> In my travels, indeed the thrift server has been viewed as an older
> solution to a problem probably better met by others.
> From my perspective it's worth dropping, but, that's just anecdotal.
> Any other arguments for or against the thrift server?
>
> On Fri, Oct 26, 2018 at 12:30 PM Marco Gaido <marcogaid...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> one big problem about getting rid of the Hive fork is the thriftserver,
>> which relies on the HiveServer from the Hive fork.
>> We might migrate to an apache/hive dependency, but not sure this would
>> help that much.
>> I think a broader topic would be the actual opportunity of having a
>> thriftserver directly into Spark. It has many well-known limitations (not
>> fault tolerant, no security/impersonation, etc.etc.) and there are other
>> project which target to provide a thrift/JDBC interface to Spark. Just to
>> be clear I am not proposing to remove the thriftserver in 3.0, but maybe it
>> is something we could evaluate in the long term.
>>
>> Thanks,
>> Marco
>>
>>
>> Il giorno ven 26 ott 2018 alle ore 19:07 Sean Owen <sro...@gmail.com> ha
>> scritto:
>>
>>> OK let's keep this about Hive.
>>>
>>> Right, good point, this is really about supporting metastore versions,
>>> and there is a good argument for retaining backwards-compatibility with
>>> older metastores. I don't know how far, but I guess, as far as is practical?
>>>
>>> Isn't there still a lot of Hive 0.x test code? is that something that's
>>> safe to drop for 3.0?
>>>
>>> And, basically, what must we do to get rid of the Hive fork? that seems
>>> like a must-do.
>>>
>>>
>>>
>>> On Fri, Oct 26, 2018 at 11:51 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
>>> wrote:
>>>
>>>> Hi, Sean and All.
>>>>
>>>> For the first question, we support only Hive Metastore from 1.x ~ 2.x.
>>>> And, we can support Hive Metastore 3.0 simultaneously. Spark is designed
>>>> like that.
>>>>
>>>> I don't think we need to drop old Hive Metastore Support. Is it
>>>> for avoiding Hive Metastore sharing between Spark2 and Spark3 clusters?
>>>>
>>>> I think we should allow that use cases, especially for new Spark 3
>>>> clusters. How do you think so?
>>>>
>>>>
>>>> For the second question, Apache Spark 2.x doesn't support Hive
>>>> officially. It's only a best-effort approach in a boundary of Spark.
>>>>
>>>>
>>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#unsupported-hive-functionality
>>>>
>>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#incompatible-hive-udf
>>>>
>>>>
>>>> Not only the documented one, decimal literal(HIVE-17186) makes a query
>>>> result difference even in the well-known benchmark like TPC-H.
>>>>
>>>> Bests,
>>>> Dongjoon.
>>>>
>>>> PS. For Hadoop, let's have another thread if needed. I expect another
>>>> long story. :)
>>>>
>>>>
>>>> On Fri, Oct 26, 2018 at 7:11 AM Sean Owen <sro...@gmail.com> wrote:
>>>>
>>>>> Here's another thread to start considering, and I know it's been
>>>>> raised before.
>>>>> What version(s) of Hive should Spark 3 support?
>>>>>
>>>>> If at least we know it won't include Hive 0.x, could we go ahead and
>>>>> remove those tests from master? It might significantly reduce the run time
>>>>> and flakiness.
>>>>>
>>>>> It seems that maintaining even the Hive 1.x fork is untenable going
>>>>> forward, right? does that also imply this support is almost certainly not
>>>>> maintained in 3.0?
>>>>>
>>>>> Per below, it seems like it might even be hard to both support Hive 3
>>>>> and Hadoop 2 at the same time?
>>>>>
>>>>> And while we're at it, what's the + and - for simply only supporting
>>>>> Hadoop 3 in Spark 3? Is the difference in client / HDFS API even that big?
>>>>> Or what about focusing only on Hadoop 2.9.x support + 3.x support?
>>>>>
>>>>> Lots of questions, just interested now in informal reactions, not a
>>>>> binding decision.
>>>>>
>>>>> On Thu, Oct 25, 2018 at 11:49 PM Dagang Wei <notificati...@github.com>
>>>>> wrote:
>>>>>
>>>>>> Do we really want to switch to Hive 2.3? From this page
>>>>>> https://hive.apache.org/downloads.html, Hive 2.3 works with Hadoop
>>>>>> 2.x (Hive 3.x works with Hadoop 3.x).
>>>>>>
>>>>>> —
>>>>>> You are receiving this because you were mentioned.
>>>>>> Reply to this email directly, view it on GitHub
>>>>>> <https://github.com/apache/spark/pull/21588#issuecomment-433285287>,
>>>>>> or mute the thread
>>>>>> <https://github.com/notifications/unsubscribe-auth/AAyM-sRygel3il6Ne4FafD5BQ7NDSJ7Mks5uopRlgaJpZM4Usweh>
>>>>>> .
>>>>>>
>>>>>

Re: Drop support for old Hive in Spark 3.0?

Reply via email to