Re: Drop support for old Hive in Spark 3.0?

Sean Owen Fri, 26 Oct 2018 10:48:18 -0700

Maybe that's what I really mean (you can tell I don't follow the Hive part
closely)
In my travels, indeed the thrift server has been viewed as an older
solution to a problem probably better met by others.
>From my perspective it's worth dropping, but, that's just anecdotal.
Any other arguments for or against the thrift server?


On Fri, Oct 26, 2018 at 12:30 PM Marco Gaido <marcogaid...@gmail.com> wrote:

> Hi all,
>
> one big problem about getting rid of the Hive fork is the thriftserver,
> which relies on the HiveServer from the Hive fork.
> We might migrate to an apache/hive dependency, but not sure this would
> help that much.
> I think a broader topic would be the actual opportunity of having a
> thriftserver directly into Spark. It has many well-known limitations (not
> fault tolerant, no security/impersonation, etc.etc.) and there are other
> project which target to provide a thrift/JDBC interface to Spark. Just to
> be clear I am not proposing to remove the thriftserver in 3.0, but maybe it
> is something we could evaluate in the long term.
>
> Thanks,
> Marco
>
>
> Il giorno ven 26 ott 2018 alle ore 19:07 Sean Owen <sro...@gmail.com> ha
> scritto:
>
>> OK let's keep this about Hive.
>>
>> Right, good point, this is really about supporting metastore versions,
>> and there is a good argument for retaining backwards-compatibility with
>> older metastores. I don't know how far, but I guess, as far as is practical?
>>
>> Isn't there still a lot of Hive 0.x test code? is that something that's
>> safe to drop for 3.0?
>>
>> And, basically, what must we do to get rid of the Hive fork? that seems
>> like a must-do.
>>
>>
>>
>> On Fri, Oct 26, 2018 at 11:51 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
>> wrote:
>>
>>> Hi, Sean and All.
>>>
>>> For the first question, we support only Hive Metastore from 1.x ~ 2.x.
>>> And, we can support Hive Metastore 3.0 simultaneously. Spark is designed
>>> like that.
>>>
>>> I don't think we need to drop old Hive Metastore Support. Is it
>>> for avoiding Hive Metastore sharing between Spark2 and Spark3 clusters?
>>>
>>> I think we should allow that use cases, especially for new Spark 3
>>> clusters. How do you think so?
>>>
>>>
>>> For the second question, Apache Spark 2.x doesn't support Hive
>>> officially. It's only a best-effort approach in a boundary of Spark.
>>>
>>>
>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#unsupported-hive-functionality
>>>
>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#incompatible-hive-udf
>>>
>>>
>>> Not only the documented one, decimal literal(HIVE-17186) makes a query
>>> result difference even in the well-known benchmark like TPC-H.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>> PS. For Hadoop, let's have another thread if needed. I expect another
>>> long story. :)
>>>
>>>
>>> On Fri, Oct 26, 2018 at 7:11 AM Sean Owen <sro...@gmail.com> wrote:
>>>
>>>> Here's another thread to start considering, and I know it's been raised
>>>> before.
>>>> What version(s) of Hive should Spark 3 support?
>>>>
>>>> If at least we know it won't include Hive 0.x, could we go ahead and
>>>> remove those tests from master? It might significantly reduce the run time
>>>> and flakiness.
>>>>
>>>> It seems that maintaining even the Hive 1.x fork is untenable going
>>>> forward, right? does that also imply this support is almost certainly not
>>>> maintained in 3.0?
>>>>
>>>> Per below, it seems like it might even be hard to both support Hive 3
>>>> and Hadoop 2 at the same time?
>>>>
>>>> And while we're at it, what's the + and - for simply only supporting
>>>> Hadoop 3 in Spark 3? Is the difference in client / HDFS API even that big?
>>>> Or what about focusing only on Hadoop 2.9.x support + 3.x support?
>>>>
>>>> Lots of questions, just interested now in informal reactions, not a
>>>> binding decision.
>>>>
>>>> On Thu, Oct 25, 2018 at 11:49 PM Dagang Wei <notificati...@github.com>
>>>> wrote:
>>>>
>>>>> Do we really want to switch to Hive 2.3? From this page
>>>>> https://hive.apache.org/downloads.html, Hive 2.3 works with Hadoop
>>>>> 2.x (Hive 3.x works with Hadoop 3.x).
>>>>>
>>>>> —
>>>>> You are receiving this because you were mentioned.
>>>>> Reply to this email directly, view it on GitHub
>>>>> <https://github.com/apache/spark/pull/21588#issuecomment-433285287>,
>>>>> or mute the thread
>>>>> <https://github.com/notifications/unsubscribe-auth/AAyM-sRygel3il6Ne4FafD5BQ7NDSJ7Mks5uopRlgaJpZM4Usweh>
>>>>> .
>>>>>
>>>>

Re: Drop support for old Hive in Spark 3.0?

Reply via email to