Maybe that's what I really mean (you can tell I don't follow the Hive part closely) In my travels, indeed the thrift server has been viewed as an older solution to a problem probably better met by others. >From my perspective it's worth dropping, but, that's just anecdotal. Any other arguments for or against the thrift server?
On Fri, Oct 26, 2018 at 12:30 PM Marco Gaido <marcogaid...@gmail.com> wrote: > Hi all, > > one big problem about getting rid of the Hive fork is the thriftserver, > which relies on the HiveServer from the Hive fork. > We might migrate to an apache/hive dependency, but not sure this would > help that much. > I think a broader topic would be the actual opportunity of having a > thriftserver directly into Spark. It has many well-known limitations (not > fault tolerant, no security/impersonation, etc.etc.) and there are other > project which target to provide a thrift/JDBC interface to Spark. Just to > be clear I am not proposing to remove the thriftserver in 3.0, but maybe it > is something we could evaluate in the long term. > > Thanks, > Marco > > > Il giorno ven 26 ott 2018 alle ore 19:07 Sean Owen <sro...@gmail.com> ha > scritto: > >> OK let's keep this about Hive. >> >> Right, good point, this is really about supporting metastore versions, >> and there is a good argument for retaining backwards-compatibility with >> older metastores. I don't know how far, but I guess, as far as is practical? >> >> Isn't there still a lot of Hive 0.x test code? is that something that's >> safe to drop for 3.0? >> >> And, basically, what must we do to get rid of the Hive fork? that seems >> like a must-do. >> >> >> >> On Fri, Oct 26, 2018 at 11:51 AM Dongjoon Hyun <dongjoon.h...@gmail.com> >> wrote: >> >>> Hi, Sean and All. >>> >>> For the first question, we support only Hive Metastore from 1.x ~ 2.x. >>> And, we can support Hive Metastore 3.0 simultaneously. Spark is designed >>> like that. >>> >>> I don't think we need to drop old Hive Metastore Support. Is it >>> for avoiding Hive Metastore sharing between Spark2 and Spark3 clusters? >>> >>> I think we should allow that use cases, especially for new Spark 3 >>> clusters. How do you think so? >>> >>> >>> For the second question, Apache Spark 2.x doesn't support Hive >>> officially. It's only a best-effort approach in a boundary of Spark. >>> >>> >>> http://spark.apache.org/docs/latest/sql-programming-guide.html#unsupported-hive-functionality >>> >>> http://spark.apache.org/docs/latest/sql-programming-guide.html#incompatible-hive-udf >>> >>> >>> Not only the documented one, decimal literal(HIVE-17186) makes a query >>> result difference even in the well-known benchmark like TPC-H. >>> >>> Bests, >>> Dongjoon. >>> >>> PS. For Hadoop, let's have another thread if needed. I expect another >>> long story. :) >>> >>> >>> On Fri, Oct 26, 2018 at 7:11 AM Sean Owen <sro...@gmail.com> wrote: >>> >>>> Here's another thread to start considering, and I know it's been raised >>>> before. >>>> What version(s) of Hive should Spark 3 support? >>>> >>>> If at least we know it won't include Hive 0.x, could we go ahead and >>>> remove those tests from master? It might significantly reduce the run time >>>> and flakiness. >>>> >>>> It seems that maintaining even the Hive 1.x fork is untenable going >>>> forward, right? does that also imply this support is almost certainly not >>>> maintained in 3.0? >>>> >>>> Per below, it seems like it might even be hard to both support Hive 3 >>>> and Hadoop 2 at the same time? >>>> >>>> And while we're at it, what's the + and - for simply only supporting >>>> Hadoop 3 in Spark 3? Is the difference in client / HDFS API even that big? >>>> Or what about focusing only on Hadoop 2.9.x support + 3.x support? >>>> >>>> Lots of questions, just interested now in informal reactions, not a >>>> binding decision. >>>> >>>> On Thu, Oct 25, 2018 at 11:49 PM Dagang Wei <notificati...@github.com> >>>> wrote: >>>> >>>>> Do we really want to switch to Hive 2.3? From this page >>>>> https://hive.apache.org/downloads.html, Hive 2.3 works with Hadoop >>>>> 2.x (Hive 3.x works with Hadoop 3.x). >>>>> >>>>> — >>>>> You are receiving this because you were mentioned. >>>>> Reply to this email directly, view it on GitHub >>>>> <https://github.com/apache/spark/pull/21588#issuecomment-433285287>, >>>>> or mute the thread >>>>> <https://github.com/notifications/unsubscribe-auth/AAyM-sRygel3il6Ne4FafD5BQ7NDSJ7Mks5uopRlgaJpZM4Usweh> >>>>> . >>>>> >>>>