People do use it, and the maintenance cost is pretty low so I don't think we should just drop it. We can be explicit about there are not a lot of developments going on and we are unlikely to add a lot of new features to it, and users are also welcome to use other JDBC/ODBC endpoint implementations built by the ecosystem, so the Spark project itself is not pressured to continue adding a lot of features.
On Fri, Oct 26, 2018 at 10:48 AM Sean Owen <sro...@gmail.com> wrote: > Maybe that's what I really mean (you can tell I don't follow the Hive part > closely) > In my travels, indeed the thrift server has been viewed as an older > solution to a problem probably better met by others. > From my perspective it's worth dropping, but, that's just anecdotal. > Any other arguments for or against the thrift server? > > On Fri, Oct 26, 2018 at 12:30 PM Marco Gaido <marcogaid...@gmail.com> > wrote: > >> Hi all, >> >> one big problem about getting rid of the Hive fork is the thriftserver, >> which relies on the HiveServer from the Hive fork. >> We might migrate to an apache/hive dependency, but not sure this would >> help that much. >> I think a broader topic would be the actual opportunity of having a >> thriftserver directly into Spark. It has many well-known limitations (not >> fault tolerant, no security/impersonation, etc.etc.) and there are other >> project which target to provide a thrift/JDBC interface to Spark. Just to >> be clear I am not proposing to remove the thriftserver in 3.0, but maybe it >> is something we could evaluate in the long term. >> >> Thanks, >> Marco >> >> >> Il giorno ven 26 ott 2018 alle ore 19:07 Sean Owen <sro...@gmail.com> ha >> scritto: >> >>> OK let's keep this about Hive. >>> >>> Right, good point, this is really about supporting metastore versions, >>> and there is a good argument for retaining backwards-compatibility with >>> older metastores. I don't know how far, but I guess, as far as is practical? >>> >>> Isn't there still a lot of Hive 0.x test code? is that something that's >>> safe to drop for 3.0? >>> >>> And, basically, what must we do to get rid of the Hive fork? that seems >>> like a must-do. >>> >>> >>> >>> On Fri, Oct 26, 2018 at 11:51 AM Dongjoon Hyun <dongjoon.h...@gmail.com> >>> wrote: >>> >>>> Hi, Sean and All. >>>> >>>> For the first question, we support only Hive Metastore from 1.x ~ 2.x. >>>> And, we can support Hive Metastore 3.0 simultaneously. Spark is designed >>>> like that. >>>> >>>> I don't think we need to drop old Hive Metastore Support. Is it >>>> for avoiding Hive Metastore sharing between Spark2 and Spark3 clusters? >>>> >>>> I think we should allow that use cases, especially for new Spark 3 >>>> clusters. How do you think so? >>>> >>>> >>>> For the second question, Apache Spark 2.x doesn't support Hive >>>> officially. It's only a best-effort approach in a boundary of Spark. >>>> >>>> >>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#unsupported-hive-functionality >>>> >>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#incompatible-hive-udf >>>> >>>> >>>> Not only the documented one, decimal literal(HIVE-17186) makes a query >>>> result difference even in the well-known benchmark like TPC-H. >>>> >>>> Bests, >>>> Dongjoon. >>>> >>>> PS. For Hadoop, let's have another thread if needed. I expect another >>>> long story. :) >>>> >>>> >>>> On Fri, Oct 26, 2018 at 7:11 AM Sean Owen <sro...@gmail.com> wrote: >>>> >>>>> Here's another thread to start considering, and I know it's been >>>>> raised before. >>>>> What version(s) of Hive should Spark 3 support? >>>>> >>>>> If at least we know it won't include Hive 0.x, could we go ahead and >>>>> remove those tests from master? It might significantly reduce the run time >>>>> and flakiness. >>>>> >>>>> It seems that maintaining even the Hive 1.x fork is untenable going >>>>> forward, right? does that also imply this support is almost certainly not >>>>> maintained in 3.0? >>>>> >>>>> Per below, it seems like it might even be hard to both support Hive 3 >>>>> and Hadoop 2 at the same time? >>>>> >>>>> And while we're at it, what's the + and - for simply only supporting >>>>> Hadoop 3 in Spark 3? Is the difference in client / HDFS API even that big? >>>>> Or what about focusing only on Hadoop 2.9.x support + 3.x support? >>>>> >>>>> Lots of questions, just interested now in informal reactions, not a >>>>> binding decision. >>>>> >>>>> On Thu, Oct 25, 2018 at 11:49 PM Dagang Wei <notificati...@github.com> >>>>> wrote: >>>>> >>>>>> Do we really want to switch to Hive 2.3? From this page >>>>>> https://hive.apache.org/downloads.html, Hive 2.3 works with Hadoop >>>>>> 2.x (Hive 3.x works with Hadoop 3.x). >>>>>> >>>>>> — >>>>>> You are receiving this because you were mentioned. >>>>>> Reply to this email directly, view it on GitHub >>>>>> <https://github.com/apache/spark/pull/21588#issuecomment-433285287>, >>>>>> or mute the thread >>>>>> <https://github.com/notifications/unsubscribe-auth/AAyM-sRygel3il6Ne4FafD5BQ7NDSJ7Mks5uopRlgaJpZM4Usweh> >>>>>> . >>>>>> >>>>>