I'm fine as long as we are committed to fixing the shading problems before the release. Ideally I think we should fix the shading problems first and then remove the hive-exec:core jar though (which is I said it's a bit premature to do it now).
On Thu, Nov 18, 2021 at 8:28 AM Stamatis Zampetakis <zabe...@gmail.com> wrote: > Hello, > > I don't see any risk committing this right now in master. It will only > affect the new Hive release when and if it ever goes out. > Till then we have plenty of time to fix shading problems and help other > projects migrate to the "recommended" way to use Hive. > > Moreover, I don't know many projects relying on this kind of "double" (core > vs. fat) publication of dependencies. For Hive, it creates additional > maintenance cost and for its users confusion on what they should use. > If for whatever reason, another project does not want to include everything > coming in the fat jar, maven provides ways to do it. I wouldn't recommend > going down this path but there are alternatives. > > Best, > Stamatis > > On Wed, Nov 17, 2021 at 8:15 PM Zoltan Haindrich <k...@rxd.hu> wrote: > > > > > > > On 11/17/21 7:46 PM, Chao Sun wrote: > > >> We have a working hive-exec jar > > > > > > I'm not sure about this. The issue comes when the fat hive-exec jar > > shades > > > some jars but doesn't relocate them. In this case there is no way for > the > > > downstream projects to resolve the conflict. > > > > Exactly - I think those should be hammered out for good; fix the > > shading/relocation! > > > > > > > > On the Spark side IIUC we had issues with Apache Commons as well as ORC > > > (see HIVE-25317 for an effort on this), and there could be more. Spark > is > > > using Hive 2.3 though but the same applies for master/4.0 if dependency > > > versions differ between Hive and the downstream projects. > > > > This change is only about master - it won't change Hive 2.3. HIVE-25317 > > was for branch-2 as well. > > I've seen a few places wierd stuff because they were not able to use the > > hive-exec jar as-is. > > Folks in the Impala project for example went in a direction to > > re-shade/re-filter the hive-exec jar and relocate some stuff in it - most > > likely because it conflicted with > > their stuff. > > > > > https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml > > Taking a quick look at https://github.com/apache/spark/pull/33989/files > > it seems like you've also done something similar....but instead of using > > the base artifact; you have > > created a new shader. > > I don't think this better than having an artifact which is simply works > > out-of-the-box. > > > > > > cheers, > > Zoltan > > > > > > > > On Wed, Nov 17, 2021 at 10:35 AM Zoltan Haindrich <k...@rxd.hu> wrote: > > > > > >> On 11/17/21 7:07 PM, Daniel Fritsi wrote: > > >>> For Oozie we've decided to use fat Jar downstream (Cloudera) as there > > we > > >> have processes to ensure 3rd-party library versions are kept in sync. > > >>> > > >>> Since we don't have such a process in Apache, there we'll continue to > > >> use the core Jar. > > >> > > >> It might be possible to evade some problems by using a 3rd party lib > > >> syncer - but if we've done a good job shading this stuff; it should > not > > >> cause any trouble even in case > > >> other 3rd party stuff is present....but in any case to check things > out > > >> you will need a Hive release in some form > > >> > > >> cheers, > > >> Zoltan > > >> > > >>> > > >>> Dan > > >>> > > >>> On 2021. 11. 17. 18:50, Chao Sun wrote: > > >>>>> the idea is to fix the issues they bump into - because people who > > load > > >>>> the jdbc driver may also see those issues. > > >>>> > > >>>> I don’t get what you mean here, could you elaborate a bit more? > > >>>> > > >>>> IMO it's a bit premature to do this without a working hive-exec jar > > for > > >>>> downstream projects like Spark/Trino/Presto. At the current state > > there > > >> is > > >>>> no way to upgrade these projects to use the fat hive-exec jar. > > >>>> > > >>>> > > >>>> > > >>>> On Wed, Nov 17, 2021 at 5:47 AM Zoltan Haindrich<k...@rxd.hu> > wrote: > > >>>> > > >>>>> Hey all, > > >>>>> > > >>>>> I wanted to get back to this - but had other things going on. > > >>>>> > > >>>>> Chao> it is still being used today by some other popular projects > > >>>>> the idea is to fix the issues they bump into - because people who > > load > > >> the > > >>>>> jdbc driver may also see those issues. > > >>>>> > > >>>>> Edward> [...] You all must like enjoy shading jars. > > >>>>> I totally agree that they may use a shell action as well. > > >>>>> I wonder how do you propose to solve issues related to clients > using > > a > > >>>>> different version of the guava library? > > >>>>> > > >>>>> The changes which will remove the core artifact stuff is ready: > > >>>>> https://github.com/apache/hive/pull/2648 > > >>>>> > > >>>>> cheers, > > >>>>> Zoltan > > >>>>> > > >>>>> On 9/21/21 8:23 PM, Edward Capriolo wrote: > > >>>>>> recommendation from the Hive team is to use the hive-exec.jar > > >> artifact. > > >>>>>> > > >>>>>> You know about 10 years ago. I mentioned that oozie should just > use > > >>>>>> hive-service or hive jdbc. After a big fight where folks kept > > >> bringing up > > >>>>>> concurrency bugs in hive-server-1 my prs were rejected (even > though > > >> hive > > >>>>>> server2 would not have these bugs). I still cannot fathom why > > someone > > >>>>> using > > >>>>>> oozie would want a fat jar of hive (as opposed to hive server or > > >>>>> hivejdbc) > > >>>>>> . If I had to do that, i would just use shell action..... You all > > must > > >>>>> like > > >>>>>> enjoy shading jars. > > >>>>>> > > >>>>>> Edward > > >>>>>> > > >>>>>> On Thu, Sep 16, 2021 at 2:30 PM Chao Sun<sunc...@apache.org> > > wrote: > > >>>>>> > > >>>>>>> I'm not sure whether it is a good idea to remove `hive-exec-core` > > >>>>>>> completely - it is still being used today by some other popular > > >> projects > > >>>>>>> including Spark and Trino/Presto. By sticking to `hive-exec-core` > > it > > >>>>> gives > > >>>>>>> more flexibility to the other projects to shade & relocate those > > >> classes > > >>>>>>> according to their need, without waiting for new Hive releases. > > Hive > > >>>>> also > > >>>>>>> needs to make sure it relocate everything properly. Otherwise, if > > >> some > > >>>>>>> classes are shaded & included in `hive-exec` but not relocated, > > there > > >>>>> is no > > >>>>>>> way for the other projects to exclude them and avoid potential > > >>>>> conflicts. > > >>>>>>> Chao > > >>>>>>> > > >>>>>>> On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich<k...@rxd.hu> > > >> wrote: > > >>>>>>> > > >>>>>>>> Hey > > >>>>>>>> > > >>>>>>>> On 9/6/21 12:48 PM, Stamatis Zampetakis wrote: > > >>>>>>>>> Indeed this may lead to binary incompatibility problems as the > > one > > >> you > > >>>>>>>>> mentioned. If I understood correctly the problem you cite comes > > up > > >> if > > >>>>>>>>> library B in this case is not relocated. If Hive systematically > > >>>>>>> relocates > > >>>>>>>>> shaded deps do you think there will still be binary > > incompatibility > > >>>>>>>> issues? > > >>>>>>>>> If the relocating solution works, I would personally prefer > going > > >> down > > >>>>>>>> this > > >>>>>>>>> path instead of introducing an entirely new module just for the > > >> sake > > >>>>> of > > >>>>>>>>> dependency management. Most of the time when there are problems > > >> with > > >>>>>>>>> shading the answer comes from relocating the problematic > > >> dependencies > > >>>>>>> and > > >>>>>>>>> people are more or less accustomed with this route. > > >>>>>>>> I totally agree with you Stamatis - with the addition that we > > should > > >>>>> work > > >>>>>>>> together with the owners of other projects to help them use the > > >> correct > > >>>>>>>> artifact to gain access to > > >>>>>>>> Hive's internal parts. > > >>>>>>>> I've opened HIVE-25531 to remove the core classified artifact - > > and > > >>>>>>> ensure > > >>>>>>>> that we will be uncovering and fixing future issues with the > > >> hive-exec > > >>>>>>>> artifact. > > >>>>>>>> > > >>>>>>>> cheers, > > >>>>>>>> Zoltan > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>> Best, > > >>>>>>>>> Stamatis > > >>>>>>>>> > > >>>>>>>>> On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi > > >>>>>>>> <fdan...@cloudera.com.invalid> > > >>>>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>>> Dear Hive developers, > > >>>>>>>>>> > > >>>>>>>>>> I am Dan from the Oozie team and I would like to bring up the > > >>>>>>>>>> hive-exec.jar vs. hive-exec-core.jar topic. > > >>>>>>>>>> The reason for that is because as far as we understand the > > >> official > > >>>>>>>>>> recommendation from the Hive team is to use the hive-exec.jar > > >>>>>>> artifact. > > >>>>>>>>>> However in Oozie that can end-up in a binary incompatibility. > > >>>>>>>>>> > > >>>>>>>>>> The reason for that is: > > >>>>>>>>>> > > >>>>>>>>>> * Let's say library A is included in the fat Jar. > > >>>>>>>>>> > > >>>>>>>>>> * And library B which is using library A is also > included > > in > > >> the > > >>>>>>> fat > > >>>>>>>> Jar. > > >>>>>>>>>> * Let's also say that library A's com.library.alib > > package is > > >>>>>>>>>> relocated to org.apache.hive.com.library.alib, > > >>>>>>>>>> meaning the com.library.alib.SomeClass becomes > > >>>>>>>>>> org.apache.hive.com.library.alib.SomeClass > > >>>>>>>>>> > > >>>>>>>>>> * So if B has a method like public void > > >>>>>>>>>> someMethod(com.library.alib.SomeClass) then the > > signature > > >> of > > >>>>> this > > >>>>>>>>>> method will be changed to: > > >>>>>>>>>> public void > > >>>>>>> someMethod(org.apache.hive.com.library.alib.SomeClass) > > >>>>>>>>>> * If Oozie is also using B directly meaning we'll have > > b.jar > > >> on > > >>>>> our > > >>>>>>>>>> classpath, but with the unchanged signature, > > >>>>>>>>>> so when hive-exec tries to invoke someMethod then > > >> depending on > > >>>>>>>>>> whether b.jar coming from us will be loaded first or > > >> hive-exec > > >>>>>>>> will, > > >>>>>>>>>> we can end-up with a NoSuchMethodError is hive-exec > > tries > > >> to > > >>>>> pass > > >>>>>>>> an > > >>>>>>>>>> org.apache.hive.com.library.alib.SomeClass instance to > > the > > >>>>>>>>>> someMethod which was loaded from the original b.jar. > > >>>>>>>>>> > > >>>>>>>>>> Hence in Oozie a long time ago (OOZIE-2621 > > >>>>>>>>>> <https://issues.apache.org/jira/browse/OOZIE-2621>) the > > decision > > >> was > > >>>>>>>>>> made to use the hive-exec-core Jar. > > >>>>>>>>>> > > >>>>>>>>>> Now since the shading process actually removes those > > dependencies > > >>>>> from > > >>>>>>>>>> the hive-exec pom which are included in the fat Jar, we > manually > > >> had > > >>>>>>> to > > >>>>>>>>>> add some dependencies to Oozie to compensate this. > > >>>>>>>>>> However these dependencies are not used by Oozie directly and > > with > > >>>>> the > > >>>>>>>>>> growing features of hive-exec we had to repeat the same > process > > >>>>>>>>>> over-and-over which is a bit unmaintainable. > > >>>>>>>>>> > > >>>>>>>>>> Today I'm writing to you to propose a long-term solution where > > >>>>>>> basically > > >>>>>>>>>> nothing would change in the generated hive artifacts, poms and > > the > > >>>>>>> same > > >>>>>>>>>> time we wouldn't have to manually declare dependencies in > Oozie > > >> which > > >>>>>>>>>> are not explicitly used by us. > > >>>>>>>>>> > > >>>>>>>>>> The solution: > > >>>>>>>>>> > > >>>>>>>>>> 1. We would create a new module named > > hive-exec-dependencies > > >> which > > >>>>>>>>>> would be a pom-packaging module without any Java > source > > >> files. > > >>>>>>>>>> 2. All the dependencies declared in hive-exec would be > > moved > > >> to > > >>>>>>>>>> hive-exec-dependencies. > > >>>>>>>>>> 3. We would make the hive-exec-dependencies module the > > parent > > >> of > > >>>>>>>>>> hive-exec and with this hive-exec would still have > > access > > >> to > > >>>>> the > > >>>>>>>>>> same dependencies as before. > > >>>>>>>>>> 4. The maven shade plugin would still strip the > > dependencies > > >> from > > >>>>>>> the > > >>>>>>>>>> generated hive-exec pom which are included in the fat > > Jar. > > >>>>>>>>>> 5. And with a small maven plugin we'd change hive-exec's > > >> parent > > >>>>> back > > >>>>>>>>>> from hive-exec-dependencies to the root hive project > in > > the > > >>>>>>>>>> generated hive-exec pom file. > > >>>>>>>>>> > > >>>>>>>>>> I have a change ready locally and it works as described above. > > >>>>>>>>>> > > >>>>>>>>>> With this on the Oozie side we could add a dependency on > > >>>>>>>>>> hive-exec-dependencies and hence all the required libraries > > which > > >> are > > >>>>>>>>>> included in the fat Jar would be pulled into Oozie. > > >>>>>>>>>> The next time a new dependency would be added to > > >>>>>>> hive-exec-dependencies, > > >>>>>>>>>> the Oozie build would pull it in automatically without us > having > > >> to > > >>>>>>>>>> explicitly declare it. > > >>>>>>>>>> > > >>>>>>>>>> Please let me know what you think. > > >>>>>>>>>> > > >>>>>>>>>> Best, > > >>>>>>>>>> Dan > > >>>>>>>>>> > > >>> > > >> > > > > > >