I'm fine as long as we are committed to fixing the shading problems before
the release. Ideally I think we should fix the shading problems first and
then remove the hive-exec:core jar though (which is I said it's a bit
premature to do it now).

On Thu, Nov 18, 2021 at 8:28 AM Stamatis Zampetakis <zabe...@gmail.com>
wrote:

> Hello,
>
> I don't see any risk committing this right now in master. It will only
> affect the new Hive release when and if it ever goes out.
> Till then we have plenty of time to fix shading problems and help other
> projects migrate to the "recommended" way to use Hive.
>
> Moreover, I don't know many projects relying on this kind of "double" (core
> vs. fat) publication of dependencies. For Hive, it creates additional
> maintenance cost and for its users confusion on what they should use.
> If for whatever reason, another project does not want to include everything
> coming in the fat jar, maven provides ways to do it. I wouldn't recommend
> going down this path but there are alternatives.
>
> Best,
> Stamatis
>
> On Wed, Nov 17, 2021 at 8:15 PM Zoltan Haindrich <k...@rxd.hu> wrote:
>
> >
> >
> > On 11/17/21 7:46 PM, Chao Sun wrote:
> > >> We have a working hive-exec jar
> > >
> > > I'm not sure about this. The issue comes when the fat hive-exec jar
> > shades
> > > some jars but doesn't relocate them. In this case there is no way for
> the
> > > downstream projects to resolve the conflict.
> >
> > Exactly - I think those should be hammered out for good; fix the
> > shading/relocation!
> >
> > >
> > > On the Spark side IIUC we had issues with Apache Commons as well as ORC
> > > (see HIVE-25317 for an effort on this), and there could be more. Spark
> is
> > > using Hive 2.3 though but the same applies for master/4.0 if dependency
> > > versions differ between Hive and the downstream projects.
> >
> > This change is only about master - it won't change Hive 2.3. HIVE-25317
> > was for branch-2 as well.
> > I've seen a few places wierd stuff because they were not able to use the
> > hive-exec jar as-is.
> > Folks in the Impala project for example went in a direction to
> > re-shade/re-filter the hive-exec jar and relocate some stuff in it - most
> > likely because it conflicted with
> > their stuff.
> >
> >
> https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml
> > Taking a quick look at https://github.com/apache/spark/pull/33989/files
> > it seems like you've also done something similar....but instead of using
> > the base artifact; you have
> > created a new shader.
> > I don't think this better than having an artifact which is simply works
> > out-of-the-box.
> >
> >
> > cheers,
> > Zoltan
> >
> > >
> > > On Wed, Nov 17, 2021 at 10:35 AM Zoltan Haindrich <k...@rxd.hu> wrote:
> > >
> > >> On 11/17/21 7:07 PM, Daniel Fritsi wrote:
> > >>> For Oozie we've decided to use fat Jar downstream (Cloudera) as there
> > we
> > >> have processes to ensure 3rd-party library versions are kept in sync.
> > >>>
> > >>> Since we don't have such a process in Apache, there we'll continue to
> > >> use the core Jar.
> > >>
> > >> It might be possible to evade some problems by using a 3rd party lib
> > >> syncer - but if we've done a good job shading this stuff; it should
> not
> > >> cause any trouble even in case
> > >> other 3rd party stuff is present....but in any case to check things
> out
> > >> you will need a Hive release in some form
> > >>
> > >> cheers,
> > >> Zoltan
> > >>
> > >>>
> > >>> Dan
> > >>>
> > >>> On 2021. 11. 17. 18:50, Chao Sun wrote:
> > >>>>> the idea is to fix the issues they bump into - because people who
> > load
> > >>>> the jdbc driver may also see those issues.
> > >>>>
> > >>>> I don’t get what you mean here, could you elaborate a bit more?
> > >>>>
> > >>>> IMO it's a bit premature to do this without a working hive-exec jar
> > for
> > >>>> downstream projects like Spark/Trino/Presto. At the current state
> > there
> > >> is
> > >>>> no way to upgrade these projects to use the fat hive-exec jar.
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Wed, Nov 17, 2021 at 5:47 AM Zoltan Haindrich<k...@rxd.hu>
> wrote:
> > >>>>
> > >>>>> Hey all,
> > >>>>>
> > >>>>> I wanted to get back to this - but had other things going on.
> > >>>>>
> > >>>>> Chao> it is still being used today by some other popular projects
> > >>>>> the idea is to fix the issues they bump into - because people who
> > load
> > >> the
> > >>>>> jdbc driver may also see those issues.
> > >>>>>
> > >>>>> Edward> [...] You all must like enjoy shading jars.
> > >>>>> I totally agree that they may use a shell action as well.
> > >>>>> I wonder how do you propose to solve issues related to clients
> using
> > a
> > >>>>> different version of the guava library?
> > >>>>>
> > >>>>> The changes which will remove the core artifact stuff is ready:
> > >>>>> https://github.com/apache/hive/pull/2648
> > >>>>>
> > >>>>> cheers,
> > >>>>> Zoltan
> > >>>>>
> > >>>>> On 9/21/21 8:23 PM, Edward Capriolo wrote:
> > >>>>>> recommendation from the Hive team is to use the hive-exec.jar
> > >> artifact.
> > >>>>>>
> > >>>>>> You know about 10 years ago. I mentioned that oozie should just
> use
> > >>>>>> hive-service or hive jdbc. After a big fight where folks kept
> > >> bringing up
> > >>>>>> concurrency bugs in hive-server-1 my prs were rejected (even
> though
> > >> hive
> > >>>>>> server2 would not have these bugs). I still cannot fathom why
> > someone
> > >>>>> using
> > >>>>>> oozie would want a fat jar of hive (as opposed to hive server or
> > >>>>> hivejdbc)
> > >>>>>> . If I had to do that, i would just use shell action..... You all
> > must
> > >>>>> like
> > >>>>>> enjoy shading jars.
> > >>>>>>
> > >>>>>> Edward
> > >>>>>>
> > >>>>>> On Thu, Sep 16, 2021 at 2:30 PM Chao Sun<sunc...@apache.org>
> > wrote:
> > >>>>>>
> > >>>>>>> I'm not sure whether it is a good idea to remove `hive-exec-core`
> > >>>>>>> completely - it is still being used today by some other popular
> > >> projects
> > >>>>>>> including Spark and Trino/Presto. By sticking to `hive-exec-core`
> > it
> > >>>>> gives
> > >>>>>>> more flexibility to the other projects to shade & relocate those
> > >> classes
> > >>>>>>> according to their need, without waiting for new Hive releases.
> > Hive
> > >>>>> also
> > >>>>>>> needs to make sure it relocate everything properly. Otherwise, if
> > >> some
> > >>>>>>> classes are shaded & included in `hive-exec` but not relocated,
> > there
> > >>>>> is no
> > >>>>>>> way for the other projects to exclude them and avoid potential
> > >>>>> conflicts.
> > >>>>>>> Chao
> > >>>>>>>
> > >>>>>>> On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich<k...@rxd.hu>
> > >> wrote:
> > >>>>>>>
> > >>>>>>>> Hey
> > >>>>>>>>
> > >>>>>>>> On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:
> > >>>>>>>>> Indeed this may lead to binary incompatibility problems as the
> > one
> > >> you
> > >>>>>>>>> mentioned. If I understood correctly the problem you cite comes
> > up
> > >> if
> > >>>>>>>>> library B in this case is not relocated. If Hive systematically
> > >>>>>>> relocates
> > >>>>>>>>> shaded deps do you think there will still be binary
> > incompatibility
> > >>>>>>>> issues?
> > >>>>>>>>> If the relocating solution works, I would personally prefer
> going
> > >> down
> > >>>>>>>> this
> > >>>>>>>>> path instead of introducing an entirely new module just for the
> > >> sake
> > >>>>> of
> > >>>>>>>>> dependency management. Most of the time when there are problems
> > >> with
> > >>>>>>>>> shading the answer comes from relocating the problematic
> > >> dependencies
> > >>>>>>> and
> > >>>>>>>>> people are more or less accustomed with this route.
> > >>>>>>>> I totally agree with you Stamatis - with the addition that we
> > should
> > >>>>> work
> > >>>>>>>> together with the owners of other projects to help them use the
> > >> correct
> > >>>>>>>> artifact to gain access to
> > >>>>>>>> Hive's internal parts.
> > >>>>>>>> I've opened HIVE-25531 to remove the core classified artifact -
> > and
> > >>>>>>> ensure
> > >>>>>>>> that we will be uncovering and fixing future issues with the
> > >> hive-exec
> > >>>>>>>> artifact.
> > >>>>>>>>
> > >>>>>>>> cheers,
> > >>>>>>>> Zoltan
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>> Stamatis
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi
> > >>>>>>>> <fdan...@cloudera.com.invalid>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Dear Hive developers,
> > >>>>>>>>>>
> > >>>>>>>>>> I am Dan from the Oozie team and I would like to bring up the
> > >>>>>>>>>> hive-exec.jar vs. hive-exec-core.jar topic.
> > >>>>>>>>>> The reason for that is because as far as we understand the
> > >> official
> > >>>>>>>>>> recommendation from the Hive team is to use the hive-exec.jar
> > >>>>>>> artifact.
> > >>>>>>>>>> However in Oozie that can end-up in a binary incompatibility.
> > >>>>>>>>>>
> > >>>>>>>>>> The reason for that is:
> > >>>>>>>>>>
> > >>>>>>>>>>       * Let's say library A is included in the fat Jar.
> > >>>>>>>>>>
> > >>>>>>>>>>       * And library B which is using library A is also
> included
> > in
> > >> the
> > >>>>>>> fat
> > >>>>>>>> Jar.
> > >>>>>>>>>>       * Let's also say that library A's com.library.alib
> > package is
> > >>>>>>>>>>         relocated to org.apache.hive.com.library.alib,
> > >>>>>>>>>>         meaning the com.library.alib.SomeClass becomes
> > >>>>>>>>>>         org.apache.hive.com.library.alib.SomeClass
> > >>>>>>>>>>
> > >>>>>>>>>>       * So if B has a method like public void
> > >>>>>>>>>>         someMethod(com.library.alib.SomeClass) then the
> > signature
> > >> of
> > >>>>> this
> > >>>>>>>>>>         method will be changed to:
> > >>>>>>>>>>         public void
> > >>>>>>> someMethod(org.apache.hive.com.library.alib.SomeClass)
> > >>>>>>>>>>       * If Oozie is also using B directly meaning we'll have
> > b.jar
> > >> on
> > >>>>> our
> > >>>>>>>>>>         classpath, but with the unchanged signature,
> > >>>>>>>>>>         so when hive-exec tries to invoke someMethod then
> > >> depending on
> > >>>>>>>>>>         whether b.jar coming from us will be loaded first or
> > >> hive-exec
> > >>>>>>>> will,
> > >>>>>>>>>>         we can end-up with a NoSuchMethodError is hive-exec
> > tries
> > >> to
> > >>>>> pass
> > >>>>>>>> an
> > >>>>>>>>>>         org.apache.hive.com.library.alib.SomeClass instance to
> > the
> > >>>>>>>>>>         someMethod which was loaded from the original b.jar.
> > >>>>>>>>>>
> > >>>>>>>>>> Hence in Oozie a long time ago (OOZIE-2621
> > >>>>>>>>>> <https://issues.apache.org/jira/browse/OOZIE-2621>) the
> > decision
> > >> was
> > >>>>>>>>>> made to use the hive-exec-core Jar.
> > >>>>>>>>>>
> > >>>>>>>>>> Now since the shading process actually removes those
> > dependencies
> > >>>>> from
> > >>>>>>>>>> the hive-exec pom which are included in the fat Jar, we
> manually
> > >> had
> > >>>>>>> to
> > >>>>>>>>>> add some dependencies to Oozie to compensate this.
> > >>>>>>>>>> However these dependencies are not used by Oozie directly and
> > with
> > >>>>> the
> > >>>>>>>>>> growing features of hive-exec we had to repeat the same
> process
> > >>>>>>>>>> over-and-over which is a bit unmaintainable.
> > >>>>>>>>>>
> > >>>>>>>>>> Today I'm writing to you to propose a long-term solution where
> > >>>>>>> basically
> > >>>>>>>>>> nothing would change in the generated hive artifacts, poms and
> > the
> > >>>>>>> same
> > >>>>>>>>>> time we wouldn't have to manually declare dependencies in
> Oozie
> > >> which
> > >>>>>>>>>> are not explicitly used by us.
> > >>>>>>>>>>
> > >>>>>>>>>> The solution:
> > >>>>>>>>>>
> > >>>>>>>>>>      1. We would create a new module named
> > hive-exec-dependencies
> > >> which
> > >>>>>>>>>>         would be a pom-packaging module without any Java
> source
> > >> files.
> > >>>>>>>>>>      2. All the dependencies declared in hive-exec would be
> > moved
> > >> to
> > >>>>>>>>>>         hive-exec-dependencies.
> > >>>>>>>>>>      3. We would make the hive-exec-dependencies module the
> > parent
> > >> of
> > >>>>>>>>>>         hive-exec and with this hive-exec would still have
> > access
> > >> to
> > >>>>> the
> > >>>>>>>>>>         same dependencies as before.
> > >>>>>>>>>>      4. The maven shade plugin would still strip the
> > dependencies
> > >> from
> > >>>>>>> the
> > >>>>>>>>>>         generated hive-exec pom which are included in the fat
> > Jar.
> > >>>>>>>>>>      5. And with a small maven plugin we'd change hive-exec's
> > >> parent
> > >>>>> back
> > >>>>>>>>>>         from hive-exec-dependencies to the root hive project
> in
> > the
> > >>>>>>>>>>         generated hive-exec pom file.
> > >>>>>>>>>>
> > >>>>>>>>>> I have a change ready locally and it works as described above.
> > >>>>>>>>>>
> > >>>>>>>>>> With this on the Oozie side we could add a dependency on
> > >>>>>>>>>> hive-exec-dependencies and hence all the required libraries
> > which
> > >> are
> > >>>>>>>>>> included in the fat Jar would be pulled into Oozie.
> > >>>>>>>>>> The next time a new dependency would be added to
> > >>>>>>> hive-exec-dependencies,
> > >>>>>>>>>> the Oozie build would pull it in automatically without us
> having
> > >> to
> > >>>>>>>>>> explicitly declare it.
> > >>>>>>>>>>
> > >>>>>>>>>> Please let me know what you think.
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Dan
> > >>>>>>>>>>
> > >>>
> > >>
> > >
> >
>

Reply via email to