> the idea is to fix the issues they bump into - because people who load
the jdbc driver may also see those issues.

I don’t get what you mean here, could you elaborate a bit more?

IMO it's a bit premature to do this without a working hive-exec jar for
downstream projects like Spark/Trino/Presto. At the current state there is
no way to upgrade these projects to use the fat hive-exec jar.



On Wed, Nov 17, 2021 at 5:47 AM Zoltan Haindrich <k...@rxd.hu> wrote:

> Hey all,
>
> I wanted to get back to this - but had other things going on.
>
> Chao> it is still being used today by some other popular projects
> the idea is to fix the issues they bump into - because people who load the
> jdbc driver may also see those issues.
>
> Edward> [...] You all must like enjoy shading jars.
> I totally agree that they may use a shell action as well.
> I wonder how do you propose to solve issues related to clients using a
> different version of the guava library?
>
> The changes which will remove the core artifact stuff is ready:
> https://github.com/apache/hive/pull/2648
>
> cheers,
> Zoltan
>
> On 9/21/21 8:23 PM, Edward Capriolo wrote:
> > recommendation from the Hive team is to use the hive-exec.jar artifact.
> >
> > You know about 10 years ago. I mentioned that oozie should just use
> > hive-service or hive jdbc. After a big fight where folks kept bringing up
> > concurrency bugs in hive-server-1 my prs were rejected (even though hive
> > server2 would not have these bugs). I still cannot fathom why someone
> using
> > oozie would want a fat jar of hive (as opposed to hive server or
> hivejdbc)
> > . If I had to do that, i would just use shell action..... You all must
> like
> > enjoy shading jars.
> >
> > Edward
> >
> > On Thu, Sep 16, 2021 at 2:30 PM Chao Sun <sunc...@apache.org> wrote:
> >
> >> I'm not sure whether it is a good idea to remove `hive-exec-core`
> >> completely - it is still being used today by some other popular projects
> >> including Spark and Trino/Presto. By sticking to `hive-exec-core` it
> gives
> >> more flexibility to the other projects to shade & relocate those classes
> >> according to their need, without waiting for new Hive releases. Hive
> also
> >> needs to make sure it relocate everything properly. Otherwise, if some
> >> classes are shaded & included in `hive-exec` but not relocated, there
> is no
> >> way for the other projects to exclude them and avoid potential
> conflicts.
> >>
> >> Chao
> >>
> >> On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich <k...@rxd.hu> wrote:
> >>
> >>> Hey
> >>>
> >>> On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:
> >>>> Indeed this may lead to binary incompatibility problems as the one you
> >>>> mentioned. If I understood correctly the problem you cite comes up if
> >>>> library B in this case is not relocated. If Hive systematically
> >> relocates
> >>>> shaded deps do you think there will still be binary incompatibility
> >>> issues?
> >>>>
> >>>> If the relocating solution works, I would personally prefer going down
> >>> this
> >>>> path instead of introducing an entirely new module just for the sake
> of
> >>>> dependency management. Most of the time when there are problems with
> >>>> shading the answer comes from relocating the problematic dependencies
> >> and
> >>>> people are more or less accustomed with this route.
> >>>
> >>> I totally agree with you Stamatis - with the addition that we should
> work
> >>> together with the owners of other projects to help them use the correct
> >>> artifact to gain access to
> >>> Hive's internal parts.
> >>> I've opened HIVE-25531 to remove the core classified artifact - and
> >> ensure
> >>> that we will be uncovering and fixing future issues with the hive-exec
> >>> artifact.
> >>>
> >>> cheers,
> >>> Zoltan
> >>>
> >>>
> >>>>
> >>>> Best,
> >>>> Stamatis
> >>>>
> >>>> On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi
> >>> <fdan...@cloudera.com.invalid>
> >>>> wrote:
> >>>>
> >>>>> Dear Hive developers,
> >>>>>
> >>>>> I am Dan from the Oozie team and I would like to bring up the
> >>>>> hive-exec.jar vs. hive-exec-core.jar topic.
> >>>>> The reason for that is because as far as we understand the official
> >>>>> recommendation from the Hive team is to use the hive-exec.jar
> >> artifact.
> >>>>>
> >>>>> However in Oozie that can end-up in a binary incompatibility.
> >>>>>
> >>>>> The reason for that is:
> >>>>>
> >>>>>     * Let's say library A is included in the fat Jar.
> >>>>>
> >>>>>     * And library B which is using library A is also included in the
> >> fat
> >>> Jar.
> >>>>>
> >>>>>     * Let's also say that library A's com.library.alib package is
> >>>>>       relocated to org.apache.hive.com.library.alib,
> >>>>>       meaning the com.library.alib.SomeClass becomes
> >>>>>       org.apache.hive.com.library.alib.SomeClass
> >>>>>
> >>>>>     * So if B has a method like public void
> >>>>>       someMethod(com.library.alib.SomeClass) then the signature of
> this
> >>>>>       method will be changed to:
> >>>>>       public void
> >> someMethod(org.apache.hive.com.library.alib.SomeClass)
> >>>>>
> >>>>>     * If Oozie is also using B directly meaning we'll have b.jar on
> our
> >>>>>       classpath, but with the unchanged signature,
> >>>>>       so when hive-exec tries to invoke someMethod then depending on
> >>>>>       whether b.jar coming from us will be loaded first or hive-exec
> >>> will,
> >>>>>       we can end-up with a NoSuchMethodError is hive-exec tries to
> pass
> >>> an
> >>>>>       org.apache.hive.com.library.alib.SomeClass instance to the
> >>>>>       someMethod which was loaded from the original b.jar.
> >>>>>
> >>>>> Hence in Oozie a long time ago (OOZIE-2621
> >>>>> <https://issues.apache.org/jira/browse/OOZIE-2621>) the decision was
> >>>>> made to use the hive-exec-core Jar.
> >>>>>
> >>>>> Now since the shading process actually removes those dependencies
> from
> >>>>> the hive-exec pom which are included in the fat Jar, we manually had
> >> to
> >>>>> add some dependencies to Oozie to compensate this.
> >>>>> However these dependencies are not used by Oozie directly and with
> the
> >>>>> growing features of hive-exec we had to repeat the same process
> >>>>> over-and-over which is a bit unmaintainable.
> >>>>>
> >>>>> Today I'm writing to you to propose a long-term solution where
> >> basically
> >>>>> nothing would change in the generated hive artifacts, poms and the
> >> same
> >>>>> time we wouldn't have to manually declare dependencies in Oozie which
> >>>>> are not explicitly used by us.
> >>>>>
> >>>>> The solution:
> >>>>>
> >>>>>    1. We would create a new module named hive-exec-dependencies which
> >>>>>       would be a pom-packaging module without any Java source files.
> >>>>>    2. All the dependencies declared in hive-exec would be moved to
> >>>>>       hive-exec-dependencies.
> >>>>>    3. We would make the hive-exec-dependencies module the parent of
> >>>>>       hive-exec and with this hive-exec would still have access to
> the
> >>>>>       same dependencies as before.
> >>>>>    4. The maven shade plugin would still strip the dependencies from
> >> the
> >>>>>       generated hive-exec pom which are included in the fat Jar.
> >>>>>    5. And with a small maven plugin we'd change hive-exec's parent
> back
> >>>>>       from hive-exec-dependencies to the root hive project in the
> >>>>>       generated hive-exec pom file.
> >>>>>
> >>>>> I have a change ready locally and it works as described above.
> >>>>>
> >>>>> With this on the Oozie side we could add a dependency on
> >>>>> hive-exec-dependencies and hence all the required libraries which are
> >>>>> included in the fat Jar would be pulled into Oozie.
> >>>>> The next time a new dependency would be added to
> >> hive-exec-dependencies,
> >>>>> the Oozie build would pull it in automatically without us having to
> >>>>> explicitly declare it.
> >>>>>
> >>>>> Please let me know what you think.
> >>>>>
> >>>>> Best,
> >>>>> Dan
> >>>>>
> >>>>
> >>>
> >>
> >
>

Reply via email to