recommendation from the Hive team is to use the hive-exec.jar artifact.

You know about 10 years ago. I mentioned that oozie should just use
hive-service or hive jdbc. After a big fight where folks kept bringing up
concurrency bugs in hive-server-1 my prs were rejected (even though hive
server2 would not have these bugs). I still cannot fathom why someone using
oozie would want a fat jar of hive (as opposed to hive server or hivejdbc)
. If I had to do that, i would just use shell action..... You all must like
enjoy shading jars.

Edward

On Thu, Sep 16, 2021 at 2:30 PM Chao Sun <sunc...@apache.org> wrote:

> I'm not sure whether it is a good idea to remove `hive-exec-core`
> completely - it is still being used today by some other popular projects
> including Spark and Trino/Presto. By sticking to `hive-exec-core` it gives
> more flexibility to the other projects to shade & relocate those classes
> according to their need, without waiting for new Hive releases. Hive also
> needs to make sure it relocate everything properly. Otherwise, if some
> classes are shaded & included in `hive-exec` but not relocated, there is no
> way for the other projects to exclude them and avoid potential conflicts.
>
> Chao
>
> On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich <k...@rxd.hu> wrote:
>
> > Hey
> >
> > On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:
> > > Indeed this may lead to binary incompatibility problems as the one you
> > > mentioned. If I understood correctly the problem you cite comes up if
> > > library B in this case is not relocated. If Hive systematically
> relocates
> > > shaded deps do you think there will still be binary incompatibility
> > issues?
> > >
> > > If the relocating solution works, I would personally prefer going down
> > this
> > > path instead of introducing an entirely new module just for the sake of
> > > dependency management. Most of the time when there are problems with
> > > shading the answer comes from relocating the problematic dependencies
> and
> > > people are more or less accustomed with this route.
> >
> > I totally agree with you Stamatis - with the addition that we should work
> > together with the owners of other projects to help them use the correct
> > artifact to gain access to
> > Hive's internal parts.
> > I've opened HIVE-25531 to remove the core classified artifact - and
> ensure
> > that we will be uncovering and fixing future issues with the hive-exec
> > artifact.
> >
> > cheers,
> > Zoltan
> >
> >
> > >
> > > Best,
> > > Stamatis
> > >
> > > On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi
> > <fdan...@cloudera.com.invalid>
> > > wrote:
> > >
> > >> Dear Hive developers,
> > >>
> > >> I am Dan from the Oozie team and I would like to bring up the
> > >> hive-exec.jar vs. hive-exec-core.jar topic.
> > >> The reason for that is because as far as we understand the official
> > >> recommendation from the Hive team is to use the hive-exec.jar
> artifact.
> > >>
> > >> However in Oozie that can end-up in a binary incompatibility.
> > >>
> > >> The reason for that is:
> > >>
> > >>    * Let's say library A is included in the fat Jar.
> > >>
> > >>    * And library B which is using library A is also included in the
> fat
> > Jar.
> > >>
> > >>    * Let's also say that library A's com.library.alib package is
> > >>      relocated to org.apache.hive.com.library.alib,
> > >>      meaning the com.library.alib.SomeClass becomes
> > >>      org.apache.hive.com.library.alib.SomeClass
> > >>
> > >>    * So if B has a method like public void
> > >>      someMethod(com.library.alib.SomeClass) then the signature of this
> > >>      method will be changed to:
> > >>      public void
> someMethod(org.apache.hive.com.library.alib.SomeClass)
> > >>
> > >>    * If Oozie is also using B directly meaning we'll have b.jar on our
> > >>      classpath, but with the unchanged signature,
> > >>      so when hive-exec tries to invoke someMethod then depending on
> > >>      whether b.jar coming from us will be loaded first or hive-exec
> > will,
> > >>      we can end-up with a NoSuchMethodError is hive-exec tries to pass
> > an
> > >>      org.apache.hive.com.library.alib.SomeClass instance to the
> > >>      someMethod which was loaded from the original b.jar.
> > >>
> > >> Hence in Oozie a long time ago (OOZIE-2621
> > >> <https://issues.apache.org/jira/browse/OOZIE-2621>) the decision was
> > >> made to use the hive-exec-core Jar.
> > >>
> > >> Now since the shading process actually removes those dependencies from
> > >> the hive-exec pom which are included in the fat Jar, we manually had
> to
> > >> add some dependencies to Oozie to compensate this.
> > >> However these dependencies are not used by Oozie directly and with the
> > >> growing features of hive-exec we had to repeat the same process
> > >> over-and-over which is a bit unmaintainable.
> > >>
> > >> Today I'm writing to you to propose a long-term solution where
> basically
> > >> nothing would change in the generated hive artifacts, poms and the
> same
> > >> time we wouldn't have to manually declare dependencies in Oozie which
> > >> are not explicitly used by us.
> > >>
> > >> The solution:
> > >>
> > >>   1. We would create a new module named hive-exec-dependencies which
> > >>      would be a pom-packaging module without any Java source files.
> > >>   2. All the dependencies declared in hive-exec would be moved to
> > >>      hive-exec-dependencies.
> > >>   3. We would make the hive-exec-dependencies module the parent of
> > >>      hive-exec and with this hive-exec would still have access to the
> > >>      same dependencies as before.
> > >>   4. The maven shade plugin would still strip the dependencies from
> the
> > >>      generated hive-exec pom which are included in the fat Jar.
> > >>   5. And with a small maven plugin we'd change hive-exec's parent back
> > >>      from hive-exec-dependencies to the root hive project in the
> > >>      generated hive-exec pom file.
> > >>
> > >> I have a change ready locally and it works as described above.
> > >>
> > >> With this on the Oozie side we could add a dependency on
> > >> hive-exec-dependencies and hence all the required libraries which are
> > >> included in the fat Jar would be pulled into Oozie.
> > >> The next time a new dependency would be added to
> hive-exec-dependencies,
> > >> the Oozie build would pull it in automatically without us having to
> > >> explicitly declare it.
> > >>
> > >> Please let me know what you think.
> > >>
> > >> Best,
> > >> Dan
> > >>
> > >
> >
>

Reply via email to