For Oozie we've decided to use fat Jar downstream (Cloudera) as there we have processes to ensure 3rd-party library versions are kept in sync.

Since we don't have such a process in Apache, there we'll continue to use the core Jar.

Dan

On 2021. 11. 17. 18:50, Chao Sun wrote:
the idea is to fix the issues they bump into - because people who load
the jdbc driver may also see those issues.

I don’t get what you mean here, could you elaborate a bit more?

IMO it's a bit premature to do this without a working hive-exec jar for
downstream projects like Spark/Trino/Presto. At the current state there is
no way to upgrade these projects to use the fat hive-exec jar.



On Wed, Nov 17, 2021 at 5:47 AM Zoltan Haindrich<k...@rxd.hu>  wrote:

Hey all,

I wanted to get back to this - but had other things going on.

Chao> it is still being used today by some other popular projects
the idea is to fix the issues they bump into - because people who load the
jdbc driver may also see those issues.

Edward> [...] You all must like enjoy shading jars.
I totally agree that they may use a shell action as well.
I wonder how do you propose to solve issues related to clients using a
different version of the guava library?

The changes which will remove the core artifact stuff is ready:
https://github.com/apache/hive/pull/2648

cheers,
Zoltan

On 9/21/21 8:23 PM, Edward Capriolo wrote:
recommendation from the Hive team is to use the hive-exec.jar artifact.

You know about 10 years ago. I mentioned that oozie should just use
hive-service or hive jdbc. After a big fight where folks kept bringing up
concurrency bugs in hive-server-1 my prs were rejected (even though hive
server2 would not have these bugs). I still cannot fathom why someone
using
oozie would want a fat jar of hive (as opposed to hive server or
hivejdbc)
. If I had to do that, i would just use shell action..... You all must
like
enjoy shading jars.

Edward

On Thu, Sep 16, 2021 at 2:30 PM Chao Sun<sunc...@apache.org>  wrote:

I'm not sure whether it is a good idea to remove `hive-exec-core`
completely - it is still being used today by some other popular projects
including Spark and Trino/Presto. By sticking to `hive-exec-core` it
gives
more flexibility to the other projects to shade & relocate those classes
according to their need, without waiting for new Hive releases. Hive
also
needs to make sure it relocate everything properly. Otherwise, if some
classes are shaded & included in `hive-exec` but not relocated, there
is no
way for the other projects to exclude them and avoid potential
conflicts.
Chao

On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich<k...@rxd.hu>  wrote:

Hey

On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:
Indeed this may lead to binary incompatibility problems as the one you
mentioned. If I understood correctly the problem you cite comes up if
library B in this case is not relocated. If Hive systematically
relocates
shaded deps do you think there will still be binary incompatibility
issues?
If the relocating solution works, I would personally prefer going down
this
path instead of introducing an entirely new module just for the sake
of
dependency management. Most of the time when there are problems with
shading the answer comes from relocating the problematic dependencies
and
people are more or less accustomed with this route.
I totally agree with you Stamatis - with the addition that we should
work
together with the owners of other projects to help them use the correct
artifact to gain access to
Hive's internal parts.
I've opened HIVE-25531 to remove the core classified artifact - and
ensure
that we will be uncovering and fixing future issues with the hive-exec
artifact.

cheers,
Zoltan


Best,
Stamatis

On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi
<fdan...@cloudera.com.invalid>
wrote:

Dear Hive developers,

I am Dan from the Oozie team and I would like to bring up the
hive-exec.jar vs. hive-exec-core.jar topic.
The reason for that is because as far as we understand the official
recommendation from the Hive team is to use the hive-exec.jar
artifact.
However in Oozie that can end-up in a binary incompatibility.

The reason for that is:

     * Let's say library A is included in the fat Jar.

     * And library B which is using library A is also included in the
fat
Jar.
     * Let's also say that library A's com.library.alib package is
       relocated to org.apache.hive.com.library.alib,
       meaning the com.library.alib.SomeClass becomes
       org.apache.hive.com.library.alib.SomeClass

     * So if B has a method like public void
       someMethod(com.library.alib.SomeClass) then the signature of
this
       method will be changed to:
       public void
someMethod(org.apache.hive.com.library.alib.SomeClass)
     * If Oozie is also using B directly meaning we'll have b.jar on
our
       classpath, but with the unchanged signature,
       so when hive-exec tries to invoke someMethod then depending on
       whether b.jar coming from us will be loaded first or hive-exec
will,
       we can end-up with a NoSuchMethodError is hive-exec tries to
pass
an
       org.apache.hive.com.library.alib.SomeClass instance to the
       someMethod which was loaded from the original b.jar.

Hence in Oozie a long time ago (OOZIE-2621
<https://issues.apache.org/jira/browse/OOZIE-2621>) the decision was
made to use the hive-exec-core Jar.

Now since the shading process actually removes those dependencies
from
the hive-exec pom which are included in the fat Jar, we manually had
to
add some dependencies to Oozie to compensate this.
However these dependencies are not used by Oozie directly and with
the
growing features of hive-exec we had to repeat the same process
over-and-over which is a bit unmaintainable.

Today I'm writing to you to propose a long-term solution where
basically
nothing would change in the generated hive artifacts, poms and the
same
time we wouldn't have to manually declare dependencies in Oozie which
are not explicitly used by us.

The solution:

    1. We would create a new module named hive-exec-dependencies which
       would be a pom-packaging module without any Java source files.
    2. All the dependencies declared in hive-exec would be moved to
       hive-exec-dependencies.
    3. We would make the hive-exec-dependencies module the parent of
       hive-exec and with this hive-exec would still have access to
the
       same dependencies as before.
    4. The maven shade plugin would still strip the dependencies from
the
       generated hive-exec pom which are included in the fat Jar.
    5. And with a small maven plugin we'd change hive-exec's parent
back
       from hive-exec-dependencies to the root hive project in the
       generated hive-exec pom file.

I have a change ready locally and it works as described above.

With this on the Oozie side we could add a dependency on
hive-exec-dependencies and hence all the required libraries which are
included in the fat Jar would be pulled into Oozie.
The next time a new dependency would be added to
hive-exec-dependencies,
the Oozie build would pull it in automatically without us having to
explicitly declare it.

Please let me know what you think.

Best,
Dan

Reply via email to