Hi Dan,

Thanks for kicking off this discussion and taking the time to propose
solutions.

As you correctly mentioned the recommendation from the Hive team is to
always use the hive-exec.jar (dependency) and never rely on
hive-exec-core.jar.

Indeed this may lead to binary incompatibility problems as the one you
mentioned. If I understood correctly the problem you cite comes up if
library B in this case is not relocated. If Hive systematically relocates
shaded deps do you think there will still be binary incompatibility issues?

If the relocating solution works, I would personally prefer going down this
path instead of introducing an entirely new module just for the sake of
dependency management. Most of the time when there are problems with
shading the answer comes from relocating the problematic dependencies and
people are more or less accustomed with this route.

Best,
Stamatis

On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi <fdan...@cloudera.com.invalid>
wrote:

> Dear Hive developers,
>
> I am Dan from the Oozie team and I would like to bring up the
> hive-exec.jar vs. hive-exec-core.jar topic.
> The reason for that is because as far as we understand the official
> recommendation from the Hive team is to use the hive-exec.jar artifact.
>
> However in Oozie that can end-up in a binary incompatibility.
>
> The reason for that is:
>
>   * Let's say library A is included in the fat Jar.
>
>   * And library B which is using library A is also included in the fat Jar.
>
>   * Let's also say that library A's com.library.alib package is
>     relocated to org.apache.hive.com.library.alib,
>     meaning the com.library.alib.SomeClass becomes
>     org.apache.hive.com.library.alib.SomeClass
>
>   * So if B has a method like public void
>     someMethod(com.library.alib.SomeClass) then the signature of this
>     method will be changed to:
>     public void someMethod(org.apache.hive.com.library.alib.SomeClass)
>
>   * If Oozie is also using B directly meaning we'll have b.jar on our
>     classpath, but with the unchanged signature,
>     so when hive-exec tries to invoke someMethod then depending on
>     whether b.jar coming from us will be loaded first or hive-exec will,
>     we can end-up with a NoSuchMethodError is hive-exec tries to pass an
>     org.apache.hive.com.library.alib.SomeClass instance to the
>     someMethod which was loaded from the original b.jar.
>
> Hence in Oozie a long time ago (OOZIE-2621
> <https://issues.apache.org/jira/browse/OOZIE-2621>) the decision was
> made to use the hive-exec-core Jar.
>
> Now since the shading process actually removes those dependencies from
> the hive-exec pom which are included in the fat Jar, we manually had to
> add some dependencies to Oozie to compensate this.
> However these dependencies are not used by Oozie directly and with the
> growing features of hive-exec we had to repeat the same process
> over-and-over which is a bit unmaintainable.
>
> Today I'm writing to you to propose a long-term solution where basically
> nothing would change in the generated hive artifacts, poms and the same
> time we wouldn't have to manually declare dependencies in Oozie which
> are not explicitly used by us.
>
> The solution:
>
>  1. We would create a new module named hive-exec-dependencies which
>     would be a pom-packaging module without any Java source files.
>  2. All the dependencies declared in hive-exec would be moved to
>     hive-exec-dependencies.
>  3. We would make the hive-exec-dependencies module the parent of
>     hive-exec and with this hive-exec would still have access to the
>     same dependencies as before.
>  4. The maven shade plugin would still strip the dependencies from the
>     generated hive-exec pom which are included in the fat Jar.
>  5. And with a small maven plugin we'd change hive-exec's parent back
>     from hive-exec-dependencies to the root hive project in the
>     generated hive-exec pom file.
>
> I have a change ready locally and it works as described above.
>
> With this on the Oozie side we could add a dependency on
> hive-exec-dependencies and hence all the required libraries which are
> included in the fat Jar would be pulled into Oozie.
> The next time a new dependency would be added to hive-exec-dependencies,
> the Oozie build would pull it in automatically without us having to
> explicitly declare it.
>
> Please let me know what you think.
>
> Best,
> Dan
>

Reply via email to