recommendation from the Hive team is to use the hive-exec.jar artifact. You know about 10 years ago. I mentioned that oozie should just use hive-service or hive jdbc. After a big fight where folks kept bringing up concurrency bugs in hive-server-1 my prs were rejected (even though hive server2 would not have these bugs). I still cannot fathom why someone using oozie would want a fat jar of hive (as opposed to hive server or hivejdbc) . If I had to do that, i would just use shell action..... You all must like enjoy shading jars.
Edward On Thu, Sep 16, 2021 at 2:30 PM Chao Sun <sunc...@apache.org> wrote: > I'm not sure whether it is a good idea to remove `hive-exec-core` > completely - it is still being used today by some other popular projects > including Spark and Trino/Presto. By sticking to `hive-exec-core` it gives > more flexibility to the other projects to shade & relocate those classes > according to their need, without waiting for new Hive releases. Hive also > needs to make sure it relocate everything properly. Otherwise, if some > classes are shaded & included in `hive-exec` but not relocated, there is no > way for the other projects to exclude them and avoid potential conflicts. > > Chao > > On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich <k...@rxd.hu> wrote: > > > Hey > > > > On 9/6/21 12:48 PM, Stamatis Zampetakis wrote: > > > Indeed this may lead to binary incompatibility problems as the one you > > > mentioned. If I understood correctly the problem you cite comes up if > > > library B in this case is not relocated. If Hive systematically > relocates > > > shaded deps do you think there will still be binary incompatibility > > issues? > > > > > > If the relocating solution works, I would personally prefer going down > > this > > > path instead of introducing an entirely new module just for the sake of > > > dependency management. Most of the time when there are problems with > > > shading the answer comes from relocating the problematic dependencies > and > > > people are more or less accustomed with this route. > > > > I totally agree with you Stamatis - with the addition that we should work > > together with the owners of other projects to help them use the correct > > artifact to gain access to > > Hive's internal parts. > > I've opened HIVE-25531 to remove the core classified artifact - and > ensure > > that we will be uncovering and fixing future issues with the hive-exec > > artifact. > > > > cheers, > > Zoltan > > > > > > > > > > Best, > > > Stamatis > > > > > > On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi > > <fdan...@cloudera.com.invalid> > > > wrote: > > > > > >> Dear Hive developers, > > >> > > >> I am Dan from the Oozie team and I would like to bring up the > > >> hive-exec.jar vs. hive-exec-core.jar topic. > > >> The reason for that is because as far as we understand the official > > >> recommendation from the Hive team is to use the hive-exec.jar > artifact. > > >> > > >> However in Oozie that can end-up in a binary incompatibility. > > >> > > >> The reason for that is: > > >> > > >> * Let's say library A is included in the fat Jar. > > >> > > >> * And library B which is using library A is also included in the > fat > > Jar. > > >> > > >> * Let's also say that library A's com.library.alib package is > > >> relocated to org.apache.hive.com.library.alib, > > >> meaning the com.library.alib.SomeClass becomes > > >> org.apache.hive.com.library.alib.SomeClass > > >> > > >> * So if B has a method like public void > > >> someMethod(com.library.alib.SomeClass) then the signature of this > > >> method will be changed to: > > >> public void > someMethod(org.apache.hive.com.library.alib.SomeClass) > > >> > > >> * If Oozie is also using B directly meaning we'll have b.jar on our > > >> classpath, but with the unchanged signature, > > >> so when hive-exec tries to invoke someMethod then depending on > > >> whether b.jar coming from us will be loaded first or hive-exec > > will, > > >> we can end-up with a NoSuchMethodError is hive-exec tries to pass > > an > > >> org.apache.hive.com.library.alib.SomeClass instance to the > > >> someMethod which was loaded from the original b.jar. > > >> > > >> Hence in Oozie a long time ago (OOZIE-2621 > > >> <https://issues.apache.org/jira/browse/OOZIE-2621>) the decision was > > >> made to use the hive-exec-core Jar. > > >> > > >> Now since the shading process actually removes those dependencies from > > >> the hive-exec pom which are included in the fat Jar, we manually had > to > > >> add some dependencies to Oozie to compensate this. > > >> However these dependencies are not used by Oozie directly and with the > > >> growing features of hive-exec we had to repeat the same process > > >> over-and-over which is a bit unmaintainable. > > >> > > >> Today I'm writing to you to propose a long-term solution where > basically > > >> nothing would change in the generated hive artifacts, poms and the > same > > >> time we wouldn't have to manually declare dependencies in Oozie which > > >> are not explicitly used by us. > > >> > > >> The solution: > > >> > > >> 1. We would create a new module named hive-exec-dependencies which > > >> would be a pom-packaging module without any Java source files. > > >> 2. All the dependencies declared in hive-exec would be moved to > > >> hive-exec-dependencies. > > >> 3. We would make the hive-exec-dependencies module the parent of > > >> hive-exec and with this hive-exec would still have access to the > > >> same dependencies as before. > > >> 4. The maven shade plugin would still strip the dependencies from > the > > >> generated hive-exec pom which are included in the fat Jar. > > >> 5. And with a small maven plugin we'd change hive-exec's parent back > > >> from hive-exec-dependencies to the root hive project in the > > >> generated hive-exec pom file. > > >> > > >> I have a change ready locally and it works as described above. > > >> > > >> With this on the Oozie side we could add a dependency on > > >> hive-exec-dependencies and hence all the required libraries which are > > >> included in the fat Jar would be pulled into Oozie. > > >> The next time a new dependency would be added to > hive-exec-dependencies, > > >> the Oozie build would pull it in automatically without us having to > > >> explicitly declare it. > > >> > > >> Please let me know what you think. > > >> > > >> Best, > > >> Dan > > >> > > > > > >