I think the shading should be fixed instead restoring this core jar.
Providing a core-jar means that we support it and I think that would be a bad 
move:
I believe its an irrational expectation from any project to use the same or 
compatible deps as against hive-exec was compiled!
For example hive-exec uses an ancient guava which was released back in 2017 
https://mvnrepository.com/artifact/com.google.guava/guava/22.0
and has 3 CVEs listed... and that's just one from many deps the core-jar will 
pull into a build.
Also note that guava tends to break api quite frequently - so I guess anyone 
using a bit more recent guava will have a hard time consuming the artifact....

Downstream projects have had the opportunity to try and report issues with the 
alpha releases before the 4.0 have came out or not?
If they were not doing that - I think that's not our fault!

Middle ground could be to suggest them to try the shaded hive-exec jar (we still have nightly builds [1]); notify these projects to try it and report back issues - give them some time fix up any further shading issues and done.

[1] http://ci.hive.apache.org/job/hive-nightly/

cheers,
Zoltan

On 4/29/24 09:16, Stamatis Zampetakis wrote:
I shared the reasons behind the removal of the jar and my concerns around bringing it back. I'm still not convinced that it's needed but if the rest of the community feels that it's the right path forward then I am ok with this.

Best,
Stamatis

On Fri, Apr 26, 2024, 2:42 PM Ayush Saxena <ayush...@gmail.com 
<mailto:ayush...@gmail.com>> wrote:

    Stamatis,
    Isn't the removal itself an incompatible change? There are a lot of projects 
using it & we suddenly removed a jar because there were some people not sure 
how to
    properly use it and were complaining about it.

    What about the projects which are now stuck? reading the thread at [1], 
there were promises made that everything will be relocated and sorted before 
the release, but we
    couldn't, AFAIK it isn't a naive task to just relocate all the dependencies.

    As I see here @Chao Sun , even raised concerns [2], that the removal just 
stops the way for upgrading downstream projects and it got countered like folks 
chasing the
    removal will help chase getting all the dependencies relocated or solve the 
issues for downstream. I think none volunteered.

    I would either recommend:
    * Best case we relocate all the dependencies present in hive-exec, not just one or 
two. Somebody volunteers to raise one PR relocating "all" and we can commit 
that and
    we should be sorted.
    * Restore back the core jar, because a lot of projects depend on it, the 
removal itself was incompatible, the removal I don't think had a clear 
community agreement, it
    was a conditional agreement, which I don't think got sorted, so we should 
rollback.

    On a lighter note, we might release with some 5000+ commits, with best 
performance or so, but if nobody is able to consume those release bits, I think 
those efforts are
    just getting waste, eventually people will just stick to their older versions 
and not even try to upgrade & we will be releasing for nobody or maybe for few 
folks who
    just have only Hive in their stack (I don't know if there are folks like 
that), No matter how good a product is, if people don't use it, it is gonna die 
:-(


    I think we have a ticket which talks about relocating all dependencies, I 
agree we should drop the core jar for sure, it leads to all the problems as 
Stamatis mentioned
    but lets restore the core jar back & we can drop it when that relocation 
ticket is resolved. Does that sound convincing, or even worth a thought?

    btw. having jars with a set of dependencies shaded and other ones unshaded is 
done in hadoop as well, hadoop-minicluster vs hadoop-client-minicluster & such 
problems by
    users keep on coming, eg [3]

    Anyone else, any thoughts?

    -Ayush

    [1] https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg 
<https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg>
    [2] https://lists.apache.org/thread/23sshgolmbpcc01npqgt03woljdy6hdn 
<https://lists.apache.org/thread/23sshgolmbpcc01npqgt03woljdy6hdn>
    [3] https://lists.apache.org/thread/f47s6bxrtslkxbc8s2gybwrxps8vk63x 
<https://lists.apache.org/thread/f47s6bxrtslkxbc8s2gybwrxps8vk63x>



    On Fri, 26 Apr 2024 at 16:37, Stamatis Zampetakis <zabe...@gmail.com 
<mailto:zabe...@gmail.com>> wrote:

        Hey Simhadri, thanks for starting this discussion.

        Maven has many limitations when it comes to publishing multiple
        artifacts from the same module. In most cases, the end result is
        broken and hard to use. The pom file that is published for a given
        module is not able to describe correctly all artifacts of the module
        and that's why there is one main artifact for every module; dependency
        declarations are usually correct for the main artifact but are not
        representative for the rest.

        For example, end-users who consume the hive-exec-core module tend to
        think that maven will automatically resolve all transitive
        dependencies and things will work as usual which is not the case. In
        the past, this kind of assumption created a lot of confusion on
        consumers of the hive-core-exec.jar with tickets and open debates that
        spanned for multiple months. The discussions even reached a point
        where people requested certain features of Hive to be reverted in
        order to rectify some things around transitive dependencies and the
        core jar.

        I think we should stick to the usual maven convention and just publish
        one artifact for each module. Adding back and claiming to support the
        "core" jar is a step backwards that just postpones the real problems
        that we need to tackle.

        Furthermore, I don't think that the hive-exec module was ever meant to
        be used as a dependency. This is mainly an application module and not
        a library module and that's why shading takes place. Clearly some
        parts from hive-exec could be considered to become a library and that
        would be a promising direction going forward (splitting hive-exec into
        other modules) but a bit outside the scope of the current discussion.

         From the issues outlined above the only actionable item that I see
        concerns the joda library so we could try to simply relocate it if it
        is causing issues.

        Finally, if someone wants to create a jar with specific contents from
        the hive-exec module it is rather easy to do so. I created a small POC
        project [1] on how someone can create something similar to the
        hive-exec-core.jar and incorporate it in their build. Each project has
        separate needs so for such customization I feel that the burden
        shouldn't fall on the Hive community.

        Best,
        Stamatis

        [1] https://github.com/zabetak/hive-core-poc 
<https://github.com/zabetak/hive-core-poc>

        On Thu, Apr 25, 2024 at 11:12 AM Simhadri G <simhad...@apache.org 
<mailto:simhad...@apache.org>> wrote:
         >
         > Hi Everyone,
         >
         > The hive-exec:core jar is used by spark, oozie, hudi and many other 
projects. Removal of the hive-exec:core jar has caused the following issues.
         >
         > Spark : https://lists.apache.org/list?dev@hive.apache.org:lte=1M:joda 
<https://lists.apache.org/list?dev@hive.apache.org:lte=1M:joda>
         > Oozie: https://lists.apache.org/thread/yld75ltf9y8d9q3cow3xqlg0fqyj6mkg 
<https://lists.apache.org/thread/yld75ltf9y8d9q3cow3xqlg0fqyj6mkg>
         > Hudi: apache/hudi#8147
         > Apache IotDB: 
https://lists.apache.org/thread/wdqsyj89w9cvyk1pyxr83hlxpg6zp1go 
<https://lists.apache.org/thread/wdqsyj89w9cvyk1pyxr83hlxpg6zp1go>
         > Guava: https://github.com/google/guava/issues/6666 
<https://github.com/google/guava/issues/6666>
         > joda-time: 
https://lists.apache.org/thread/sphgcvod3qx9wtc51ltpfyr8dpx9p294 
<https://lists.apache.org/thread/sphgcvod3qx9wtc51ltpfyr8dpx9p294>
         >
         > I understand that there is prior discussion about why the 
hive-exec:core jar was removed here:
         > https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg 
<https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg>
         >
         > We agreed that ultimately hive-exec jar should be used over 
hive-exec:core but there are quite a few dependencies that need to be shaded and 
relocated for this.
        https://issues.apache.org/jira/browse/HIVE-26220 
<https://issues.apache.org/jira/browse/HIVE-26220> .
         >
         > Until we shade & relocate dependencies in hive-exec, we should 
restore the hive-exec:core jar . The intention for this is to provide a smoother 
transition from
        the hive-exec:core to hive-exec jar for projects that depend on hive .
         >
         > Seeking inputs from the community  and a way to move forward on this 
topic.
         >
         > I apologize in advance if I have missed anything.
         >
         > Thanks!
         >
         > Simhadri G

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to