I think the shading should be fixed instead restoring this core jar. Providing a core-jar means that we support it and I think that would be a bad move: I believe its an irrational expectation from any project to use the same or compatible deps as against hive-exec was compiled! For example hive-exec uses an ancient guava which was released back in 2017 https://mvnrepository.com/artifact/com.google.guava/guava/22.0 and has 3 CVEs listed... and that's just one from many deps the core-jar will pull into a build. Also note that guava tends to break api quite frequently - so I guess anyone using a bit more recent guava will have a hard time consuming the artifact....
Downstream projects have had the opportunity to try and report issues with the alpha releases before the 4.0 have came out or not? If they were not doing that - I think that's not our fault!Middle ground could be to suggest them to try the shaded hive-exec jar (we still have nightly builds [1]); notify these projects to try it and report back issues - give them some time fix up any further shading issues and done.
[1] http://ci.hive.apache.org/job/hive-nightly/ cheers, Zoltan On 4/29/24 09:16, Stamatis Zampetakis wrote:
I shared the reasons behind the removal of the jar and my concerns around bringing it back. I'm still not convinced that it's needed but if the rest of the community feels that it's the right path forward then I am ok with this.Best, Stamatis On Fri, Apr 26, 2024, 2:42 PM Ayush Saxena <ayush...@gmail.com <mailto:ayush...@gmail.com>> wrote: Stamatis, Isn't the removal itself an incompatible change? There are a lot of projects using it & we suddenly removed a jar because there were some people not sure how to properly use it and were complaining about it. What about the projects which are now stuck? reading the thread at [1], there were promises made that everything will be relocated and sorted before the release, but we couldn't, AFAIK it isn't a naive task to just relocate all the dependencies. As I see here @Chao Sun , even raised concerns [2], that the removal just stops the way for upgrading downstream projects and it got countered like folks chasing the removal will help chase getting all the dependencies relocated or solve the issues for downstream. I think none volunteered. I would either recommend: * Best case we relocate all the dependencies present in hive-exec, not just one or two. Somebody volunteers to raise one PR relocating "all" and we can commit that and we should be sorted. * Restore back the core jar, because a lot of projects depend on it, the removal itself was incompatible, the removal I don't think had a clear community agreement, it was a conditional agreement, which I don't think got sorted, so we should rollback. On a lighter note, we might release with some 5000+ commits, with best performance or so, but if nobody is able to consume those release bits, I think those efforts are just getting waste, eventually people will just stick to their older versions and not even try to upgrade & we will be releasing for nobody or maybe for few folks who just have only Hive in their stack (I don't know if there are folks like that), No matter how good a product is, if people don't use it, it is gonna die :-( I think we have a ticket which talks about relocating all dependencies, I agree we should drop the core jar for sure, it leads to all the problems as Stamatis mentioned but lets restore the core jar back & we can drop it when that relocation ticket is resolved. Does that sound convincing, or even worth a thought? btw. having jars with a set of dependencies shaded and other ones unshaded is done in hadoop as well, hadoop-minicluster vs hadoop-client-minicluster & such problems by users keep on coming, eg [3] Anyone else, any thoughts? -Ayush [1] https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg <https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg> [2] https://lists.apache.org/thread/23sshgolmbpcc01npqgt03woljdy6hdn <https://lists.apache.org/thread/23sshgolmbpcc01npqgt03woljdy6hdn> [3] https://lists.apache.org/thread/f47s6bxrtslkxbc8s2gybwrxps8vk63x <https://lists.apache.org/thread/f47s6bxrtslkxbc8s2gybwrxps8vk63x> On Fri, 26 Apr 2024 at 16:37, Stamatis Zampetakis <zabe...@gmail.com <mailto:zabe...@gmail.com>> wrote: Hey Simhadri, thanks for starting this discussion. Maven has many limitations when it comes to publishing multiple artifacts from the same module. In most cases, the end result is broken and hard to use. The pom file that is published for a given module is not able to describe correctly all artifacts of the module and that's why there is one main artifact for every module; dependency declarations are usually correct for the main artifact but are not representative for the rest. For example, end-users who consume the hive-exec-core module tend to think that maven will automatically resolve all transitive dependencies and things will work as usual which is not the case. In the past, this kind of assumption created a lot of confusion on consumers of the hive-core-exec.jar with tickets and open debates that spanned for multiple months. The discussions even reached a point where people requested certain features of Hive to be reverted in order to rectify some things around transitive dependencies and the core jar. I think we should stick to the usual maven convention and just publish one artifact for each module. Adding back and claiming to support the "core" jar is a step backwards that just postpones the real problems that we need to tackle. Furthermore, I don't think that the hive-exec module was ever meant to be used as a dependency. This is mainly an application module and not a library module and that's why shading takes place. Clearly some parts from hive-exec could be considered to become a library and that would be a promising direction going forward (splitting hive-exec into other modules) but a bit outside the scope of the current discussion. From the issues outlined above the only actionable item that I see concerns the joda library so we could try to simply relocate it if it is causing issues. Finally, if someone wants to create a jar with specific contents from the hive-exec module it is rather easy to do so. I created a small POC project [1] on how someone can create something similar to the hive-exec-core.jar and incorporate it in their build. Each project has separate needs so for such customization I feel that the burden shouldn't fall on the Hive community. Best, Stamatis [1] https://github.com/zabetak/hive-core-poc <https://github.com/zabetak/hive-core-poc> On Thu, Apr 25, 2024 at 11:12 AM Simhadri G <simhad...@apache.org <mailto:simhad...@apache.org>> wrote: > > Hi Everyone, > > The hive-exec:core jar is used by spark, oozie, hudi and many other projects. Removal of the hive-exec:core jar has caused the following issues. > > Spark : https://lists.apache.org/list?dev@hive.apache.org:lte=1M:joda <https://lists.apache.org/list?dev@hive.apache.org:lte=1M:joda> > Oozie: https://lists.apache.org/thread/yld75ltf9y8d9q3cow3xqlg0fqyj6mkg <https://lists.apache.org/thread/yld75ltf9y8d9q3cow3xqlg0fqyj6mkg> > Hudi: apache/hudi#8147 > Apache IotDB: https://lists.apache.org/thread/wdqsyj89w9cvyk1pyxr83hlxpg6zp1go <https://lists.apache.org/thread/wdqsyj89w9cvyk1pyxr83hlxpg6zp1go> > Guava: https://github.com/google/guava/issues/6666 <https://github.com/google/guava/issues/6666> > joda-time: https://lists.apache.org/thread/sphgcvod3qx9wtc51ltpfyr8dpx9p294 <https://lists.apache.org/thread/sphgcvod3qx9wtc51ltpfyr8dpx9p294> > > I understand that there is prior discussion about why the hive-exec:core jar was removed here: > https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg <https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg> > > We agreed that ultimately hive-exec jar should be used over hive-exec:core but there are quite a few dependencies that need to be shaded and relocated for this. https://issues.apache.org/jira/browse/HIVE-26220 <https://issues.apache.org/jira/browse/HIVE-26220> . > > Until we shade & relocate dependencies in hive-exec, we should restore the hive-exec:core jar . The intention for this is to provide a smoother transition from the hive-exec:core to hive-exec jar for projects that depend on hive . > > Seeking inputs from the community and a way to move forward on this topic. > > I apologize in advance if I have missed anything. > > Thanks! > > Simhadri G
OpenPGP_signature.asc
Description: OpenPGP digital signature