+1. Multiple projects will benefit from this. Thanks Simhadri for driving this discussion.
Regards, Sourabh Badhya On Mon, Apr 29, 2024 at 12:46 PM Stamatis Zampetakis <zabe...@gmail.com> wrote: > I shared the reasons behind the removal of the jar and my concerns around > bringing it back. I'm still not convinced that it's needed but if the rest > of the community feels that it's the right path forward then I am ok with > this. > > Best, > Stamatis > > On Fri, Apr 26, 2024, 2:42 PM Ayush Saxena <ayush...@gmail.com> wrote: > >> Stamatis, >> Isn't the removal itself an incompatible change? There are a lot of >> projects using it & we suddenly removed a jar because there were some >> people not sure how to properly use it and were complaining about it. >> >> What about the projects which are now stuck? reading the thread at [1], >> there were promises made that everything will be relocated and sorted >> before the release, but we couldn't, AFAIK it isn't a naive task to just >> relocate all the dependencies. >> >> As I see here @Chao Sun , even raised concerns [2], that the removal just >> stops the way for upgrading downstream projects and it got countered like >> folks chasing the removal will help chase getting all the dependencies >> relocated or solve the issues for downstream. I think none volunteered. >> >> I would either recommend: >> * Best case we relocate all the dependencies present in hive-exec, not >> just one or two. Somebody volunteers to raise one PR relocating "all" and >> we can commit that and we should be sorted. >> * Restore back the core jar, because a lot of projects depend on it, the >> removal itself was incompatible, the removal I don't think had a clear >> community agreement, it was a conditional agreement, which I don't think >> got sorted, so we should rollback. >> >> On a lighter note, we might release with some 5000+ commits, with best >> performance or so, but if nobody is able to consume those release bits, I >> think those efforts are just getting waste, eventually people will just >> stick to their older versions and not even try to upgrade & we will be >> releasing for nobody or maybe for few folks who just have only Hive in >> their stack (I don't know if there are folks like that), No matter how good >> a product is, if people don't use it, it is gonna die :-( >> >> >> I think we have a ticket which talks about relocating all dependencies, I >> agree we should drop the core jar for sure, it leads to all the problems as >> Stamatis mentioned but lets restore the core jar back & we can drop it when >> that relocation ticket is resolved. Does that sound convincing, or even >> worth a thought? >> >> btw. having jars with a set of dependencies shaded and other ones >> unshaded is done in hadoop as well, hadoop-minicluster vs >> hadoop-client-minicluster & such problems by users keep on coming, eg [3] >> >> Anyone else, any thoughts? >> >> -Ayush >> >> [1] https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg >> [2] https://lists.apache.org/thread/23sshgolmbpcc01npqgt03woljdy6hdn >> [3] https://lists.apache.org/thread/f47s6bxrtslkxbc8s2gybwrxps8vk63x >> >> >> >> On Fri, 26 Apr 2024 at 16:37, Stamatis Zampetakis <zabe...@gmail.com> >> wrote: >> >>> Hey Simhadri, thanks for starting this discussion. >>> >>> Maven has many limitations when it comes to publishing multiple >>> artifacts from the same module. In most cases, the end result is >>> broken and hard to use. The pom file that is published for a given >>> module is not able to describe correctly all artifacts of the module >>> and that's why there is one main artifact for every module; dependency >>> declarations are usually correct for the main artifact but are not >>> representative for the rest. >>> >>> For example, end-users who consume the hive-exec-core module tend to >>> think that maven will automatically resolve all transitive >>> dependencies and things will work as usual which is not the case. In >>> the past, this kind of assumption created a lot of confusion on >>> consumers of the hive-core-exec.jar with tickets and open debates that >>> spanned for multiple months. The discussions even reached a point >>> where people requested certain features of Hive to be reverted in >>> order to rectify some things around transitive dependencies and the >>> core jar. >>> >>> I think we should stick to the usual maven convention and just publish >>> one artifact for each module. Adding back and claiming to support the >>> "core" jar is a step backwards that just postpones the real problems >>> that we need to tackle. >>> >>> Furthermore, I don't think that the hive-exec module was ever meant to >>> be used as a dependency. This is mainly an application module and not >>> a library module and that's why shading takes place. Clearly some >>> parts from hive-exec could be considered to become a library and that >>> would be a promising direction going forward (splitting hive-exec into >>> other modules) but a bit outside the scope of the current discussion. >>> >>> From the issues outlined above the only actionable item that I see >>> concerns the joda library so we could try to simply relocate it if it >>> is causing issues. >>> >>> Finally, if someone wants to create a jar with specific contents from >>> the hive-exec module it is rather easy to do so. I created a small POC >>> project [1] on how someone can create something similar to the >>> hive-exec-core.jar and incorporate it in their build. Each project has >>> separate needs so for such customization I feel that the burden >>> shouldn't fall on the Hive community. >>> >>> Best, >>> Stamatis >>> >>> [1] https://github.com/zabetak/hive-core-poc >>> >>> On Thu, Apr 25, 2024 at 11:12 AM Simhadri G <simhad...@apache.org> >>> wrote: >>> > >>> > Hi Everyone, >>> > >>> > The hive-exec:core jar is used by spark, oozie, hudi and many other >>> projects. Removal of the hive-exec:core jar has caused the following issues. >>> > >>> > Spark : https://lists.apache.org/list?dev@hive.apache.org:lte=1M:joda >>> > Oozie: >>> https://lists.apache.org/thread/yld75ltf9y8d9q3cow3xqlg0fqyj6mkg >>> > Hudi: apache/hudi#8147 >>> > Apache IotDB: >>> https://lists.apache.org/thread/wdqsyj89w9cvyk1pyxr83hlxpg6zp1go >>> > Guava: https://github.com/google/guava/issues/6666 >>> > joda-time: >>> https://lists.apache.org/thread/sphgcvod3qx9wtc51ltpfyr8dpx9p294 >>> > >>> > I understand that there is prior discussion about why the >>> hive-exec:core jar was removed here: >>> > https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg >>> > >>> > We agreed that ultimately hive-exec jar should be used over >>> hive-exec:core but there are quite a few dependencies that need to be >>> shaded and relocated for this. >>> https://issues.apache.org/jira/browse/HIVE-26220 . >>> > >>> > Until we shade & relocate dependencies in hive-exec, we should restore >>> the hive-exec:core jar . The intention for this is to provide a smoother >>> transition from the hive-exec:core to hive-exec jar for projects that >>> depend on hive . >>> > >>> > Seeking inputs from the community and a way to move forward on this >>> topic. >>> > >>> > I apologize in advance if I have missed anything. >>> > >>> > Thanks! >>> > >>> > Simhadri G >>> >>