+1. Multiple projects will benefit from this.

Thanks Simhadri for driving this discussion.

Regards,
Sourabh Badhya

On Mon, Apr 29, 2024 at 12:46 PM Stamatis Zampetakis <zabe...@gmail.com>
wrote:

> I shared the reasons behind the removal of the jar and my concerns around
> bringing it back. I'm still not convinced that it's needed but if the rest
> of the community feels that it's the right path forward then I am ok with
> this.
>
> Best,
> Stamatis
>
> On Fri, Apr 26, 2024, 2:42 PM Ayush Saxena <ayush...@gmail.com> wrote:
>
>> Stamatis,
>> Isn't the removal itself an incompatible change? There are a lot of
>> projects using it & we suddenly removed a jar because there were some
>> people not sure how to properly use it and were complaining about it.
>>
>> What about the projects which are now stuck? reading the thread at [1],
>> there were promises made that everything will be relocated and sorted
>> before the release, but we couldn't, AFAIK it isn't a naive task to just
>> relocate all the dependencies.
>>
>> As I see here @Chao Sun , even raised concerns [2], that the removal just
>> stops the way for upgrading downstream projects and it got countered like
>> folks chasing the removal will help chase getting all the dependencies
>> relocated or solve the issues for downstream. I think none volunteered.
>>
>> I would either recommend:
>> * Best case we relocate all the dependencies present in hive-exec, not
>> just one or two. Somebody volunteers to raise one PR relocating "all" and
>> we can commit that and we should be sorted.
>> * Restore back the core jar, because a lot of projects depend on it, the
>> removal itself was incompatible, the removal I don't think had a clear
>> community agreement, it was a conditional agreement, which I don't think
>> got sorted, so we should rollback.
>>
>> On a lighter note, we might release with some 5000+ commits, with best
>> performance or so, but if nobody is able to consume those release bits, I
>> think those efforts are just getting waste, eventually people will just
>> stick to their older versions and not even try to upgrade & we will be
>> releasing for nobody or maybe for few folks who just have only Hive in
>> their stack (I don't know if there are folks like that), No matter how good
>> a product is, if people don't use it, it is gonna die :-(
>>
>>
>> I think we have a ticket which talks about relocating all dependencies, I
>> agree we should drop the core jar for sure, it leads to all the problems as
>> Stamatis mentioned but lets restore the core jar back & we can drop it when
>> that relocation ticket is resolved. Does that sound convincing, or even
>> worth a thought?
>>
>> btw. having jars with a set of dependencies shaded and other ones
>> unshaded is done in hadoop as well, hadoop-minicluster vs
>> hadoop-client-minicluster & such problems by users keep on coming, eg [3]
>>
>> Anyone else, any thoughts?
>>
>> -Ayush
>>
>> [1] https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg
>> [2] https://lists.apache.org/thread/23sshgolmbpcc01npqgt03woljdy6hdn
>> [3] https://lists.apache.org/thread/f47s6bxrtslkxbc8s2gybwrxps8vk63x
>>
>>
>>
>> On Fri, 26 Apr 2024 at 16:37, Stamatis Zampetakis <zabe...@gmail.com>
>> wrote:
>>
>>> Hey Simhadri, thanks for starting this discussion.
>>>
>>> Maven has many limitations when it comes to publishing multiple
>>> artifacts from the same module. In most cases, the end result is
>>> broken and hard to use. The pom file that is published for a given
>>> module is not able to describe correctly all artifacts of the module
>>> and that's why there is one main artifact for every module; dependency
>>> declarations are usually correct for the main artifact but are not
>>> representative for the rest.
>>>
>>> For example, end-users who consume the hive-exec-core module tend to
>>> think that maven will automatically resolve all transitive
>>> dependencies and things will work as usual which is not the case. In
>>> the past, this kind of assumption created a lot of confusion on
>>> consumers of the hive-core-exec.jar with tickets and open debates that
>>> spanned for multiple months. The discussions even reached a point
>>> where people requested certain features of Hive to be reverted in
>>> order to rectify some things around transitive dependencies and the
>>> core jar.
>>>
>>> I think we should stick to the usual maven convention and just publish
>>> one artifact for each module. Adding back and claiming to support the
>>> "core" jar is a step backwards that just postpones the real problems
>>> that we need to tackle.
>>>
>>> Furthermore, I don't think that the hive-exec module was ever meant to
>>> be used as a dependency. This is mainly an application module and not
>>> a library module and that's why shading takes place. Clearly some
>>> parts from hive-exec could be considered to become a library and that
>>> would be a promising direction going forward (splitting hive-exec into
>>> other modules) but a bit outside the scope of the current discussion.
>>>
>>> From the issues outlined above the only actionable item that I see
>>> concerns the joda library so we could try to simply relocate it if it
>>> is causing issues.
>>>
>>> Finally, if someone wants to create a jar with specific contents from
>>> the hive-exec module it is rather easy to do so. I created a small POC
>>> project [1] on how someone can create something similar to the
>>> hive-exec-core.jar and incorporate it in their build. Each project has
>>> separate needs so for such customization I feel that the burden
>>> shouldn't fall on the Hive community.
>>>
>>> Best,
>>> Stamatis
>>>
>>> [1] https://github.com/zabetak/hive-core-poc
>>>
>>> On Thu, Apr 25, 2024 at 11:12 AM Simhadri G <simhad...@apache.org>
>>> wrote:
>>> >
>>> > Hi Everyone,
>>> >
>>> > The hive-exec:core jar is used by spark, oozie, hudi and many other
>>> projects. Removal of the hive-exec:core jar has caused the following issues.
>>> >
>>> > Spark : https://lists.apache.org/list?dev@hive.apache.org:lte=1M:joda
>>> > Oozie:
>>> https://lists.apache.org/thread/yld75ltf9y8d9q3cow3xqlg0fqyj6mkg
>>> > Hudi: apache/hudi#8147
>>> > Apache IotDB:
>>> https://lists.apache.org/thread/wdqsyj89w9cvyk1pyxr83hlxpg6zp1go
>>> > Guava: https://github.com/google/guava/issues/6666
>>> > joda-time:
>>> https://lists.apache.org/thread/sphgcvod3qx9wtc51ltpfyr8dpx9p294
>>> >
>>> > I understand that there is prior discussion about why the
>>> hive-exec:core jar was removed here:
>>> > https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg
>>> >
>>> > We agreed that ultimately hive-exec jar should be used over
>>> hive-exec:core but there are quite a few dependencies that need to be
>>> shaded and relocated for this.
>>> https://issues.apache.org/jira/browse/HIVE-26220 .
>>> >
>>> > Until we shade & relocate dependencies in hive-exec, we should restore
>>> the hive-exec:core jar . The intention for this is to provide a smoother
>>> transition from the hive-exec:core to hive-exec jar for projects that
>>> depend on hive .
>>> >
>>> > Seeking inputs from the community  and a way to move forward on this
>>> topic.
>>> >
>>> > I apologize in advance if I have missed anything.
>>> >
>>> > Thanks!
>>> >
>>> > Simhadri G
>>>
>>

Reply via email to