Re: [Discussion] HIVE-28211: Restore hive-exec:core jar

2024-05-29 Thread Mergu Ravi
When is the hive-exec-core:4.1.0 expected to be released?
HIVE-28211 

On Fri, May 3, 2024 at 3:43 PM Denys Kuzmenko  wrote:

> I agree that shaded hive-exec should be the proper way to go, however, ATM
> it's a show-stopper for many downstream projects to upgrade.
> Also based on the mail threads, they clearly understand the risks of using
> an unshaded jar but still insist on keeping it.
> If we'd like to improve the project acceptance, perhaps we could allow
> some flexibility.
>


-- 

Thanks & Regards,



Ravi Mergu
SMTS-2
o:  +91 40 64535355
m: +91 9959618687
w: www.gaiansolutions.com





*Please consider the environment before printing this email.* This message
contains confidential information and is intended only for the individual
named. If you are not the named addressee you should not disseminate,
distribute or copy this e-mail


Re: [Discussion] HIVE-28211: Restore hive-exec:core jar

2024-05-03 Thread Denys Kuzmenko
I agree that shaded hive-exec should be the proper way to go, however, ATM it's 
a show-stopper for many downstream projects to upgrade. 
Also based on the mail threads, they clearly understand the risks of using an 
unshaded jar but still insist on keeping it. 
If we'd like to improve the project acceptance, perhaps we could allow some 
flexibility. 


Re: [Discussion] HIVE-28211: Restore hive-exec:core jar

2024-05-03 Thread Zoltan Haindrich

I think the shading should be fixed instead restoring this core jar.
Providing a core-jar means that we support it and I think that would be a bad 
move:
I believe its an irrational expectation from any project to use the same or 
compatible deps as against hive-exec was compiled!
For example hive-exec uses an ancient guava which was released back in 2017 
https://mvnrepository.com/artifact/com.google.guava/guava/22.0
and has 3 CVEs listed... and that's just one from many deps the core-jar will 
pull into a build.
Also note that guava tends to break api quite frequently - so I guess anyone 
using a bit more recent guava will have a hard time consuming the artifact

Downstream projects have had the opportunity to try and report issues with the 
alpha releases before the 4.0 have came out or not?
If they were not doing that - I think that's not our fault!

Middle ground could be to suggest them to try the shaded hive-exec jar (we still have nightly builds [1]); notify these projects to try it and report back issues - give 
them some time fix up any further shading issues and done.


[1] http://ci.hive.apache.org/job/hive-nightly/

cheers,
Zoltan

On 4/29/24 09:16, Stamatis Zampetakis wrote:
I shared the reasons behind the removal of the jar and my concerns around bringing it back. I'm still not convinced that it's needed but if the rest of the community feels 
that it's the right path forward then I am ok with this.


Best,
Stamatis

On Fri, Apr 26, 2024, 2:42 PM Ayush Saxena mailto:ayush...@gmail.com>> wrote:

Stamatis,
Isn't the removal itself an incompatible change? There are a lot of projects 
using it & we suddenly removed a jar because there were some people not sure 
how to
properly use it and were complaining about it.

What about the projects which are now stuck? reading the thread at [1], 
there were promises made that everything will be relocated and sorted before 
the release, but we
couldn't, AFAIK it isn't a naive task to just relocate all the dependencies.

As I see here @Chao Sun , even raised concerns [2], that the removal just 
stops the way for upgrading downstream projects and it got countered like folks 
chasing the
removal will help chase getting all the dependencies relocated or solve the 
issues for downstream. I think none volunteered.

I would either recommend:
* Best case we relocate all the dependencies present in hive-exec, not just one or 
two. Somebody volunteers to raise one PR relocating "all" and we can commit 
that and
we should be sorted.
* Restore back the core jar, because a lot of projects depend on it, the 
removal itself was incompatible, the removal I don't think had a clear 
community agreement, it
was a conditional agreement, which I don't think got sorted, so we should 
rollback.

On a lighter note, we might release with some 5000+ commits, with best 
performance or so, but if nobody is able to consume those release bits, I think 
those efforts are
just getting waste, eventually people will just stick to their older versions 
and not even try to upgrade & we will be releasing for nobody or maybe for few 
folks who
just have only Hive in their stack (I don't know if there are folks like 
that), No matter how good a product is, if people don't use it, it is gonna die 
:-(


I think we have a ticket which talks about relocating all dependencies, I 
agree we should drop the core jar for sure, it leads to all the problems as 
Stamatis mentioned
but lets restore the core jar back & we can drop it when that relocation 
ticket is resolved. Does that sound convincing, or even worth a thought?

btw. having jars with a set of dependencies shaded and other ones unshaded is 
done in hadoop as well, hadoop-minicluster vs hadoop-client-minicluster & such 
problems by
users keep on coming, eg [3]

Anyone else, any thoughts?

-Ayush

[1] https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg 

[2] https://lists.apache.org/thread/23sshgolmbpcc01npqgt03woljdy6hdn 

[3] https://lists.apache.org/thread/f47s6bxrtslkxbc8s2gybwrxps8vk63x 




On Fri, 26 Apr 2024 at 16:37, Stamatis Zampetakis mailto:zabe...@gmail.com>> wrote:

Hey Simhadri, thanks for starting this discussion.

Maven has many limitations when it comes to publishing multiple
artifacts from the same module. In most cases, the end result is
broken and hard to use. The pom file that is published for a given
module is not able to describe correctly all artifacts of the module
and that's why there is one main artifact for every module; dependency
declarations are usually correct for the main artifact but are not
representative for the rest.

For 

Re: [Discussion] HIVE-28211: Restore hive-exec:core jar

2024-05-01 Thread Denys Kuzmenko
Just found out that Amoro project is also using hive-exec:jar:core
+1 to restore


Re: [Discussion] HIVE-28211: Restore hive-exec:core jar

2024-04-29 Thread Denys Kuzmenko
Would we fix the problem by relocating just guava and joda-time? 
Here is how it's done in Impala:
https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml#L70-L77
 


Re: [Discussion] HIVE-28211: Restore hive-exec:core jar

2024-04-29 Thread Sourabh Badhya
+1. Multiple projects will benefit from this.

Thanks Simhadri for driving this discussion.

Regards,
Sourabh Badhya

On Mon, Apr 29, 2024 at 12:46 PM Stamatis Zampetakis 
wrote:

> I shared the reasons behind the removal of the jar and my concerns around
> bringing it back. I'm still not convinced that it's needed but if the rest
> of the community feels that it's the right path forward then I am ok with
> this.
>
> Best,
> Stamatis
>
> On Fri, Apr 26, 2024, 2:42 PM Ayush Saxena  wrote:
>
>> Stamatis,
>> Isn't the removal itself an incompatible change? There are a lot of
>> projects using it & we suddenly removed a jar because there were some
>> people not sure how to properly use it and were complaining about it.
>>
>> What about the projects which are now stuck? reading the thread at [1],
>> there were promises made that everything will be relocated and sorted
>> before the release, but we couldn't, AFAIK it isn't a naive task to just
>> relocate all the dependencies.
>>
>> As I see here @Chao Sun , even raised concerns [2], that the removal just
>> stops the way for upgrading downstream projects and it got countered like
>> folks chasing the removal will help chase getting all the dependencies
>> relocated or solve the issues for downstream. I think none volunteered.
>>
>> I would either recommend:
>> * Best case we relocate all the dependencies present in hive-exec, not
>> just one or two. Somebody volunteers to raise one PR relocating "all" and
>> we can commit that and we should be sorted.
>> * Restore back the core jar, because a lot of projects depend on it, the
>> removal itself was incompatible, the removal I don't think had a clear
>> community agreement, it was a conditional agreement, which I don't think
>> got sorted, so we should rollback.
>>
>> On a lighter note, we might release with some 5000+ commits, with best
>> performance or so, but if nobody is able to consume those release bits, I
>> think those efforts are just getting waste, eventually people will just
>> stick to their older versions and not even try to upgrade & we will be
>> releasing for nobody or maybe for few folks who just have only Hive in
>> their stack (I don't know if there are folks like that), No matter how good
>> a product is, if people don't use it, it is gonna die :-(
>>
>>
>> I think we have a ticket which talks about relocating all dependencies, I
>> agree we should drop the core jar for sure, it leads to all the problems as
>> Stamatis mentioned but lets restore the core jar back & we can drop it when
>> that relocation ticket is resolved. Does that sound convincing, or even
>> worth a thought?
>>
>> btw. having jars with a set of dependencies shaded and other ones
>> unshaded is done in hadoop as well, hadoop-minicluster vs
>> hadoop-client-minicluster & such problems by users keep on coming, eg [3]
>>
>> Anyone else, any thoughts?
>>
>> -Ayush
>>
>> [1] https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg
>> [2] https://lists.apache.org/thread/23sshgolmbpcc01npqgt03woljdy6hdn
>> [3] https://lists.apache.org/thread/f47s6bxrtslkxbc8s2gybwrxps8vk63x
>>
>>
>>
>> On Fri, 26 Apr 2024 at 16:37, Stamatis Zampetakis 
>> wrote:
>>
>>> Hey Simhadri, thanks for starting this discussion.
>>>
>>> Maven has many limitations when it comes to publishing multiple
>>> artifacts from the same module. In most cases, the end result is
>>> broken and hard to use. The pom file that is published for a given
>>> module is not able to describe correctly all artifacts of the module
>>> and that's why there is one main artifact for every module; dependency
>>> declarations are usually correct for the main artifact but are not
>>> representative for the rest.
>>>
>>> For example, end-users who consume the hive-exec-core module tend to
>>> think that maven will automatically resolve all transitive
>>> dependencies and things will work as usual which is not the case. In
>>> the past, this kind of assumption created a lot of confusion on
>>> consumers of the hive-core-exec.jar with tickets and open debates that
>>> spanned for multiple months. The discussions even reached a point
>>> where people requested certain features of Hive to be reverted in
>>> order to rectify some things around transitive dependencies and the
>>> core jar.
>>>
>>> I think we should stick to the usual maven convention and just publish
>>> one artifact for each module. Adding back and claiming to support the
>>> "core" jar is a step backwards that just postpones the real problems
>>> that we need to tackle.
>>>
>>> Furthermore, I don't think that the hive-exec module was ever meant to
>>> be used as a dependency. This is mainly an application module and not
>>> a library module and that's why shading takes place. Clearly some
>>> parts from hive-exec could be considered to become a library and that
>>> would be a promising direction going forward (splitting hive-exec into
>>> other modules) but a bit outside the scope of the current discussion.
>>>
>>> 

Re: [Discussion] HIVE-28211: Restore hive-exec:core jar

2024-04-29 Thread Stamatis Zampetakis
I shared the reasons behind the removal of the jar and my concerns around
bringing it back. I'm still not convinced that it's needed but if the rest
of the community feels that it's the right path forward then I am ok with
this.

Best,
Stamatis

On Fri, Apr 26, 2024, 2:42 PM Ayush Saxena  wrote:

> Stamatis,
> Isn't the removal itself an incompatible change? There are a lot of
> projects using it & we suddenly removed a jar because there were some
> people not sure how to properly use it and were complaining about it.
>
> What about the projects which are now stuck? reading the thread at [1],
> there were promises made that everything will be relocated and sorted
> before the release, but we couldn't, AFAIK it isn't a naive task to just
> relocate all the dependencies.
>
> As I see here @Chao Sun , even raised concerns [2], that the removal just
> stops the way for upgrading downstream projects and it got countered like
> folks chasing the removal will help chase getting all the dependencies
> relocated or solve the issues for downstream. I think none volunteered.
>
> I would either recommend:
> * Best case we relocate all the dependencies present in hive-exec, not
> just one or two. Somebody volunteers to raise one PR relocating "all" and
> we can commit that and we should be sorted.
> * Restore back the core jar, because a lot of projects depend on it, the
> removal itself was incompatible, the removal I don't think had a clear
> community agreement, it was a conditional agreement, which I don't think
> got sorted, so we should rollback.
>
> On a lighter note, we might release with some 5000+ commits, with best
> performance or so, but if nobody is able to consume those release bits, I
> think those efforts are just getting waste, eventually people will just
> stick to their older versions and not even try to upgrade & we will be
> releasing for nobody or maybe for few folks who just have only Hive in
> their stack (I don't know if there are folks like that), No matter how good
> a product is, if people don't use it, it is gonna die :-(
>
>
> I think we have a ticket which talks about relocating all dependencies, I
> agree we should drop the core jar for sure, it leads to all the problems as
> Stamatis mentioned but lets restore the core jar back & we can drop it when
> that relocation ticket is resolved. Does that sound convincing, or even
> worth a thought?
>
> btw. having jars with a set of dependencies shaded and other ones unshaded
> is done in hadoop as well, hadoop-minicluster vs hadoop-client-minicluster
> & such problems by users keep on coming, eg [3]
>
> Anyone else, any thoughts?
>
> -Ayush
>
> [1] https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg
> [2] https://lists.apache.org/thread/23sshgolmbpcc01npqgt03woljdy6hdn
> [3] https://lists.apache.org/thread/f47s6bxrtslkxbc8s2gybwrxps8vk63x
>
>
>
> On Fri, 26 Apr 2024 at 16:37, Stamatis Zampetakis 
> wrote:
>
>> Hey Simhadri, thanks for starting this discussion.
>>
>> Maven has many limitations when it comes to publishing multiple
>> artifacts from the same module. In most cases, the end result is
>> broken and hard to use. The pom file that is published for a given
>> module is not able to describe correctly all artifacts of the module
>> and that's why there is one main artifact for every module; dependency
>> declarations are usually correct for the main artifact but are not
>> representative for the rest.
>>
>> For example, end-users who consume the hive-exec-core module tend to
>> think that maven will automatically resolve all transitive
>> dependencies and things will work as usual which is not the case. In
>> the past, this kind of assumption created a lot of confusion on
>> consumers of the hive-core-exec.jar with tickets and open debates that
>> spanned for multiple months. The discussions even reached a point
>> where people requested certain features of Hive to be reverted in
>> order to rectify some things around transitive dependencies and the
>> core jar.
>>
>> I think we should stick to the usual maven convention and just publish
>> one artifact for each module. Adding back and claiming to support the
>> "core" jar is a step backwards that just postpones the real problems
>> that we need to tackle.
>>
>> Furthermore, I don't think that the hive-exec module was ever meant to
>> be used as a dependency. This is mainly an application module and not
>> a library module and that's why shading takes place. Clearly some
>> parts from hive-exec could be considered to become a library and that
>> would be a promising direction going forward (splitting hive-exec into
>> other modules) but a bit outside the scope of the current discussion.
>>
>> From the issues outlined above the only actionable item that I see
>> concerns the joda library so we could try to simply relocate it if it
>> is causing issues.
>>
>> Finally, if someone wants to create a jar with specific contents from
>> the hive-exec module it is rather easy to do 

Re: [Discussion] HIVE-28211: Restore hive-exec:core jar

2024-04-26 Thread Ayush Saxena
Stamatis,
Isn't the removal itself an incompatible change? There are a lot of
projects using it & we suddenly removed a jar because there were some
people not sure how to properly use it and were complaining about it.

What about the projects which are now stuck? reading the thread at [1],
there were promises made that everything will be relocated and sorted
before the release, but we couldn't, AFAIK it isn't a naive task to just
relocate all the dependencies.

As I see here @Chao Sun , even raised concerns [2], that the removal just
stops the way for upgrading downstream projects and it got countered like
folks chasing the removal will help chase getting all the dependencies
relocated or solve the issues for downstream. I think none volunteered.

I would either recommend:
* Best case we relocate all the dependencies present in hive-exec, not just
one or two. Somebody volunteers to raise one PR relocating "all" and we can
commit that and we should be sorted.
* Restore back the core jar, because a lot of projects depend on it, the
removal itself was incompatible, the removal I don't think had a clear
community agreement, it was a conditional agreement, which I don't think
got sorted, so we should rollback.

On a lighter note, we might release with some 5000+ commits, with best
performance or so, but if nobody is able to consume those release bits, I
think those efforts are just getting waste, eventually people will just
stick to their older versions and not even try to upgrade & we will be
releasing for nobody or maybe for few folks who just have only Hive in
their stack (I don't know if there are folks like that), No matter how good
a product is, if people don't use it, it is gonna die :-(


I think we have a ticket which talks about relocating all dependencies, I
agree we should drop the core jar for sure, it leads to all the problems as
Stamatis mentioned but lets restore the core jar back & we can drop it when
that relocation ticket is resolved. Does that sound convincing, or even
worth a thought?

btw. having jars with a set of dependencies shaded and other ones unshaded
is done in hadoop as well, hadoop-minicluster vs hadoop-client-minicluster
& such problems by users keep on coming, eg [3]

Anyone else, any thoughts?

-Ayush

[1] https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg
[2] https://lists.apache.org/thread/23sshgolmbpcc01npqgt03woljdy6hdn
[3] https://lists.apache.org/thread/f47s6bxrtslkxbc8s2gybwrxps8vk63x



On Fri, 26 Apr 2024 at 16:37, Stamatis Zampetakis  wrote:

> Hey Simhadri, thanks for starting this discussion.
>
> Maven has many limitations when it comes to publishing multiple
> artifacts from the same module. In most cases, the end result is
> broken and hard to use. The pom file that is published for a given
> module is not able to describe correctly all artifacts of the module
> and that's why there is one main artifact for every module; dependency
> declarations are usually correct for the main artifact but are not
> representative for the rest.
>
> For example, end-users who consume the hive-exec-core module tend to
> think that maven will automatically resolve all transitive
> dependencies and things will work as usual which is not the case. In
> the past, this kind of assumption created a lot of confusion on
> consumers of the hive-core-exec.jar with tickets and open debates that
> spanned for multiple months. The discussions even reached a point
> where people requested certain features of Hive to be reverted in
> order to rectify some things around transitive dependencies and the
> core jar.
>
> I think we should stick to the usual maven convention and just publish
> one artifact for each module. Adding back and claiming to support the
> "core" jar is a step backwards that just postpones the real problems
> that we need to tackle.
>
> Furthermore, I don't think that the hive-exec module was ever meant to
> be used as a dependency. This is mainly an application module and not
> a library module and that's why shading takes place. Clearly some
> parts from hive-exec could be considered to become a library and that
> would be a promising direction going forward (splitting hive-exec into
> other modules) but a bit outside the scope of the current discussion.
>
> From the issues outlined above the only actionable item that I see
> concerns the joda library so we could try to simply relocate it if it
> is causing issues.
>
> Finally, if someone wants to create a jar with specific contents from
> the hive-exec module it is rather easy to do so. I created a small POC
> project [1] on how someone can create something similar to the
> hive-exec-core.jar and incorporate it in their build. Each project has
> separate needs so for such customization I feel that the burden
> shouldn't fall on the Hive community.
>
> Best,
> Stamatis
>
> [1] https://github.com/zabetak/hive-core-poc
>
> On Thu, Apr 25, 2024 at 11:12 AM Simhadri G  wrote:
> >
> > Hi Everyone,
> >
> > The 

Re: [Discussion] HIVE-28211: Restore hive-exec:core jar

2024-04-26 Thread Stamatis Zampetakis
Hey Simhadri, thanks for starting this discussion.

Maven has many limitations when it comes to publishing multiple
artifacts from the same module. In most cases, the end result is
broken and hard to use. The pom file that is published for a given
module is not able to describe correctly all artifacts of the module
and that's why there is one main artifact for every module; dependency
declarations are usually correct for the main artifact but are not
representative for the rest.

For example, end-users who consume the hive-exec-core module tend to
think that maven will automatically resolve all transitive
dependencies and things will work as usual which is not the case. In
the past, this kind of assumption created a lot of confusion on
consumers of the hive-core-exec.jar with tickets and open debates that
spanned for multiple months. The discussions even reached a point
where people requested certain features of Hive to be reverted in
order to rectify some things around transitive dependencies and the
core jar.

I think we should stick to the usual maven convention and just publish
one artifact for each module. Adding back and claiming to support the
"core" jar is a step backwards that just postpones the real problems
that we need to tackle.

Furthermore, I don't think that the hive-exec module was ever meant to
be used as a dependency. This is mainly an application module and not
a library module and that's why shading takes place. Clearly some
parts from hive-exec could be considered to become a library and that
would be a promising direction going forward (splitting hive-exec into
other modules) but a bit outside the scope of the current discussion.

>From the issues outlined above the only actionable item that I see
concerns the joda library so we could try to simply relocate it if it
is causing issues.

Finally, if someone wants to create a jar with specific contents from
the hive-exec module it is rather easy to do so. I created a small POC
project [1] on how someone can create something similar to the
hive-exec-core.jar and incorporate it in their build. Each project has
separate needs so for such customization I feel that the burden
shouldn't fall on the Hive community.

Best,
Stamatis

[1] https://github.com/zabetak/hive-core-poc

On Thu, Apr 25, 2024 at 11:12 AM Simhadri G  wrote:
>
> Hi Everyone,
>
> The hive-exec:core jar is used by spark, oozie, hudi and many other projects. 
> Removal of the hive-exec:core jar has caused the following issues.
>
> Spark : https://lists.apache.org/list?dev@hive.apache.org:lte=1M:joda
> Oozie: https://lists.apache.org/thread/yld75ltf9y8d9q3cow3xqlg0fqyj6mkg
> Hudi: apache/hudi#8147
> Apache IotDB: https://lists.apache.org/thread/wdqsyj89w9cvyk1pyxr83hlxpg6zp1go
> Guava: https://github.com/google/guava/issues/
> joda-time: https://lists.apache.org/thread/sphgcvod3qx9wtc51ltpfyr8dpx9p294
>
> I understand that there is prior discussion about why the hive-exec:core jar 
> was removed here:
> https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg
>
> We agreed that ultimately hive-exec jar should be used over hive-exec:core 
> but there are quite a few dependencies that need to be shaded and relocated 
> for this.  https://issues.apache.org/jira/browse/HIVE-26220 .
>
> Until we shade & relocate dependencies in hive-exec, we should restore the 
> hive-exec:core jar . The intention for this is to provide a smoother 
> transition from the hive-exec:core to hive-exec jar for projects that depend 
> on hive .
>
> Seeking inputs from the community  and a way to move forward on this topic.
>
> I apologize in advance if I have missed anything.
>
> Thanks!
>
> Simhadri G


[Discussion] HIVE-28211: Restore hive-exec:core jar

2024-04-25 Thread Simhadri G
Hi Everyone,

The hive-exec:core jar is used by spark, oozie, hudi and many other
projects. Removal of the hive-exec:core jar has caused the following issues.

   - Spark : https://lists.apache.org/list?dev@hive.apache.org:lte=1M:joda
   - Oozie: https://lists.apache.org/thread/yld75ltf9y8d9q3cow3xqlg0fqyj6mkg
   - Hudi: apache/hudi#8147 
   - Apache IotDB:
https://lists.apache.org/thread/wdqsyj89w9cvyk1pyxr83hlxpg6zp1go

   - Guava: https://github.com/google/guava/issues/
   - joda-time:
   https://lists.apache.org/thread/sphgcvod3qx9wtc51ltpfyr8dpx9p294

I understand that there is prior discussion about why the hive-exec:core
jar was removed here:
https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg

We agreed that ultimately hive-exec jar should be used over hive-exec:core
but there are quite a few dependencies that need to be shaded and relocated
for this.  https://issues.apache.org/jira/browse/HIVE-26220 .

Until we shade & relocate dependencies in hive-exec, we should restore the
hive-exec:core jar . The intention for this is to provide a smoother
transition from the hive-exec:core to hive-exec jar for projects that
depend on hive .

Seeking inputs from the community  and a way to move forward on this topic.

I apologize in advance if I have missed anything.

Thanks!

Simhadri G