Re:Re: Re: Support java/11/17/21

2024-07-10 Thread lisoda
Thank you for your reply.Thank you for letting me know about this.











At 2024-07-10 16:16:16, "Ayush Saxena"  wrote:
>I know about that, I only flagged that [1] :-)
>
>That got sorted by upgrading the protobuf to 3.23, [2], & merging [3]
>
>Hadoop can't even compile JDK-8+, there are bunch of issue, Jersey
>being the biggest one, All the JDK upgrade tickets are open
>https://issues.apache.org/jira/browse/HADOOP-16795
>https://issues.apache.org/jira/browse/HADOOP-17177
>
>And they aren't being actively chased, There is no plan to drop jdk-8
>in 3.4.x. Hive will anyway move to JDK-17 in 2-3 months
>
>-Ayush
>
>
>[1] 
>https://issues.apache.org/jira/browse/HADOOP-18197?focusedCommentId=17818651=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17818651
>[2] 
>https://issues.apache.org/jira/browse/HADOOP-18197?focusedCommentId=17820711=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17820711
>[3] https://github.com/apache/hadoop/pull/6593
>
>On Wed, 10 Jul 2024 at 13:15, lisoda  wrote:
>>
>> Hello Sir.
>> Hadoop upgraded the protobuf version in 3.4.0, and it caused problems with 
>> incompatibility with some jdk8 versions.They plan to drop support for java8 
>> outright in 3.4.x.You can check out the changelog for hadoop 3.4.0.
>>
>>
>>
>>
>>
>> 在 2024-07-10 15:38:35,"Ayush Saxena"  写道:
>>
>> We are working towards supporting JDK-17, should take couple of months, we 
>> don’t have a planned deadline for that as of now
>>
>> Btw. hadoop didn’t drop support for JDK-8….
>>
>> -Ayush
>>
>> On 10 Jul 2024, at 12:52 PM, lisoda  wrote:
>>
>> 
>> Hi. Currently, Iceberg/hadoop/spark and the rest of the third-party 
>> frameworks have dropped support for JAVA8 (or are planning to do so). When 
>> will HIVE be able to support a higher version of the JDK and what progress 
>> has been made in this regard?


Re: Re: Support java/11/17/21

2024-07-10 Thread Ayush Saxena
I know about that, I only flagged that [1] :-)

That got sorted by upgrading the protobuf to 3.23, [2], & merging [3]

Hadoop can't even compile JDK-8+, there are bunch of issue, Jersey
being the biggest one, All the JDK upgrade tickets are open
https://issues.apache.org/jira/browse/HADOOP-16795
https://issues.apache.org/jira/browse/HADOOP-17177

And they aren't being actively chased, There is no plan to drop jdk-8
in 3.4.x. Hive will anyway move to JDK-17 in 2-3 months

-Ayush


[1] 
https://issues.apache.org/jira/browse/HADOOP-18197?focusedCommentId=17818651=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17818651
[2] 
https://issues.apache.org/jira/browse/HADOOP-18197?focusedCommentId=17820711=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17820711
[3] https://github.com/apache/hadoop/pull/6593

On Wed, 10 Jul 2024 at 13:15, lisoda  wrote:
>
> Hello Sir.
> Hadoop upgraded the protobuf version in 3.4.0, and it caused problems with 
> incompatibility with some jdk8 versions.They plan to drop support for java8 
> outright in 3.4.x.You can check out the changelog for hadoop 3.4.0.
>
>
>
>
>
> 在 2024-07-10 15:38:35,"Ayush Saxena"  写道:
>
> We are working towards supporting JDK-17, should take couple of months, we 
> don’t have a planned deadline for that as of now
>
> Btw. hadoop didn’t drop support for JDK-8….
>
> -Ayush
>
> On 10 Jul 2024, at 12:52 PM, lisoda  wrote:
>
> 
> Hi. Currently, Iceberg/hadoop/spark and the rest of the third-party 
> frameworks have dropped support for JAVA8 (or are planning to do so). When 
> will HIVE be able to support a higher version of the JDK and what progress 
> has been made in this regard?


Re:Re: Support java/11/17/21

2024-07-10 Thread lisoda
Hello Sir.
Hadoop upgraded the protobuf version in 3.4.0, and it caused problems with 
incompatibility with some jdk8 versions.They plan to drop support for java8 
outright in 3.4.x.You can check out the changelog for hadoop 3.4.0.











在 2024-07-10 15:38:35,"Ayush Saxena"  写道:

We are working towards supporting JDK-17, should take couple of months, we 
don’t have a planned deadline for that as of now


Btw. hadoop didn’t drop support for JDK-8….



-Ayush

On 10 Jul 2024, at 12:52 PM, lisoda  wrote:



Hi. Currently, Iceberg/hadoop/spark and the rest of the third-party frameworks 
have dropped support for JAVA8 (or are planning to do so). When will HIVE be 
able to support a higher version of the JDK and what progress has been made in 
this regard?

Re: Support java/11/17/21

2024-07-10 Thread Ayush Saxena
We are working towards supporting JDK-17, should take couple of months, we 
don’t have a planned deadline for that as of now

Btw. hadoop didn’t drop support for JDK-8….

-Ayush

> On 10 Jul 2024, at 12:52 PM, lisoda  wrote:
> 
> 
> Hi. Currently, Iceberg/hadoop/spark and the rest of the third-party 
> frameworks have dropped support for JAVA8 (or are planning to do so). When 
> will HIVE be able to support a higher version of the JDK and what progress 
> has been made in this regard?


Support java/11/17/21

2024-07-10 Thread lisoda
Hi. Currently, Iceberg/hadoop/spark and the rest of the third-party frameworks 
have dropped support for JAVA8 (or are planning to do so). When will HIVE be 
able to support a higher version of the JDK and what progress has been made in 
this regard?

Re: Re: Next Hive 4.0.1 minor release

2024-06-26 Thread Okumin
Hi,

I'm posting some more suggestions.

# We may label HIVE-28352 as "hive-4.0.1-must"
I found that users were unable to upgrade some system tables from Hive
3 through schematool. It would be better to fix.
https://issues.apache.org/jira/browse/HIVE-28352

# We may drop HIVE-24167
I am the ticket owner, and we are struggling to figure out how to
resolve the problem. We can potentially skip it as branch-4.0 includes
the following workaround.
https://issues.apache.org/jira/browse/HIVE-27856

Thanks,
Okumin

On Wed, May 15, 2024 at 5:18 PM dengzhhu653  wrote:
>
>
> Thank you for your feedback and testing, Okumin! I've tagged them with 
> "hive-4.0.1-must".
>
>
> Thanks,
>
> Zhihua
>
> At 2024-05-15 10:43:59, "Okumin"  wrote:
> >Hi Zhihua,
> >
> >Thanks for driving the next release. We are actively testing 4.0.0 and
> >would like to give some suggestions.
> >
> ># HIVE-27847: Prevent query Failures on Numeric <-> Timestamp
> >We hit the issue when we ran Hive 4 with the option. I believe it is
> >worth resolving for those who want to try Hive 4, keeping
> >compatibilities with a previous version.
> >https://issues.apache.org/jira/browse/HIVE-27847
> >
> ># HIVE-28098: Fails to copy empty column statistics of materialized CTE
> >This follows up on HIVE-28080, but the current 4.0.0 includes only
> >HIVE-28080. The reasonable option to me is to revert HIVE-28080 or
> >cherry-pick HIVE-28098, all or nothing.
> >https://issues.apache.org/jira/browse/HIVE-28098
> >
> >Thanks,
> >Okumin
> >
> >On Sat, May 11, 2024 at 9:45 AM dengzhhu653  wrote:
> >>
> >> Hello Community,
> >>
> >>
> >> As you have noticed, we are going to propose the next 4.0.1 release on top 
> >> of 4.0.0, with some
> >>
> >> critical bug fixes and improvements [1]. As of now we are putting the 
> >> label "hive-4.0.1-must" on the tickets
> >>
> >> and we plan to make sure those get c-picked to branch-4.0 [2]. Please 
> >> suggest other important fixes that can be
> >>
> >> included in this release if any.
> >>
> >>
> >> We will get this minor release out as soon as possible once all the 
> >> tickets marked with "hive-4.0.1-must" get resolved and tested.
> >>
> >>
> >> [1] https://lists.apache.org/thread/rkw2toj5d74t8n5jvnkrfw77hyzn7qh3
> >>
> >> [2] 
> >> https://issues.apache.org/jira/browse/HIVE-28204?jql=labels%20%3D%20hive-4.0.1-must
> >>
> >>
> >> Thanks,
> >>
> >> Zhihua


[ANNOUNCE] Hive 2.x EOL

2024-05-20 Thread Ayush Saxena
Hi All,
The Apache Hive Community has voted to declare the 2.x release line as End of 
Life. This means no further updates or releases will be made for this release 
line.

We urge all Hive 2.x users to upgrade to the latest versions promptly to 
benefit from new features and ongoing support.

-Ayush Saxena
(On Behalf of Apache Hive PMC)

Re:Re: Next Hive 4.0.1 minor release

2024-05-15 Thread dengzhhu653



Thank you for your feedback and testing, Okumin! I've tagged them with 
"hive-4.0.1-must".




Thanks,

Zhihua

At 2024-05-15 10:43:59, "Okumin"  wrote:
>Hi Zhihua,
>
>Thanks for driving the next release. We are actively testing 4.0.0 and
>would like to give some suggestions.
>
># HIVE-27847: Prevent query Failures on Numeric <-> Timestamp
>We hit the issue when we ran Hive 4 with the option. I believe it is
>worth resolving for those who want to try Hive 4, keeping
>compatibilities with a previous version.
>https://issues.apache.org/jira/browse/HIVE-27847
>
># HIVE-28098: Fails to copy empty column statistics of materialized CTE
>This follows up on HIVE-28080, but the current 4.0.0 includes only
>HIVE-28080. The reasonable option to me is to revert HIVE-28080 or
>cherry-pick HIVE-28098, all or nothing.
>https://issues.apache.org/jira/browse/HIVE-28098
>
>Thanks,
>Okumin
>
>On Sat, May 11, 2024 at 9:45 AM dengzhhu653  wrote:
>>
>> Hello Community,
>>
>>
>> As you have noticed, we are going to propose the next 4.0.1 release on top 
>> of 4.0.0, with some
>>
>> critical bug fixes and improvements [1]. As of now we are putting the label 
>> "hive-4.0.1-must" on the tickets
>>
>> and we plan to make sure those get c-picked to branch-4.0 [2]. Please 
>> suggest other important fixes that can be
>>
>> included in this release if any.
>>
>>
>> We will get this minor release out as soon as possible once all the tickets 
>> marked with "hive-4.0.1-must" get resolved and tested.
>>
>>
>> [1] https://lists.apache.org/thread/rkw2toj5d74t8n5jvnkrfw77hyzn7qh3
>>
>> [2] 
>> https://issues.apache.org/jira/browse/HIVE-28204?jql=labels%20%3D%20hive-4.0.1-must
>>
>>
>> Thanks,
>>
>> Zhihua


Re: Next Hive 4.0.1 minor release

2024-05-14 Thread Okumin
Hi Zhihua,

Thanks for driving the next release. We are actively testing 4.0.0 and
would like to give some suggestions.

# HIVE-27847: Prevent query Failures on Numeric <-> Timestamp
We hit the issue when we ran Hive 4 with the option. I believe it is
worth resolving for those who want to try Hive 4, keeping
compatibilities with a previous version.
https://issues.apache.org/jira/browse/HIVE-27847

# HIVE-28098: Fails to copy empty column statistics of materialized CTE
This follows up on HIVE-28080, but the current 4.0.0 includes only
HIVE-28080. The reasonable option to me is to revert HIVE-28080 or
cherry-pick HIVE-28098, all or nothing.
https://issues.apache.org/jira/browse/HIVE-28098

Thanks,
Okumin

On Sat, May 11, 2024 at 9:45 AM dengzhhu653  wrote:
>
> Hello Community,
>
>
> As you have noticed, we are going to propose the next 4.0.1 release on top of 
> 4.0.0, with some
>
> critical bug fixes and improvements [1]. As of now we are putting the label 
> "hive-4.0.1-must" on the tickets
>
> and we plan to make sure those get c-picked to branch-4.0 [2]. Please suggest 
> other important fixes that can be
>
> included in this release if any.
>
>
> We will get this minor release out as soon as possible once all the tickets 
> marked with "hive-4.0.1-must" get resolved and tested.
>
>
> [1] https://lists.apache.org/thread/rkw2toj5d74t8n5jvnkrfw77hyzn7qh3
>
> [2] 
> https://issues.apache.org/jira/browse/HIVE-28204?jql=labels%20%3D%20hive-4.0.1-must
>
>
> Thanks,
>
> Zhihua


Next Hive 4.0.1 minor release

2024-05-10 Thread dengzhhu653
Hello Community, 




As you have noticed, we are going to propose the next 4.0.1 release on top of 
4.0.0, with some

critical bug fixes and improvements [1]. As of now we are putting the label 
"hive-4.0.1-must" on the tickets

and we plan to make sure those get c-picked to branch-4.0 [2]. Please suggest 
other important fixes that can be 

included in this release if any.




We will get this minor release out as soon as possible once all the tickets 
marked with "hive-4.0.1-must" get resolved and tested.




[1] https://lists.apache.org/thread/rkw2toj5d74t8n5jvnkrfw77hyzn7qh3

[2] 
https://issues.apache.org/jira/browse/HIVE-28204?jql=labels%20%3D%20hive-4.0.1-must




Thanks,

Zhihua

CVE-2023-35701: Apache Hive: Arbitrary command execution via JDBC driver

2024-05-03 Thread Stamatis Zampetakis
Severity: moderate

Affected versions:

- Apache Hive 4.0.0-alpha-1 before 4.0.0

Description:

Improper Control of Generation of Code ('Code Injection') vulnerability in 
Apache Hive.

The vulnerability affects the Hive JDBC driver component and it can potentially 
lead to arbitrary code execution on the machine/endpoint that the JDBC driver 
(client) is running. The malicious user must have sufficient permissions to 
specify/edit JDBC URL(s) in an endpoint relying on the Hive JDBC driver and the 
JDBC client process must run under a privileged user to fully exploit the 
vulnerability. 

The attacker can setup a malicious HTTP server and specify a JDBC URL pointing 
towards this server. When a JDBC connection is attempted, the malicious HTTP 
server can provide a special response with customized payload that can trigger 
the execution of certain commands in the JDBC client.This issue affects Apache 
Hive: from 4.0.0-alpha-1 before 4.0.0.

Users are recommended to upgrade to version 4.0.0, which fixes the issue.

This issue is being tracked as HIVE-27554 

Credit:

Kostya Kortchinsky (reporter)

References:

https://hive.apache.org/
https://www.cve.org/CVERecord?id=CVE-2023-35701
https://issues.apache.org/jira/browse/HIVE-27554



Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-19 Thread Simhadri G
Thanks again everyone :)

On Fri, Apr 19, 2024, 2:15 AM Rajesh Balamohan 
wrote:

> Congratulations Simhadri. :)
>
> ~Rajesh.B
>
> On Fri, Apr 19, 2024 at 2:02 AM Aman Sinha  wrote:
>
>> Congrats Simhadri !
>>
>> On Thu, Apr 18, 2024 at 12:25 PM Naveen Gangam
>>  wrote:
>>
>>> Congrats Simhadri. Looking forward to many more contributions in the
>>> future.
>>>
>>> On Thu, Apr 18, 2024 at 12:25 PM Sai Hemanth Gantasala
>>>  wrote:
>>>
 Congratulations Simhadri  well deserved

 On Thu, Apr 18, 2024 at 8:41 AM Pau Tallada  wrote:

> Congratulations
>
> Missatge de Alessandro Solimando  del
> dia dj., 18 d’abr. 2024 a les 17:40:
>
>> Great news, Simhadri, very well deserved!
>>
>> On Thu, 18 Apr 2024 at 15:07, Simhadri G 
>> wrote:
>>
>>> Thanks everyone!
>>> I really appreciate it, it means a lot to me :)
>>> The Apache Hive project and its community have truly inspired me .
>>> I'm grateful for the chance to contribute to such a remarkable project.
>>>
>>> Thanks!
>>> Simhadri Govindappa
>>>
>>> On Thu, Apr 18, 2024 at 6:18 PM Sankar Hariappan
>>>  wrote:
>>>
 Congrats Simhadri!



 -Sankar



 *From:* Butao Zhang 
 *Sent:* Thursday, April 18, 2024 5:39 PM
 *To:* user@hive.apache.org; dev 
 *Subject:* [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri
 Govindappa



 You don't often get email from butaozha...@163.com. Learn why this
 is important 

 Congratulations Simhadri !!!



 Thanks.


 --

 *发件人**:* user-return-28075-butaozhang1=163@hive.apache.org <
 user-return-28075-butaozhang1=163@hive.apache.org> 代表 Ayush
 Saxena 
 *发送时间**:* 星期四, 四月 18, 2024 7:50 下午
 *收件人**:* dev ; user@hive.apache.org <
 user@hive.apache.org>
 *主题**:* [ANNOUNCE] New Committer: Simhadri Govindappa



 Hi All,

 Apache Hive's Project Management Committee (PMC) has invited
 Simhadri Govindappa to become a committer, and we are pleased to 
 announce
 that he has accepted.



 Please join me in congratulating him, Congratulations Simhadri,
 Welcome aboard!!!



 -Ayush Saxena

 (On behalf of Apache Hive PMC)

>>>
>
> --
> --
> Pau Tallada Crespí
> Departament de Serveis
> Port d'Informació Científica (PIC)
> Tel: +34 93 170 2729
> --
>
>


Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Rajesh Balamohan
Congratulations Simhadri. :)

~Rajesh.B

On Fri, Apr 19, 2024 at 2:02 AM Aman Sinha  wrote:

> Congrats Simhadri !
>
> On Thu, Apr 18, 2024 at 12:25 PM Naveen Gangam
>  wrote:
>
>> Congrats Simhadri. Looking forward to many more contributions in the
>> future.
>>
>> On Thu, Apr 18, 2024 at 12:25 PM Sai Hemanth Gantasala
>>  wrote:
>>
>>> Congratulations Simhadri  well deserved
>>>
>>> On Thu, Apr 18, 2024 at 8:41 AM Pau Tallada  wrote:
>>>
 Congratulations

 Missatge de Alessandro Solimando  del
 dia dj., 18 d’abr. 2024 a les 17:40:

> Great news, Simhadri, very well deserved!
>
> On Thu, 18 Apr 2024 at 15:07, Simhadri G 
> wrote:
>
>> Thanks everyone!
>> I really appreciate it, it means a lot to me :)
>> The Apache Hive project and its community have truly inspired me .
>> I'm grateful for the chance to contribute to such a remarkable project.
>>
>> Thanks!
>> Simhadri Govindappa
>>
>> On Thu, Apr 18, 2024 at 6:18 PM Sankar Hariappan
>>  wrote:
>>
>>> Congrats Simhadri!
>>>
>>>
>>>
>>> -Sankar
>>>
>>>
>>>
>>> *From:* Butao Zhang 
>>> *Sent:* Thursday, April 18, 2024 5:39 PM
>>> *To:* user@hive.apache.org; dev 
>>> *Subject:* [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri
>>> Govindappa
>>>
>>>
>>>
>>> You don't often get email from butaozha...@163.com. Learn why this
>>> is important 
>>>
>>> Congratulations Simhadri !!!
>>>
>>>
>>>
>>> Thanks.
>>>
>>>
>>> --
>>>
>>> *发件人**:* user-return-28075-butaozhang1=163@hive.apache.org <
>>> user-return-28075-butaozhang1=163@hive.apache.org> 代表 Ayush
>>> Saxena 
>>> *发送时间**:* 星期四, 四月 18, 2024 7:50 下午
>>> *收件人**:* dev ; user@hive.apache.org <
>>> user@hive.apache.org>
>>> *主题**:* [ANNOUNCE] New Committer: Simhadri Govindappa
>>>
>>>
>>>
>>> Hi All,
>>>
>>> Apache Hive's Project Management Committee (PMC) has invited
>>> Simhadri Govindappa to become a committer, and we are pleased to 
>>> announce
>>> that he has accepted.
>>>
>>>
>>>
>>> Please join me in congratulating him, Congratulations Simhadri,
>>> Welcome aboard!!!
>>>
>>>
>>>
>>> -Ayush Saxena
>>>
>>> (On behalf of Apache Hive PMC)
>>>
>>

 --
 --
 Pau Tallada Crespí
 Departament de Serveis
 Port d'Informació Científica (PIC)
 Tel: +34 93 170 2729
 --




Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Aman Sinha
Congrats Simhadri !

On Thu, Apr 18, 2024 at 12:25 PM Naveen Gangam 
wrote:

> Congrats Simhadri. Looking forward to many more contributions in the
> future.
>
> On Thu, Apr 18, 2024 at 12:25 PM Sai Hemanth Gantasala
>  wrote:
>
>> Congratulations Simhadri  well deserved
>>
>> On Thu, Apr 18, 2024 at 8:41 AM Pau Tallada  wrote:
>>
>>> Congratulations
>>>
>>> Missatge de Alessandro Solimando  del
>>> dia dj., 18 d’abr. 2024 a les 17:40:
>>>
 Great news, Simhadri, very well deserved!

 On Thu, 18 Apr 2024 at 15:07, Simhadri G  wrote:

> Thanks everyone!
> I really appreciate it, it means a lot to me :)
> The Apache Hive project and its community have truly inspired me . I'm
> grateful for the chance to contribute to such a remarkable project.
>
> Thanks!
> Simhadri Govindappa
>
> On Thu, Apr 18, 2024 at 6:18 PM Sankar Hariappan
>  wrote:
>
>> Congrats Simhadri!
>>
>>
>>
>> -Sankar
>>
>>
>>
>> *From:* Butao Zhang 
>> *Sent:* Thursday, April 18, 2024 5:39 PM
>> *To:* user@hive.apache.org; dev 
>> *Subject:* [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri
>> Govindappa
>>
>>
>>
>> You don't often get email from butaozha...@163.com. Learn why this
>> is important 
>>
>> Congratulations Simhadri !!!
>>
>>
>>
>> Thanks.
>>
>>
>> --
>>
>> *发件人**:* user-return-28075-butaozhang1=163@hive.apache.org <
>> user-return-28075-butaozhang1=163@hive.apache.org> 代表 Ayush
>> Saxena 
>> *发送时间**:* 星期四, 四月 18, 2024 7:50 下午
>> *收件人**:* dev ; user@hive.apache.org <
>> user@hive.apache.org>
>> *主题**:* [ANNOUNCE] New Committer: Simhadri Govindappa
>>
>>
>>
>> Hi All,
>>
>> Apache Hive's Project Management Committee (PMC) has invited Simhadri
>> Govindappa to become a committer, and we are pleased to announce that he
>> has accepted.
>>
>>
>>
>> Please join me in congratulating him, Congratulations Simhadri,
>> Welcome aboard!!!
>>
>>
>>
>> -Ayush Saxena
>>
>> (On behalf of Apache Hive PMC)
>>
>
>>>
>>> --
>>> --
>>> Pau Tallada Crespí
>>> Departament de Serveis
>>> Port d'Informació Científica (PIC)
>>> Tel: +34 93 170 2729
>>> --
>>>
>>>


Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Naveen Gangam
Congrats Simhadri. Looking forward to many more contributions in the future.

On Thu, Apr 18, 2024 at 12:25 PM Sai Hemanth Gantasala
 wrote:

> Congratulations Simhadri  well deserved
>
> On Thu, Apr 18, 2024 at 8:41 AM Pau Tallada  wrote:
>
>> Congratulations
>>
>> Missatge de Alessandro Solimando  del
>> dia dj., 18 d’abr. 2024 a les 17:40:
>>
>>> Great news, Simhadri, very well deserved!
>>>
>>> On Thu, 18 Apr 2024 at 15:07, Simhadri G  wrote:
>>>
 Thanks everyone!
 I really appreciate it, it means a lot to me :)
 The Apache Hive project and its community have truly inspired me . I'm
 grateful for the chance to contribute to such a remarkable project.

 Thanks!
 Simhadri Govindappa

 On Thu, Apr 18, 2024 at 6:18 PM Sankar Hariappan
  wrote:

> Congrats Simhadri!
>
>
>
> -Sankar
>
>
>
> *From:* Butao Zhang 
> *Sent:* Thursday, April 18, 2024 5:39 PM
> *To:* user@hive.apache.org; dev 
> *Subject:* [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri
> Govindappa
>
>
>
> You don't often get email from butaozha...@163.com. Learn why this is
> important 
>
> Congratulations Simhadri !!!
>
>
>
> Thanks.
>
>
> --
>
> *发件人**:* user-return-28075-butaozhang1=163@hive.apache.org <
> user-return-28075-butaozhang1=163@hive.apache.org> 代表 Ayush
> Saxena 
> *发送时间**:* 星期四, 四月 18, 2024 7:50 下午
> *收件人**:* dev ; user@hive.apache.org <
> user@hive.apache.org>
> *主题**:* [ANNOUNCE] New Committer: Simhadri Govindappa
>
>
>
> Hi All,
>
> Apache Hive's Project Management Committee (PMC) has invited Simhadri
> Govindappa to become a committer, and we are pleased to announce that he
> has accepted.
>
>
>
> Please join me in congratulating him, Congratulations Simhadri,
> Welcome aboard!!!
>
>
>
> -Ayush Saxena
>
> (On behalf of Apache Hive PMC)
>

>>
>> --
>> --
>> Pau Tallada Crespí
>> Departament de Serveis
>> Port d'Informació Científica (PIC)
>> Tel: +34 93 170 2729
>> --
>>
>>


unsubscribe

2024-04-18 Thread Rajbir singh
-- 
Regards,
Rajbir


Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Sai Hemanth Gantasala
Congratulations Simhadri  well deserved

On Thu, Apr 18, 2024 at 8:41 AM Pau Tallada  wrote:

> Congratulations
>
> Missatge de Alessandro Solimando  del dia
> dj., 18 d’abr. 2024 a les 17:40:
>
>> Great news, Simhadri, very well deserved!
>>
>> On Thu, 18 Apr 2024 at 15:07, Simhadri G  wrote:
>>
>>> Thanks everyone!
>>> I really appreciate it, it means a lot to me :)
>>> The Apache Hive project and its community have truly inspired me . I'm
>>> grateful for the chance to contribute to such a remarkable project.
>>>
>>> Thanks!
>>> Simhadri Govindappa
>>>
>>> On Thu, Apr 18, 2024 at 6:18 PM Sankar Hariappan
>>>  wrote:
>>>
 Congrats Simhadri!



 -Sankar



 *From:* Butao Zhang 
 *Sent:* Thursday, April 18, 2024 5:39 PM
 *To:* user@hive.apache.org; dev 
 *Subject:* [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri Govindappa



 You don't often get email from butaozha...@163.com. Learn why this is
 important 

 Congratulations Simhadri !!!



 Thanks.


 --

 *发件人**:* user-return-28075-butaozhang1=163@hive.apache.org <
 user-return-28075-butaozhang1=163@hive.apache.org> 代表 Ayush Saxena
 
 *发送时间**:* 星期四, 四月 18, 2024 7:50 下午
 *收件人**:* dev ; user@hive.apache.org <
 user@hive.apache.org>
 *主题**:* [ANNOUNCE] New Committer: Simhadri Govindappa



 Hi All,

 Apache Hive's Project Management Committee (PMC) has invited Simhadri
 Govindappa to become a committer, and we are pleased to announce that he
 has accepted.



 Please join me in congratulating him, Congratulations Simhadri, Welcome
 aboard!!!



 -Ayush Saxena

 (On behalf of Apache Hive PMC)

>>>
>
> --
> --
> Pau Tallada Crespí
> Departament de Serveis
> Port d'Informació Científica (PIC)
> Tel: +34 93 170 2729
> --
>
>


Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Pau Tallada
Congratulations

Missatge de Alessandro Solimando  del dia
dj., 18 d’abr. 2024 a les 17:40:

> Great news, Simhadri, very well deserved!
>
> On Thu, 18 Apr 2024 at 15:07, Simhadri G  wrote:
>
>> Thanks everyone!
>> I really appreciate it, it means a lot to me :)
>> The Apache Hive project and its community have truly inspired me . I'm
>> grateful for the chance to contribute to such a remarkable project.
>>
>> Thanks!
>> Simhadri Govindappa
>>
>> On Thu, Apr 18, 2024 at 6:18 PM Sankar Hariappan
>>  wrote:
>>
>>> Congrats Simhadri!
>>>
>>>
>>>
>>> -Sankar
>>>
>>>
>>>
>>> *From:* Butao Zhang 
>>> *Sent:* Thursday, April 18, 2024 5:39 PM
>>> *To:* user@hive.apache.org; dev 
>>> *Subject:* [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri Govindappa
>>>
>>>
>>>
>>> You don't often get email from butaozha...@163.com. Learn why this is
>>> important 
>>>
>>> Congratulations Simhadri !!!
>>>
>>>
>>>
>>> Thanks.
>>>
>>>
>>> --
>>>
>>> *发件人**:* user-return-28075-butaozhang1=163@hive.apache.org <
>>> user-return-28075-butaozhang1=163@hive.apache.org> 代表 Ayush Saxena <
>>> ayush...@gmail.com>
>>> *发送时间**:* 星期四, 四月 18, 2024 7:50 下午
>>> *收件人**:* dev ; user@hive.apache.org <
>>> user@hive.apache.org>
>>> *主题**:* [ANNOUNCE] New Committer: Simhadri Govindappa
>>>
>>>
>>>
>>> Hi All,
>>>
>>> Apache Hive's Project Management Committee (PMC) has invited Simhadri
>>> Govindappa to become a committer, and we are pleased to announce that he
>>> has accepted.
>>>
>>>
>>>
>>> Please join me in congratulating him, Congratulations Simhadri, Welcome
>>> aboard!!!
>>>
>>>
>>>
>>> -Ayush Saxena
>>>
>>> (On behalf of Apache Hive PMC)
>>>
>>

-- 
--
Pau Tallada Crespí
Departament de Serveis
Port d'Informació Científica (PIC)
Tel: +34 93 170 2729
--


Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Alessandro Solimando
Great news, Simhadri, very well deserved!

On Thu, 18 Apr 2024 at 15:07, Simhadri G  wrote:

> Thanks everyone!
> I really appreciate it, it means a lot to me :)
> The Apache Hive project and its community have truly inspired me . I'm
> grateful for the chance to contribute to such a remarkable project.
>
> Thanks!
> Simhadri Govindappa
>
> On Thu, Apr 18, 2024 at 6:18 PM Sankar Hariappan
>  wrote:
>
>> Congrats Simhadri!
>>
>>
>>
>> -Sankar
>>
>>
>>
>> *From:* Butao Zhang 
>> *Sent:* Thursday, April 18, 2024 5:39 PM
>> *To:* user@hive.apache.org; dev 
>> *Subject:* [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri Govindappa
>>
>>
>>
>> You don't often get email from butaozha...@163.com. Learn why this is
>> important 
>>
>> Congratulations Simhadri !!!
>>
>>
>>
>> Thanks.
>>
>>
>> --
>>
>> *发件人**:* user-return-28075-butaozhang1=163@hive.apache.org <
>> user-return-28075-butaozhang1=163@hive.apache.org> 代表 Ayush Saxena <
>> ayush...@gmail.com>
>> *发送时间**:* 星期四, 四月 18, 2024 7:50 下午
>> *收件人**:* dev ; user@hive.apache.org <
>> user@hive.apache.org>
>> *主题**:* [ANNOUNCE] New Committer: Simhadri Govindappa
>>
>>
>>
>> Hi All,
>>
>> Apache Hive's Project Management Committee (PMC) has invited Simhadri
>> Govindappa to become a committer, and we are pleased to announce that he
>> has accepted.
>>
>>
>>
>> Please join me in congratulating him, Congratulations Simhadri, Welcome
>> aboard!!!
>>
>>
>>
>> -Ayush Saxena
>>
>> (On behalf of Apache Hive PMC)
>>
>


Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Krisztian Kasa
Congratulations Simhadri!

Regards,
Krisztian

On Thu, Apr 18, 2024 at 3:25 PM kokila narayanan <
kokilanarayana...@gmail.com> wrote:

> Congratulations Simhadri 
>
> On Thu, 18 Apr, 2024, 17:22 Ayush Saxena,  wrote:
>
>> Hi All,
>> Apache Hive's Project Management Committee (PMC) has invited Simhadri
>> Govindappa to become a committer, and we are pleased to announce that he
>> has accepted.
>>
>> Please join me in congratulating him, Congratulations Simhadri, Welcome
>> aboard!!!
>>
>> -Ayush Saxena
>> (On behalf of Apache Hive PMC)
>>
>


Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Simhadri G
Thanks everyone!
I really appreciate it, it means a lot to me :)
The Apache Hive project and its community have truly inspired me . I'm
grateful for the chance to contribute to such a remarkable project.

Thanks!
Simhadri Govindappa

On Thu, Apr 18, 2024 at 6:18 PM Sankar Hariappan
 wrote:

> Congrats Simhadri!
>
>
>
> -Sankar
>
>
>
> *From:* Butao Zhang 
> *Sent:* Thursday, April 18, 2024 5:39 PM
> *To:* user@hive.apache.org; dev 
> *Subject:* [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri Govindappa
>
>
>
> You don't often get email from butaozha...@163.com. Learn why this is
> important 
>
> Congratulations Simhadri !!!
>
>
>
> Thanks.
>
>
> --
>
> *发件人**:* user-return-28075-butaozhang1=163@hive.apache.org <
> user-return-28075-butaozhang1=163@hive.apache.org> 代表 Ayush Saxena <
> ayush...@gmail.com>
> *发送时间**:* 星期四, 四月 18, 2024 7:50 下午
> *收件人**:* dev ; user@hive.apache.org <
> user@hive.apache.org>
> *主题**:* [ANNOUNCE] New Committer: Simhadri Govindappa
>
>
>
> Hi All,
>
> Apache Hive's Project Management Committee (PMC) has invited Simhadri
> Govindappa to become a committer, and we are pleased to announce that he
> has accepted.
>
>
>
> Please join me in congratulating him, Congratulations Simhadri, Welcome
> aboard!!!
>
>
>
> -Ayush Saxena
>
> (On behalf of Apache Hive PMC)
>


RE: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Sankar Hariappan via user
Congrats Simhadri!

-Sankar

From: Butao Zhang 
Sent: Thursday, April 18, 2024 5:39 PM
To: user@hive.apache.org; dev 
Subject: [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri Govindappa

You don't often get email from butaozha...@163.com. 
Learn why this is important
Congratulations Simhadri !!!

Thanks.


发件人: 
user-return-28075-butaozhang1=163@hive.apache.org
 
mailto:user-return-28075-butaozhang1=163@hive.apache.org>>
 代表 Ayush Saxena mailto:ayush...@gmail.com>>
发送时间: 星期四, 四月 18, 2024 7:50 下午
收件人: dev mailto:d...@hive.apache.org>>; 
user@hive.apache.org 
mailto:user@hive.apache.org>>
主题: [ANNOUNCE] New Committer: Simhadri Govindappa

Hi All,
Apache Hive's Project Management Committee (PMC) has invited Simhadri 
Govindappa to become a committer, and we are pleased to announce that he has 
accepted.

Please join me in congratulating him, Congratulations Simhadri, Welcome 
aboard!!!

-Ayush Saxena
(On behalf of Apache Hive PMC)


Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Butao Zhang

  
  
  

Congratulations Simhadri !!!
Thanks.
  

 发件人: user-return-28075-butaozhang1=163@hive.apache.org  代表 Ayush Saxena 发送时间: 星期四, 四月 18, 2024 7:50 下午收件人: dev ; user@hive.apache.org 主题: [ANNOUNCE] New Committer: Simhadri Govindappa Hi All,Apache Hive's Project Management Committee (PMC) has invited Simhadri Govindappa to become a committer, and we are pleased to announce that he has accepted.Please join me in congratulating him, Congratulations Simhadri, Welcome aboard!!!-Ayush Saxena(On behalf of Apache Hive PMC)




[ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Ayush Saxena
Hi All,
Apache Hive's Project Management Committee (PMC) has invited Simhadri
Govindappa to become a committer, and we are pleased to announce that he
has accepted.

Please join me in congratulating him, Congratulations Simhadri, Welcome
aboard!!!

-Ayush Saxena
(On behalf of Apache Hive PMC)


[ANNOUNCE] Hive 1.x EOL

2024-04-11 Thread Ayush Saxena
Hi All,
The Apache Hive Community has voted to declare the 1.x release line as End of 
Life (EOL). This means no further updates or releases will be made for this 
series.

We urge all Hive 1.x users to upgrade to the latest versions promptly to 
benefit from new features and ongoing support.

-Ayush Saxena
(On Behalf of Apache Hive PMC)

Unsubscripe

2024-04-06 Thread Mahendra prabhu
Unsubscribe


Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-04-04 Thread Sungwoo Park
Congratulations and huge thanks to Apache Hive team and contributors for
releasing Hive 4. We have been watching the development of Hive 4 since the
release of Hive 3.1, and it's truly satisfying to witness the resolution of
all the critical issues at last after 5 years. Hive 4 comes with a lot of
new great features, and our initial performance benchmarking indicates that
it comes with a significant improvement over Hive 3 in terms of speed.

--- Sungwoo

On Wed, Apr 3, 2024 at 10:30 PM Okumin  wrote:

> I'm really excited to see the news! I can easily imagine the
> difficulty of testing and shipping Hive 4.0.0 with more than 5k
> commits. I'm proud to have witnessed this moment here.
>
> Thank you!
>
> On Wed, Apr 3, 2024 at 3:07 AM Naveen Gangam  wrote:
> >
> > Thank you for the tremendous amount of work put in by many many folks to
> make this release happen, including projects hive is dependent upon like
> tez.
> >
> > Thank you to all the PMC members, committers and contributors for all
> the work over the past 5+ years in shaping this release.
> >
> > THANK YOU!!!
> >
> > On Sun, Mar 31, 2024 at 8:54 AM Battula, Brahma Reddy 
> wrote:
> >>
> >> Thank you for your hard work and dedication in releasing Apache Hive
> version 4.0.0.
> >>
> >>
> >>
> >> Congratulations to the entire team on this achievement. Keep up the
> great work!
> >>
> >>
> >>
> >> Does this consider as GA.?
> >>
> >>
> >>
> >> And Looks we need to update in the following location also.?
> >>
> >> https://hive.apache.org/general/downloads/
> >>
> >>
> >>
> >>
> >>
> >> From: Denys Kuzmenko 
> >> Date: Saturday, March 30, 2024 at 00:07
> >> To: user@hive.apache.org , d...@hive.apache.org <
> d...@hive.apache.org>
> >> Subject: [ANNOUNCE] Apache Hive 4.0.0 Released
> >>
> >> The Apache Hive team is proud to announce the release of Apache Hive
> >>
> >> version 4.0.0.
> >>
> >>
> >>
> >> The Apache Hive (TM) data warehouse software facilitates querying and
> >>
> >> managing large datasets residing in distributed storage. Built on top
> >>
> >> of Apache Hadoop (TM), it provides, among others:
> >>
> >>
> >>
> >> * Tools to enable easy data extract/transform/load (ETL)
> >>
> >>
> >>
> >> * A mechanism to impose structure on a variety of data formats
> >>
> >>
> >>
> >> * Access to files stored either directly in Apache HDFS (TM) or in other
> >>
> >>   data storage systems such as Apache HBase (TM)
> >>
> >>
> >>
> >> * Query execution via Apache Hadoop MapReduce, Apache Tez and Apache
> Spark frameworks. (MapReduce is deprecated, and Spark has been removed so
> the text needs to be modified depending on the release version)
> >>
> >>
> >>
> >> For Hive release details and downloads, please visit:
> >>
> >> https://hive.apache.org/downloads.html
> >>
> >>
> >>
> >> Hive 4.0.0 Release Notes are available here:
> >>
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843
> >>
> >>
> >>
> >> We would like to thank the many contributors who made this release
> >>
> >> possible.
> >>
> >>
> >>
> >> Regards,
> >>
> >>
> >>
> >> The Apache Hive Team
>


Participate in the ASF 25th Anniversary Campaign

2024-04-03 Thread Brian Proffitt
Hi everyone,

As part of The ASF’s 25th anniversary campaign[1], we will be celebrating
projects and communities in multiple ways.

We invite all projects and contributors to participate in the following
ways:

* Individuals - submit your first contribution:
https://news.apache.org/foundation/entry/the-asf-launches-firstasfcontribution-campaign
* Projects - share your public good story:
https://docs.google.com/forms/d/1vuN-tUnBwpTgOE5xj3Z5AG1hsOoDNLBmGIqQHwQT6k8/viewform?edit_requested=true
* Projects - submit a project spotlight for the blog:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278466116
* Projects - contact the Voice of Apache podcast (formerly Feathercast) to
be featured: https://feathercast.apache.org/help/
*  Projects - use the 25th anniversary template and the #ASF25Years hashtag
on social media:
https://docs.google.com/presentation/d/1oDbMol3F_XQuCmttPYxBIOIjRuRBksUjDApjd8Ve3L8/edit#slide=id.g26b0919956e_0_13

If you have questions, email the Marketing & Publicity team at
mark...@apache.org.

Peace,
BKP

[1] https://apache.org/asf25years/

[NOTE: You are receiving this message because you are a contributor to an
Apache Software Foundation project. The ASF will very occasionally send out
messages relating to the Foundation to contributors and members, such as
this one.]

Brian Proffitt
VP, Marketing & Publicity
VP, Conferences


Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-04-03 Thread Okumin
I'm really excited to see the news! I can easily imagine the
difficulty of testing and shipping Hive 4.0.0 with more than 5k
commits. I'm proud to have witnessed this moment here.

Thank you!

On Wed, Apr 3, 2024 at 3:07 AM Naveen Gangam  wrote:
>
> Thank you for the tremendous amount of work put in by many many folks to make 
> this release happen, including projects hive is dependent upon like tez.
>
> Thank you to all the PMC members, committers and contributors for all the 
> work over the past 5+ years in shaping this release.
>
> THANK YOU!!!
>
> On Sun, Mar 31, 2024 at 8:54 AM Battula, Brahma Reddy  
> wrote:
>>
>> Thank you for your hard work and dedication in releasing Apache Hive version 
>> 4.0.0.
>>
>>
>>
>> Congratulations to the entire team on this achievement. Keep up the great 
>> work!
>>
>>
>>
>> Does this consider as GA.?
>>
>>
>>
>> And Looks we need to update in the following location also.?
>>
>> https://hive.apache.org/general/downloads/
>>
>>
>>
>>
>>
>> From: Denys Kuzmenko 
>> Date: Saturday, March 30, 2024 at 00:07
>> To: user@hive.apache.org , d...@hive.apache.org 
>> 
>> Subject: [ANNOUNCE] Apache Hive 4.0.0 Released
>>
>> The Apache Hive team is proud to announce the release of Apache Hive
>>
>> version 4.0.0.
>>
>>
>>
>> The Apache Hive (TM) data warehouse software facilitates querying and
>>
>> managing large datasets residing in distributed storage. Built on top
>>
>> of Apache Hadoop (TM), it provides, among others:
>>
>>
>>
>> * Tools to enable easy data extract/transform/load (ETL)
>>
>>
>>
>> * A mechanism to impose structure on a variety of data formats
>>
>>
>>
>> * Access to files stored either directly in Apache HDFS (TM) or in other
>>
>>   data storage systems such as Apache HBase (TM)
>>
>>
>>
>> * Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark 
>> frameworks. (MapReduce is deprecated, and Spark has been removed so the text 
>> needs to be modified depending on the release version)
>>
>>
>>
>> For Hive release details and downloads, please visit:
>>
>> https://hive.apache.org/downloads.html
>>
>>
>>
>> Hive 4.0.0 Release Notes are available here:
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843
>>
>>
>>
>> We would like to thank the many contributors who made this release
>>
>> possible.
>>
>>
>>
>> Regards,
>>
>>
>>
>> The Apache Hive Team


Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-04-02 Thread Naveen Gangam
Thank you for the tremendous amount of work put in by many many folks to
make this release happen, including projects hive is dependent upon like
tez.

Thank you to all the PMC members, committers and contributors for all the
work over the past 5+ years in shaping this release.

THANK YOU!!!

On Sun, Mar 31, 2024 at 8:54 AM Battula, Brahma Reddy 
wrote:

> Thank you for your hard work and dedication in releasing Apache Hive
> version 4.0.0.
>
>
>
> Congratulations to the entire team on this achievement. Keep up the great
> work!
>
>
>
> Does this consider as GA.?
>
>
>
> And Looks we need to update in the following location also.?
>
> https://hive.apache.org/general/downloads/
>
>
>
>
>
> *From: *Denys Kuzmenko 
> *Date: *Saturday, March 30, 2024 at 00:07
> *To: *user@hive.apache.org , d...@hive.apache.org <
> d...@hive.apache.org>
> *Subject: *[ANNOUNCE] Apache Hive 4.0.0 Released
>
> The Apache Hive team is proud to announce the release of Apache Hive
>
> version 4.0.0.
>
>
>
> The Apache Hive (TM) data warehouse software facilitates querying and
>
> managing large datasets residing in distributed storage. Built on top
>
> of Apache Hadoop (TM), it provides, among others:
>
>
>
> * Tools to enable easy data extract/transform/load (ETL)
>
>
>
> * A mechanism to impose structure on a variety of data formats
>
>
>
> * Access to files stored either directly in Apache HDFS (TM) or in other
>
>   data storage systems such as Apache HBase (TM)
>
>
>
> * Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark 
> frameworks. (MapReduce is deprecated, and Spark has been removed so the text 
> needs to be modified depending on the release version)
>
>
>
> For Hive release details and downloads, please visit:
>
> https://hive.apache.org/downloads.html
>
>
>
> Hive 4.0.0 Release Notes are available here:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843
>
>
>
> We would like to thank the many contributors who made this release
>
> possible.
>
>
>
> Regards,
>
>
>
> The Apache Hive Team
>
>


RE: [EXTERNAL] Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-04-02 Thread Sankar Hariappan via user
Absolutely exciting news! Congrats to the entire Hive community for making this 
release happen!

-Sankar

From: Pau Tallada 
Sent: Tuesday, April 2, 2024 2:31 PM
To: user@hive.apache.org
Cc: d...@hive.apache.org
Subject: [EXTERNAL] Re: [ANNOUNCE] Apache Hive 4.0.0 Released

You don't often get email from tall...@pic.es. Learn why 
this is important
Congrats to all for the hard work

Missatge de Butao Zhang mailto:butaozha...@163.com>> del 
dia dt., 2 d’abr. 2024 a les 10:58:
I'm thrilled to see the official release of Apache Hive 4.0.0, marking another 
milestone in the development of the Hive community. I want to extend my 
gratitude to all the partners in the community for their hard work.
Also special thanks to Denys for your diligent code reviews and efforts in 
completing the version release process, which I deeply admire.

Wishing the Apache Hive community continued growth and success. Keep up the 
great work!


Thanks,
Butao Zhang


 Replied Message 
From
Stamatis Zampetakis
Date
4/2/2024 16:39
To

Cc
user@hive.apache.org
Subject
Re: [ANNOUNCE] Apache Hive 4.0.0 Released
The new Apache Hive 4.0.0 release brings roughly 5K new commits (since
Apache Hive 3.1.3) and it's probably the biggest release so far in the
history of the project. The numbers clearly show that this is a
collective effort that wouldn't be possible without a strong community
and many volunteers along the years. Many thanks to everyone involved!

A special mention to Denys who went above and beyond his role of
release manager triaging release blockers, reviewing and fixing many
of those tickets that were blocking us for the past few months.

Best,
Stamatis

On Sun, Mar 31, 2024 at 2:54 PM Battula, Brahma Reddy
mailto:bbatt...@visa.com.invalid>> wrote:

Thank you for your hard work and dedication in releasing Apache Hive version 
4.0.0.

Congratulations to the entire team on this achievement. Keep up the great work!

Does this consider as GA.?

And Looks we need to update in the following location also.?
https://hive.apache.org/general/downloads/


From: Denys Kuzmenko mailto:dkuzme...@apache.org>>
Date: Saturday, March 30, 2024 at 00:07
To: user@hive.apache.org 
mailto:user@hive.apache.org>>, 
d...@hive.apache.org 
mailto:d...@hive.apache.org>>
Subject: [ANNOUNCE] Apache Hive 4.0.0 Released

The Apache Hive team is proud to announce the release of Apache Hive

version 4.0.0.



The Apache Hive (TM) data warehouse software facilitates querying and

managing large datasets residing in distributed storage. Built on top

of Apache Hadoop (TM), it provides, among others:



* Tools to enable easy data extract/transform/load (ETL)



* A mechanism to impose structure on a variety of data formats



* Access to files stored either directly in Apache HDFS (TM) or in other

data storage systems such as Apache HBase (TM)



* Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark 
frameworks. (MapReduce is deprecated, and Spark has been removed so the text 
needs to be modified depending on the release version)



For Hive release details and downloads, please visit:

https://hive.apache.org/downloads.html



Hive 4.0.0 Release Notes are available here:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843



We would like to thank the many contributors who made this release

possible.



Regards,



The Apache Hive Team


--
--
Pau Tallada Crespí
Departament de Serveis
Port d'Informació Científica (PIC)
Tel: +34 93 170 2729
--



Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-04-02 Thread Pau Tallada
Congrats to all for the hard work

Missatge de Butao Zhang  del dia dt., 2 d’abr. 2024 a
les 10:58:

> I'm thrilled to see the official release of Apache Hive 4.0.0, marking
> another milestone in the development of the Hive community. I want to
> extend my gratitude to all the partners in the community for their hard
> work.
> Also special thanks to Denys for your diligent code reviews and efforts in
> completing the version release process, which I deeply admire.
>
> Wishing the Apache Hive community continued growth and success. Keep up
> the great work!
>
>
> Thanks,
> Butao Zhang
>
>
>  Replied Message 
> From Stamatis Zampetakis 
> Date 4/2/2024 16:39
> To  
> Cc user@hive.apache.org 
> Subject Re: [ANNOUNCE] Apache Hive 4.0.0 Released
> The new Apache Hive 4.0.0 release brings roughly 5K new commits (since
> Apache Hive 3.1.3) and it's probably the biggest release so far in the
> history of the project. The numbers clearly show that this is a
> collective effort that wouldn't be possible without a strong community
> and many volunteers along the years. Many thanks to everyone involved!
>
> A special mention to Denys who went above and beyond his role of
> release manager triaging release blockers, reviewing and fixing many
> of those tickets that were blocking us for the past few months.
>
> Best,
> Stamatis
>
> On Sun, Mar 31, 2024 at 2:54 PM Battula, Brahma Reddy
>  wrote:
>
>
> Thank you for your hard work and dedication in releasing Apache Hive
> version 4.0.0.
>
> Congratulations to the entire team on this achievement. Keep up the great
> work!
>
> Does this consider as GA.?
>
> And Looks we need to update in the following location also.?
> https://hive.apache.org/general/downloads/
>
>
> From: Denys Kuzmenko 
> Date: Saturday, March 30, 2024 at 00:07
> To: user@hive.apache.org , d...@hive.apache.org <
> d...@hive.apache.org>
> Subject: [ANNOUNCE] Apache Hive 4.0.0 Released
>
> The Apache Hive team is proud to announce the release of Apache Hive
>
> version 4.0.0.
>
>
>
> The Apache Hive (TM) data warehouse software facilitates querying and
>
> managing large datasets residing in distributed storage. Built on top
>
> of Apache Hadoop (TM), it provides, among others:
>
>
>
> * Tools to enable easy data extract/transform/load (ETL)
>
>
>
> * A mechanism to impose structure on a variety of data formats
>
>
>
> * Access to files stored either directly in Apache HDFS (TM) or in other
>
> data storage systems such as Apache HBase (TM)
>
>
>
> * Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark
> frameworks. (MapReduce is deprecated, and Spark has been removed so the
> text needs to be modified depending on the release version)
>
>
>
> For Hive release details and downloads, please visit:
>
> https://hive.apache.org/downloads.html
>
>
>
> Hive 4.0.0 Release Notes are available here:
>
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843
>
>
>
> We would like to thank the many contributors who made this release
>
> possible.
>
>
>
> Regards,
>
>
>
> The Apache Hive Team
>
>

-- 
--
Pau Tallada Crespí
Departament de Serveis
Port d'Informació Científica (PIC)
Tel: +34 93 170 2729
--


Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-04-02 Thread Butao Zhang
I'm thrilled to see the official release of Apache Hive 4.0.0, marking another 
milestone in the development of the Hive community. I want to extend my 
gratitude to all the partners in the community for their hard work. 

Also special thanks to Denys for your diligent code reviews and efforts in 
completing the version release process, which I deeply admire.


Wishing the Apache Hive community continued growth and success. Keep up the 
great work!




Thanks,
Butao Zhang




 Replied Message 
| From | Stamatis Zampetakis |
| Date | 4/2/2024 16:39 |
| To |  |
| Cc | user@hive.apache.org |
| Subject | Re: [ANNOUNCE] Apache Hive 4.0.0 Released |
The new Apache Hive 4.0.0 release brings roughly 5K new commits (since
Apache Hive 3.1.3) and it's probably the biggest release so far in the
history of the project. The numbers clearly show that this is a
collective effort that wouldn't be possible without a strong community
and many volunteers along the years. Many thanks to everyone involved!

A special mention to Denys who went above and beyond his role of
release manager triaging release blockers, reviewing and fixing many
of those tickets that were blocking us for the past few months.

Best,
Stamatis

On Sun, Mar 31, 2024 at 2:54 PM Battula, Brahma Reddy
 wrote:

Thank you for your hard work and dedication in releasing Apache Hive version 
4.0.0.

Congratulations to the entire team on this achievement. Keep up the great work!

Does this consider as GA.?

And Looks we need to update in the following location also.?
https://hive.apache.org/general/downloads/


From: Denys Kuzmenko 
Date: Saturday, March 30, 2024 at 00:07
To: user@hive.apache.org , d...@hive.apache.org 

Subject: [ANNOUNCE] Apache Hive 4.0.0 Released

The Apache Hive team is proud to announce the release of Apache Hive

version 4.0.0.



The Apache Hive (TM) data warehouse software facilitates querying and

managing large datasets residing in distributed storage. Built on top

of Apache Hadoop (TM), it provides, among others:



* Tools to enable easy data extract/transform/load (ETL)



* A mechanism to impose structure on a variety of data formats



* Access to files stored either directly in Apache HDFS (TM) or in other

data storage systems such as Apache HBase (TM)



* Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark 
frameworks. (MapReduce is deprecated, and Spark has been removed so the text 
needs to be modified depending on the release version)



For Hive release details and downloads, please visit:

https://hive.apache.org/downloads.html



Hive 4.0.0 Release Notes are available here:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843



We would like to thank the many contributors who made this release

possible.



Regards,



The Apache Hive Team


Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-04-02 Thread Stamatis Zampetakis
The new Apache Hive 4.0.0 release brings roughly 5K new commits (since
Apache Hive 3.1.3) and it's probably the biggest release so far in the
history of the project. The numbers clearly show that this is a
collective effort that wouldn't be possible without a strong community
and many volunteers along the years. Many thanks to everyone involved!

A special mention to Denys who went above and beyond his role of
release manager triaging release blockers, reviewing and fixing many
of those tickets that were blocking us for the past few months.

Best,
Stamatis

On Sun, Mar 31, 2024 at 2:54 PM Battula, Brahma Reddy
 wrote:
>
> Thank you for your hard work and dedication in releasing Apache Hive version 
> 4.0.0.
>
> Congratulations to the entire team on this achievement. Keep up the great 
> work!
>
> Does this consider as GA.?
>
> And Looks we need to update in the following location also.?
> https://hive.apache.org/general/downloads/
>
>
> From: Denys Kuzmenko 
> Date: Saturday, March 30, 2024 at 00:07
> To: user@hive.apache.org , d...@hive.apache.org 
> 
> Subject: [ANNOUNCE] Apache Hive 4.0.0 Released
>
> The Apache Hive team is proud to announce the release of Apache Hive
>
> version 4.0.0.
>
>
>
> The Apache Hive (TM) data warehouse software facilitates querying and
>
> managing large datasets residing in distributed storage. Built on top
>
> of Apache Hadoop (TM), it provides, among others:
>
>
>
> * Tools to enable easy data extract/transform/load (ETL)
>
>
>
> * A mechanism to impose structure on a variety of data formats
>
>
>
> * Access to files stored either directly in Apache HDFS (TM) or in other
>
>   data storage systems such as Apache HBase (TM)
>
>
>
> * Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark 
> frameworks. (MapReduce is deprecated, and Spark has been removed so the text 
> needs to be modified depending on the release version)
>
>
>
> For Hive release details and downloads, please visit:
>
> https://hive.apache.org/downloads.html
>
>
>
> Hive 4.0.0 Release Notes are available here:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843
>
>
>
> We would like to thank the many contributors who made this release
>
> possible.
>
>
>
> Regards,
>
>
>
> The Apache Hive Team


Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-03-31 Thread Battula, Brahma Reddy
Thank you for your hard work and dedication in releasing Apache Hive version 
4.0.0.

Congratulations to the entire team on this achievement. Keep up the great work!

Does this consider as GA.?

And Looks we need to update in the following location also.?
https://hive.apache.org/general/downloads/


From: Denys Kuzmenko 
Date: Saturday, March 30, 2024 at 00:07
To: user@hive.apache.org , d...@hive.apache.org 

Subject: [ANNOUNCE] Apache Hive 4.0.0 Released

The Apache Hive team is proud to announce the release of Apache Hive

version 4.0.0.



The Apache Hive (TM) data warehouse software facilitates querying and

managing large datasets residing in distributed storage. Built on top

of Apache Hadoop (TM), it provides, among others:



* Tools to enable easy data extract/transform/load (ETL)



* A mechanism to impose structure on a variety of data formats



* Access to files stored either directly in Apache HDFS (TM) or in other

  data storage systems such as Apache HBase (TM)



* Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark 
frameworks. (MapReduce is deprecated, and Spark has been removed so the text 
needs to be modified depending on the release version)



For Hive release details and downloads, please visit:

https://hive.apache.org/downloads.html



Hive 4.0.0 Release Notes are available here:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843



We would like to thank the many contributors who made this release

possible.



Regards,



The Apache Hive Team


[ANNOUNCE] Apache Hive 4.0.0 Released

2024-03-29 Thread Denys Kuzmenko
The Apache Hive team is proud to announce the release of Apache Hive
version 4.0.0.

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top
of Apache Hadoop (TM), it provides, among others:

* Tools to enable easy data extract/transform/load (ETL)

* A mechanism to impose structure on a variety of data formats

* Access to files stored either directly in Apache HDFS (TM) or in other
  data storage systems such as Apache HBase (TM)

* Query execution via Apache Hadoop MapReduce, Apache Tez and Apache
Spark frameworks. (MapReduce is deprecated, and Spark has been removed
so the text needs to be modified depending on the release version)

For Hive release details and downloads, please
visit:https://hive.apache.org/downloads.html

Hive 4.0.0 Release Notes are available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843

We would like to thank the many contributors who made this release
possible.

Regards,

The Apache Hive Team


Community Over Code NA 2024 Travel Assistance Applications now open!

2024-03-27 Thread Gavin McDonald
Hello to all users, contributors and Committers!

[ You are receiving this email as a subscriber to one or more ASF project
dev or user
  mailing lists and is not being sent to you directly. It is important that
we reach all of our
  users and contributors/committers so that they may get a chance to
benefit from this.
  We apologise in advance if this doesn't interest you but it is on topic
for the mailing
  lists of the Apache Software Foundation; and it is important please that
you do not
  mark this as spam in your email client. Thank You! ]

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code NA 2024 are now
open!

We will be supporting Community over Code NA, Denver Colorado in
October 7th to the 10th 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Monday 6th May, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Denver, Colorado , October 2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)


Hive-MR3 1.10 released

2024-03-19 Thread Sungwoo Park
Hello Hive users,

We have released Hive on MR3 1.10. MR3 is an execution engine similar to
MapReduce and Tez, and it supports Hadoop, Kubernetes, and standalone mode.
Hive-MR3 uses MR3 for its execution backend in Hive 3.1.3. If you are
interested, please give it a try.

In MR3 1.10, we have re-written the shuffle library in Tez. In the previous
version, all tasks manage fetchers independently of each other. Now all
fetchers inside a container are managed by a common shuffle server.

For those interested in performance comparison, here are the latest results
of testing Hive-MR3 1.9/1.10, Trino 435, and Spark 3.4.1 using the
(original) TPC-DS benchmark with 10TB scale. All the systems were tested
with Java 17.

Hive-MR3 1.9: total 6473 seconds, geo-mean 25.0 seconds.
Hive-MR3 1.10: total 6138 seconds, geo-mean 24.4 seconds.
Trino 435: total 6950 seconds, geo-mean 19.2 seconds. Query 23 returns
wrong results. Query 72 fails.
Spark 3.4.1 (using Parquet instead of ORC): total 19044 seconds, geo-mean
35.9 seconds.

Thank you,

--- Sungwoo


unsubscribe

2024-03-15 Thread stephane . davy
Orange Restricted


smime.p7s
Description: S/MIME cryptographic signature


Some problems encountered when reading ICEBERG with vectorisation turned on

2024-03-10 Thread lisoda
Hi.

I am using HIVE 4.0.0 to read ICEBERG tables. I am having some problems with 
it, so if someone could guide me, that would be great.


Env: hadoop3.3.6  hive4.0.0  tez0.10.2  iceberg1.4.3


iceberg-table: hadoop-catalog-table/location_based_table


Question 1: How tez.mrreader.config.update.properties works?


I'm testing hive-iceberg. My current problem is that I find I can't read all 
the non-partitioned columns under the partitioned table.(With vectorisation 
turned on).
Reading through the code, I found that vectorised reads depend on the value of 
"hive.io.file.readcolumn.ids".
When vectorisation is turned on, TEZ-MAP-TASK relies on the values of the 
following two attributes:
hive.io.file.readcolumn.names  and  hive.io.file.readcolumn.ids
Currently, these two values are dynamically set in TEZ-Driver depending on the 
SQL submitted by the user. 
According to https://issues.apache.org/jira/ browse/TEZ-4248 , the authors seem 
to expect to be able to pass both values to tez-worker.
But, I found that in TezChild, I am not able to get the value of 
hive.io.file.readcolumn.ids which is set in TEZ-ApplicationMaster.
When I assign the value "hive.io.file.readcolumn.ids" directly from the 
console, it reads the ICEBERG partition table just fine. But I can't do this in 
a production environment.
So.How should I troubleshoot this problem?


Question 2: HIVE read ICEBERG non-partitioned table dependency on 
"hive.io.file.readcolumn.ids"?


For non-partitioned tables, I found that in cases where I couldn't get the 
value of "hive.io.file.readcolumn.ids" or the value of 
"hive.io.file.readcolumn.ids" was wrong. I can still read the ICEBERG 
non-partitioned tables just fine.
But from the code, they are using the same code .
So. Why...?


I'm very confused at the moment and I'd be grateful if someone could help me. 
I'd appreciate it. Thank you.







Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-03-01 Thread Takanobu Asanuma
I should have mentioned earlier, but we encountered our problem with
queries between Trino and Hive3 MetaStore.
The tests I reported were also querying Hive1/3 MetaStore using Trino. The
problem might only exist between Trino and Hive3 MetaStore.

- Takanobu

2024年3月1日(金) 14:53 Takanobu Asanuma :

> Yes, for now, we believe that HIVE-14187 has caused performance
> degradation in Hive3 MetaStore.
>
> We also use HiveServer2, but our HiveServer2 directly accesses the backend
> DB without going through the Hive MetaStore, because it enhances
> performance to directly access the DB in a heavily loaded cluster.
> Therefore, we might not encounter the issue of HIVE-20600. We provide the
> Hive MetaStore only for Trino/SparkSQL/etc.
>
> - Takanobu
>
> 2024年3月1日(金) 11:32 Sungwoo Park :
>
>> Thank you for sharing the result. (Does your result imply that HIVE-14187
>> is introducing an intended bug?)
>>
>> Another issue that could be of your interest is the connection leak
>> problem reported in HIVE-20600. Do you see the connection leak problem, or
>> is it not relevant to your environment (e.g., because you don't use
>> HiveServer2)?
>>
>> --- Sungwoo
>>
>> On Fri, Mar 1, 2024 at 9:45 AM Takanobu Asanuma <
>> takanobu.asan...@gmail.com> wrote:
>>
>>> Hi Pau and Sungwoo,
>>>
>>> Thanks for sharing the information.
>>>
>>> We tested a set of simple queries which just referenced the Hive table
>>> and didn't execute any Hive jobs. The result is below.
>>>
>>> No. Version rawstore.impl connectionPoolingType HIVE-14187 QueryTime
>>> --
>>> 1   1.2.1   ObjectStoreNoneNot Applied 11:38
>>> 2   3.1.3   ObjectStoreNoneApplied 34:00
>>> 3   3.1.3   CachedStoreNoneApplied 25:00
>>> 4   3.1.3   ObjectStoreHikariCPApplied 21:10
>>> 5   3.1.3   CachedStoreHikariCPApplied 14:30
>>> 6   3.1.3   ObjectStoreNoneReverted13:00
>>> 7   3.1.3   ObjectStoreHikariCPReverted11:23
>>> --
>>>
>>> Initially, we encountered an issue of Hive MetaStore slowness when we
>>> upgraded from environment No.1 to No.2. As shown in the table, environment
>>> No.2 showed the worst test results.
>>>
>>> A unique aspect of our environment is that we don't use connection
>>> pooling. After some investigation, we thought that the combination of
>>> HIVE-14187 and connectionPoolingType=None was negatively impacting
>>> performance.
>>> The fastest case in our tests was when we reverted HIVE-14187 and set
>>> connectionPoolingType=HikariCP (see No.7). Even with connectionPoolingType
>>> set to None, the environment where we reverted HIVE-14187 still performed
>>> reasonably well (see No.6).
>>>
>>> Please note our investigation is still ongoing and we haven't yet come
>>> to a conclusion.
>>>
>>> Regards,
>>> - Takanobu
>>>
>>> 2024年2月29日(木) 12:18 Sungwoo Park :
>>>
 We didn't make any other attempt to fix the problem and just decided
 not to use CachedStore. However, I think our installation of Metastore
 based on Hive 3.1.3 is running without any serious problems.

 Could you share how long it takes to compile typical queries in your
 environment (with Hive 1 and with Hive 3)?

 FYI, in our environment, sometimes it takes about 10 seconds to compile
 a query on TPC-DS 10TB datasets. Specifically, the average compilation time
 of 103 queries is 1.7 seconds (as reported by Hive), and the longest
 compilation time is 9.6 seconds (query 49). The compilation time includes
 the time for accessing Metastore.

 Thanks,

 --- Sungwoo


 On Wed, Feb 28, 2024 at 9:59 PM Takanobu Asanuma 
 wrote:

> Thanks for your detailed answer!
>
> In the original email, you reported "the query compilation takes long"
> in Hive 3.0, but has this issue been resolved in your fork of Hive 3.1.3?
> Thank you for sharing the issue with CachedStore and the JIRA tickets.
> I will also try out metastore.stats.fetch.bitvector=true.
>
> Regards,
> - Takanobu
>
> 2024年2月28日(水) 18:49 Sungwoo Park :
>
>> Hello Takanobu,
>>
>> We did not test with vanilla Hive 3.1.3 and Metastore databases can
>> be different, so I don't know why Metastore responses are very slow. I 
>> can
>> only share some results of testing CachedStore in Metastore. Please note
>> that we did not use vanilla Hive 3.1.3 and instead used our own fork of
>> Hive 3.1.3 (which applies many additional patches).
>>
>> 1.
>> When CachedStore is enabled, column stats are not computed. As a
>> result, some queries generate very inefficient plans because of
>> wrong/inaccurate stats.
>>
>> Perhaps this is because not all patches for CachedStore have been
>> merged to Hive 3.1.3. For example, these 

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-29 Thread Takanobu Asanuma
Yes, for now, we believe that HIVE-14187 has caused performance degradation
in Hive3 MetaStore.

We also use HiveServer2, but our HiveServer2 directly accesses the backend
DB without going through the Hive MetaStore, because it enhances
performance to directly access the DB in a heavily loaded cluster.
Therefore, we might not encounter the issue of HIVE-20600. We provide the
Hive MetaStore only for Trino/SparkSQL/etc.

- Takanobu

2024年3月1日(金) 11:32 Sungwoo Park :

> Thank you for sharing the result. (Does your result imply that HIVE-14187
> is introducing an intended bug?)
>
> Another issue that could be of your interest is the connection leak
> problem reported in HIVE-20600. Do you see the connection leak problem, or
> is it not relevant to your environment (e.g., because you don't use
> HiveServer2)?
>
> --- Sungwoo
>
> On Fri, Mar 1, 2024 at 9:45 AM Takanobu Asanuma <
> takanobu.asan...@gmail.com> wrote:
>
>> Hi Pau and Sungwoo,
>>
>> Thanks for sharing the information.
>>
>> We tested a set of simple queries which just referenced the Hive table
>> and didn't execute any Hive jobs. The result is below.
>>
>> No. Version rawstore.impl connectionPoolingType HIVE-14187 QueryTime
>> --
>> 1   1.2.1   ObjectStoreNoneNot Applied 11:38
>> 2   3.1.3   ObjectStoreNoneApplied 34:00
>> 3   3.1.3   CachedStoreNoneApplied 25:00
>> 4   3.1.3   ObjectStoreHikariCPApplied 21:10
>> 5   3.1.3   CachedStoreHikariCPApplied 14:30
>> 6   3.1.3   ObjectStoreNoneReverted13:00
>> 7   3.1.3   ObjectStoreHikariCPReverted11:23
>> --
>>
>> Initially, we encountered an issue of Hive MetaStore slowness when we
>> upgraded from environment No.1 to No.2. As shown in the table, environment
>> No.2 showed the worst test results.
>>
>> A unique aspect of our environment is that we don't use connection
>> pooling. After some investigation, we thought that the combination of
>> HIVE-14187 and connectionPoolingType=None was negatively impacting
>> performance.
>> The fastest case in our tests was when we reverted HIVE-14187 and set
>> connectionPoolingType=HikariCP (see No.7). Even with connectionPoolingType
>> set to None, the environment where we reverted HIVE-14187 still performed
>> reasonably well (see No.6).
>>
>> Please note our investigation is still ongoing and we haven't yet come to
>> a conclusion.
>>
>> Regards,
>> - Takanobu
>>
>> 2024年2月29日(木) 12:18 Sungwoo Park :
>>
>>> We didn't make any other attempt to fix the problem and just decided not
>>> to use CachedStore. However, I think our installation of Metastore based on
>>> Hive 3.1.3 is running without any serious problems.
>>>
>>> Could you share how long it takes to compile typical queries in your
>>> environment (with Hive 1 and with Hive 3)?
>>>
>>> FYI, in our environment, sometimes it takes about 10 seconds to compile
>>> a query on TPC-DS 10TB datasets. Specifically, the average compilation time
>>> of 103 queries is 1.7 seconds (as reported by Hive), and the longest
>>> compilation time is 9.6 seconds (query 49). The compilation time includes
>>> the time for accessing Metastore.
>>>
>>> Thanks,
>>>
>>> --- Sungwoo
>>>
>>>
>>> On Wed, Feb 28, 2024 at 9:59 PM Takanobu Asanuma 
>>> wrote:
>>>
 Thanks for your detailed answer!

 In the original email, you reported "the query compilation takes long"
 in Hive 3.0, but has this issue been resolved in your fork of Hive 3.1.3?
 Thank you for sharing the issue with CachedStore and the JIRA tickets.
 I will also try out metastore.stats.fetch.bitvector=true.

 Regards,
 - Takanobu

 2024年2月28日(水) 18:49 Sungwoo Park :

> Hello Takanobu,
>
> We did not test with vanilla Hive 3.1.3 and Metastore databases can be
> different, so I don't know why Metastore responses are very slow. I can
> only share some results of testing CachedStore in Metastore. Please note
> that we did not use vanilla Hive 3.1.3 and instead used our own fork of
> Hive 3.1.3 (which applies many additional patches).
>
> 1.
> When CachedStore is enabled, column stats are not computed. As a
> result, some queries generate very inefficient plans because of
> wrong/inaccurate stats.
>
> Perhaps this is because not all patches for CachedStore have been
> merged to Hive 3.1.3. For example, these patches are not merged. Or, there
> might be some way to properly configure CachedStore so that it correctly
> computes column stats.
>
> HIVE-20896: CachedStore fail to cache stats in multiple code paths
> HIVE-21063: Support statistics in cachedStore for transactional table
> HIVE-24258: Data mismatch between CachedStore and ObjectStore for
> constraint
>
> So, we decided that CachedStore 

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-29 Thread Sungwoo Park
Thank you for sharing the result. (Does your result imply that HIVE-14187
is introducing an intended bug?)

Another issue that could be of your interest is the connection leak problem
reported in HIVE-20600. Do you see the connection leak problem, or is it
not relevant to your environment (e.g., because you don't use HiveServer2)?

--- Sungwoo

On Fri, Mar 1, 2024 at 9:45 AM Takanobu Asanuma 
wrote:

> Hi Pau and Sungwoo,
>
> Thanks for sharing the information.
>
> We tested a set of simple queries which just referenced the Hive table and
> didn't execute any Hive jobs. The result is below.
>
> No. Version rawstore.impl connectionPoolingType HIVE-14187 QueryTime
> --
> 1   1.2.1   ObjectStoreNoneNot Applied 11:38
> 2   3.1.3   ObjectStoreNoneApplied 34:00
> 3   3.1.3   CachedStoreNoneApplied 25:00
> 4   3.1.3   ObjectStoreHikariCPApplied 21:10
> 5   3.1.3   CachedStoreHikariCPApplied 14:30
> 6   3.1.3   ObjectStoreNoneReverted13:00
> 7   3.1.3   ObjectStoreHikariCPReverted11:23
> --
>
> Initially, we encountered an issue of Hive MetaStore slowness when we
> upgraded from environment No.1 to No.2. As shown in the table, environment
> No.2 showed the worst test results.
>
> A unique aspect of our environment is that we don't use connection
> pooling. After some investigation, we thought that the combination of
> HIVE-14187 and connectionPoolingType=None was negatively impacting
> performance.
> The fastest case in our tests was when we reverted HIVE-14187 and set
> connectionPoolingType=HikariCP (see No.7). Even with connectionPoolingType
> set to None, the environment where we reverted HIVE-14187 still performed
> reasonably well (see No.6).
>
> Please note our investigation is still ongoing and we haven't yet come to
> a conclusion.
>
> Regards,
> - Takanobu
>
> 2024年2月29日(木) 12:18 Sungwoo Park :
>
>> We didn't make any other attempt to fix the problem and just decided not
>> to use CachedStore. However, I think our installation of Metastore based on
>> Hive 3.1.3 is running without any serious problems.
>>
>> Could you share how long it takes to compile typical queries in your
>> environment (with Hive 1 and with Hive 3)?
>>
>> FYI, in our environment, sometimes it takes about 10 seconds to compile a
>> query on TPC-DS 10TB datasets. Specifically, the average compilation time
>> of 103 queries is 1.7 seconds (as reported by Hive), and the longest
>> compilation time is 9.6 seconds (query 49). The compilation time includes
>> the time for accessing Metastore.
>>
>> Thanks,
>>
>> --- Sungwoo
>>
>>
>> On Wed, Feb 28, 2024 at 9:59 PM Takanobu Asanuma 
>> wrote:
>>
>>> Thanks for your detailed answer!
>>>
>>> In the original email, you reported "the query compilation takes long"
>>> in Hive 3.0, but has this issue been resolved in your fork of Hive 3.1.3?
>>> Thank you for sharing the issue with CachedStore and the JIRA tickets.
>>> I will also try out metastore.stats.fetch.bitvector=true.
>>>
>>> Regards,
>>> - Takanobu
>>>
>>> 2024年2月28日(水) 18:49 Sungwoo Park :
>>>
 Hello Takanobu,

 We did not test with vanilla Hive 3.1.3 and Metastore databases can be
 different, so I don't know why Metastore responses are very slow. I can
 only share some results of testing CachedStore in Metastore. Please note
 that we did not use vanilla Hive 3.1.3 and instead used our own fork of
 Hive 3.1.3 (which applies many additional patches).

 1.
 When CachedStore is enabled, column stats are not computed. As a
 result, some queries generate very inefficient plans because of
 wrong/inaccurate stats.

 Perhaps this is because not all patches for CachedStore have been
 merged to Hive 3.1.3. For example, these patches are not merged. Or, there
 might be some way to properly configure CachedStore so that it correctly
 computes column stats.

 HIVE-20896: CachedStore fail to cache stats in multiple code paths
 HIVE-21063: Support statistics in cachedStore for transactional table
 HIVE-24258: Data mismatch between CachedStore and ObjectStore for
 constraint

 So, we decided that CachedStore should not be enabled in Hive 3.1.3.

 (If anyone is running Hive Metastore 3.1.3 in production with
 CachedStore enabled, please let us know how you configure it.)

 2.
 Setting metastore.stats.fetch.bitvector=true can also help generate
 more efficient query plans.

 --- Sungwoo


 On Wed, Feb 28, 2024 at 1:40 PM Takanobu Asanuma 
 wrote:

> Hi Sungwoo Park,
>
> I'm sorry for the late reply to this old email.
> We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and
> noticed that the response of the Hive3 MetaStore is 

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-29 Thread Takanobu Asanuma
Hi Pau and Sungwoo,

Thanks for sharing the information.

We tested a set of simple queries which just referenced the Hive table and
didn't execute any Hive jobs. The result is below.

No. Version rawstore.impl connectionPoolingType HIVE-14187 QueryTime
--
1   1.2.1   ObjectStoreNoneNot Applied 11:38
2   3.1.3   ObjectStoreNoneApplied 34:00
3   3.1.3   CachedStoreNoneApplied 25:00
4   3.1.3   ObjectStoreHikariCPApplied 21:10
5   3.1.3   CachedStoreHikariCPApplied 14:30
6   3.1.3   ObjectStoreNoneReverted13:00
7   3.1.3   ObjectStoreHikariCPReverted11:23
--

Initially, we encountered an issue of Hive MetaStore slowness when we
upgraded from environment No.1 to No.2. As shown in the table, environment
No.2 showed the worst test results.

A unique aspect of our environment is that we don't use connection pooling.
After some investigation, we thought that the combination of HIVE-14187 and
connectionPoolingType=None was negatively impacting performance.
The fastest case in our tests was when we reverted HIVE-14187 and set
connectionPoolingType=HikariCP (see No.7). Even with connectionPoolingType
set to None, the environment where we reverted HIVE-14187 still performed
reasonably well (see No.6).

Please note our investigation is still ongoing and we haven't yet come to a
conclusion.

Regards,
- Takanobu

2024年2月29日(木) 12:18 Sungwoo Park :

> We didn't make any other attempt to fix the problem and just decided not
> to use CachedStore. However, I think our installation of Metastore based on
> Hive 3.1.3 is running without any serious problems.
>
> Could you share how long it takes to compile typical queries in your
> environment (with Hive 1 and with Hive 3)?
>
> FYI, in our environment, sometimes it takes about 10 seconds to compile a
> query on TPC-DS 10TB datasets. Specifically, the average compilation time
> of 103 queries is 1.7 seconds (as reported by Hive), and the longest
> compilation time is 9.6 seconds (query 49). The compilation time includes
> the time for accessing Metastore.
>
> Thanks,
>
> --- Sungwoo
>
>
> On Wed, Feb 28, 2024 at 9:59 PM Takanobu Asanuma 
> wrote:
>
>> Thanks for your detailed answer!
>>
>> In the original email, you reported "the query compilation takes long" in
>> Hive 3.0, but has this issue been resolved in your fork of Hive 3.1.3?
>> Thank you for sharing the issue with CachedStore and the JIRA tickets.
>> I will also try out metastore.stats.fetch.bitvector=true.
>>
>> Regards,
>> - Takanobu
>>
>> 2024年2月28日(水) 18:49 Sungwoo Park :
>>
>>> Hello Takanobu,
>>>
>>> We did not test with vanilla Hive 3.1.3 and Metastore databases can be
>>> different, so I don't know why Metastore responses are very slow. I can
>>> only share some results of testing CachedStore in Metastore. Please note
>>> that we did not use vanilla Hive 3.1.3 and instead used our own fork of
>>> Hive 3.1.3 (which applies many additional patches).
>>>
>>> 1.
>>> When CachedStore is enabled, column stats are not computed. As a result,
>>> some queries generate very inefficient plans because of wrong/inaccurate
>>> stats.
>>>
>>> Perhaps this is because not all patches for CachedStore have been merged
>>> to Hive 3.1.3. For example, these patches are not merged. Or, there might
>>> be some way to properly configure CachedStore so that it correctly computes
>>> column stats.
>>>
>>> HIVE-20896: CachedStore fail to cache stats in multiple code paths
>>> HIVE-21063: Support statistics in cachedStore for transactional table
>>> HIVE-24258: Data mismatch between CachedStore and ObjectStore for
>>> constraint
>>>
>>> So, we decided that CachedStore should not be enabled in Hive 3.1.3.
>>>
>>> (If anyone is running Hive Metastore 3.1.3 in production with
>>> CachedStore enabled, please let us know how you configure it.)
>>>
>>> 2.
>>> Setting metastore.stats.fetch.bitvector=true can also help generate more
>>> efficient query plans.
>>>
>>> --- Sungwoo
>>>
>>>
>>> On Wed, Feb 28, 2024 at 1:40 PM Takanobu Asanuma 
>>> wrote:
>>>
 Hi Sungwoo Park,

 I'm sorry for the late reply to this old email.
 We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and
 noticed that the response of the Hive3 MetaStore is very slow.
 We suspect that HIVE-14187 might be causing this slowness.
 Could you tell me if you have resolved this problem? Are there still
 any problems when you enable CachedStore?

 Regards,
 - Takanobu

 2018年6月13日(水) 0:37 Sungwoo Park :

> Hello Hive users,
>
> I am experience a problem with MetaStore in Hive 3.0.
>
> 1. Start MetaStore
> with 
> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore.
>
> 2. Generate TPC-DS data.
>
> 3. 

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-28 Thread Sungwoo Park
We didn't make any other attempt to fix the problem and just decided not to
use CachedStore. However, I think our installation of Metastore based on
Hive 3.1.3 is running without any serious problems.

Could you share how long it takes to compile typical queries in your
environment (with Hive 1 and with Hive 3)?

FYI, in our environment, sometimes it takes about 10 seconds to compile a
query on TPC-DS 10TB datasets. Specifically, the average compilation time
of 103 queries is 1.7 seconds (as reported by Hive), and the longest
compilation time is 9.6 seconds (query 49). The compilation time includes
the time for accessing Metastore.

Thanks,

--- Sungwoo


On Wed, Feb 28, 2024 at 9:59 PM Takanobu Asanuma 
wrote:

> Thanks for your detailed answer!
>
> In the original email, you reported "the query compilation takes long" in
> Hive 3.0, but has this issue been resolved in your fork of Hive 3.1.3?
> Thank you for sharing the issue with CachedStore and the JIRA tickets.
> I will also try out metastore.stats.fetch.bitvector=true.
>
> Regards,
> - Takanobu
>
> 2024年2月28日(水) 18:49 Sungwoo Park :
>
>> Hello Takanobu,
>>
>> We did not test with vanilla Hive 3.1.3 and Metastore databases can be
>> different, so I don't know why Metastore responses are very slow. I can
>> only share some results of testing CachedStore in Metastore. Please note
>> that we did not use vanilla Hive 3.1.3 and instead used our own fork of
>> Hive 3.1.3 (which applies many additional patches).
>>
>> 1.
>> When CachedStore is enabled, column stats are not computed. As a result,
>> some queries generate very inefficient plans because of wrong/inaccurate
>> stats.
>>
>> Perhaps this is because not all patches for CachedStore have been merged
>> to Hive 3.1.3. For example, these patches are not merged. Or, there might
>> be some way to properly configure CachedStore so that it correctly computes
>> column stats.
>>
>> HIVE-20896: CachedStore fail to cache stats in multiple code paths
>> HIVE-21063: Support statistics in cachedStore for transactional table
>> HIVE-24258: Data mismatch between CachedStore and ObjectStore for
>> constraint
>>
>> So, we decided that CachedStore should not be enabled in Hive 3.1.3.
>>
>> (If anyone is running Hive Metastore 3.1.3 in production with CachedStore
>> enabled, please let us know how you configure it.)
>>
>> 2.
>> Setting metastore.stats.fetch.bitvector=true can also help generate more
>> efficient query plans.
>>
>> --- Sungwoo
>>
>>
>> On Wed, Feb 28, 2024 at 1:40 PM Takanobu Asanuma 
>> wrote:
>>
>>> Hi Sungwoo Park,
>>>
>>> I'm sorry for the late reply to this old email.
>>> We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and
>>> noticed that the response of the Hive3 MetaStore is very slow.
>>> We suspect that HIVE-14187 might be causing this slowness.
>>> Could you tell me if you have resolved this problem? Are there still any
>>> problems when you enable CachedStore?
>>>
>>> Regards,
>>> - Takanobu
>>>
>>> 2018年6月13日(水) 0:37 Sungwoo Park :
>>>
 Hello Hive users,

 I am experience a problem with MetaStore in Hive 3.0.

 1. Start MetaStore
 with 
 hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore.

 2. Generate TPC-DS data.

 3. TPC-DS queries run okay and produce correct results. E.g., from
 query 1:
 +---+
 |   c_customer_id   |
 +---+
 | CHAA  |
 | DCAA  |
 | DDAA  |
 ...
 | AAAILIAA  |
 +---+
 100 rows selected (69.901 seconds)

 However, the query compilation takes long (
 https://issues.apache.org/jira/browse/HIVE-16520).

 4. Now, restart MetaStore with
 hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.cache.CachedStore.

 5. TPC-DS queries run okay, but produce wrong results. E.g, from query
 1:
 ++
 | c_customer_id  |
 ++
 ++
 No rows selected (37.448 seconds)

 What I noticed is that with hive.metastore.rawstore.impl=CachedStore,
 HiveServer2 produces such log messages:

 2018-06-12T23:50:04,223  WARN [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
 tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
 2018-06-12T23:50:04,223  INFO [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
 tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
 2018-06-12T23:50:04,225  WARN [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
 tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
 2018-06-12T23:50:04,225  INFO [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] 

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-28 Thread Pau Tallada
Hi,

We also had to disable CachedStore as it was producing wrong results in our
queries.
I'm sorry I cannot provide more detailed info.

Cheers,

Pau.

Missatge de Takanobu Asanuma  del dia dc., 28 de febr.
2024 a les 13:59:

> Thanks for your detailed answer!
>
> In the original email, you reported "the query compilation takes long" in
> Hive 3.0, but has this issue been resolved in your fork of Hive 3.1.3?
> Thank you for sharing the issue with CachedStore and the JIRA tickets.
> I will also try out metastore.stats.fetch.bitvector=true.
>
> Regards,
> - Takanobu
>
> 2024年2月28日(水) 18:49 Sungwoo Park :
>
>> Hello Takanobu,
>>
>> We did not test with vanilla Hive 3.1.3 and Metastore databases can be
>> different, so I don't know why Metastore responses are very slow. I can
>> only share some results of testing CachedStore in Metastore. Please note
>> that we did not use vanilla Hive 3.1.3 and instead used our own fork of
>> Hive 3.1.3 (which applies many additional patches).
>>
>> 1.
>> When CachedStore is enabled, column stats are not computed. As a result,
>> some queries generate very inefficient plans because of wrong/inaccurate
>> stats.
>>
>> Perhaps this is because not all patches for CachedStore have been merged
>> to Hive 3.1.3. For example, these patches are not merged. Or, there might
>> be some way to properly configure CachedStore so that it correctly computes
>> column stats.
>>
>> HIVE-20896: CachedStore fail to cache stats in multiple code paths
>> HIVE-21063: Support statistics in cachedStore for transactional table
>> HIVE-24258: Data mismatch between CachedStore and ObjectStore for
>> constraint
>>
>> So, we decided that CachedStore should not be enabled in Hive 3.1.3.
>>
>> (If anyone is running Hive Metastore 3.1.3 in production with CachedStore
>> enabled, please let us know how you configure it.)
>>
>> 2.
>> Setting metastore.stats.fetch.bitvector=true can also help generate more
>> efficient query plans.
>>
>> --- Sungwoo
>>
>>
>> On Wed, Feb 28, 2024 at 1:40 PM Takanobu Asanuma 
>> wrote:
>>
>>> Hi Sungwoo Park,
>>>
>>> I'm sorry for the late reply to this old email.
>>> We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and
>>> noticed that the response of the Hive3 MetaStore is very slow.
>>> We suspect that HIVE-14187 might be causing this slowness.
>>> Could you tell me if you have resolved this problem? Are there still any
>>> problems when you enable CachedStore?
>>>
>>> Regards,
>>> - Takanobu
>>>
>>> 2018年6月13日(水) 0:37 Sungwoo Park :
>>>
 Hello Hive users,

 I am experience a problem with MetaStore in Hive 3.0.

 1. Start MetaStore
 with 
 hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore.

 2. Generate TPC-DS data.

 3. TPC-DS queries run okay and produce correct results. E.g., from
 query 1:
 +---+
 |   c_customer_id   |
 +---+
 | CHAA  |
 | DCAA  |
 | DDAA  |
 ...
 | AAAILIAA  |
 +---+
 100 rows selected (69.901 seconds)

 However, the query compilation takes long (
 https://issues.apache.org/jira/browse/HIVE-16520).

 4. Now, restart MetaStore with
 hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.cache.CachedStore.

 5. TPC-DS queries run okay, but produce wrong results. E.g, from query
 1:
 ++
 | c_customer_id  |
 ++
 ++
 No rows selected (37.448 seconds)

 What I noticed is that with hive.metastore.rawstore.impl=CachedStore,
 HiveServer2 produces such log messages:

 2018-06-12T23:50:04,223  WARN [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
 tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
 2018-06-12T23:50:04,223  INFO [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
 tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
 2018-06-12T23:50:04,225  WARN [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
 tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
 2018-06-12T23:50:04,225  INFO [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
 tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
 2018-06-12T23:50:04,226  WARN [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
 tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
 c_customer_id
 2018-06-12T23:50:04,226  INFO [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
 

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-28 Thread Takanobu Asanuma
Thanks for your detailed answer!

In the original email, you reported "the query compilation takes long" in
Hive 3.0, but has this issue been resolved in your fork of Hive 3.1.3?
Thank you for sharing the issue with CachedStore and the JIRA tickets.
I will also try out metastore.stats.fetch.bitvector=true.

Regards,
- Takanobu

2024年2月28日(水) 18:49 Sungwoo Park :

> Hello Takanobu,
>
> We did not test with vanilla Hive 3.1.3 and Metastore databases can be
> different, so I don't know why Metastore responses are very slow. I can
> only share some results of testing CachedStore in Metastore. Please note
> that we did not use vanilla Hive 3.1.3 and instead used our own fork of
> Hive 3.1.3 (which applies many additional patches).
>
> 1.
> When CachedStore is enabled, column stats are not computed. As a result,
> some queries generate very inefficient plans because of wrong/inaccurate
> stats.
>
> Perhaps this is because not all patches for CachedStore have been merged
> to Hive 3.1.3. For example, these patches are not merged. Or, there might
> be some way to properly configure CachedStore so that it correctly computes
> column stats.
>
> HIVE-20896: CachedStore fail to cache stats in multiple code paths
> HIVE-21063: Support statistics in cachedStore for transactional table
> HIVE-24258: Data mismatch between CachedStore and ObjectStore for
> constraint
>
> So, we decided that CachedStore should not be enabled in Hive 3.1.3.
>
> (If anyone is running Hive Metastore 3.1.3 in production with CachedStore
> enabled, please let us know how you configure it.)
>
> 2.
> Setting metastore.stats.fetch.bitvector=true can also help generate more
> efficient query plans.
>
> --- Sungwoo
>
>
> On Wed, Feb 28, 2024 at 1:40 PM Takanobu Asanuma 
> wrote:
>
>> Hi Sungwoo Park,
>>
>> I'm sorry for the late reply to this old email.
>> We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and
>> noticed that the response of the Hive3 MetaStore is very slow.
>> We suspect that HIVE-14187 might be causing this slowness.
>> Could you tell me if you have resolved this problem? Are there still any
>> problems when you enable CachedStore?
>>
>> Regards,
>> - Takanobu
>>
>> 2018年6月13日(水) 0:37 Sungwoo Park :
>>
>>> Hello Hive users,
>>>
>>> I am experience a problem with MetaStore in Hive 3.0.
>>>
>>> 1. Start MetaStore
>>> with 
>>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore.
>>>
>>> 2. Generate TPC-DS data.
>>>
>>> 3. TPC-DS queries run okay and produce correct results. E.g., from query
>>> 1:
>>> +---+
>>> |   c_customer_id   |
>>> +---+
>>> | CHAA  |
>>> | DCAA  |
>>> | DDAA  |
>>> ...
>>> | AAAILIAA  |
>>> +---+
>>> 100 rows selected (69.901 seconds)
>>>
>>> However, the query compilation takes long (
>>> https://issues.apache.org/jira/browse/HIVE-16520).
>>>
>>> 4. Now, restart MetaStore with
>>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.cache.CachedStore.
>>>
>>> 5. TPC-DS queries run okay, but produce wrong results. E.g, from query 1:
>>> ++
>>> | c_customer_id  |
>>> ++
>>> ++
>>> No rows selected (37.448 seconds)
>>>
>>> What I noticed is that with hive.metastore.rawstore.impl=CachedStore,
>>> HiveServer2 produces such log messages:
>>>
>>> 2018-06-12T23:50:04,223  WARN [b3041385-0290-492f-aef8-c0249de328ad
>>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
>>> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
>>> 2018-06-12T23:50:04,223  INFO [b3041385-0290-492f-aef8-c0249de328ad
>>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
>>> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
>>> 2018-06-12T23:50:04,225  WARN [b3041385-0290-492f-aef8-c0249de328ad
>>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
>>> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
>>> 2018-06-12T23:50:04,225  INFO [b3041385-0290-492f-aef8-c0249de328ad
>>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
>>> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
>>> 2018-06-12T23:50:04,226  WARN [b3041385-0290-492f-aef8-c0249de328ad
>>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
>>> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
>>> c_customer_id
>>> 2018-06-12T23:50:04,226  INFO [b3041385-0290-492f-aef8-c0249de328ad
>>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
>>> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
>>> c_customer_id
>>>
>>> 2018-06-12T23:50:05,158 ERROR [b3041385-0290-492f-aef8-c0249de328ad
>>> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
>>> Invalid column stats: No of nulls > cardinality
>>> 2018-06-12T23:50:05,159 ERROR [b3041385-0290-492f-aef8-c0249de328ad
>>> 

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-28 Thread Sungwoo Park
Hello Takanobu,

We did not test with vanilla Hive 3.1.3 and Metastore databases can be
different, so I don't know why Metastore responses are very slow. I can
only share some results of testing CachedStore in Metastore. Please note
that we did not use vanilla Hive 3.1.3 and instead used our own fork of
Hive 3.1.3 (which applies many additional patches).

1.
When CachedStore is enabled, column stats are not computed. As a result,
some queries generate very inefficient plans because of wrong/inaccurate
stats.

Perhaps this is because not all patches for CachedStore have been merged to
Hive 3.1.3. For example, these patches are not merged. Or, there might be
some way to properly configure CachedStore so that it correctly computes
column stats.

HIVE-20896: CachedStore fail to cache stats in multiple code paths
HIVE-21063: Support statistics in cachedStore for transactional table
HIVE-24258: Data mismatch between CachedStore and ObjectStore for constraint

So, we decided that CachedStore should not be enabled in Hive 3.1.3.

(If anyone is running Hive Metastore 3.1.3 in production with CachedStore
enabled, please let us know how you configure it.)

2.
Setting metastore.stats.fetch.bitvector=true can also help generate more
efficient query plans.

--- Sungwoo


On Wed, Feb 28, 2024 at 1:40 PM Takanobu Asanuma 
wrote:

> Hi Sungwoo Park,
>
> I'm sorry for the late reply to this old email.
> We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and
> noticed that the response of the Hive3 MetaStore is very slow.
> We suspect that HIVE-14187 might be causing this slowness.
> Could you tell me if you have resolved this problem? Are there still any
> problems when you enable CachedStore?
>
> Regards,
> - Takanobu
>
> 2018年6月13日(水) 0:37 Sungwoo Park :
>
>> Hello Hive users,
>>
>> I am experience a problem with MetaStore in Hive 3.0.
>>
>> 1. Start MetaStore
>> with 
>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore.
>>
>> 2. Generate TPC-DS data.
>>
>> 3. TPC-DS queries run okay and produce correct results. E.g., from query
>> 1:
>> +---+
>> |   c_customer_id   |
>> +---+
>> | CHAA  |
>> | DCAA  |
>> | DDAA  |
>> ...
>> | AAAILIAA  |
>> +---+
>> 100 rows selected (69.901 seconds)
>>
>> However, the query compilation takes long (
>> https://issues.apache.org/jira/browse/HIVE-16520).
>>
>> 4. Now, restart MetaStore with
>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.cache.CachedStore.
>>
>> 5. TPC-DS queries run okay, but produce wrong results. E.g, from query 1:
>> ++
>> | c_customer_id  |
>> ++
>> ++
>> No rows selected (37.448 seconds)
>>
>> What I noticed is that with hive.metastore.rawstore.impl=CachedStore,
>> HiveServer2 produces such log messages:
>>
>> 2018-06-12T23:50:04,223  WARN [b3041385-0290-492f-aef8-c0249de328ad
>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
>> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
>> 2018-06-12T23:50:04,223  INFO [b3041385-0290-492f-aef8-c0249de328ad
>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
>> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
>> 2018-06-12T23:50:04,225  WARN [b3041385-0290-492f-aef8-c0249de328ad
>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
>> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
>> 2018-06-12T23:50:04,225  INFO [b3041385-0290-492f-aef8-c0249de328ad
>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
>> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
>> 2018-06-12T23:50:04,226  WARN [b3041385-0290-492f-aef8-c0249de328ad
>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
>> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
>> c_customer_id
>> 2018-06-12T23:50:04,226  INFO [b3041385-0290-492f-aef8-c0249de328ad
>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
>> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
>> c_customer_id
>>
>> 2018-06-12T23:50:05,158 ERROR [b3041385-0290-492f-aef8-c0249de328ad
>> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
>> Invalid column stats: No of nulls > cardinality
>> 2018-06-12T23:50:05,159 ERROR [b3041385-0290-492f-aef8-c0249de328ad
>> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
>> Invalid column stats: No of nulls > cardinality
>> 2018-06-12T23:50:05,160 ERROR [b3041385-0290-492f-aef8-c0249de328ad
>> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
>> Invalid column stats: No of nulls > cardinality
>>
>> However, even after computing column stats, queries still return wrong
>> results, despite the fact that the above log messages disappear.
>>
>> I guess I am missing some configuration parameters 

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-27 Thread Takanobu Asanuma
Hi Sungwoo Park,

I'm sorry for the late reply to this old email.
We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and
noticed that the response of the Hive3 MetaStore is very slow.
We suspect that HIVE-14187 might be causing this slowness.
Could you tell me if you have resolved this problem? Are there still any
problems when you enable CachedStore?

Regards,
- Takanobu

2018年6月13日(水) 0:37 Sungwoo Park :

> Hello Hive users,
>
> I am experience a problem with MetaStore in Hive 3.0.
>
> 1. Start MetaStore
> with 
> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore.
>
> 2. Generate TPC-DS data.
>
> 3. TPC-DS queries run okay and produce correct results. E.g., from query 1:
> +---+
> |   c_customer_id   |
> +---+
> | CHAA  |
> | DCAA  |
> | DDAA  |
> ...
> | AAAILIAA  |
> +---+
> 100 rows selected (69.901 seconds)
>
> However, the query compilation takes long (
> https://issues.apache.org/jira/browse/HIVE-16520).
>
> 4. Now, restart MetaStore with
> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.cache.CachedStore.
>
> 5. TPC-DS queries run okay, but produce wrong results. E.g, from query 1:
> ++
> | c_customer_id  |
> ++
> ++
> No rows selected (37.448 seconds)
>
> What I noticed is that with hive.metastore.rawstore.impl=CachedStore,
> HiveServer2 produces such log messages:
>
> 2018-06-12T23:50:04,223  WARN [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
> 2018-06-12T23:50:04,223  INFO [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
> 2018-06-12T23:50:04,225  WARN [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
> 2018-06-12T23:50:04,225  INFO [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
> 2018-06-12T23:50:04,226  WARN [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
> c_customer_id
> 2018-06-12T23:50:04,226  INFO [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
> c_customer_id
>
> 2018-06-12T23:50:05,158 ERROR [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
> Invalid column stats: No of nulls > cardinality
> 2018-06-12T23:50:05,159 ERROR [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
> Invalid column stats: No of nulls > cardinality
> 2018-06-12T23:50:05,160 ERROR [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
> Invalid column stats: No of nulls > cardinality
>
> However, even after computing column stats, queries still return wrong
> results, despite the fact that the above log messages disappear.
>
> I guess I am missing some configuration parameters (because I imported
> hive-site.xml from Hive 2). Any suggestion would be appreciated.
>
> Thanks a lot,
>
> --- Sungwoo Park
>
>


Community Over Code Asia 2024 Travel Assistance Applications now open!

2024-02-20 Thread Gavin McDonald
Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code Asia 2024 are now
open!

We will be supporting Community over Code Asia, Hangzhou, China
July 26th - 28th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this year's applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, May 10th, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you to
apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Hangzhou, China in July, 2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)


Re: [EXTERNAL] Backport HIVE-21075 into 3.2.0 release

2024-02-07 Thread Aman Raj via user
Hi ,

No HIVE-21075 was not planned as of now in the 3.2.0 release. But will check 
the feasibility and implications of the same and add it to the parent JIRA.

Its difficult to quote an exact date for the release since we are working on 
backporting some important tickets (linked to parent JIRA) but hopefully we 
will come up with a release date soon.

Thanks,
Aman.

From: Daniel Cristian 
Sent: Wednesday, February 7, 2024 8:20 PM
To: user@hive.apache.org 
Subject: [EXTERNAL] Backport HIVE-21075 into 3.2.0 release

You don't often get email from danielcri...@gmail.com. Learn why this is 
important
Hi,

I saw that HIVE-21075 has a 
bug fix on a performance problem for drop partitions with PostgreSQL or MySQL, 
and the fix was merged only into the 4.0.0-alph-1 version.

I also saw that you plan to release a 3.2.0 version with 
HIVE-26751. Do you plan to 
backport HIVE-21075 into 
HIVE-26751 and have this bug 
fixed on the next release?

Would you also let me know if you have an expected date for the new release?

I'm using an AWS RDS PostgreSQL DB for my Hive Metastore and suffer from this 
performance problem where my RDS keeps 100% CPU usage during the unregistering 
process.

Best Regards,
Daniel Cristian




Backport HIVE-21075 into 3.2.0 release

2024-02-07 Thread Daniel Cristian
Hi,

I saw that HIVE-21075 
has a bug fix on a performance problem for drop partitions with PostgreSQL
or MySQL, and the fix was merged only into the 4.0.0-alph-1 version.

I also saw that you plan to release a 3.2.0 version with HIVE-26751
. Do you plan to backport
HIVE-21075  into
HIVE-26751  and have this
bug fixed on the next release?

Would you also let me know if you have an expected date for the new release?

I'm using an AWS RDS PostgreSQL DB for my Hive Metastore and suffer from
this performance problem where my RDS keeps 100% CPU usage during the
unregistering process.

Best Regards,
Daniel Cristian


Community over Code EU 2024 Travel Assistance Applications now open!

2024-02-03 Thread Gavin McDonald
Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code EU 2024 are now
open!

We will be supporting Community over Code EU, Bratislava, Slovakia,
June 3th - 5th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, March 1st, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Bratislava, Slovakia in June,
2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)


[no subject]

2024-02-03 Thread Gavin McDonald
Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code EU 2024 are now
open!

We will be supporting Community over Code EU, Bratislava, Slovakia,
June 3th - 5th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, March 1st, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Bratislava, Slovakia in June,
2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)


?????? [Hive Support] Query about StandardStructObjectInspector converting field names to lowercase

2024-02-02 Thread chang.wd
Hi,


I re-try my use-case on Apache Hive 4.0.0-beta-1 release. ??Just with create 
table statement.??
This behavior still exists.


Best regards,


Chang
--  --
??: 
   "user"   
 https://hive.apache.org/general/downloads/

On Mon, Jan 29, 2024 at 5:52?6?2AM chang.wd 

Re: [Hive Support] Query about StandardStructObjectInspector converting field names to lowercase

2024-02-01 Thread Stamatis Zampetakis
Hi Chang,

The hive-hcatalog-core-1.1.0-cdh5.13.1.jar jar file is not something
maintained by Apache. For vendor specific problems you should reach
out to the respective support team from where you obtained the
product.

Apart from that the version that you are using (5.13.1) is quite old.
Please re-try your use-case with the latest Apache Hive 4.0.0-beta-1
release [1] and report back if you still observe unexpected behavior.

Best,
Stamatis

[1] https://hive.apache.org/general/downloads/

On Mon, Jan 29, 2024 at 5:52 AM chang.wd  wrote:
>
> Dear Hive Support Team,
>
> I hope you are doing well. I am writing to inquire about a specific behavior 
> I encountered in Hive, related to the 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector 
> class.
>
> Sql to reply this behavior:
> ```
> -- add JsonSerDe jar
> ADD JAR hive-hcatalog-core-1.1.0-cdh5.13.1.jar;
> -- create json table, the `struct` will become to lower case: 
> `struct`.
> CREATE TABLE `test.hive_json_struct_schema`(
>   `cond_keys` struct
> )
> ROW FORMAT SERDE
>   'org.apache.hive.hcatalog.data.JsonSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> ```
>
> When using the StandardStructObjectInspector class, it appears that field 
> names are being automatically converted to lowercase in the following code 
> snippet:
>
> ```
> this.fieldName = fieldName.toLowerCase();
> ```
>
> This behavior subsequently causes issues when querying JSON formatted tables, 
> particularly when nested Struct field names within the JSON data contain a 
> mix of uppercase and lowercase characters. Since field names are being 
> changed to lowercase by the StandardStructObjectInspector class, the actual 
> field names no longer match the expected field names, which leads to errors 
> when reading the data.(Not with SQL)
>
> I would appreciate if you could kindly provide an explanation for this design 
> choice and whether there are any available workarounds or alternative 
> solutions for this scenario. I understand that the class may have been 
> implemented to avoid case sensitivity issues, but in cases like mine where 
> field name case matters, it would be helpful to have a better understanding 
> of how to handle this situation.
>
> Thank you in advance for your assistance and guidance. I look forward to 
> hearing from you.
>
> Best regards,
>
> Chang


[Hive Support] Query about StandardStructObjectInspector converting field names to lowercase

2024-01-28 Thread chang.wd
Dear Hive Support Team,


I hope you are doing well. I am writing to inquire about a specific behavior I 
encountered in Hive, related to the 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector 
class.


Sql to reply this behavior:
```
-- add JsonSerDe jar
ADD JAR hive-hcatalog-core-1.1.0-cdh5.13.1.jar;
-- create json table, the `struct

TABLESAMPLE with buckets not working, it does not prune input

2024-01-24 Thread Pau Tallada
Hi all,

We have a web platform in production[1] that uses Hive to facilitate access
to massive cosmological datasets.
When launched in 2016 over Hive 2.1.2 we used the TABLESAMPLE clause on
clustered tables to allow quick subsampling of the data.
However, we have been unable to get the same behaviour using Hive 3.1.2.

Tables are clustered following indications[2], but the queries always read
all the data in the table.
In fact, they read even more data (HDFS_BYTES_READ counter) when the
tablesample clause is used.

Example query on a very large table:

SELECT SUM(float_column) FROM huge_clustered_table
=>
23552 tasks
HDFS_BYTES_READ  = 44475888168 (44G)

SELECT SUM(float_column) FROM huge_clustered_table
TABLESAMPLE(BUCKET1 OUT OF 1024)
=>
*23552 tasks*

*HDFS_BYTES_READ  = 58372075670 (58G) ()*

However, using block sampling:

SELECT SUM(float_column) FROM huge_clustered_table
TABLESAMPLE(0.1 PERCENT)
=>

*25 tasks*
* HDFS_BYTES_READ  = 45484944 (45M)*

Please, any hint would be greatly appreciated!

[1] https://cosmohub.pic.es
[2] https://cwiki.apache.org/confluence/display/hive/languagemanual+sampling
-- 
--
Pau Tallada Crespí
Departament de Serveis
Port d'Informació Científica (PIC)
Tel: +34 93 170 2729
--


Re: Hive on Docker

2024-01-18 Thread Sanjay Gupta
nevermind, I have to set these properties on hive metastore to resolve
issue.I was setting these in HIveServer2

On Wed, Jan 17, 2024 at 11:38 PM Sanjay Gupta  wrote:
>
> Hi,
> I get exactly same issue as described here in Docker container which
> is running Hive Metastore and HS2.
>
> Using Hive version 3.1.3 ( inside docker container)
>
> https://issues.apache.org/jira/browse/HIVE-19740
> It looks like close to above issue
>
> I have also set following in hive-site.xml as per suggestion but still
> I get same issue in log file
>
> 
> hive.metastore.event.db.notification.api.auth
> false
> 
> 
> hadoop.proxyuser.hive.hosts
> *
> 
>
> 
> hadoop.proxyuser.hive.groups
> *
> 
>
>
> 2024-01-18T07:36:07,203  INFO [main] metastore.HiveMetaStoreClient:
> Connected to metastore.
> 2024-01-18T07:36:07,204  INFO [main] server.HiveServer2: Shutting down
> HiveServer2
> 2024-01-18T07:36:07,205  INFO [main] server.HiveServer2:
> Stopping/Disconnecting tez sessions.
> 2024-01-18T07:36:07,205  INFO [main] metastore.HiveMetaStoreClient:
> Closed a connection to metastore, current connections: 2
> 2024-01-18T07:36:07,205  WARN [main] server.HiveServer2: Error
> starting HiveServer2 on attempt 3, will retry in 6ms
> java.lang.RuntimeException: Error initializing notification event poll
> at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:274)
> ~[hive-service-3.1.3.jar:3.1.3]
> at 
> org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1038)
> [hive-service-3.1.3.jar:3.1.3]
> at 
> org.apache.hive.service.server.HiveServer2.access$1600(HiveServer2.java:139)
> [hive-service-3.1.3.jar:3.1.3]
> at 
> org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:1307)
> [hive-service-3.1.3.jar:3.1.3]
> at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:1151)
> [hive-service-3.1.3.jar:3.1.3]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_342]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_342]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_342]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_342]
> at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
> [hadoop-common-3.1.0.jar:?]
> at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
> [hadoop-common-3.1.0.jar:?]
> Caused by: java.io.IOException:
> org.apache.thrift.TApplicationException: Internal error processing
> get_current_notificationEventId
> at 
> org.apache.hadoop.hive.metastore.messaging.EventUtils$MSClientNotificationFetcher.getCurrentNotificationEventId(EventUtils.java:75)
> ~[hive-exec-3.1.3.jar:3.1.3]
> at 
> org.apache.hadoop.hive.ql.metadata.events.NotificationEventPoll.(NotificationEventPoll.java:103)
> ~[hive-exec-3.1.3.jar:3.1.3]
> at 
> org.apache.hadoop.hive.ql.metadata.events.NotificationEventPoll.initialize(NotificationEventPoll.java:59)
> ~[hive-exec-3.1.3.jar:3.1.3]
> at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:272)
> ~[hive-service-3.1.3.jar:3.1.3]
> ... 10 more
> Caused by: org.apache.thrift.TApplicationException: Internal error
> processing get_current_notificationEventId
> at 
> org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
> ~[hive-exec-3.1.3.jar:3.1.3]
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
> ~[hive-exec-3.1.3.jar:3.1.3]
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_current_notificationEventId(ThriftHiveMetastore.java:5575)
> ~[hive-exec-3.1.3.jar:3.1.3]
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_current_notificationEventId(ThriftHiveMetastore.java:5563)
> ~[hive-exec-3.1.3.jar:3.1.3]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getCurrentNotificationEventId(HiveMetaStoreClient.java:2723)
> ~[hive-exec-3.1.3.jar:3.1.3]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_342]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_342]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_342]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_342]
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
> ~[hive-exec-3.1.3.jar:3.1.3]
> at com.sun.proxy.$Proxy25.getCurrentNotificationEventId(Unknown Source) ~[?:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_342]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_342]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_342]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_342]
> at 
> 

Hive on Docker

2024-01-17 Thread Sanjay Gupta
Hi,
I get exactly same issue as described here in Docker container which
is running Hive Metastore and HS2.

Using Hive version 3.1.3 ( inside docker container)

https://issues.apache.org/jira/browse/HIVE-19740
It looks like close to above issue

I have also set following in hive-site.xml as per suggestion but still
I get same issue in log file


hive.metastore.event.db.notification.api.auth
false


hadoop.proxyuser.hive.hosts
*



hadoop.proxyuser.hive.groups
*



2024-01-18T07:36:07,203  INFO [main] metastore.HiveMetaStoreClient:
Connected to metastore.
2024-01-18T07:36:07,204  INFO [main] server.HiveServer2: Shutting down
HiveServer2
2024-01-18T07:36:07,205  INFO [main] server.HiveServer2:
Stopping/Disconnecting tez sessions.
2024-01-18T07:36:07,205  INFO [main] metastore.HiveMetaStoreClient:
Closed a connection to metastore, current connections: 2
2024-01-18T07:36:07,205  WARN [main] server.HiveServer2: Error
starting HiveServer2 on attempt 3, will retry in 6ms
java.lang.RuntimeException: Error initializing notification event poll
at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:274)
~[hive-service-3.1.3.jar:3.1.3]
at 
org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1038)
[hive-service-3.1.3.jar:3.1.3]
at org.apache.hive.service.server.HiveServer2.access$1600(HiveServer2.java:139)
[hive-service-3.1.3.jar:3.1.3]
at 
org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:1307)
[hive-service-3.1.3.jar:3.1.3]
at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:1151)
[hive-service-3.1.3.jar:3.1.3]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_342]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
~[?:1.8.0_342]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_342]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_342]
at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
[hadoop-common-3.1.0.jar:?]
at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
[hadoop-common-3.1.0.jar:?]
Caused by: java.io.IOException:
org.apache.thrift.TApplicationException: Internal error processing
get_current_notificationEventId
at 
org.apache.hadoop.hive.metastore.messaging.EventUtils$MSClientNotificationFetcher.getCurrentNotificationEventId(EventUtils.java:75)
~[hive-exec-3.1.3.jar:3.1.3]
at 
org.apache.hadoop.hive.ql.metadata.events.NotificationEventPoll.(NotificationEventPoll.java:103)
~[hive-exec-3.1.3.jar:3.1.3]
at 
org.apache.hadoop.hive.ql.metadata.events.NotificationEventPoll.initialize(NotificationEventPoll.java:59)
~[hive-exec-3.1.3.jar:3.1.3]
at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:272)
~[hive-service-3.1.3.jar:3.1.3]
... 10 more
Caused by: org.apache.thrift.TApplicationException: Internal error
processing get_current_notificationEventId
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
~[hive-exec-3.1.3.jar:3.1.3]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
~[hive-exec-3.1.3.jar:3.1.3]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_current_notificationEventId(ThriftHiveMetastore.java:5575)
~[hive-exec-3.1.3.jar:3.1.3]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_current_notificationEventId(ThriftHiveMetastore.java:5563)
~[hive-exec-3.1.3.jar:3.1.3]
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getCurrentNotificationEventId(HiveMetaStoreClient.java:2723)
~[hive-exec-3.1.3.jar:3.1.3]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_342]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
~[?:1.8.0_342]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_342]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_342]
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
~[hive-exec-3.1.3.jar:3.1.3]
at com.sun.proxy.$Proxy25.getCurrentNotificationEventId(Unknown Source) ~[?:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_342]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
~[?:1.8.0_342]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_342]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_342]
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2773)
~[hive-exec-3.1.3.jar:3.1.3]
at com.sun.proxy.$Proxy25.getCurrentNotificationEventId(Unknown Source) ~[?:?]
at 
org.apache.hadoop.hive.metastore.messaging.EventUtils$MSClientNotificationFetcher.getCurrentNotificationEventId(EventUtils.java:73)
~[hive-exec-3.1.3.jar:3.1.3]
at 

Re: Contributing doc

2024-01-14 Thread Stamatis Zampetakis
Hi Henri,

I gave you the necessary permissions to the wiki. Please check and if
you encounter any issues let us know.

Best,
Stamatus

On Fri, Jan 12, 2024 at 10:56 AM Henri Biestro  wrote:
>
>
> My Apache Id is hen...@apache.org.
> Cheers
>
> On 2024/01/12 09:52:20 Henri Biestro wrote:
> > Hello;
> > I'd like to contribute some documentation on Hive 4 - (
> > https://issues.apache.org/jira/browse/HIVE-27186  for instance)
> > May I get write access to the Wiki ( ie
> > https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+4.0.0 ) ?
> > Thanks
> > Henri
> >


Re: Contributing doc

2024-01-12 Thread Henri Biestro


My Apache Id is hen...@apache.org.
Cheers

On 2024/01/12 09:52:20 Henri Biestro wrote:
> Hello;
> I'd like to contribute some documentation on Hive 4 - (
> https://issues.apache.org/jira/browse/HIVE-27186  for instance)
> May I get write access to the Wiki ( ie
> https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+4.0.0 ) ?
> Thanks
> Henri
> 


Contributing doc

2024-01-12 Thread Henri Biestro
Hello;
I'd like to contribute some documentation on Hive 4 - (
https://issues.apache.org/jira/browse/HIVE-27186  for instance)
May I get write access to the Wiki ( ie
https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+4.0.0 ) ?
Thanks
Henri


Re: "org.apache.thrift.transport.TTransportException: Invalid status -128" errors when SASL is enabled

2024-01-11 Thread Austin Hackett
For the benefit of anyone who comes across this error in future, it was solved 
by adding hive.metastore.sasl.enabled and hive.metastore.kerberos.principal to 
hive-site.xml on the client side, e.g. $SPARK_HOME/conf


> On 8 Jan 2024, at 16:18, Austin Hackett  wrote:
> 
> Hi List
>  
> I'm having an issue where Hive Metastore operations (e.g. show databases) are 
> failing with "org.apache.thrift.transport.TTransportException: Invalid status 
> -128" errors when I enable SASL.
>  
> I am a bit stuck on how to go about troubleshooting this further, and any 
> pointers would be greatly apprecicated...
>  
> Full details as follows:
>  
> - Ubuntu 22.04 & OpenJDK 8u342
> - Unpacked Hive 3.1.3 binary release 
> (https://dlcdn.apache.org/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz) to 
> /opt/hive
> - Unpacked Hadoop 3.1.0 binary release 
> (https://archive.apache.org/dist/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz)
>  to /opt/hadoop
> - Created /opt/hive/conf/metastore-site.xml (see below for contents) and 
> copied hdfs-site.xml and core-site.xml from the target HDFS cluster to 
> /opt/hive/conf
> - export HADOOP_HOME=/opt/hadoop
> - export HIVE_HOME=/opt/hive
> - Successfully started the metastore, i.e. hive --service metastore
> - Use a Hive Metastore client to "show databases" and get an error (see below 
> for the associated errors in the HMS log). I get the same error with 
> spark-shell running in local mode and the Python hive-metastore-client 
> (https://pypi.org/project/hive-metastore-client/)
>  
>  
> metastore-site.xml
> ==
> 
>   
> metastore.warehouse.dir
> /user/hive/warehouse
>   
>   
> javax.jdo.option.ConnectionDriverName
> org.postgresql.Driver
>   
>   
> javax.jdo.option.ConnectionURL
> jdbc:postgresql://postgres.example.net:5432/metastore_db 
> 
>   
>   
> javax.jdo.option.ConnectionUserName
> hive
>   
>   
> javax.jdo.option.ConnectionPassword
> password
>   
>   
> metastore.kerberos.principal
> hive/_h...@example.net >
>   
>   
> metastore.kerberos.keytab.file
> /etc/security/keytabs/hive.keytab
>   
>   
> hive.metastore.sasl.enabled
> true
>   
> 
> ==
>  
> HMS log shows that it is able to authenticate using the specified keytab and 
> principle (and I have also checked this manually via kinit command):
>  
> 
> 2024-01-08T13:12:33,463  WARN [main] security.HadoopThriftAuthBridge: 
> Client-facing principal not set. Using server-side setting: 
> hive/_h...@example.net 
> 2024-01-08T13:12:33,464  INFO [main] security.HadoopThriftAuthBridge: Logging 
> in via CLIENT based principal
> 2024-01-08T13:12:33,471 DEBUG [main] security.UserGroupInformation: Hadoop 
> login
> 2024-01-08T13:12:33,472 DEBUG [main] security.UserGroupInformation: hadoop 
> login commit
> 2024-01-08T13:12:33,472 DEBUG [main] security.UserGroupInformation: Using 
> kerberos user: hive/metstore.example@example.net 
> 
> 2024-01-08T13:12:33,472 DEBUG [main] security.UserGroupInformation: Using 
> user: "hive/metstore.example@example.net 
> " with name: 
> hive/metstore.example@example.net 
> 
> 2024-01-08T13:12:33,472 DEBUG [main] security.UserGroupInformation: User 
> entry: "hive/metstore.example@example.net 
> "
> 2024-01-08T13:12:33,472  INFO [main] security.UserGroupInformation: Login 
> successful for user hive/metstore.example@example.net 
>  using keytab file hive.keytab. 
> Keytab auto renewal enabled : false
> 2024-01-08T13:12:33,472  INFO [main] security.HadoopThriftAuthBridge: Logging 
> in via SERVER based principal
> 2024-01-08T13:12:33,480 DEBUG [main] security.UserGroupInformation: Hadoop 
> login
> 2024-01-08T13:12:33,480 DEBUG [main] security.UserGroupInformation: hadoop 
> login commit
> 2024-01-08T13:12:33,480 DEBUG [main] security.UserGroupInformation: Using 
> kerberos user: hive/metstore.example@example.net 
> 
> 2024-01-08T13:12:33,480 DEBUG [main] security.UserGroupInformation: Using 
> user: "hive/metstore.example@example.net 
> " with name: 
> hive/metstore.example@example.net 
> 
> 2024-01-08T13:12:33,480 DEBUG [main] security.UserGroupInformation: User 
> entry: "hive/metstore.example@example.net 
> "
> 2024-01-08T13:12:33,480  INFO [main] security.UserGroupInformation: Login 
> successful for user hive/metstore.example@example.net 
>  using keytab file hive.keytab. 
> Keytab auto renewal enabled : false
> 
>  
> 

Re: Docker Hive using tez without hdfs

2024-01-11 Thread Sanjay Gupta
Thanks Attila & Ayush,
I don't have permission to open Jira ticket yet but I have initiated process.
I have tried with Tez 9.1 and also version 10.2 and same issue.
I have noticed that when I change default hive.execution.engine=mr in
hive-site.xml ( restart hive service ) and after that start  hive cli
and then do set hive.execution.engine=tez on command line and run
query, it doesn't give error.
However when default engine is set to tez in hive-site.xml, hive cli
exits out with error
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> >> org/apache/tez/dag/api/TezConfiguration

Thanks

On Wed, Jan 10, 2024 at 5:40 AM Attila Turoczy  wrote:
>
> Agree with Ayush.
>
> Back to the original issue, is it not related to the latest Tez fix? As I 
> remember there was an incompatibility issue, which the next tez release will 
> fix. Maybe this is related to that. Sanjay could you please create a JIRA 
> around it for the tracking, and the community or someone from the community 
> will check. (I know most of you don't like the jira but this could help the 
> tracking then a mail thread)
>
> -Attila
>
> On Wed, Jan 10, 2024 at 8:45 AM Ayush Saxena  wrote:
>>
>> Hive on MR3 isn’t an official Apache Hive thing, not even an Apache OS 
>> thing, so, it is a vendor product just being tried to advertised in the 
>> ‘Apache’ Hive space
>>
>> So, it can be all mess, filled with security issues or bugs & we Apache Hive 
>> for the record aren’t responsible for that neither do we endorse usage of 
>> that or anything outside the scope of Apache
>>
>> -Ayush
>>
>> On 10-Jan-2024, at 1:09 PM, Sungwoo Park  wrote:
>>
>> 
>> As far as I know, Hive-Tez supports local mode, but does not standalone mode 
>> (like Spark). Hive-MR3 supports standalone mode, so you can run it in any 
>> type of cluster.
>>
>> --- Sungwoo
>>
>> On Wed, Jan 10, 2024 at 4:22 PM Sanjay Gupta  wrote:
>>>
>>> I can run hive with mr engine in local mode. Does Hive + Tez also
>>> works in standalone mode ?
>>>
>>> On Tue, Jan 9, 2024 at 11:08 PM Sungwoo Park  wrote:
>>> >
>>> > Hello,
>>> >
>>> > I don't have an answer to your problem, but if your goal is to quickly 
>>> > test Hive 3 using Docker, there is an alternative way which uses Hive on 
>>> > MR3.
>>> >
>>> > https://mr3docs.datamonad.com/docs/quick/docker/
>>> >
>>> > You can also run Hive on MR3 on Kubernetes.
>>> >
>>> > Thanks,
>>> >
>>> > --- Sungwoo
>>> >
>>> >
>>> >
>>> > On Wed, Jan 10, 2024 at 3:25 PM Sanjay Gupta  wrote:
>>> >>
>>> >> Hi,
>>> >> Using following docker container to run meta , hiveserver2
>>> >>
>>> >> https://hub.docker.com/r/apache/hive
>>> >> https://github.com/apache/hive/blob/master/packaging/src/docker/
>>> >>
>>> >> I have configured hive-site.xml to se S3
>>> >> When I set in hive.execution.engine to mr hive-site.xml, hive is
>>> >> running fine and I can perform queries but setting to tez fails with
>>> >> error.
>>> >> There is no hdfs but it is running in local mode.
>>> >>
>>> >> 
>>> >> hive.execution.engine
>>> >> tez
>>> >> 
>>> >>
>>> >> Any idea how to fix this issue ?
>>> >>
>>> >> hive
>>> >> SLF4J: Actual binding is of type 
>>> >> [org.apache.logging.slf4j.Log4jLoggerFactory]
>>> >> Hive Session ID = 03368207-1904-4c4c-b63e-b29dd28e0a71
>>> >>
>>> >> Logging initialized using configuration in
>>> >> jar:file:/opt/hive/lib/hive-common-3.1.3.jar!/hive-log4j2.properties
>>> >> Async: true
>>> >> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> >> org/apache/tez/dag/api/TezConfiguration
>>> >> at 
>>> >> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:661)
>>> >> at 
>>> >> org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:591)
>>> >> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747)
>>> >> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
>>> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> >> at 
>>> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> >> at 
>>> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> >> at java.lang.reflect.Method.invoke(Method.java:498)
>>> >> at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
>>> >> at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
>>> >> Caused by: java.lang.ClassNotFoundException:
>>> >> org.apache.tez.dag.api.TezConfiguration
>>> >> at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>>> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>>> >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>>> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>>> >>
>>> >>
>>> >> --
>>> >>
>>> >> Thanks
>>> >> Sanjay Gupta
>>>
>>>
>>>
>>> --
>>>
>>> Thanks
>>> Sanjay Gupta



-- 

Thanks
Sanjay Gupta


Re: Docker Hive using tez without hdfs

2024-01-10 Thread Attila Turoczy
Agree with Ayush.

Back to the original issue, is it not related to the latest Tez fix? As I
remember there was an incompatibility issue, which the next tez release
will fix. Maybe this is related to that. Sanjay could you please create a
JIRA around it for the tracking, and the community or someone from the
community will check. (I know most of you don't like the jira but this
could help the tracking then a mail thread)

-Attila

On Wed, Jan 10, 2024 at 8:45 AM Ayush Saxena  wrote:

> Hive on MR3 isn’t an official Apache Hive thing, not even an Apache OS
> thing, so, it is a vendor product just being tried to advertised in the
> ‘Apache’ Hive space
>
> So, it can be all mess, filled with security issues or bugs & we Apache
> Hive for the record aren’t responsible for that neither do we endorse usage
> of that or anything outside the scope of Apache
>
> -Ayush
>
> On 10-Jan-2024, at 1:09 PM, Sungwoo Park  wrote:
>
> 
> As far as I know, Hive-Tez supports local mode, but does not standalone
> mode (like Spark). Hive-MR3 supports standalone mode, so you can run it in
> any type of cluster.
>
> --- Sungwoo
>
> On Wed, Jan 10, 2024 at 4:22 PM Sanjay Gupta  wrote:
>
>> I can run hive with mr engine in local mode. Does Hive + Tez also
>> works in standalone mode ?
>>
>> On Tue, Jan 9, 2024 at 11:08 PM Sungwoo Park  wrote:
>> >
>> > Hello,
>> >
>> > I don't have an answer to your problem, but if your goal is to quickly
>> test Hive 3 using Docker, there is an alternative way which uses Hive on
>> MR3.
>> >
>> > https://mr3docs.datamonad.com/docs/quick/docker/
>> >
>> > You can also run Hive on MR3 on Kubernetes.
>> >
>> > Thanks,
>> >
>> > --- Sungwoo
>> >
>> >
>> >
>> > On Wed, Jan 10, 2024 at 3:25 PM Sanjay Gupta 
>> wrote:
>> >>
>> >> Hi,
>> >> Using following docker container to run meta , hiveserver2
>> >>
>> >> https://hub.docker.com/r/apache/hive
>> >> https://github.com/apache/hive/blob/master/packaging/src/docker/
>> >>
>> >> I have configured hive-site.xml to se S3
>> >> When I set in hive.execution.engine to mr hive-site.xml, hive is
>> >> running fine and I can perform queries but setting to tez fails with
>> >> error.
>> >> There is no hdfs but it is running in local mode.
>> >>
>> >> 
>> >> hive.execution.engine
>> >> tez
>> >> 
>> >>
>> >> Any idea how to fix this issue ?
>> >>
>> >> hive
>> >> SLF4J: Actual binding is of type
>> [org.apache.logging.slf4j.Log4jLoggerFactory]
>> >> Hive Session ID = 03368207-1904-4c4c-b63e-b29dd28e0a71
>> >>
>> >> Logging initialized using configuration in
>> >> jar:file:/opt/hive/lib/hive-common-3.1.3.jar!/hive-log4j2.properties
>> >> Async: true
>> >> Exception in thread "main" java.lang.NoClassDefFoundError:
>> >> org/apache/tez/dag/api/TezConfiguration
>> >> at
>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:661)
>> >> at
>> org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:591)
>> >> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747)
>> >> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
>> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> >> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >> at java.lang.reflect.Method.invoke(Method.java:498)
>> >> at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
>> >> at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
>> >> Caused by: java.lang.ClassNotFoundException:
>> >> org.apache.tez.dag.api.TezConfiguration
>> >> at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>> >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>> >>
>> >>
>> >> --
>> >>
>> >> Thanks
>> >> Sanjay Gupta
>>
>>
>>
>> --
>>
>> Thanks
>> Sanjay Gupta
>>
>


Re: Docker Hive using tez without hdfs

2024-01-10 Thread Zoltán Rátkai
Hi,

I am not sure if the official built docker contains TEZ, but if you try to
build it by yourself, you can have a look at here:
https://github.com/apache/hive/blob/master/packaging/src/docker/Dockerfile#L41

To use TEZ you need to place it inside the docker container and configure
it:
Download from here:
https://archive.apache.org/dist/tez/0.10.2/apache-tez-0.10.2-bin.tar.gz

and configure it with:
export TEZ_HOME="/tez"
export TEZ_CONF_DIR="/hive/conf/"
export HADOOP_CLASSPATH="$TEZ_HOME/*:$TEZ_HOME/lib/*:$HADOOP_CLASSPATH"

The above Dockerfile seems to do all that job, so try to build docker image
with with the official Dockerfile, it should work.

Regards,

Zoltan Ratkai

On Wed, Jan 10, 2024 at 8:45 AM Ayush Saxena  wrote:

> Hive on MR3 isn’t an official Apache Hive thing, not even an Apache OS
> thing, so, it is a vendor product just being tried to advertised in the
> ‘Apache’ Hive space
>
> So, it can be all mess, filled with security issues or bugs & we Apache
> Hive for the record aren’t responsible for that neither do we endorse usage
> of that or anything outside the scope of Apache
>
> -Ayush
>
> On 10-Jan-2024, at 1:09 PM, Sungwoo Park  wrote:
>
> 
> As far as I know, Hive-Tez supports local mode, but does not standalone
> mode (like Spark). Hive-MR3 supports standalone mode, so you can run it in
> any type of cluster.
>
> --- Sungwoo
>
> On Wed, Jan 10, 2024 at 4:22 PM Sanjay Gupta  wrote:
>
>> I can run hive with mr engine in local mode. Does Hive + Tez also
>> works in standalone mode ?
>>
>> On Tue, Jan 9, 2024 at 11:08 PM Sungwoo Park  wrote:
>> >
>> > Hello,
>> >
>> > I don't have an answer to your problem, but if your goal is to quickly
>> test Hive 3 using Docker, there is an alternative way which uses Hive on
>> MR3.
>> >
>> > https://mr3docs.datamonad.com/docs/quick/docker/
>> >
>> > You can also run Hive on MR3 on Kubernetes.
>> >
>> > Thanks,
>> >
>> > --- Sungwoo
>> >
>> >
>> >
>> > On Wed, Jan 10, 2024 at 3:25 PM Sanjay Gupta 
>> wrote:
>> >>
>> >> Hi,
>> >> Using following docker container to run meta , hiveserver2
>> >>
>> >> https://hub.docker.com/r/apache/hive
>> >> https://github.com/apache/hive/blob/master/packaging/src/docker/
>> >>
>> >> I have configured hive-site.xml to se S3
>> >> When I set in hive.execution.engine to mr hive-site.xml, hive is
>> >> running fine and I can perform queries but setting to tez fails with
>> >> error.
>> >> There is no hdfs but it is running in local mode.
>> >>
>> >> 
>> >> hive.execution.engine
>> >> tez
>> >> 
>> >>
>> >> Any idea how to fix this issue ?
>> >>
>> >> hive
>> >> SLF4J: Actual binding is of type
>> [org.apache.logging.slf4j.Log4jLoggerFactory]
>> >> Hive Session ID = 03368207-1904-4c4c-b63e-b29dd28e0a71
>> >>
>> >> Logging initialized using configuration in
>> >> jar:file:/opt/hive/lib/hive-common-3.1.3.jar!/hive-log4j2.properties
>> >> Async: true
>> >> Exception in thread "main" java.lang.NoClassDefFoundError:
>> >> org/apache/tez/dag/api/TezConfiguration
>> >> at
>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:661)
>> >> at
>> org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:591)
>> >> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747)
>> >> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
>> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> >> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >> at java.lang.reflect.Method.invoke(Method.java:498)
>> >> at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
>> >> at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
>> >> Caused by: java.lang.ClassNotFoundException:
>> >> org.apache.tez.dag.api.TezConfiguration
>> >> at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>> >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>> >>
>> >>
>> >> --
>> >>
>> >> Thanks
>> >> Sanjay Gupta
>>
>>
>>
>> --
>>
>> Thanks
>> Sanjay Gupta
>>
>


Re: Docker Hive using tez without hdfs

2024-01-09 Thread Ayush Saxena
Hive on MR3 isn’t an official Apache Hive thing, not even an Apache OS thing, so, it is a vendor product just being tried to advertised in the ‘Apache’ Hive spaceSo, it can be all mess, filled with security issues or bugs & we Apache Hive for the record aren’t responsible for that neither do we endorse usage of that or anything outside the scope of Apache-AyushOn 10-Jan-2024, at 1:09 PM, Sungwoo Park  wrote:As far as I know, Hive-Tez supports local mode, but does not standalone mode (like Spark). Hive-MR3 supports standalone mode, so you can run it in any type of cluster.--- SungwooOn Wed, Jan 10, 2024 at 4:22 PM Sanjay Gupta  wrote:I can run hive with mr engine in local mode. Does Hive + Tez also
works in standalone mode ?

On Tue, Jan 9, 2024 at 11:08 PM Sungwoo Park  wrote:
>
> Hello,
>
> I don't have an answer to your problem, but if your goal is to quickly test Hive 3 using Docker, there is an alternative way which uses Hive on MR3.
>
> https://mr3docs.datamonad.com/docs/quick/docker/
>
> You can also run Hive on MR3 on Kubernetes.
>
> Thanks,
>
> --- Sungwoo
>
>
>
> On Wed, Jan 10, 2024 at 3:25 PM Sanjay Gupta  wrote:
>>
>> Hi,
>> Using following docker container to run meta , hiveserver2
>>
>> https://hub.docker.com/r/apache/hive
>> https://github.com/apache/hive/blob/master/packaging/src/docker/
>>
>> I have configured hive-site.xml to se S3
>> When I set in hive.execution.engine to mr hive-site.xml, hive is
>> running fine and I can perform queries but setting to tez fails with
>> error.
>> There is no hdfs but it is running in local mode.
>>
>>     
>>         hive.execution.engine
>>         tez
>>     
>>
>> Any idea how to fix this issue ?
>>
>> hive
>> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
>> Hive Session ID = 03368207-1904-4c4c-b63e-b29dd28e0a71
>>
>> Logging initialized using configuration in
>> jar:file:/opt/hive/lib/hive-common-3.1.3.jar!/hive-log4j2.properties
>> Async: true
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/tez/dag/api/TezConfiguration
>> at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:661)
>> at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:591)
>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747)
>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.tez.dag.api.TezConfiguration
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>>
>>
>> --
>>
>> Thanks
>> Sanjay Gupta



-- 

Thanks
Sanjay Gupta



Re: Docker Hive using tez without hdfs

2024-01-09 Thread Sungwoo Park
As far as I know, Hive-Tez supports local mode, but does not standalone
mode (like Spark). Hive-MR3 supports standalone mode, so you can run it in
any type of cluster.

--- Sungwoo

On Wed, Jan 10, 2024 at 4:22 PM Sanjay Gupta  wrote:

> I can run hive with mr engine in local mode. Does Hive + Tez also
> works in standalone mode ?
>
> On Tue, Jan 9, 2024 at 11:08 PM Sungwoo Park  wrote:
> >
> > Hello,
> >
> > I don't have an answer to your problem, but if your goal is to quickly
> test Hive 3 using Docker, there is an alternative way which uses Hive on
> MR3.
> >
> > https://mr3docs.datamonad.com/docs/quick/docker/
> >
> > You can also run Hive on MR3 on Kubernetes.
> >
> > Thanks,
> >
> > --- Sungwoo
> >
> >
> >
> > On Wed, Jan 10, 2024 at 3:25 PM Sanjay Gupta  wrote:
> >>
> >> Hi,
> >> Using following docker container to run meta , hiveserver2
> >>
> >> https://hub.docker.com/r/apache/hive
> >> https://github.com/apache/hive/blob/master/packaging/src/docker/
> >>
> >> I have configured hive-site.xml to se S3
> >> When I set in hive.execution.engine to mr hive-site.xml, hive is
> >> running fine and I can perform queries but setting to tez fails with
> >> error.
> >> There is no hdfs but it is running in local mode.
> >>
> >> 
> >> hive.execution.engine
> >> tez
> >> 
> >>
> >> Any idea how to fix this issue ?
> >>
> >> hive
> >> SLF4J: Actual binding is of type
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> >> Hive Session ID = 03368207-1904-4c4c-b63e-b29dd28e0a71
> >>
> >> Logging initialized using configuration in
> >> jar:file:/opt/hive/lib/hive-common-3.1.3.jar!/hive-log4j2.properties
> >> Async: true
> >> Exception in thread "main" java.lang.NoClassDefFoundError:
> >> org/apache/tez/dag/api/TezConfiguration
> >> at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:661)
> >> at
> org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:591)
> >> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747)
> >> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >> at java.lang.reflect.Method.invoke(Method.java:498)
> >> at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
> >> at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
> >> Caused by: java.lang.ClassNotFoundException:
> >> org.apache.tez.dag.api.TezConfiguration
> >> at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> >>
> >>
> >> --
> >>
> >> Thanks
> >> Sanjay Gupta
>
>
>
> --
>
> Thanks
> Sanjay Gupta
>


Re: Docker Hive using tez without hdfs

2024-01-09 Thread Sanjay Gupta
I can run hive with mr engine in local mode. Does Hive + Tez also
works in standalone mode ?

On Tue, Jan 9, 2024 at 11:08 PM Sungwoo Park  wrote:
>
> Hello,
>
> I don't have an answer to your problem, but if your goal is to quickly test 
> Hive 3 using Docker, there is an alternative way which uses Hive on MR3.
>
> https://mr3docs.datamonad.com/docs/quick/docker/
>
> You can also run Hive on MR3 on Kubernetes.
>
> Thanks,
>
> --- Sungwoo
>
>
>
> On Wed, Jan 10, 2024 at 3:25 PM Sanjay Gupta  wrote:
>>
>> Hi,
>> Using following docker container to run meta , hiveserver2
>>
>> https://hub.docker.com/r/apache/hive
>> https://github.com/apache/hive/blob/master/packaging/src/docker/
>>
>> I have configured hive-site.xml to se S3
>> When I set in hive.execution.engine to mr hive-site.xml, hive is
>> running fine and I can perform queries but setting to tez fails with
>> error.
>> There is no hdfs but it is running in local mode.
>>
>> 
>> hive.execution.engine
>> tez
>> 
>>
>> Any idea how to fix this issue ?
>>
>> hive
>> SLF4J: Actual binding is of type 
>> [org.apache.logging.slf4j.Log4jLoggerFactory]
>> Hive Session ID = 03368207-1904-4c4c-b63e-b29dd28e0a71
>>
>> Logging initialized using configuration in
>> jar:file:/opt/hive/lib/hive-common-3.1.3.jar!/hive-log4j2.properties
>> Async: true
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/tez/dag/api/TezConfiguration
>> at 
>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:661)
>> at 
>> org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:591)
>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747)
>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.tez.dag.api.TezConfiguration
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>>
>>
>> --
>>
>> Thanks
>> Sanjay Gupta



-- 

Thanks
Sanjay Gupta


Re: Docker Hive using tez without hdfs

2024-01-09 Thread Sungwoo Park
Hello,

I don't have an answer to your problem, but if your goal is to quickly test
Hive 3 using Docker, there is an alternative way which uses Hive on MR3.

https://mr3docs.datamonad.com/docs/quick/docker/

You can also run Hive on MR3 on Kubernetes.

Thanks,

--- Sungwoo



On Wed, Jan 10, 2024 at 3:25 PM Sanjay Gupta  wrote:

> Hi,
> Using following docker container to run meta , hiveserver2
>
> https://hub.docker.com/r/apache/hive
> https://github.com/apache/hive/blob/master/packaging/src/docker/
>
> I have configured hive-site.xml to se S3
> When I set in hive.execution.engine to mr hive-site.xml, hive is
> running fine and I can perform queries but setting to tez fails with
> error.
> There is no hdfs but it is running in local mode.
>
> 
> hive.execution.engine
> tez
> 
>
> Any idea how to fix this issue ?
>
> hive
> SLF4J: Actual binding is of type
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> Hive Session ID = 03368207-1904-4c4c-b63e-b29dd28e0a71
>
> Logging initialized using configuration in
> jar:file:/opt/hive/lib/hive-common-3.1.3.jar!/hive-log4j2.properties
> Async: true
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/tez/dag/api/TezConfiguration
> at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:661)
> at
> org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:591)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.tez.dag.api.TezConfiguration
> at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>
>
> --
>
> Thanks
> Sanjay Gupta
>


Docker Hive using tez without hdfs

2024-01-09 Thread Sanjay Gupta
Hi,
Using following docker container to run meta , hiveserver2

https://hub.docker.com/r/apache/hive
https://github.com/apache/hive/blob/master/packaging/src/docker/

I have configured hive-site.xml to se S3
When I set in hive.execution.engine to mr hive-site.xml, hive is
running fine and I can perform queries but setting to tez fails with
error.
There is no hdfs but it is running in local mode.


hive.execution.engine
tez


Any idea how to fix this issue ?

hive
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 03368207-1904-4c4c-b63e-b29dd28e0a71

Logging initialized using configuration in
jar:file:/opt/hive/lib/hive-common-3.1.3.jar!/hive-log4j2.properties
Async: true
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/tez/dag/api/TezConfiguration
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:661)
at 
org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:591)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
Caused by: java.lang.ClassNotFoundException:
org.apache.tez.dag.api.TezConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)


-- 

Thanks
Sanjay Gupta


"org.apache.thrift.transport.TTransportException: Invalid status -128" errors when SASL is enabled

2024-01-08 Thread Austin Hackett
Hi List
 
I'm having an issue where Hive Metastore operations (e.g. show databases) are 
failing with "org.apache.thrift.transport.TTransportException: Invalid status 
-128" errors when I enable SASL.
 
I am a bit stuck on how to go about troubleshooting this further, and any 
pointers would be greatly apprecicated...
 
Full details as follows:
 
- Ubuntu 22.04 & OpenJDK 8u342
- Unpacked Hive 3.1.3 binary release 
(https://dlcdn.apache.org/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz) to 
/opt/hive
- Unpacked Hadoop 3.1.0 binary release 
(https://archive.apache.org/dist/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz)
 to /opt/hadoop
- Created /opt/hive/conf/metastore-site.xml (see below for contents) and copied 
hdfs-site.xml and core-site.xml from the target HDFS cluster to /opt/hive/conf
- export HADOOP_HOME=/opt/hadoop
- export HIVE_HOME=/opt/hive
- Successfully started the metastore, i.e. hive --service metastore
- Use a Hive Metastore client to "show databases" and get an error (see below 
for the associated errors in the HMS log). I get the same error with 
spark-shell running in local mode and the Python hive-metastore-client 
(https://pypi.org/project/hive-metastore-client/)
 
 
metastore-site.xml
==

  
metastore.warehouse.dir
/user/hive/warehouse
  
  
javax.jdo.option.ConnectionDriverName
org.postgresql.Driver
  
  
javax.jdo.option.ConnectionURL
jdbc:postgresql://postgres.example.net:5432/metastore_db 

  
  
javax.jdo.option.ConnectionUserName
hive
  
  
javax.jdo.option.ConnectionPassword
password
  
  
metastore.kerberos.principal
hive/_h...@example.netmailto:hive/_h...@example.net%3c/value>>
  
  
metastore.kerberos.keytab.file
/etc/security/keytabs/hive.keytab
  
  
hive.metastore.sasl.enabled
true
  

==
 
HMS log shows that it is able to authenticate using the specified keytab and 
principle (and I have also checked this manually via kinit command):
 

2024-01-08T13:12:33,463  WARN [main] security.HadoopThriftAuthBridge: 
Client-facing principal not set. Using server-side setting: 
hive/_h...@example.net 
2024-01-08T13:12:33,464  INFO [main] security.HadoopThriftAuthBridge: Logging 
in via CLIENT based principal
2024-01-08T13:12:33,471 DEBUG [main] security.UserGroupInformation: Hadoop login
2024-01-08T13:12:33,472 DEBUG [main] security.UserGroupInformation: hadoop 
login commit
2024-01-08T13:12:33,472 DEBUG [main] security.UserGroupInformation: Using 
kerberos user: hive/metstore.example@example.net 

2024-01-08T13:12:33,472 DEBUG [main] security.UserGroupInformation: Using user: 
"hive/metstore.example@example.net 
" with name: 
hive/metstore.example@example.net 

2024-01-08T13:12:33,472 DEBUG [main] security.UserGroupInformation: User entry: 
"hive/metstore.example@example.net 
"
2024-01-08T13:12:33,472  INFO [main] security.UserGroupInformation: Login 
successful for user hive/metstore.example@example.net 
 using keytab file hive.keytab. 
Keytab auto renewal enabled : false
2024-01-08T13:12:33,472  INFO [main] security.HadoopThriftAuthBridge: Logging 
in via SERVER based principal
2024-01-08T13:12:33,480 DEBUG [main] security.UserGroupInformation: Hadoop login
2024-01-08T13:12:33,480 DEBUG [main] security.UserGroupInformation: hadoop 
login commit
2024-01-08T13:12:33,480 DEBUG [main] security.UserGroupInformation: Using 
kerberos user: hive/metstore.example@example.net 

2024-01-08T13:12:33,480 DEBUG [main] security.UserGroupInformation: Using user: 
"hive/metstore.example@example.net 
" with name: 
hive/metstore.example@example.net 

2024-01-08T13:12:33,480 DEBUG [main] security.UserGroupInformation: User entry: 
"hive/metstore.example@example.net 
"
2024-01-08T13:12:33,480  INFO [main] security.UserGroupInformation: Login 
successful for user hive/metstore.example@example.net 
 using keytab file hive.keytab. 
Keytab auto renewal enabled : false

 
However, when i attempt to "show databases":
 

2024-01-08T13:59:08,068 DEBUG [pool-6-thread-1] security.UserGroupInformation: 
PrivilegedAction [as: hive/metstore.example@example.net 
 
(auth:KERBEROS)][action:org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1@1e655c9
 
]
java.lang.Exception: 

MR3 1.9 and performance evaluation of Trino 435 and Hive-MR3 1.9 using TPC-DS

2024-01-08 Thread Sungwoo Park
Hello Hive users,

MR3 1.9 has been released. For changes, please see the release notes:

https://mr3docs.datamonad.com/docs/release/
https://mr3docs.datamonad.com/docs/release/#patches-backported-in-mr3-19

We evaluated the performance of Trino 435 and Hive on MR3 1.9 using the
TPC-DS benchmark. Please see the blog article:

https://www.datamonad.com/post/2024-01-07-trino-hive-performance-1.9/

Thanks,

--- Sungwoo


Re: [DISCUSS] Tez 0.10.3 Release Planning

2024-01-04 Thread Ayush Saxena
Thanx Laszlo,
I faced some issue here [1], maybe if it is not just me, maybe we can
either drop this ticket or maybe reduce the log level to debug

-Ayush

[1] 
https://issues.apache.org/jira/browse/TEZ-4039?focusedCommentId=17800336=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17800336

On Thu, 4 Jan 2024 at 21:10, László Bodor  wrote:
>
> Thanks for the feedback so far, I believe it's time to make the release.
> Please let me know about blockers if any, otherwise, I'm happy to volunteer 
> to start making the release next week.
>
> (Just added user@tez now, as I added user@hive originally, 
> accidentally...that was a feature in this context, not a bug.)
>
> Butao Zhang  ezt írta (időpont: 2024. jan. 2., K, 7:13):
>>
>> +1  (non-binding) Thanks Laszlo !
>>  Replied Message 
>> | From | Attila Turoczy |
>> | Date | 1/2/2024 00:06 |
>> | To |  |
>> | Cc |  |
>> | Subject | Re: [DISCUSS] Tez 0.10.3 Release Planning |
>> +1  (non-binding)
>> Thank you for the effort and happy new year!
>> -Attila
>>
>> On Mon, 1 Jan 2024 at 14:22, Ayush Saxena  wrote:
>>
>> +1 (non-binding),
>> Thanx Laszlo for starting the thread.
>>
>> -Ayush
>>
>> On Mon, 1 Jan 2024 at 18:30, László Bodor 
>> wrote:
>>
>> Hi Everyone!
>>
>> Happy New Year!
>>
>> I think it's time to create a new Tez release.
>> It's not a secret: Hive 4.0 GA would benefit from the latest bug fixes
>> since 0.10.2, which are:
>> https://issues.apache.org/jira/issues?jql=project%20%3D%20TEZ%20AND%20resolution%20!%3D%20null%20and%20fixVersion%20in%20(0.10.3)
>> .
>>
>> Please let me know your opinions.
>>
>> Regards,
>> Laszlo Bodor
>> Tez PMC Chair
>>
>>


Re: [DISCUSS] Tez 0.10.3 Release Planning

2024-01-04 Thread László Bodor
Thanks for the feedback so far, I believe it's time to make the release.
Please let me know about blockers if any, otherwise, I'm happy to volunteer
to start making the release next week.

(Just added user@tez now, as I added user@hive originally,
accidentally...that was a feature in this context, not a bug.)

Butao Zhang  ezt írta (időpont: 2024. jan. 2., K,
7:13):

> +1  (non-binding) Thanks Laszlo !
>  Replied Message 
> | From | Attila Turoczy |
> | Date | 1/2/2024 00:06 |
> | To |  |
> | Cc |  |
> | Subject | Re: [DISCUSS] Tez 0.10.3 Release Planning |
> +1  (non-binding)
> Thank you for the effort and happy new year!
> -Attila
>
> On Mon, 1 Jan 2024 at 14:22, Ayush Saxena  wrote:
>
> +1 (non-binding),
> Thanx Laszlo for starting the thread.
>
> -Ayush
>
> On Mon, 1 Jan 2024 at 18:30, László Bodor 
> wrote:
>
> Hi Everyone!
>
> Happy New Year!
>
> I think it's time to create a new Tez release.
> It's not a secret: Hive 4.0 GA would benefit from the latest bug fixes
> since 0.10.2, which are:
>
> https://issues.apache.org/jira/issues?jql=project%20%3D%20TEZ%20AND%20resolution%20!%3D%20null%20and%20fixVersion%20in%20(0.10.3)
> .
>
> Please let me know your opinions.
>
> Regards,
> Laszlo Bodor
> Tez PMC Chair
>
>
>


Re: [DISCUSS] Tez 0.10.3 Release Planning

2024-01-01 Thread Butao Zhang
+1  (non-binding) Thanks Laszlo !
 Replied Message 
| From | Attila Turoczy |
| Date | 1/2/2024 00:06 |
| To |  |
| Cc |  |
| Subject | Re: [DISCUSS] Tez 0.10.3 Release Planning |
+1  (non-binding)
Thank you for the effort and happy new year!
-Attila

On Mon, 1 Jan 2024 at 14:22, Ayush Saxena  wrote:

+1 (non-binding),
Thanx Laszlo for starting the thread.

-Ayush

On Mon, 1 Jan 2024 at 18:30, László Bodor 
wrote:

Hi Everyone!

Happy New Year!

I think it's time to create a new Tez release.
It's not a secret: Hive 4.0 GA would benefit from the latest bug fixes
since 0.10.2, which are:
https://issues.apache.org/jira/issues?jql=project%20%3D%20TEZ%20AND%20resolution%20!%3D%20null%20and%20fixVersion%20in%20(0.10.3)
.

Please let me know your opinions.

Regards,
Laszlo Bodor
Tez PMC Chair




Re: [DISCUSS] Tez 0.10.3 Release Planning

2024-01-01 Thread Attila Turoczy
+1  (non-binding)
Thank you for the effort and happy new year!
-Attila

On Mon, 1 Jan 2024 at 14:22, Ayush Saxena  wrote:

> +1 (non-binding),
> Thanx Laszlo for starting the thread.
>
> -Ayush
>
> On Mon, 1 Jan 2024 at 18:30, László Bodor 
> wrote:
> >
> > Hi Everyone!
> >
> > Happy New Year!
> >
> > I think it's time to create a new Tez release.
> > It's not a secret: Hive 4.0 GA would benefit from the latest bug fixes
> since 0.10.2, which are:
> https://issues.apache.org/jira/issues?jql=project%20%3D%20TEZ%20AND%20resolution%20!%3D%20null%20and%20fixVersion%20in%20(0.10.3)
> .
> >
> > Please let me know your opinions.
> >
> > Regards,
> > Laszlo Bodor
> > Tez PMC Chair
> >
>


Re: [DISCUSS] Tez 0.10.3 Release Planning

2024-01-01 Thread Ayush Saxena
+1 (non-binding),
Thanx Laszlo for starting the thread.

-Ayush

On Mon, 1 Jan 2024 at 18:30, László Bodor  wrote:
>
> Hi Everyone!
>
> Happy New Year!
>
> I think it's time to create a new Tez release.
> It's not a secret: Hive 4.0 GA would benefit from the latest bug fixes since 
> 0.10.2, which are: 
> https://issues.apache.org/jira/issues?jql=project%20%3D%20TEZ%20AND%20resolution%20!%3D%20null%20and%20fixVersion%20in%20(0.10.3).
>
> Please let me know your opinions.
>
> Regards,
> Laszlo Bodor
> Tez PMC Chair
>


[DISCUSS] Tez 0.10.3 Release Planning

2024-01-01 Thread László Bodor
Hi Everyone!

Happy New Year!

I think it's time to create a new Tez release.
It's not a secret: Hive 4.0 GA would benefit from the latest bug fixes
since 0.10.2, which are:
https://issues.apache.org/jira/issues?jql=project%20%3D%20TEZ%20AND%20resolution%20!%3D%20null%20and%20fixVersion%20in%20(0.10.3)
.

Please let me know your opinions.

Regards,
Laszlo Bodor
Tez PMC Chair


Hive JDBC driver compatibility with Jakarta Servlet API 5.0 (Jetty 11)

2023-12-27 Thread Aditya Malhotra via user
Hi Hive developer community,

I hope this message finds you well.

I am encountering an issue while trying to establish a JDBC connection to Hive 
Server 2 using Java. I am currently working with Jetty 11, which requires 
Servlet classes to be instances of Jakarta instead of Javax.
This is leading to an exception, as shown in the snippet of the stacktrace 
below:

Caused by: org.springframework.boot.web.server.WebServerException: Unable to 
start embedded Jetty server
   ... 49 more
Caused by: jakarta.servlet.UnavailableException: Servlet class 
org.apache.jasper.servlet.JspServlet is not a jakarta.servlet.Servlet

I have experimented with several recent versions of the JDBC driver (e.g., 
org.apache.hive:hive-jdbc:3.1.3), but the issue persists. Are there known 
compatibility issues between the Hive JDBC driver and Jetty 11?
Additionally, could you advise on any possible workarounds or alternative 
driver versions that might resolve this issue?

I appreciate your time and assistance in this matter.

Best Regards,
Aditya



[Domo]
Aditya Malhotra
Senior Software Engineer - India
CEL
+91 9250311355


--
This email may contain Domo confidential information and is intended only for 
the use of the individual to whom it is addressed. If you are not the intended 
recipient, please immediately notify the sender and delete the message from 
your system.


Re: Hive 3.1.3 Hadoop Compatability

2023-12-25 Thread Takanobu Asanuma
BigTop supports a specific version stack with some patches in place. It
should be helpful for you.

- Currently, the master branch consists of Hadoop-3.3.6, Hive-3.1.3,
Tez-0.10.2.
- https://github.com/apache/bigtop/blob/master/bigtop.bom
- Hive patches:
https://github.com/apache/bigtop/tree/master/bigtop-packages/src/common/hive
- Tez patches:
https://github.com/apache/bigtop/tree/master/bigtop-packages/src/common/tez

Disclaimer: I am not the developer of Hive/Tez/BigTop.

Thanks,
- Takanobu

2023年12月22日(金) 23:14 Austin Hackett :

> Many thanks for clarifying Ayush - much appreciated
>
> > On 22 Dec 2023, at 08:41, Ayush Saxena  wrote:
> >
> > Ideally the hadoop should be on 3.1.0 only, that is what we support,
> > rest if there are no incompatibilities it might or might not work with
> > higher versions of hadoop, we at "hive" don't claim that it can work,
> > mostly it will create issues with hadoop-3.3.x line due to thirdparty
> > libs and stuff like that, Guava IIRC does create some mess.
> >
> > So, short answer: we officially only support the above said hadoop
> > versions only for a particular hive release.
> >
> > -Ayush
> >
> >> On Fri, 22 Dec 2023 at 03:03, Austin Hackett 
> wrote:
> >>
> >> Hi Ayush
> >>
> >> Many thanks for your response.
> >>
> >> I’d really appreciate a clarification if that’s OK?
> >>
> >> Does this just mean that the Hadoop 3.1.0 libraries need to be deployed
> with Hive, or does it also mean the Hadoop cluster itself cannot be on a
> version later than 3.1.0 (if using Hive 3.1.3).
> >>
> >> For example, if running the Hive 3.1.3 Metastore in standalone mode,
> can the HMS work with a 3.3.6 HDFS cluster providing the Hadoop 3.1.0
> libraries are deployed alongside the HMS?
> >>
> >> Any help is much appreciated
> >>
> >> Thank you
> >>
> >>
> >>
>  On 21 Dec 2023, at 12:18, Ayush Saxena  wrote:
> >>>
> >>> Hi Austin,
> >>> Hive 3.1.3 & 4.0.0-alpha-1 works with Hadoop-3.1.0
> >>>
> >>> HIve 4.0.0-alpha-2 & 4.0.0-beta-1 works with Hadoop-3.3.1
> >>>
> >>> The upcoming Hive 4.0 GA release would be compatible with Hadoop-3.3.6
> >>>
> >>> -Ayush
> >>>
> >>> On Thu, 21 Dec 2023 at 17:39, Austin Hackett 
> wrote:
> 
>  Hi List
> 
>  I was hoping that someone might be able to clarify which Hadoop
> versions Hive 3.1.3 is compatible with?
> 
>  https://hive.apache.org/general/downloads/ says that Hive release
> 3.1.3 works with Hadoop 3.x.y which is straightforward enough.
> 
>  However, I notice the 4.0.0 releases only work with Hadoop 3.3.1,
> which makes we wonder if 3.1.3 doesn’t work actually work with 3.3.1.
> 
>  Similarly, I see that HIVE-27757 upgrades Hadoop to 3.3.6 in Hive
> 4.0.0, which makes me wonder if Hive 4.0.0 actually works with 3.3.6 and
> not 3.3.1 as mentioned on the releases page.
> 
>  In summary: does Hive 3.1.3 work with Hadoop 3.3.6, and if not, which
> Hadoop 3.x.x versions are known to work?
> 
>  Any pointers would be greatly appreciated
> 
>  Thank you
> >>
>


Hive Tez llap creates/kills many containers after kerberized the cluster

2023-12-24 Thread onmstester onmstester via user
Hi,





I'm trying to setup LLAP on Hive 3.1.3, its been enable on non-secure cluster 
successfully and it prints out that llap daemon instead of container while 
running insert into table, although it allocates a separate Tez container for 
every simple insert (about +10 seconds for a simple insert). I setup LLAP on 
HDP 3 and it did not create a seperate container for every inserts and simple 
inserts took less than a second in HDP/LLAP.



After kerberised the cluster, the HS2 service is running successfully, but the 
state of LLAP daemon is UNKNOWN, although llap0 app is running in Yarn and its 
container num-1 is up and running with no errors but another container is being 
created and finishes with exeuteStatus=0 again and again. Also hive client 
failed to connect with no LLAP daemons errors.
I tried to apply all configs from HDP, although failed to enable kerberizing it.
Here is the log from container-01 which is responsible to create the other 
container:



INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for 
Container container_1703049378182_0001_01_02

INFO instance.ComponentInstance: [COMPINSTANCE llap-0 : 
container_1703049378182_0001_01_02] retrieve status after 0

INFO instance.ComponentInstance: [COMPINSTANCE llap-0 : 
container_1703049378182_0001_01_02] retrieve localization statuses

INFO instance.ComponentInstance: [COMPINSTANCE llap-0 : 
container_1703049378182_0001_01_02] Transitioned from INIT to STARTED on 
START event

INFO instance.ComponentInstance: [COMPINSTANCE llap-0 : 
container_1703049378182_0001_01_02] cancelling localization retriever

WARN instance.ComponentInstance: Unable to process container ports mapping: {}

content to map due to end-of-input

[Source: (String)""; line: 1, column: 0]

INFO registry.YarnRegistryViewForProviders: [COMPINSTANCE llap-0 : 
container_1703049378182_0001_01_02]: D

eleting registry path 
/users/hive/services/yarn-service/llap0/components/ctr-1703049378182-0001-01-02

INFO instance.ComponentInstance: [COMPINSTANCE llap-0 : 
container_1703049378182_0001_01_02] new IP = [192.168.56.113], host = 
ambari-server, updating registry

INFO instance.ComponentInstance: [COMPINSTANCE llap-0 : 
container_1703049378182_0001_01_02] IP = [192.168.56.113], host = 
ambari-server, cancel container status retriever

INFO component.Component: [COMPONENT llap] Requesting for 1 container(s)

INFO component.Component: [COMPONENT llap] Submitting container request : 
Capability[]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution 
Type: GUARANTEED, Enforce Execution Type: false}]Resource Profile[null]

INFO instance.ComponentInstance: [COMPINSTANCE llap-0 : 
container_1703049378182_0001_01_02]: container_1703049378182_0001_01_02 
completed. Reinsert back to pending list and requested a new container.

diagnostics=

INFO instance.ComponentInstance: [COMPINSTANCE llap-0 : 
container_1703049378182_0001_01_02] Transitioned from STARTED to INIT on 
STOP event

INFO registry.YarnRegistryViewForProviders: [COMPINSTANCE llap-0 : 
container_1703049378182_0001_01_02]: Deleting registry path 
/users/hive/services/yarn-service/llap0/components/ctr-1703049378182-0001-01-02

INFO service.ServiceScheduler: 1 containers allocated.

INFO service.ServiceScheduler: [COMPONENT llap]: remove 1 outstanding container 
requests for allocateId 0

Sent using https://www.zoho.com/mail/

Hive Standalone Metastore dependency on Hadoop

2023-12-23 Thread Sanjay Gupta
Hi,
https://repo1.maven.org/maven2/org/apache/hive/hive-standalone-metastore

Does Hive Standalone Metastore depends on Hadoop ? I am building my
own docker container to run Hive Metastore and I see my image size is
1.4GB.
I did install Hadoop 3.3.6 and Standalone metastore 3.1.3 and
metastore service is running fine.
hadoop-3.3.6 size is 923 M uncompressed.

To build hive-standalone-metastore, I have followed this
https://janakiev.com/blog/presto-trino-s3/#hive-standalone-metastore

I am trying to find out how can I reduce image size ?

-- 

Thanks
Sanjay Gupta


Re: Blog article 'Performance Tuning for Single-table Queries'

2023-12-23 Thread lisoda




 Replied Message 
| From | Sungwoo Park |
| Date | 12/24/2023 00:06 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Blog article 'Performance Tuning for Single-table Queries' |
Hello Hive users,


I have published a new blog article 'Performance Tuning for Single-table 
Queries'. It shows how to change configuration parameters of Hive and Tez in 
order to make simple queries run faster than Spark. Although it uses Hive on 
MR3, the technique equally applies to Hive on Tez and Hive-LLAP.



https://www.datamonad.com/post/2023-12-23-optimize-bi-1.8/


Hope you find it useful.


Cheers,


--- Sungwoo

Blog article 'Performance Tuning for Single-table Queries'

2023-12-23 Thread Sungwoo Park
Hello Hive users,

I have published a new blog article 'Performance Tuning for Single-table
Queries'. It shows how to change configuration parameters of Hive and Tez
in order to make simple queries run faster than Spark. Although it
uses Hive on MR3, the technique equally applies to Hive on Tez and
Hive-LLAP.

https://www.datamonad.com/post/2023-12-23-optimize-bi-1.8/

Hope you find it useful.

Cheers,

--- Sungwoo


Re: How to use SKIP_SCHEMA_INIT=TRUE from command line

2023-12-22 Thread Sanjay Gupta
Thanks, it solves issue. Much appreciated.


Thanks
Sanjay Gupta

From: Akshat m 
Sent: Friday, December 22, 2023 5:55:38 AM
To: user@hive.apache.org 
Subject: Re: How to use SKIP_SCHEMA_INIT=TRUE from command line

Hi Sanjay,

Instead of using  --env SKIP_SCHEMA_INIT=TRUE,

Please use --env IS_RESUME="true" while running,

docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
--add-host=host.docker.internal:host-gateway \
 --env IS_RESUME="true" \
 --env DB_DRIVER=mysql \
 --env 
SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
 
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=" \
 --mount source=warehouse,target=/opt/hive/data/warehouse \
 --name metastore-standalone apache/hive:${HIVE_VERSION} /bin/bash

This should skip the schema initOrUpgrade process: 
https://github.com/apache/hive/blob/5022b85b5f50615f85da07bce42aebd414deb9b0/packaging/src/docker/entrypoint.sh#L24
from the 2nd time you run the container.

Regards,
Akshat


On Fri, Dec 22, 2023 at 11:53 AM Sanjay Gupta 
mailto:sanja...@gmail.com>> wrote:
Hi All,

If my metastore schema already exists with correct version so what I
need to do so it doesn't do init or upgrade when starting metastore
container

I have tried following command line


On MAC environment variables
export HIVE_VERSION=3.1.3
and even
SKIP_SCHEMA_INIT=TRUE

docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
--add-host=host.docker.internal:host-gateway \
 --env SKIP_SCHEMA_INIT=TRUE \
 --env DB_DRIVER=mysql \
 --env 
SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
 
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=" \
 --mount source=warehouse,target=/opt/hive/data/warehouse \
 --name metastore-standalone apache/hive:${HIVE_VERSION} /bin/bash



---

Docker Logs , it still tries to initSchema

docker logs 1c
+ : mysql
+ SKIP_SCHEMA_INIT=false
+ export HIVE_CONF_DIR=/opt/hive/conf
+ HIVE_CONF_DIR=/opt/hive/conf
+ '[' -d '' ']'
+ export 'HADOOP_CLIENT_OPTS= -Xmx1G
-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=hive'
+ HADOOP_CLIENT_OPTS=' -Xmx1G
-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=hive'
+ [[ false == \f\a\l\s\e ]]
+ initialize_hive
+ COMMAND=-initOrUpgradeSchema
++ cut -d . -f1
++ echo 3.1.3
+ '[' 3 -lt 4 ']'
+ COMMAND=-initSchema
+ /opt/hive/bin/schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/opt/hive/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:
jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.cj.jdbc.Driver
Metastore connection User: hive
Starting metastore schema initialization to 3.1.0
Initialization script hive-schema-3.1.0.mysql.sql


Error: Table 'ctlgs' already exists (state=42S01,code=1050)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema
initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***
[WARN] Failed to create directory: /home/hive/.beeline
No such file or directory
+ '[' 1 -eq 0 ']'
+ echo 'Schema initialization failed!'
Schema initialization failed!
+ exit 1
-



Docker entrypoint.sh have following code

SKIP_SCHEMA_INIT="${IS_RESUME:-false}"

function initialize_hive {
  COMMAND="-initOrUpgradeSchema"
  if [ "$(echo "$HIVE_VER" | cut -d '.' -f1)" -lt "4" ]; then
 COMMAND="-${SCHEMA_COMMAND:-initSchema}"
  fi
  $HIVE_HOME/bin/schematool -dbType $DB_DRIVER $COMMAND
  if [ $? -eq 0 ]; then
echo "Initialized schema successfully.."
  else
echo "Schema initialization failed!"
exit 1
  fi
}

export HIVE_CONF_DIR=$HIVE_HOME/conf
if [ -d "${HIVE_CUSTOM_CONF_DIR:-}" ]; then
  find "${HIVE_CUSTOM_CONF_DIR}" -type f -exec \
ln -sfn {} 

Re: Hive 3.1.3 Hadoop Compatability

2023-12-22 Thread Austin Hackett
Many thanks for clarifying Ayush - much appreciated 

> On 22 Dec 2023, at 08:41, Ayush Saxena  wrote:
> 
> Ideally the hadoop should be on 3.1.0 only, that is what we support,
> rest if there are no incompatibilities it might or might not work with
> higher versions of hadoop, we at "hive" don't claim that it can work,
> mostly it will create issues with hadoop-3.3.x line due to thirdparty
> libs and stuff like that, Guava IIRC does create some mess.
> 
> So, short answer: we officially only support the above said hadoop
> versions only for a particular hive release.
> 
> -Ayush
> 
>> On Fri, 22 Dec 2023 at 03:03, Austin Hackett  wrote:
>> 
>> Hi Ayush
>> 
>> Many thanks for your response.
>> 
>> I’d really appreciate a clarification if that’s OK?
>> 
>> Does this just mean that the Hadoop 3.1.0 libraries need to be deployed with 
>> Hive, or does it also mean the Hadoop cluster itself cannot be on a version 
>> later than 3.1.0 (if using Hive 3.1.3).
>> 
>> For example, if running the Hive 3.1.3 Metastore in standalone mode, can the 
>> HMS work with a 3.3.6 HDFS cluster providing the Hadoop 3.1.0 libraries are 
>> deployed alongside the HMS?
>> 
>> Any help is much appreciated
>> 
>> Thank you
>> 
>> 
>> 
 On 21 Dec 2023, at 12:18, Ayush Saxena  wrote:
>>> 
>>> Hi Austin,
>>> Hive 3.1.3 & 4.0.0-alpha-1 works with Hadoop-3.1.0
>>> 
>>> HIve 4.0.0-alpha-2 & 4.0.0-beta-1 works with Hadoop-3.3.1
>>> 
>>> The upcoming Hive 4.0 GA release would be compatible with Hadoop-3.3.6
>>> 
>>> -Ayush
>>> 
>>> On Thu, 21 Dec 2023 at 17:39, Austin Hackett  wrote:
 
 Hi List
 
 I was hoping that someone might be able to clarify which Hadoop versions 
 Hive 3.1.3 is compatible with?
 
 https://hive.apache.org/general/downloads/ says that Hive release 3.1.3 
 works with Hadoop 3.x.y which is straightforward enough.
 
 However, I notice the 4.0.0 releases only work with Hadoop 3.3.1, which 
 makes we wonder if 3.1.3 doesn’t work actually work with 3.3.1.
 
 Similarly, I see that HIVE-27757 upgrades Hadoop to 3.3.6 in Hive 4.0.0, 
 which makes me wonder if Hive 4.0.0 actually works with 3.3.6 and not 
 3.3.1 as mentioned on the releases page.
 
 In summary: does Hive 3.1.3 work with Hadoop 3.3.6, and if not, which 
 Hadoop 3.x.x versions are known to work?
 
 Any pointers would be greatly appreciated
 
 Thank you
>> 


Re: How to use SKIP_SCHEMA_INIT=TRUE from command line

2023-12-22 Thread Akshat m
Hi Sanjay,

Instead of using  --env SKIP_SCHEMA_INIT=TRUE,

Please use --env IS_RESUME="true" while running,

docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
--add-host=host.docker.internal:host-gateway \
 --env IS_RESUME="true" \
 --env DB_DRIVER=mysql \
 --env
SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
 
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=" \
 --mount source=warehouse,target=/opt/hive/data/warehouse \
 --name metastore-standalone apache/hive:${HIVE_VERSION} /bin/bash

This should skip the schema initOrUpgrade process:
https://github.com/apache/hive/blob/5022b85b5f50615f85da07bce42aebd414deb9b0/packaging/src/docker/entrypoint.sh#L24
from the 2nd time you run the container.

Regards,
Akshat


On Fri, Dec 22, 2023 at 11:53 AM Sanjay Gupta  wrote:

> Hi All,
>
> If my metastore schema already exists with correct version so what I
> need to do so it doesn't do init or upgrade when starting metastore
> container
>
> I have tried following command line
>
>
> On MAC environment variables
> export HIVE_VERSION=3.1.3
> and even
> SKIP_SCHEMA_INIT=TRUE
>
> docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
> --add-host=host.docker.internal:host-gateway \
>  --env SKIP_SCHEMA_INIT=TRUE \
>  --env DB_DRIVER=mysql \
>  --env
> SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
>
>  
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=" \
>  --mount source=warehouse,target=/opt/hive/data/warehouse \
>  --name metastore-standalone apache/hive:${HIVE_VERSION} /bin/bash
>
>
>
> ---
>
> Docker Logs , it still tries to initSchema
>
> docker logs 1c
> + : mysql
> + SKIP_SCHEMA_INIT=false
> + export HIVE_CONF_DIR=/opt/hive/conf
> + HIVE_CONF_DIR=/opt/hive/conf
> + '[' -d '' ']'
> + export 'HADOOP_CLIENT_OPTS= -Xmx1G
> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
>
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=hive'
> + HADOOP_CLIENT_OPTS=' -Xmx1G
> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
>
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=hive'
> + [[ false == \f\a\l\s\e ]]
> + initialize_hive
> + COMMAND=-initOrUpgradeSchema
> ++ cut -d . -f1
> ++ echo 3.1.3
> + '[' 3 -lt 4 ']'
> + COMMAND=-initSchema
> + /opt/hive/bin/schematool -dbType mysql -initSchema
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
>
> [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>
> [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> Metastore connection URL:
> jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> Metastore Connection Driver : com.mysql.cj.jdbc.Driver
> Metastore connection User: hive
> Starting metastore schema initialization to 3.1.0
> Initialization script hive-schema-3.1.0.mysql.sql
>
>
> Error: Table 'ctlgs' already exists (state=42S01,code=1050)
> org.apache.hadoop.hive.metastore.HiveMetaException: Schema
> initialization FAILED! Metastore state would be inconsistent !!
> Underlying cause: java.io.IOException : Schema script failed, errorcode 2
> Use --verbose for detailed stacktrace.
> *** schemaTool failed ***
> [WARN] Failed to create directory: /home/hive/.beeline
> No such file or directory
> + '[' 1 -eq 0 ']'
> + echo 'Schema initialization failed!'
> Schema initialization failed!
> + exit 1
> -
>
>
>
> Docker entrypoint.sh have following code
>
> SKIP_SCHEMA_INIT="${IS_RESUME:-false}"
>
> function initialize_hive {
>   COMMAND="-initOrUpgradeSchema"
>   if [ "$(echo "$HIVE_VER" | cut -d '.' -f1)" -lt "4" ]; then
>  COMMAND="-${SCHEMA_COMMAND:-initSchema}"
>   fi
>   $HIVE_HOME/bin/schematool -dbType $DB_DRIVER $COMMAND
>   if [ $? -eq 0 ]; then
> echo "Initialized schema successfully.."
>   else
> echo "Schema initialization failed!"
> exit 1
>   fi
> }
>
> export HIVE_CONF_DIR=$HIVE_HOME/conf
> if [ -d "${HIVE_CUSTOM_CONF_DIR:-}" ]; then
>   find "${HIVE_CUSTOM_CONF_DIR}" -type f -exec \
> ln -sfn {} "${HIVE_CONF_DIR}"/ \;
>   export HADOOP_CONF_DIR=$HIVE_CONF_DIR

Re: Hive 3.1.3 Hadoop Compatability

2023-12-22 Thread Ayush Saxena
Ideally the hadoop should be on 3.1.0 only, that is what we support,
rest if there are no incompatibilities it might or might not work with
higher versions of hadoop, we at "hive" don't claim that it can work,
mostly it will create issues with hadoop-3.3.x line due to thirdparty
libs and stuff like that, Guava IIRC does create some mess.

So, short answer: we officially only support the above said hadoop
versions only for a particular hive release.

-Ayush

On Fri, 22 Dec 2023 at 03:03, Austin Hackett  wrote:
>
> Hi Ayush
>
> Many thanks for your response.
>
> I’d really appreciate a clarification if that’s OK?
>
> Does this just mean that the Hadoop 3.1.0 libraries need to be deployed with 
> Hive, or does it also mean the Hadoop cluster itself cannot be on a version 
> later than 3.1.0 (if using Hive 3.1.3).
>
> For example, if running the Hive 3.1.3 Metastore in standalone mode, can the 
> HMS work with a 3.3.6 HDFS cluster providing the Hadoop 3.1.0 libraries are 
> deployed alongside the HMS?
>
> Any help is much appreciated
>
> Thank you
>
>
>
> > On 21 Dec 2023, at 12:18, Ayush Saxena  wrote:
> >
> > Hi Austin,
> > Hive 3.1.3 & 4.0.0-alpha-1 works with Hadoop-3.1.0
> >
> > HIve 4.0.0-alpha-2 & 4.0.0-beta-1 works with Hadoop-3.3.1
> >
> > The upcoming Hive 4.0 GA release would be compatible with Hadoop-3.3.6
> >
> > -Ayush
> >
> > On Thu, 21 Dec 2023 at 17:39, Austin Hackett  wrote:
> >>
> >> Hi List
> >>
> >> I was hoping that someone might be able to clarify which Hadoop versions 
> >> Hive 3.1.3 is compatible with?
> >>
> >> https://hive.apache.org/general/downloads/ says that Hive release 3.1.3 
> >> works with Hadoop 3.x.y which is straightforward enough.
> >>
> >> However, I notice the 4.0.0 releases only work with Hadoop 3.3.1, which 
> >> makes we wonder if 3.1.3 doesn’t work actually work with 3.3.1.
> >>
> >> Similarly, I see that HIVE-27757 upgrades Hadoop to 3.3.6 in Hive 4.0.0, 
> >> which makes me wonder if Hive 4.0.0 actually works with 3.3.6 and not 
> >> 3.3.1 as mentioned on the releases page.
> >>
> >> In summary: does Hive 3.1.3 work with Hadoop 3.3.6, and if not, which 
> >> Hadoop 3.x.x versions are known to work?
> >>
> >> Any pointers would be greatly appreciated
> >>
> >> Thank you
>


How to use SKIP_SCHEMA_INIT=TRUE from command line

2023-12-21 Thread Sanjay Gupta
Hi All,

If my metastore schema already exists with correct version so what I
need to do so it doesn't do init or upgrade when starting metastore
container

I have tried following command line


On MAC environment variables
export HIVE_VERSION=3.1.3
and even
SKIP_SCHEMA_INIT=TRUE

docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
--add-host=host.docker.internal:host-gateway \
 --env SKIP_SCHEMA_INIT=TRUE \
 --env DB_DRIVER=mysql \
 --env 
SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
 
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=" \
 --mount source=warehouse,target=/opt/hive/data/warehouse \
 --name metastore-standalone apache/hive:${HIVE_VERSION} /bin/bash



---

Docker Logs , it still tries to initSchema

docker logs 1c
+ : mysql
+ SKIP_SCHEMA_INIT=false
+ export HIVE_CONF_DIR=/opt/hive/conf
+ HIVE_CONF_DIR=/opt/hive/conf
+ '[' -d '' ']'
+ export 'HADOOP_CLIENT_OPTS= -Xmx1G
-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=hive'
+ HADOOP_CLIENT_OPTS=' -Xmx1G
-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=hive'
+ [[ false == \f\a\l\s\e ]]
+ initialize_hive
+ COMMAND=-initOrUpgradeSchema
++ cut -d . -f1
++ echo 3.1.3
+ '[' 3 -lt 4 ']'
+ COMMAND=-initSchema
+ /opt/hive/bin/schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/opt/hive/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:
jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.cj.jdbc.Driver
Metastore connection User: hive
Starting metastore schema initialization to 3.1.0
Initialization script hive-schema-3.1.0.mysql.sql


Error: Table 'ctlgs' already exists (state=42S01,code=1050)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema
initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***
[WARN] Failed to create directory: /home/hive/.beeline
No such file or directory
+ '[' 1 -eq 0 ']'
+ echo 'Schema initialization failed!'
Schema initialization failed!
+ exit 1
-



Docker entrypoint.sh have following code

SKIP_SCHEMA_INIT="${IS_RESUME:-false}"

function initialize_hive {
  COMMAND="-initOrUpgradeSchema"
  if [ "$(echo "$HIVE_VER" | cut -d '.' -f1)" -lt "4" ]; then
 COMMAND="-${SCHEMA_COMMAND:-initSchema}"
  fi
  $HIVE_HOME/bin/schematool -dbType $DB_DRIVER $COMMAND
  if [ $? -eq 0 ]; then
echo "Initialized schema successfully.."
  else
echo "Schema initialization failed!"
exit 1
  fi
}

export HIVE_CONF_DIR=$HIVE_HOME/conf
if [ -d "${HIVE_CUSTOM_CONF_DIR:-}" ]; then
  find "${HIVE_CUSTOM_CONF_DIR}" -type f -exec \
ln -sfn {} "${HIVE_CONF_DIR}"/ \;
  export HADOOP_CONF_DIR=$HIVE_CONF_DIR
  export TEZ_CONF_DIR=$HIVE_CONF_DIR
fi

export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Xmx1G $SERVICE_OPTS"
if [[ "${SKIP_SCHEMA_INIT}" == "false" ]]; then
  # handles schema initialization
  initialize_hive
fi

\


Re: Hive 3.1.3 Hadoop Compatability

2023-12-21 Thread Austin Hackett
Hi Ayush

Many thanks for your response.

I’d really appreciate a clarification if that’s OK?

Does this just mean that the Hadoop 3.1.0 libraries need to be deployed with 
Hive, or does it also mean the Hadoop cluster itself cannot be on a version 
later than 3.1.0 (if using Hive 3.1.3).

For example, if running the Hive 3.1.3 Metastore in standalone mode, can the 
HMS work with a 3.3.6 HDFS cluster providing the Hadoop 3.1.0 libraries are 
deployed alongside the HMS?

Any help is much appreciated

Thank you



> On 21 Dec 2023, at 12:18, Ayush Saxena  wrote:
> 
> Hi Austin,
> Hive 3.1.3 & 4.0.0-alpha-1 works with Hadoop-3.1.0
> 
> HIve 4.0.0-alpha-2 & 4.0.0-beta-1 works with Hadoop-3.3.1
> 
> The upcoming Hive 4.0 GA release would be compatible with Hadoop-3.3.6
> 
> -Ayush
> 
> On Thu, 21 Dec 2023 at 17:39, Austin Hackett  wrote:
>> 
>> Hi List
>> 
>> I was hoping that someone might be able to clarify which Hadoop versions 
>> Hive 3.1.3 is compatible with?
>> 
>> https://hive.apache.org/general/downloads/ says that Hive release 3.1.3 
>> works with Hadoop 3.x.y which is straightforward enough.
>> 
>> However, I notice the 4.0.0 releases only work with Hadoop 3.3.1, which 
>> makes we wonder if 3.1.3 doesn’t work actually work with 3.3.1.
>> 
>> Similarly, I see that HIVE-27757 upgrades Hadoop to 3.3.6 in Hive 4.0.0, 
>> which makes me wonder if Hive 4.0.0 actually works with 3.3.6 and not 3.3.1 
>> as mentioned on the releases page.
>> 
>> In summary: does Hive 3.1.3 work with Hadoop 3.3.6, and if not, which Hadoop 
>> 3.x.x versions are known to work?
>> 
>> Any pointers would be greatly appreciated
>> 
>> Thank you



Re: Hive 3.1.3 Hadoop Compatability

2023-12-21 Thread Ayush Saxena
Hi Austin,
Hive 3.1.3 & 4.0.0-alpha-1 works with Hadoop-3.1.0

HIve 4.0.0-alpha-2 & 4.0.0-beta-1 works with Hadoop-3.3.1

The upcoming Hive 4.0 GA release would be compatible with Hadoop-3.3.6

-Ayush

On Thu, 21 Dec 2023 at 17:39, Austin Hackett  wrote:
>
> Hi List
>
> I was hoping that someone might be able to clarify which Hadoop versions Hive 
> 3.1.3 is compatible with?
>
> https://hive.apache.org/general/downloads/ says that Hive release 3.1.3 works 
> with Hadoop 3.x.y which is straightforward enough.
>
> However, I notice the 4.0.0 releases only work with Hadoop 3.3.1, which makes 
> we wonder if 3.1.3 doesn’t work actually work with 3.3.1.
>
> Similarly, I see that HIVE-27757 upgrades Hadoop to 3.3.6 in Hive 4.0.0, 
> which makes me wonder if Hive 4.0.0 actually works with 3.3.6 and not 3.3.1 
> as mentioned on the releases page.
>
> In summary: does Hive 3.1.3 work with Hadoop 3.3.6, and if not, which Hadoop 
> 3.x.x versions are known to work?
>
> Any pointers would be greatly appreciated
>
> Thank you


Hive 3.1.3 Hadoop Compatability

2023-12-21 Thread Austin Hackett
Hi List

I was hoping that someone might be able to clarify which Hadoop versions Hive 
3.1.3 is compatible with?

https://hive.apache.org/general/downloads/ says that Hive release 3.1.3 works 
with Hadoop 3.x.y which is straightforward enough.

However, I notice the 4.0.0 releases only work with Hadoop 3.3.1, which makes 
we wonder if 3.1.3 doesn’t work actually work with 3.3.1.

Similarly, I see that HIVE-27757 upgrades Hadoop to 3.3.6 in Hive 4.0.0, which 
makes me wonder if Hive 4.0.0 actually works with 3.3.6 and not 3.3.1 as 
mentioned on the releases page.

In summary: does Hive 3.1.3 work with Hadoop 3.3.6, and if not, which Hadoop 
3.x.x versions are known to work?

Any pointers would be greatly appreciated 

Thank you

Re: Help with Docker Apache/Hive metastore using mysql remote database

2023-12-20 Thread Akshat m
Hi,

You can download it from here:
https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.29/mysql-connector-java-8.0.29.jar

Regards,
Akshat

On Thu, Dec 21, 2023 at 1:40 AM Sanjay Gupta  wrote:

> There is no mysql driver file in following repo
> https://repo1.maven.org/maven2/org
>
> On Mon, Dec 18, 2023 at 3:10 AM Simhadri G  wrote:
> >
> > We can modify the Dockerfile to wget the necessary driver and copy it to
> /opt/hive/lib/ .  This should make it work. The diff is attached below:
> >
> >
> > diff --git a/packaging/src/docker/Dockerfile
> b/packaging/src/docker/Dockerfile
> > --- a/packaging/src/docker/Dockerfile (revision
> dceaf810b32fc266e3e657fdaefcd4507f2191b5)
> > +++ b/packaging/src/docker/Dockerfile (date 1702897518609)
> > @@ -80,6 +80,9 @@
> >
> >  ENV PATH=$HIVE_HOME/bin:$HADOOP_HOME/bin:$PATH
> >
> > +RUN wget
> https://repo1.maven.org/maven2/org/postgresql/postgresql/42.5.1/postgresql-42.5.1.jar
> > +RUN cp /postgresql-42.5.1.jar /opt/hive/lib/
> > +
> >  COPY entrypoint.sh /
> >  COPY conf $HIVE_HOME/conf
> >  RUN chmod +x /entrypoint.sh
> >
> > On Mon, Dec 18, 2023, 12:59 PM Ayush Saxena  wrote:
> >>
> >> I think the similar problem is being chased as part of
> >> https://github.com/apache/hive/pull/4948
> >>
> >> On Mon, 18 Dec 2023 at 09:48, Sanjay Gupta  wrote:
> >> >
> >> >
> >> >
> >> >
> >> > Issue with Docker container using mysql RDBMS ( Failed to load driver)
> >> >
> >> > https://hub.docker.com/r/apache/hive
> >> >
> >> > According to readme
> >> >
> >> > Launch Standalone Metastore With External RDBMS
> (Postgres/Oracle/MySql/MsSql)
> >> >
> >> > I want to use MySQL
> >> >
> >> > I tried com.mysql.jdbc.Driver or com.mysql.cj.jdbc.Driver
> >> >
> >> > docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
> --add-host=host.docker.internal:host-gateway \
> >> >  --env DB_DRIVER=mysql \
> >> >  --env
> SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=password" \
> >> >  --mount source=warehouse,target=/opt/hive/data/warehouse \
> >> >  --name metastore-standalone apache/hive:${HIVE_VERSION}
> >> >
> >> >
> >> > docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
> --add-host=host.docker.internal:host-gateway \
> >> >  --env DB_DRIVER=mysql \
> >> >  --env
> SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=password" \
> >> >  --mount source=warehouse,target=/opt/hive/data/warehouse \
> >> >  --name metastore-standalone apache/hive:${HIVE_VERSION}
> >> >
> >> > Docker logs shows this for both drivers ( same error )
> >> >
> >> > docker logs f3
> >> > + : mysql
> >> > + SKIP_SCHEMA_INIT=false
> >> > + export HIVE_CONF_DIR=/opt/hive/conf
> >> > + HIVE_CONF_DIR=/opt/hive/conf
> >> > + '[' -d '' ']'
> >> > + export 'HADOOP_CLIENT_OPTS= -Xmx1G
> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=hive'
> >> > + HADOOP_CLIENT_OPTS=' -Xmx1G
> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=hive'
> >> > + [[ false == \f\a\l\s\e ]]
> >> > + initialize_hive
> >> > + /opt/hive/bin/schematool -dbType mysql -initSchema
> >> > SLF4J: Class path contains multiple SLF4J bindings.
> >> > SLF4J: Found binding in
> [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >> > SLF4J: Found binding in
> [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> >> > SLF4J: Actual binding is of type
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> >> > Metastore connection URL:
> jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> >> > Metastore Connection Driver : com.mysql.cj.jdbc.Driver
> >> > Metastore connection User: hive
> >> > org.apache.hadoop.hive.metastore.HiveMetaException: Failed to load
> driver
> >> > Underlying cause: java.lang.ClassNotFoundException :
> com.mysql.cj.jdbc.Driver
> >> > Use --verbose for detailed stacktrace.
> >> > *** schemaTool failed ***
> >> > + '[' 1 -eq 0 ']'
> >> > + echo 'Schema initialization failed!'
> >> > Schema initialization failed!
> >> > + exit 1

Re: Help with Docker Apache/Hive metastore using mysql remote database

2023-12-20 Thread Sanjay Gupta
There is no mysql driver file in following repo
https://repo1.maven.org/maven2/org

On Mon, Dec 18, 2023 at 3:10 AM Simhadri G  wrote:
>
> We can modify the Dockerfile to wget the necessary driver and copy it to 
> /opt/hive/lib/ .  This should make it work. The diff is attached below:
>
>
> diff --git a/packaging/src/docker/Dockerfile b/packaging/src/docker/Dockerfile
> --- a/packaging/src/docker/Dockerfile (revision 
> dceaf810b32fc266e3e657fdaefcd4507f2191b5)
> +++ b/packaging/src/docker/Dockerfile (date 1702897518609)
> @@ -80,6 +80,9 @@
>
>  ENV PATH=$HIVE_HOME/bin:$HADOOP_HOME/bin:$PATH
>
> +RUN wget 
> https://repo1.maven.org/maven2/org/postgresql/postgresql/42.5.1/postgresql-42.5.1.jar
> +RUN cp /postgresql-42.5.1.jar /opt/hive/lib/
> +
>  COPY entrypoint.sh /
>  COPY conf $HIVE_HOME/conf
>  RUN chmod +x /entrypoint.sh
>
> On Mon, Dec 18, 2023, 12:59 PM Ayush Saxena  wrote:
>>
>> I think the similar problem is being chased as part of
>> https://github.com/apache/hive/pull/4948
>>
>> On Mon, 18 Dec 2023 at 09:48, Sanjay Gupta  wrote:
>> >
>> >
>> >
>> >
>> > Issue with Docker container using mysql RDBMS ( Failed to load driver)
>> >
>> > https://hub.docker.com/r/apache/hive
>> >
>> > According to readme
>> >
>> > Launch Standalone Metastore With External RDBMS 
>> > (Postgres/Oracle/MySql/MsSql)
>> >
>> > I want to use MySQL
>> >
>> > I tried com.mysql.jdbc.Driver or com.mysql.cj.jdbc.Driver
>> >
>> > docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore 
>> > --add-host=host.docker.internal:host-gateway \
>> >  --env DB_DRIVER=mysql \
>> >  --env 
>> > SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
>> >  
>> > -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
>> >  -Djavax.jdo.option.ConnectionUserName=hive 
>> > -Djavax.jdo.option.ConnectionPassword=password" \
>> >  --mount source=warehouse,target=/opt/hive/data/warehouse \
>> >  --name metastore-standalone apache/hive:${HIVE_VERSION}
>> >
>> >
>> > docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore 
>> > --add-host=host.docker.internal:host-gateway \
>> >  --env DB_DRIVER=mysql \
>> >  --env 
>> > SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
>> >   
>> > -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
>> >  -Djavax.jdo.option.ConnectionUserName=hive 
>> > -Djavax.jdo.option.ConnectionPassword=password" \
>> >  --mount source=warehouse,target=/opt/hive/data/warehouse \
>> >  --name metastore-standalone apache/hive:${HIVE_VERSION}
>> >
>> > Docker logs shows this for both drivers ( same error )
>> >
>> > docker logs f3
>> > + : mysql
>> > + SKIP_SCHEMA_INIT=false
>> > + export HIVE_CONF_DIR=/opt/hive/conf
>> > + HIVE_CONF_DIR=/opt/hive/conf
>> > + '[' -d '' ']'
>> > + export 'HADOOP_CLIENT_OPTS= -Xmx1G 
>> > -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver  
>> > -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
>> >  -Djavax.jdo.option.ConnectionUserName=hive 
>> > -Djavax.jdo.option.ConnectionPassword=hive'
>> > + HADOOP_CLIENT_OPTS=' -Xmx1G 
>> > -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver  
>> > -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
>> >  -Djavax.jdo.option.ConnectionUserName=hive 
>> > -Djavax.jdo.option.ConnectionPassword=hive'
>> > + [[ false == \f\a\l\s\e ]]
>> > + initialize_hive
>> > + /opt/hive/bin/schematool -dbType mysql -initSchema
>> > SLF4J: Class path contains multiple SLF4J bindings.
>> > SLF4J: Found binding in 
>> > [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> > SLF4J: Found binding in 
>> > [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
>> > explanation.
>> > SLF4J: Actual binding is of type 
>> > [org.apache.logging.slf4j.Log4jLoggerFactory]
>> > Metastore connection URL: 
>> > jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
>> > Metastore Connection Driver : com.mysql.cj.jdbc.Driver
>> > Metastore connection User: hive
>> > org.apache.hadoop.hive.metastore.HiveMetaException: Failed to load driver
>> > Underlying cause: java.lang.ClassNotFoundException : 
>> > com.mysql.cj.jdbc.Driver
>> > Use --verbose for detailed stacktrace.
>> > *** schemaTool failed ***
>> > + '[' 1 -eq 0 ']'
>> > + echo 'Schema initialization failed!'
>> > Schema initialization failed!
>> > + exit 1
>> >
>> > Any idea, why I am getting failed to load driver for MySQL DB.
>> >
>> > Isn't docker container comes with MySQL Driver ?
>> >
>> > Docker container exits so I can't check whether driver is already 
>> > installed.
>> >
>> > Let me know, what I can do 

when enable reducededuplication, count(distinct)+group by very slow

2023-12-19 Thread lisoda
Hi team.

I found that when I enable reduceduplication, count(distinct)+GroupBy becomes 
very slow.
Is there a problem with reduceduplication?


test query info:
|
CONFIG
|
SQL
|
TIME
|
|
hive.optimize.reducededuplication=true
|
select count(1) from(select uni_shop_id,partner,count(distinct uni_id) from 
default.b_std_trade_sampling group by uni_shop_id,partner) s1;
|
400s
|
|
hive.optimize.reducededuplication=false
|
select count(1) from(select uni_shop_id,partner,count(distinct uni_id) from 
default.b_std_trade_sampling group by uni_shop_id,partner) s1;
|
180s
|


table basic info:
|
info
|
row
|
|
select count(1) form default.b_std_trade_sampling
|
 9774285968
|
|
select count(distinct uni_id) form default.b_std_trade_sampling
|
5367720404
|
|
select count(distinct partner),count(distinct uni_shop_id) form 
default.b_std_trade_sampling
|
50,13000
|

I'd be grateful if someone could guide me.




Re: Help with Docker Apache/Hive metastore using mysql remote database

2023-12-18 Thread Simhadri G
We can modify the Dockerfile to wget the necessary driver and copy it to
/opt/hive/lib/ .  This should make it work. The diff is attached below:


diff --git a/packaging/src/docker/Dockerfile
b/packaging/src/docker/Dockerfile
--- a/packaging/src/docker/Dockerfile (revision
dceaf810b32fc266e3e657fdaefcd4507f2191b5)
+++ b/packaging/src/docker/Dockerfile (date 1702897518609)
@@ -80,6 +80,9 @@

 ENV PATH=$HIVE_HOME/bin:$HADOOP_HOME/bin:$PATH

+RUN wget
https://repo1.maven.org/maven2/org/postgresql/postgresql/42.5.1/postgresql-42.5.1.jar
+RUN cp /postgresql-42.5.1.jar /opt/hive/lib/
+
 COPY entrypoint.sh /
 COPY conf $HIVE_HOME/conf
 RUN chmod +x /entrypoint.sh

On Mon, Dec 18, 2023, 12:59 PM Ayush Saxena  wrote:

> I think the similar problem is being chased as part of
> https://github.com/apache/hive/pull/4948
>
> On Mon, 18 Dec 2023 at 09:48, Sanjay Gupta  wrote:
> >
> >
> >
> >
> > Issue with Docker container using mysql RDBMS ( Failed to load driver)
> >
> > https://hub.docker.com/r/apache/hive
> >
> > According to readme
> >
> > Launch Standalone Metastore With External RDBMS
> (Postgres/Oracle/MySql/MsSql)
> >
> > I want to use MySQL
> >
> > I tried com.mysql.jdbc.Driver or com.mysql.cj.jdbc.Driver
> >
> > docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
> --add-host=host.docker.internal:host-gateway \
> >  --env DB_DRIVER=mysql \
> >  --env
> SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=password" \
> >  --mount source=warehouse,target=/opt/hive/data/warehouse \
> >  --name metastore-standalone apache/hive:${HIVE_VERSION}
> >
> >
> > docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
> --add-host=host.docker.internal:host-gateway \
> >  --env DB_DRIVER=mysql \
> >  --env
> SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=password" \
> >  --mount source=warehouse,target=/opt/hive/data/warehouse \
> >  --name metastore-standalone apache/hive:${HIVE_VERSION}
> >
> > Docker logs shows this for both drivers ( same error )
> >
> > docker logs f3
> > + : mysql
> > + SKIP_SCHEMA_INIT=false
> > + export HIVE_CONF_DIR=/opt/hive/conf
> > + HIVE_CONF_DIR=/opt/hive/conf
> > + '[' -d '' ']'
> > + export 'HADOOP_CLIENT_OPTS= -Xmx1G
> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=hive'
> > + HADOOP_CLIENT_OPTS=' -Xmx1G
> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=hive'
> > + [[ false == \f\a\l\s\e ]]
> > + initialize_hive
> > + /opt/hive/bin/schematool -dbType mysql -initSchema
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: Found binding in
> [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> > SLF4J: Actual binding is of type
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> > Metastore connection URL:
> jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> > Metastore Connection Driver : com.mysql.cj.jdbc.Driver
> > Metastore connection User: hive
> > org.apache.hadoop.hive.metastore.HiveMetaException: Failed to load driver
> > Underlying cause: java.lang.ClassNotFoundException :
> com.mysql.cj.jdbc.Driver
> > Use --verbose for detailed stacktrace.
> > *** schemaTool failed ***
> > + '[' 1 -eq 0 ']'
> > + echo 'Schema initialization failed!'
> > Schema initialization failed!
> > + exit 1
> >
> > Any idea, why I am getting failed to load driver for MySQL DB.
> >
> > Isn't docker container comes with MySQL Driver ?
> >
> > Docker container exits so I can't check whether driver is already
> installed.
> >
> > Let me know, what I can do to make it work.
> >
> > --
> >
> >
> > Thanks
> > Sanjay Gupta
> >
> >
> >
> > --
> >
> > Thanks
> > Sanjay Gupta
> >
> >
> >
> > --
> >
> > Thanks
> > Sanjay Gupta
> >
>


Re: Help with Docker Apache/Hive metastore using mysql remote database

2023-12-17 Thread Ayush Saxena
I think the similar problem is being chased as part of
https://github.com/apache/hive/pull/4948

On Mon, 18 Dec 2023 at 09:48, Sanjay Gupta  wrote:
>
>
>
>
> Issue with Docker container using mysql RDBMS ( Failed to load driver)
>
> https://hub.docker.com/r/apache/hive
>
> According to readme
>
> Launch Standalone Metastore With External RDBMS (Postgres/Oracle/MySql/MsSql)
>
> I want to use MySQL
>
> I tried com.mysql.jdbc.Driver or com.mysql.cj.jdbc.Driver
>
> docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore 
> --add-host=host.docker.internal:host-gateway \
>  --env DB_DRIVER=mysql \
>  --env 
> SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver 
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
>  -Djavax.jdo.option.ConnectionUserName=hive 
> -Djavax.jdo.option.ConnectionPassword=password" \
>  --mount source=warehouse,target=/opt/hive/data/warehouse \
>  --name metastore-standalone apache/hive:${HIVE_VERSION}
>
>
> docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore 
> --add-host=host.docker.internal:host-gateway \
>  --env DB_DRIVER=mysql \
>  --env 
> SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
>   
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
>  -Djavax.jdo.option.ConnectionUserName=hive 
> -Djavax.jdo.option.ConnectionPassword=password" \
>  --mount source=warehouse,target=/opt/hive/data/warehouse \
>  --name metastore-standalone apache/hive:${HIVE_VERSION}
>
> Docker logs shows this for both drivers ( same error )
>
> docker logs f3
> + : mysql
> + SKIP_SCHEMA_INIT=false
> + export HIVE_CONF_DIR=/opt/hive/conf
> + HIVE_CONF_DIR=/opt/hive/conf
> + '[' -d '' ']'
> + export 'HADOOP_CLIENT_OPTS= -Xmx1G 
> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver  
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
>  -Djavax.jdo.option.ConnectionUserName=hive 
> -Djavax.jdo.option.ConnectionPassword=hive'
> + HADOOP_CLIENT_OPTS=' -Xmx1G 
> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver  
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
>  -Djavax.jdo.option.ConnectionUserName=hive 
> -Djavax.jdo.option.ConnectionPassword=hive'
> + [[ false == \f\a\l\s\e ]]
> + initialize_hive
> + /opt/hive/bin/schematool -dbType mysql -initSchema
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Metastore connection URL: 
> jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> Metastore Connection Driver : com.mysql.cj.jdbc.Driver
> Metastore connection User: hive
> org.apache.hadoop.hive.metastore.HiveMetaException: Failed to load driver
> Underlying cause: java.lang.ClassNotFoundException : com.mysql.cj.jdbc.Driver
> Use --verbose for detailed stacktrace.
> *** schemaTool failed ***
> + '[' 1 -eq 0 ']'
> + echo 'Schema initialization failed!'
> Schema initialization failed!
> + exit 1
>
> Any idea, why I am getting failed to load driver for MySQL DB.
>
> Isn't docker container comes with MySQL Driver ?
>
> Docker container exits so I can't check whether driver is already installed.
>
> Let me know, what I can do to make it work.
>
> --
>
>
> Thanks
> Sanjay Gupta
>
>
>
> --
>
> Thanks
> Sanjay Gupta
>
>
>
> --
>
> Thanks
> Sanjay Gupta
>


Fwd: Help with Docker Apache/Hive metastore using mysql remote database

2023-12-17 Thread Sanjay Gupta
Issue with Docker container using mysql RDBMS ( Failed to load driver)

https://hub.docker.com/r/apache/hive

According to readme

Launch Standalone Metastore With External RDBMS
(Postgres/Oracle/MySql/MsSql)

I want to use MySQL

I tried com.mysql.jdbc.Driver or com.mysql.cj.jdbc.Driver

docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
--add-host=host.docker.internal:host-gateway \
 --env DB_DRIVER=mysql \
 --env 
SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=password" \
 --mount source=warehouse,target=/opt/hive/data/warehouse \
 --name metastore-standalone apache/hive:${HIVE_VERSION}


docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
--add-host=host.docker.internal:host-gateway \
 --env DB_DRIVER=mysql \
 --env 
SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
 
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=password" \
 --mount source=warehouse,target=/opt/hive/data/warehouse \
 --name metastore-standalone apache/hive:${HIVE_VERSION}

Docker logs shows this for both drivers ( same error )

docker logs f3
+ : mysql
+ SKIP_SCHEMA_INIT=false
+ export HIVE_CONF_DIR=/opt/hive/conf
+ HIVE_CONF_DIR=/opt/hive/conf
+ '[' -d '' ']'
+ export 'HADOOP_CLIENT_OPTS= -Xmx1G
-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=hive'
+ HADOOP_CLIENT_OPTS=' -Xmx1G
-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=hive'
+ [[ false == \f\a\l\s\e ]]
+ initialize_hive
+ /opt/hive/bin/schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/opt/hive/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:   
jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver :com.mysql.cj.jdbc.Driver
Metastore connection User:   hive
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to load driver
Underlying cause: java.lang.ClassNotFoundException : com.mysql.cj.jdbc.Driver
Use --verbose for detailed stacktrace.
*** schemaTool failed ***
+ '[' 1 -eq 0 ']'
+ echo 'Schema initialization failed!'
Schema initialization failed!
+ exit 1

Any idea, why I am getting failed to load driver for MySQL DB.

Isn't docker container comes with MySQL Driver ?

Docker container exits so I can't check whether driver is already installed.

Let me know, what I can do to make it work.

--


Thanks
Sanjay Gupta



-- 

Thanks
Sanjay Gupta



-- 

Thanks
Sanjay Gupta


  1   2   3   4   5   6   7   8   9   10   >