Re: [VOTE] Mark Hive 2.x EOL

2024-05-09 Thread Attila Turoczy
+1 (non-binding)

-Attila

On Fri, 10 May 2024 at 03:20, Butao Zhang  wrote:

> +1
>
>
> Thanks
> Butao Zhang
>  Replied Message 
> From Ayush Saxena 
> Date 5/10/2024 08:45
> To dev 
> Subject [VOTE] Mark Hive 2.x EOL
> Hi All,
> Following the discussion at [1]. Starting the official vote thread to
> mark Hive 2.x release line as EOL.
>
> Marking a release lines as EOL means there won't be any further
> release made for that release line
>
> I will start with my +1
>
> -Ayush
>
>
> [1] https://lists.apache.org/thread/91wk3oy1qo953md7941ojg2q97ofsl2d
>


Re: [Discuss] Enable Attachments for Hive mailing lists

2024-01-22 Thread Attila Turoczy
+1 for me as well. We need it.

-Attila

On Mon, Jan 22, 2024 at 1:25 PM Ayush Saxena  wrote:

> Hi All,
> As of now we don't allow having attachments on the hive mailing lists
> (apart from security ML), This prevents us from attaching patches/design
> doc or even screenshots of issues being reported on our mailing lists.
>
> A lot of projects allow that, I feel we should enable this for our Hive
> mailing lists as well for better dev experience.
>
> Let me know your thoughts!!!
>
> Obviously a +1 from me
>
> -Ayush
>


Re: Re: Cleanup remote feature/wip branches

2024-01-19 Thread Attila Turoczy
+1

On Fri, 19 Jan 2024 at 04:30, dengzhhu653  wrote:

> +1
> At 2024-01-19 19:58:49, "Krisztian Kasa" 
> wrote:
> >+1
> >
> >On Fri, Jan 19, 2024 at 11:28 AM Alessandro Solimando <
> >alessandro.solima...@gmail.com> wrote:
> >
> >> +1, thanks Stamatis
> >>
> >> On Fri, Jan 19, 2024, 11:14 Ayush Saxena  wrote:
> >>
> >> > +1
> >> >
> >> > -Ayush
> >> >
> >> > > On 19-Jan-2024, at 3:41 PM, Stamatis Zampetakis 
> >> > wrote:
> >> > >
> >> > > Hey everyone,
> >> > >
> >> > > I noticed that in our official git repo [1] we have some kind of
> >> > > feature/WIP branches (see list below). Most of them (if not all) are
> >> > > stale, add noise, and some of them eat CI resources (storage and
> CPU)
> >> > > since Jenkins picks them up for builds/precommits.
> >> > >
> >> > > I would like to drop those at the end of this email. Please +1 if
> you
> >> > agree.
> >> > >
> >> > > Best,
> >> > > Stamatis
> >> > >
> >> > > [1] https://github.com/apache/hive/branches/all
> >> > >
> >> > > git branch -r | grep origin | grep -v "branch-" | grep -v "master"
> >> > >  origin/HIVE-23274_280_rb
> >> > >  origin/HIVE-23337_280_rb
> >> > >  origin/HIVE-23403_280_rb
> >> > >  origin/HIVE-23440_280_rb
> >> > >  origin/HIVE-23470_rb
> >> > >  origin/HIVE-4115
> >> > >  origin/branc-2.3
> >> > >  origin/cbo
> >> > >  origin/dependabot/maven/com.google.protobuf-protobuf-java-3.21.7
> >> > >
> >> >
> >>
> origin/dependabot/maven/itests/qtest-druid/org.eclipse.jetty-jetty-server-9.4.51.v20230217
> >> > >  origin/dependabot/maven/org.apache.commons-commons-text-1.10.0
> >> > >
> >> origin/dependabot/maven/org.eclipse.jetty-jetty-server-9.4.51.v20230217
> >> > >  origin/dependabot/maven/org.postgresql-postgresql-42.4.3
> >> > >
> >> >
> >>
> origin/dependabot/maven/standalone-metastore/com.google.protobuf-protobuf-java-3.21.7
> >> > >
> >> >
> >>
> origin/dependabot/maven/standalone-metastore/org.eclipse.jetty-jetty-server-9.4.51.v20230217
> >> > >
> >> >
> >>
> origin/dependabot/maven/standalone-metastore/org.postgresql-postgresql-42.4.3
> >> > >  origin/ptf-windowing
> >> > >  origin/release-1.1
> >> > >  origin/revert-1365-upgrade-guava
> >> > >  origin/revert-1855-HIVE-24624
> >> > >  origin/revert-2694-HIVE-25355
> >> > >  origin/revert-3624-HIVE-26567
> >> > >  origin/revert-4247-hive-23256
> >> > >  origin/revert-4306-HIVE-27330
> >> > >  origin/revert-4452-HIVE-57988-BetweenBugFix
> >> > >  origin/revert-4501-OptimizeGetPartitionAPI
> >> > >  origin/vectorization
> >> >
> >>
>


Re: [VOTE] Mark Hive 1.x EOL

2024-01-16 Thread Attila Turoczy
+1

-Attila

On Tue, 16 Jan 2024 at 22:18, Butao Zhang  wrote:

> +1
>
>
>
> Thanks,
> Butao Zhang
>  Replied Message 
> | From | Ayush Saxena |
> | Date | 1/17/2024 14:15 |
> | To | dev |
> | Subject | [VOTE] Mark Hive 1.x EOL |
> Hi All,
> Following the discussion in [1], Starting an official thread to mark Hive
> 1.x EOL.
>
> Marking a release line EOL, means there won't be any further releases for
> that release line.
>
> I will start with my +1
>
> -Ayush
>
> [1] https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s
>


Re: [EXTERNAL] Re: [DISCUSS] End of life for Hive 1.x, 2.x, 3.x

2024-01-16 Thread Attila Turoczy
Dear PMC's,

Do we have a verdict / decision about this?

-Attila

On Wed, Jan 10, 2024 at 5:45 PM Chao Sun  wrote:

> On Hive 2.x, I'm still preparing for another release 2.3.10 (Hive 2.3
> branch is being actively maintained so far). Hopefully this will be
> the last release in the branch-2 line.
>
> +1 on making Hive 1 EOL for the time being.
>
> Chao
>
> On Wed, Jan 10, 2024 at 8:10 AM Sankar Hariappan
>  wrote:
> >
> > +1 for making both Hive 1&2 EOL
> >
> > -Sankar
> > -Original Message-
> > From: Attila Turoczy 
> > Sent: Wednesday, January 10, 2024 7:37 PM
> > To: dev@hive.apache.org
> > Subject: [EXTERNAL] Re: [DISCUSS] End of life for Hive 1.x, 2.x, 3.x
> >
> > [You don't often get email from aturo...@cloudera.com.invalid. Learn
> why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > +1 for making it EOL for Hive 1 and Hive 2. I do not think these 2
> product
> > branches are relevant in 2023.
> >
> > -Attila
> >
> > On Wed, Jan 10, 2024 at 12:59 PM Denys Kuzmenko 
> > wrote:
> >
> > > +1 for marking Hive 1.x EOL
> > >
> > > Assuming no volunteers willing to take ownership of branch-2
> maintenance,
> > > +1 to declare it EOL as well.
> > >
> > > Regards,
> > > Denys
> > >
>


Re: [DISCUSS] End of life for Hive 1.x, 2.x, 3.x

2024-01-10 Thread Attila Turoczy
+1 for making it EOL for Hive 1 and Hive 2. I do not think these 2 product
branches are relevant in 2023.

-Attila

On Wed, Jan 10, 2024 at 12:59 PM Denys Kuzmenko 
wrote:

> +1 for marking Hive 1.x EOL
>
> Assuming no volunteers willing to take ownership of branch-2 maintenance,
> +1 to declare it EOL as well.
>
> Regards,
> Denys
>


Re: [DISCUSS] Deprecate/Drop upgrade-acid module from 4.x

2024-01-10 Thread Attila Turoczy
Big +1 from me. As we shift our focus from ACID to Iceberg I do not think
it is relevant anymore. Also as Butao highlighted it has a CVE as well.
Let's remove it, and if eventually something is needed (highly doubt) then
we can revisit the decision at that time.
Due to the extensive history of Hive and the numerous legacy components
that haven't been touched since 1972, it is crucial for us all to be more
decisive in determining what to keep and maintain. The size of the codebase
makes it extremely challenging, time-consuming, and potentially frustrating
for OSS contributors to thoroughly review all 67 (just a number :) )
aspects of the Hive.

-Attila

On Wed, Jan 10, 2024 at 2:55 AM Butao Zhang  wrote:

> +1. I am not sure the use case of upgrade-acid module, but it seems that
> this module is rarely used in my world. I think maybe the first safe
> step is deprecating this module to let users know that this module
> should not be used any more.
>
> BTW, my idea tells me that this module used the old Hive2.3.3 which has
> some vulnerability. Should we consider upgrading this dependency to hive4?
> :
> "Dependency maven:org.apache.hive:hive-metastore:2.3.3 is vulnerable,
> safe version 4.0.0-alpha-2"
> CVE-2021-34538 7.5 Missing Authentication for Critical Function
> vulnerability
>
>
>
> Thanks,
> Butao Zhang
>  Replied Message 
> From Ayush Saxena 
> Date 1/10/2024 07:45
> To dev 
> Subject [DISCUSS] Deprecate/Drop upgrade-acid module from 4.x
> Hi Folks,
> Wanted to know thoughts on removing the upgrade-acid module[1] from
> 4.x. The javadoc on one of the main files[2] read "This utility is
> designed to help with upgrading Hive 2.x to Hive 3.0". I think this is
> a 2.x to 3.x thing and doesn't look relevant for Hive-4.x. Checking
> the git log, I don't find any relevant development happening on this
> either.
>
> The main challenge that this brings is that it depends on legacy
> Hive(2.3.3) & Hadoop(2.7.2) [3], which aren't JDK-11 compliant & it
> blocks the way for Hive JDK-11 compile time support.
>
> Let me know your thoughts!!!
>
> -Ayush
>
> [1] https://github.com/apache/hive/tree/master/upgrade-acid
> [2]
> https://github.com/apache/hive/blob/master/upgrade-acid/pre-upgrade/src/main/java/org/apache/hadoop/hive/upgrade/acid/PreUpgradeTool.java#L86C4-L86C72
> [3]
> https://github.com/apache/hive/blob/master/upgrade-acid/pre-upgrade/pom.xml#L38-L39
>


Re: [ANNOUNCE] Apache Hive 4.0.0 Branching

2023-12-01 Thread Attila Turoczy
Best new for 1st of December :)

On Fri, Dec 1, 2023 at 4:54 PM Denys Kuzmenko  wrote:

> Hi All,
>
> Hive 4.0.0 release branch cut is happening today.
>
> Best,
> Denys
>


Re: Release of Hive 4 and TPC-DS benchmark

2023-11-23 Thread Attila Turoczy
Excellent new Denys! Hive 4 is here! Can't wait :)

-Attila


On Thu, Nov 23, 2023 at 3:20 PM Denys Kuzmenko  wrote:

> Update:
> 1. Query 2, 71: Resolved in HIVE-27006 [1];
>
> 2. Query 97: Under review in HIVE-27269 [2];
> Thanks, Seonggon for providing a reproduce qfile.
>
> 3. Query 14: Reported in HIVE-24167 [3];
> set hive.optimize.cte.materialize.threshold to -1 by default in Hive 4 and
> fix it in the next versions
>
> 4. HIVE-26986 [4] is a performance improvement that is nice to have, but
> not a blocker for the release.
>
> Be advised, next week we plan to cut 4.0.0 release branch from master and
> start testing.
>
> Thanks, Denys
>
> [1] https://issues.apache.org/jira/browse/HIVE-27006
> [2] https://issues.apache.org/jira/browse/HIVE-27269
> [3] https://issues.apache.org/jira/browse/HIVE-24167
> [4] https://issues.apache.org/jira/browse/HIVE-26986
>
>


Re: Discussion about HIVE-12679 to make IMetaStoreClient pluggable

2023-10-19 Thread Attila Turoczy
Hi All,

In my mind, the proxy option is to encapsulate the implementation and more
likely force the IMetaStoreClient to have understanding about the
RetryingMetaStoreClient or HiveMetaStoreClientWithLocalCache. Which should
not be necessary all the time. IMetaStoreClient  should be just an
abstraction of the top of the current one. The concrete implementation of
the interface needs to define if that would like to have retry or cache or
any other special functionality.

Let's imagine the following scenario. Okumin came back and created a purely
memory-based HMS. In this case the caching layer that is enforced by the
design is not necessary for his case. I believe that such capabilities
should be determined by the implemented MetaStoreClient itself. If the
newly created client needs Retry or LocalCache that could be possible if
that would be also implement their interface. Currently the MetaStoreClient
is just thinking about the current HMS and current HMS strategies. Which is
fine, till we do not open it. I think the abstraction should not be
enforced by implementation details like the Cache or Retry classes.

-Attila



On Thu, Oct 19, 2023 at 12:29 PM Stamatis Zampetakis 
wrote:

> Hey Okumin,
>
> Thanks for picking up this ticket and driving it forward.
>
> I don't have a strong opinion between the two options.
>
> On the surface the factory option seems simpler and possibly more
> efficient but I am not sure if the changes under the PR are sufficient
> to cover all usages in Hive.
>
> On the other hand, the proxy option looks more cumbersome to configure
> but maybe it is easier to integrate with the existing plumbing of
> RetryingMetaStoreClient in various places.
>
> Best,
> Stamatis
>
>
> On Mon, Oct 16, 2023 at 11:00 AM Attila Turoczy
>  wrote:
> >
> > Hi Okumin,
> >
> > I love this initiative. Especially every good platform should be
> pluggable.
> > In my mind the HMS should be just one option that the user can choose
> from.
> > Yes, that will be the default, but the world is way more open now, and we
> > need to provide the choice of freedom. If you or others want to choose a
> > different megastore it should be easy.
> >
> > Both option1 and option2 are acceptable. (Maybe the first one is easier,
> > just need another factory, which are so boring :) )
> >
> > Thank you for your PR and work. I will also check it soon.
> >
> > -Attila
> >
> > On Fri, Oct 13, 2023 at 5:04 PM Okumin  wrote:
> >
> > > Hi,
> > >
> > > I'm working on introducing a feature to make IMetaStoreClient
> pluggable.
> > > I'm sending this e-mail to gather opinions in a visible manner because
> it
> > > has controversial points.
> > >
> > > Some Hive users need the feature in order to integrate Hive with a data
> > > catalog other than HMS. Although the original patch was submitted more
> than
> > > 7 years ago and many users have wanted it, it has not been merged yet.
> > > I revived the ticket and PR so that we can maintain or improve it
> within
> > > the community.
> > >
> > > - JIRA: https://issues.apache.org/jira/browse/HIVE-12679
> > > - PR: https://github.com/apache/hive/pull/
> > >
> > > I initially created the above PR based on the original design. That's
> > > because I think it is reasonable enough and I can see some users have
> > > already ported the patch for the past 7 years. But there are also other
> > > opinions to suggest other designs. This is a summary for easy catch-up.
> > >
> > > https://gist.github.com/okumin/30b058b14db1b099ba37ba7dc257fe8e
> > >
> > > If you are interested in this problem and you have any opinions,
> please put
> > > a comment on the Pull Request.
> > >
> > > Regards,
> > > Okumin
> > >
>


Re: Discussion about HIVE-12679 to make IMetaStoreClient pluggable

2023-10-16 Thread Attila Turoczy
Hi Okumin,

I love this initiative. Especially every good platform should be pluggable.
In my mind the HMS should be just one option that the user can choose from.
Yes, that will be the default, but the world is way more open now, and we
need to provide the choice of freedom. If you or others want to choose a
different megastore it should be easy.

Both option1 and option2 are acceptable. (Maybe the first one is easier,
just need another factory, which are so boring :) )

Thank you for your PR and work. I will also check it soon.

-Attila

On Fri, Oct 13, 2023 at 5:04 PM Okumin  wrote:

> Hi,
>
> I'm working on introducing a feature to make IMetaStoreClient pluggable.
> I'm sending this e-mail to gather opinions in a visible manner because it
> has controversial points.
>
> Some Hive users need the feature in order to integrate Hive with a data
> catalog other than HMS. Although the original patch was submitted more than
> 7 years ago and many users have wanted it, it has not been merged yet.
> I revived the ticket and PR so that we can maintain or improve it within
> the community.
>
> - JIRA: https://issues.apache.org/jira/browse/HIVE-12679
> - PR: https://github.com/apache/hive/pull/
>
> I initially created the above PR based on the original design. That's
> because I think it is reasonable enough and I can see some users have
> already ported the patch for the past 7 years. But there are also other
> opinions to suggest other designs. This is a summary for easy catch-up.
>
> https://gist.github.com/okumin/30b058b14db1b099ba37ba7dc257fe8e
>
> If you are interested in this problem and you have any opinions, please put
> a comment on the Pull Request.
>
> Regards,
> Okumin
>


Re: Include ARM binaries with next release

2023-08-25 Thread Attila Turoczy
Love it! In 2023 where ARM became an industrial standard. Also ARM perform
very well plus the cloud arm vm's are so much cheaper.

-Attila

On 2023. Aug 25., Fri at 12:48, Ayush Saxena  wrote:

> Hi All,
> Considering now we do support building Hive on both x86 & ARM, can we
> explore having additional binaries built for ARM architecture?
>
> A lot of projects do release both x86 & ARM binaries example hadoop
> [1], can check the Binary Download column in the 3.3.6 row
>
> As for the process, the release vote is on the source code, which
> stays the same for both x86 & ARM. It is just an additional
> convenience binary built, signed & released. We can consider making
> this step optional as well.
>
> Let me know what people think!!!
>
> -Ayush
>
> [1] https://hadoop.apache.org/releases.html
>


Re: [DISCUSS] Migrate precommit git repos from kgyrtkirk to apache

2023-08-23 Thread Attila Turoczy
Thank you, Stamatis! Also, Zoltan for the "donation" :)

-Attila

On Wed, Aug 23, 2023 at 4:53 PM Ayush Saxena  wrote:

> +1,
> Thanx Stamatis foe initiating this. This was something which was in my
> mind as well since long but couldn’t find time.
>
> -Ayush
>
> > On 23-Aug-2023, at 6:19 PM, Zoltan Haindrich  wrote:
> >
> > Hey Stamatis!
> >
> > I'm happy to donate these repos / help with the migration!
> > I should have done it earlier - but it was never top priority...thank
> you for initiating it!
> >
> > cheers,
> > Zoltan
> >
> >> On 8/23/23 14:00, Stamatis Zampetakis wrote:
> >> Hi all,
> >> Our precommit infrastructure uses code that resides in the following
> repos.
> >> * https://github.com/kgyrtkirk/hive-test-kube
> >> * https://github.com/kgyrtkirk/hive-toolbox
> >> * https://github.com/kgyrtkirk/hive-dev-box
> >> These are mainly maintained by Zoltán Haindrich who is always helpful
> >> and kind to investigate and resolve issues.
> >> For facilitating contributions from the apache community and also
> >> removing some burden from Zoltan's shoulders it may be a good time to
> >> migrate those and put them under the apache namespace.
> >> For the initial migration, we could have a straightforward 1 to 1
> >> mapping as shown below:
> >> * https://github.com/apache/hive-test-kube
> >> * https://github.com/apache/hive-toolbox
> >> * https://github.com/apache/hive-dev-box
> >> How do you feel about this?
> >> Best,
> >> Stamatis
>


[Twitter] Quickstart dock

2023-07-25 Thread Attila Turoczy
流 Interested in Apache Hive and @ApacheIceberg? Check out the QuickStart
documentation at https://iceberg.apache.org/hive-quickstart/ for all the
details! #ApacheHive #ApacheIceberg

First 2 chars are icons :)


Re: [DISCUSS] HIVE 4.0.0 GA Release Proposal

2023-07-13 Thread Attila Turoczy
Thanks for the update! Can't wait for the beta :)

-Attila

On Thu, Jul 13, 2023 at 5:19 PM Stamatis Zampetakis 
wrote:

> Hey everyone,
>
> As you may have noticed there have been various tickets around LICENSE
> and NOTICE files popping up recently. I just logged HIVE-27504 [1]
> which hopefully addresses all remaining issues that were found while I
> was working with the RC. After this gets resolved we should be good to
> go for putting up the RC for vote.
>
> The structure and content of the LICENSE and NOTICE file are very
> important for Apache releases so I would encourage other members of
> the community (especially PMC) to review the latest changes and
> current status and raise new JIRA tickets if they discover some
> problems. I would like to avoid having last minute -1 votes due to
> that.
>
> Best,
> Stamatis
>
> [1] https://issues.apache.org/jira/browse/HIVE-27504
>
> On Tue, Jun 20, 2023 at 11:09 PM Stamatis Zampetakis 
> wrote:
> >
> > Hey team,
> >
> > Small heads up regarding the progress of the 4.0.0-beta-1 release.
> >
> > Most of the release steps went out smoothly and I was able to get an
> > RC0 ready [1].
> >
> > However, I am afraid that our binary distribution does not comply
> > fully with the ASF Policy [2]. We bundle a lot of dependencies (jars)
> > within and I am not sure if we are fully covered in terms of licenses
> > and notice files. Thanks Ayush for reminding me to check the
> > binary-package-licenses directory [5].
> >
> > I am checking various resources such as [3, 4] to see what additional
> > steps we can take to be on the safe side and also looking for ways to
> > automate this so that we don't have to manually inspect the jars on
> > every release. I was playing a bit with license-maven-plugin [6] but I
> > am not yet completely happy with its output.
> >
> > The next few days will be a bit busy so most likely I will get back on
> > this during the weekend. If people have feedback or other ideas to
> > share please let me know.
> >
> > Best,
> > Stamatis
> >
> > [1] https://people.apache.org/~zabetak/apache-hive-4.0.0-beta-1-rc0/
> > [2]
> https://www.apache.org/legal/src-headers.html#asf-source-header-and-copyright-notice-policy
> > [3] https://infra.apache.org/licensing-howto.html
> > [4] https://www.apache.org/legal/resolved.html
> > [5] https://github.com/apache/hive/tree/master/binary-package-licenses
> > [6] https://www.mojohaus.org/license-maven-plugin/
> >
> >
> > On Fri, Jun 2, 2023 at 10:03 PM Stamatis Zampetakis 
> wrote:
> > >
> > > I can start preparing the RC towards the end of next week. If somebody
> > > has more time and wants to start earlier I am fine to switch.
> > >
> > > Best,
> > > Stamatis
> > >
> > > On Fri, Jun 2, 2023 at 5:36 PM Denys Kuzmenko 
> wrote:
> > > >
> > > > great, this is the current list of release managers:
> > > >
> > > > 4.0.0 Stamatis Zampetakis
> > > > 4.1.0 Denys Kuzmenko
> > > > 4.2.0 Sai Hemanth Gantasala
> > > >
> > > > Should we keep the same RM order and just shift the releases or find
> a volunteer for the 4.0.0-beta release, WDYT?
> > > >
> > > >
>


Re: Move to JDK-11

2023-07-11 Thread Attila Turoczy
Returning to this topic, I kindly request those who would like to advocate
for the continued support of JDK8 to please share their reasoning and
insights with us. Your input and perspective are greatly appreciated!
Thank you.

-Attila

On Fri, Jun 2, 2023 at 12:43 PM Attila Turoczy 
wrote:

> Hi All,
>
> I know my opinion might not be the most popular, but I advocate for using*
> JDK 17*. Here's why:
>
> Let's consider a scenario where a customer wants to use the latest version
> of Apache Hive. They would typically install it locally or on a small
> cluster. In 2023, is it realistic to assume that this customer won't be
> able to install JDK 17 on their cluster? Even in large enterprises, it
> should be feasible to install an LTS JDK, especially considering the
> widespread adoption of cloud computing. Sungwoo Park's measurements also
> support this recommendation to go with JDK 17. It outperforms JDK 11 by 8%
> in terms of runtime speed, and JDK 11 itself is 10+% faster than JDK 8.
> This is a significant value proposition. Who would be the customer that
> says, "I don't want faster query execution! I'd rather use JDK 8 and pay
> more for cloud or data center resources instead of using JDK 17!" It
> doesn't make sense to me.
>
> The tech industry has been evolving at an incredible pace, with
> improvements in serialization, IPC mechanisms, and parallelized frameworks
> since the release of JDK 8 ten years ago. We should leverage these
> advancements! Couple years ago, we invested a lot to improve 1-2% of the
> execution. We prayed for 3 gods, sacrificed 2 ships and traveled around the
> world to make it happen. :-) Now, the JDK itself provides a substantial
> amount of improvement. So, why would we resist progress just because there
> are a few lazy or conservative admins who don't want to spend two minutes
> installing a JDK?
>
> A platform needs to be modern and incorporate the latest technologies to
> attract developers and users. I understand that some may prefer to stay
> with JDK 8 as it seems like the safest position, but I believe in taking
> bold bets to achieve big wins. Even if we decide to stick with JDK 8, I
> would still be happy since we are moving forward and not dwelling on a JDK
> that is a decade old. Personally, I think focusing on one thing that brings
> more value to us and our users is the idealistic path forward.
>
> -Attila
>
>
>
>
>
>
>
>
>
> On Thu, Jun 1, 2023 at 11:23 AM Stamatis Zampetakis 
> wrote:
>
>> Hey everyone,
>>
>> If we claim that Hive supports a certain JDK then we should compile and
>> run
>> tests with it.
>>
>> The more JDKs we can support the better for everyone but this comes at a
>> cost (resources mostly). We should have a precommit run for every
>> supported
>> JDK (frequency to be determined once per day/week) that compiles and run
>> all tests.
>>
>> From my perspective, I would be pretty happy if we could cover the two
>> edge
>> LTS releases at every point in time.
>>
>> Then we have to decide also which JDK shall we use for the pull requests
>> and local dev environment. I think it makes sense to use the latest.
>> People
>> like working on modern stuff and also it makes sense that newer releases
>> will also use newer versions. It would be pretty awkward if someone wants
>> to use the latest Hive version and it turns out that it can only run on
>> JDK8.
>>
>> Best,
>> Stamatis
>>
>> On Thu, Jun 1, 2023, 3:42 AM Sungwoo Park  wrote:
>>
>> > Hi, everyone.
>> >
>> > I have not tested the master branch with Java 11/17 yet, but I would
>> like
>> > to share my experience with testing a fork of branch-3.1 with Java 11/17
>> > (as part of developing Hive-MR3), in case that it can be useful for the
>> > discussion. I merged the patches listed in [1] HIVE-22415 and updated
>> the
>> > Maven configuration for Java 11.
>> >
>> > 1. Building Hive was fine and I was able to run it with Java 11 as well
>> as
>> > Java 17. So, it seems that the work reported in [1] is indeed complete
>> for
>> > upgrading to Java 11 (and Java 17) and getting Hive to work.
>> >
>> > 2. However, there was a problem with running tests, so this can be
>> > additional work for upgrading to Java 11.
>> >
>> > 3. For performance, Java 17 gives about 8 percent of (free) performance
>> > improvement. When tested with 10TB TPC-DS, Java 8 takes 8074 seconds,
>> > whereas Java 17 takes 7415 seconds. Considering the maturity of Hive, I
>> > think this is not a small im

Re: Idea: Remove PowerMock

2023-07-10 Thread Attila Turoczy
+1 Kill it! :)
mockito is a more modern approach. I think it is cool that we modernize our
platform, and remove old and unsupported tools and components.


On Mon, Jul 10, 2023 at 5:36 PM Ayush Saxena  wrote:

> +1, PowerMock as far as I remember has issues with JDK-11+ as well,
> one such ref :
> https://stackoverflow.com/questions/52966897/powermock-java-11
>
> -Ayush
>
> On Mon, 10 Jul 2023 at 20:18, Zsolt Miskolczi 
> wrote:
> >
> > Hi,
> >
> > Hive heavily uses PowerMock . The main
> > purpose of it is having static mocking.
> >
> > The sad thing is it seems PowerMock is dead:
> > - The main branch got it's lot commit in 2022 and and most of the
> > contributions last year were simple dependency upgrades:
> > https://github.com/powermock/powermock/commits/release/2.x
> > - The last release was in 2020
> > - And their mailing list looks dead as well. That is the last email on
> that
> > list: https://groups.google.com/g/powermock/c/JdYY3naZlbU. It asked if
> it
> > was discontinued and didn't get an answer at all.
> >
> > So officially, it is not dead but it seems it is.
> >
> > Back then when PowerMock development started, there were no static
> mocking
> > in mockito. But since then, it is possible using mockito-inline.
> >
> > I won't lie, it is hard to switch from PowerMock: it enables some coding
> > patterns that are considered bad patterns and it leads to code that is
> > harder to test. Last year I played with it and removed it from the
> > hive-exec module: https://github.com/apache/hive/pull/3798.
> >
> > The hard part in removing it is that PowerMock and mockito-inline don't
> > work together. So when we want to remove it, we have to do it in one pull
> > request for a given module. It cannot be separated into smaller steps.
> > The good news is as it relates to testing, pre commit tests can validate
> > the refactor.
> >
> > What do you think? Should we move away from PowerMock or keep it as it
> is?
> >
> > Thank you,
> > Zsolt Miskolczi
>


Re: [apache/hive] HIVE-24706: add the HiveHBaseTableInputFormatV2 to fix the compatible… (PR #4199)

2023-06-07 Thread Attila Turoczy
Sure thing.

On Wed, Jun 7, 2023 at 10:59 AM Dong Li  wrote:

> Hey team,
>
> Anyone can help to review this PR?
>
> -- Forwarded message -
> From: github-actions[bot] 
> Date: Wed, 7 Jun 2023 at 10:22
> Subject: Re: [apache/hive] HIVE-24706: add the HiveHBaseTableInputFormatV2
> to fix the compatible… (PR #4199)
> To: apache/hive 
> Cc: alexdongli0829 , Author <
> aut...@noreply.github.com>
>
>
> This pull request has been automatically marked as stale because it has not
> had recent activity. It will be closed if no further activity occurs.
> Feel free to reach out on the dev@hive.apache.org list if the patch is in
> need of reviews.
>
> —
> Reply to this email directly, view it on GitHub
> , or
> unsubscribe
> <
> https://github.com/notifications/unsubscribe-auth/AJUTAP5DCGMRTADMLK2FDXTXJ7CSPANCNFSM6AAWTTNWAA
> >
> .
> You are receiving this because you authored the thread.Message ID:
> 
>


Re: Move to JDK-11

2023-06-02 Thread Attila Turoczy
Hi All,

I know my opinion might not be the most popular, but I advocate for using*
JDK 17*. Here's why:

Let's consider a scenario where a customer wants to use the latest version
of Apache Hive. They would typically install it locally or on a small
cluster. In 2023, is it realistic to assume that this customer won't be
able to install JDK 17 on their cluster? Even in large enterprises, it
should be feasible to install an LTS JDK, especially considering the
widespread adoption of cloud computing. Sungwoo Park's measurements also
support this recommendation to go with JDK 17. It outperforms JDK 11 by 8%
in terms of runtime speed, and JDK 11 itself is 10+% faster than JDK 8.
This is a significant value proposition. Who would be the customer that
says, "I don't want faster query execution! I'd rather use JDK 8 and pay
more for cloud or data center resources instead of using JDK 17!" It
doesn't make sense to me.

The tech industry has been evolving at an incredible pace, with
improvements in serialization, IPC mechanisms, and parallelized frameworks
since the release of JDK 8 ten years ago. We should leverage these
advancements! Couple years ago, we invested a lot to improve 1-2% of the
execution. We prayed for 3 gods, sacrificed 2 ships and traveled around the
world to make it happen. :-) Now, the JDK itself provides a substantial
amount of improvement. So, why would we resist progress just because there
are a few lazy or conservative admins who don't want to spend two minutes
installing a JDK?

A platform needs to be modern and incorporate the latest technologies to
attract developers and users. I understand that some may prefer to stay
with JDK 8 as it seems like the safest position, but I believe in taking
bold bets to achieve big wins. Even if we decide to stick with JDK 8, I
would still be happy since we are moving forward and not dwelling on a JDK
that is a decade old. Personally, I think focusing on one thing that brings
more value to us and our users is the idealistic path forward.

-Attila









On Thu, Jun 1, 2023 at 11:23 AM Stamatis Zampetakis 
wrote:

> Hey everyone,
>
> If we claim that Hive supports a certain JDK then we should compile and run
> tests with it.
>
> The more JDKs we can support the better for everyone but this comes at a
> cost (resources mostly). We should have a precommit run for every supported
> JDK (frequency to be determined once per day/week) that compiles and run
> all tests.
>
> From my perspective, I would be pretty happy if we could cover the two edge
> LTS releases at every point in time.
>
> Then we have to decide also which JDK shall we use for the pull requests
> and local dev environment. I think it makes sense to use the latest. People
> like working on modern stuff and also it makes sense that newer releases
> will also use newer versions. It would be pretty awkward if someone wants
> to use the latest Hive version and it turns out that it can only run on
> JDK8.
>
> Best,
> Stamatis
>
> On Thu, Jun 1, 2023, 3:42 AM Sungwoo Park  wrote:
>
> > Hi, everyone.
> >
> > I have not tested the master branch with Java 11/17 yet, but I would like
> > to share my experience with testing a fork of branch-3.1 with Java 11/17
> > (as part of developing Hive-MR3), in case that it can be useful for the
> > discussion. I merged the patches listed in [1] HIVE-22415 and updated the
> > Maven configuration for Java 11.
> >
> > 1. Building Hive was fine and I was able to run it with Java 11 as well
> as
> > Java 17. So, it seems that the work reported in [1] is indeed complete
> for
> > upgrading to Java 11 (and Java 17) and getting Hive to work.
> >
> > 2. However, there was a problem with running tests, so this can be
> > additional work for upgrading to Java 11.
> >
> > 3. For performance, Java 17 gives about 8 percent of (free) performance
> > improvement. When tested with 10TB TPC-DS, Java 8 takes 8074 seconds,
> > whereas Java 17 takes 7415 seconds. Considering the maturity of Hive, I
> > think this is not a small improvement because almost every query gets
> some
> > speedup.
> >
> > Thanks,
> >
> > --- Sungwoo
> >
> > [1] https://issues.apache.org/jira/browse/HIVE-22415
> >
> >
> > On Thu, Jun 1, 2023 at 3:53 AM Sai Hemanth Gantasala
> >  wrote:
> >
> > > Hi All,
> > >
> > > I would strongly advocate keeping support for JDK8.
> > > Between JDK11 and JDK17, Depending on the amount of effort on the
> upgrade
> > > I'm inclined towards JDK17 (JDK21 LTS will be released in Sep 2023).
> > >
> > > Thanks,
> > > Sai.
> > >
> > > On Wed, May 31, 2023 at 5:39 AM László Bodor <
> bodorlaszlo0...@gmail.com>
> > > wrote:
> > >
> > > > *Hi!*
> > > >
> > > >
> > > > *Should we support both JDK-11 & JDK-8?*
> > > > IMO absolutely yes, let's not break up with JDK-8: according to its
> > > > lifecycle, it's going to stay with us for a long time.
> > > >
> > > > I believe
> > > > a) we should be able to compile on JDK8, JDK11, and JDK17 (github
> > actions
> > > > can cover this 

Re: Re: Reg: Discussion on removal of deprecated APIs in the HMS thrift interface

2023-06-01 Thread Attila Turoczy
+1 from me as well. Let's clean it up. Still, because we have struggled
with the data correctness issue, we have time to introduce these changes.
If won't fit then won't be a problem as well, as the next release will
contain it. As I wrote earlier, as the 4.0 goes out I want to help to have
regular releases. Even majors. I have started a proposal document about a
public hive roadmap, and release roadmap that I want to share and discuss
with the community.

-Attila

On Thu, Jun 1, 2023 at 12:37 PM dengzhhu653  wrote:

> Hi
>
>
> Thanks Sai for driving this, the request based API makes sense to me.
> For the removal of deprecated API:
>  a) +1 if it is marked as deprecated in 3.x;
>  b) If the API is introduced after 4.0.0-alpha, but tend to become
> obsolete in 4.x GA, I think we can remove it as well.
>
>
> Thanks,
> Zhihua.
> At 2023-06-01 17:56:03, "Ayush Saxena"  wrote:
> >+1 to what Stamatis said, if it is there in 3.X we can explore their
> removal, else let them go in 4.x GA release and we can remove then in the
> subsequent release
> >
> >-Ayush
> >
> >> On 01-Jun-2023, at 3:08 PM, Stamatis Zampetakis 
> wrote:
> >>
> >> Hello,
> >>
> >> Ideally we should deprecate APIs in one release and remove them in a
> >> subsequent major release. If the HMS deprecations were added in Hive
> >> 3.X then I am ok removing them now. Otherwise it is not really that we
> >> will remove deprecated APIs but we will remove regular APIs without
> >> any notice.
> >>
> >> Best,
> >> Stamatis
> >>
> >>> On Thu, Jun 1, 2023 at 2:57 AM Sai Hemanth Gantasala
> >>>  wrote:
> >>>
> >>> Hi everyone,
> >>>
> >>> This thread is to initiate a discussion on the removal of deprecated
> APIs
> >>> in the HMS thrift class. Any client including HiveMetastoreClient
> talks to
> >>> HiveMetaStore Server via the thrift layer. Over the past few years, the
> >>> thrift class is bloated with duplicated APIs with varying parameters
> >>> (function overloading) in the API definition. The reason why the APIs
> are
> >>> being deprecated is that the API might need an additional argument, so
> a
> >>> new API is added with an additional argument, and mark the old API as
> >>> deprecated.
> >>>
> >>> I'm working on HIVE-26537 <
> https://issues.apache.org/jira/browse/HIVE-26537>
> >>> to clean up the code around the interaction between
> HiveMetaStoreClient and
> >>> HMS to not use the deprecated APIs (the HMS client will now be using
> >>> request-based APIs instead of APIs using individual arguments). Going
> >>> forward, using these request-based APIs is ideal as we can just add an
> >>> additional field to request object definition in the thrift class and
> API
> >>> remains unchanged. This would hopefully require minimal changes between
> >>> client and server interaction in the future.
> >>>
> >>> I would like to hear the community member's opinions regarding the
> >>> deprecated APIs,
> >>> 1) Keep the deprecated APIs for the 4.x release, HMSClient will use the
> >>> request-based APIs, So that would keep the older clients compatible
> with
> >>> the new HMS server.
> >>> 2) Remove the deprecated APIs for the 4.x release. This would break
> >>> backward compatibility with the older clients but we have the
> opportunity
> >>> to clean up a lot of deprecated code. Since we are making a major
> release
> >>> after 5 years, I hope this incompatibility is acceptable.
> >>>
> >>> Please let me know your thoughts.
> >>>
> >>> Thanks,
> >>> Sai.
>


Re: [DISCUSS] HIVE 4.0.0 GA Release Proposal

2023-06-01 Thread Attila Turoczy
Ayush just told me, the mailing list does not support images. It is a very
sad world :-(

Previous meme:
https://imgflip.com/i/7nuzql

On Thu, Jun 1, 2023 at 11:46 AM Attila Turoczy 
wrote:

> OK. Then let's go with beta. Please start. (if not started already) a vote
> for a release.
>
> I think we should have a more frequent release cadence. The community
> needs it, we need this, also without frequent release nobody will believe
> this project is healthy. I know the first will be the hardest, but we need
> to fix those issues and release them. Even a monthly release could be
> possible for hive. We are capable to do this!
>
> [image: image.png]
>
> -Attila
>
> On Thu, Jun 1, 2023 at 11:24 AM Stamatis Zampetakis 
> wrote:
>
>> +1 from me as well. Any alpha or beta name should be fine I have no
>> strong preferences.
>>
>> On Wed, May 31, 2023, 2:30 PM László Bodor 
>> wrote:
>>
>> > Hi!
>> >
>> > +1 for creating a new release before GA in the presence of possible
>> > correctness problems. I'm not 100% sure about alpha or beta, I'm fine
>> with
>> > alpha-3.
>> >
>> > Regards,
>> > Laszlo Bodor
>> >
>> > Denys Kuzmenko  ezt írta (időpont: 2023. máj.
>> 31.,
>> > Sze, 14:22):
>> >
>> > > Hi folks,
>> > >
>> > > The master branch has many new features, bug fixes, and performance
>> > > improvements since alpha-2. However, we still have several correctness
>> > bugs
>> > > [HIVE-26654] and performance issues that should be eliminated before
>> the
>> > > GA.
>> > >
>> > > Could we consider doing a beta release to keep at least a 6-month
>> release
>> > > cadence and also show the community that 4.0.0 GA is the next stop?
>> > >
>> > > Thanks,
>> > > Denys
>> > >
>> >
>>
>


Re: [DISCUSS] HIVE 4.0.0 GA Release Proposal

2023-06-01 Thread Attila Turoczy
OK. Then let's go with beta. Please start. (if not started already) a vote
for a release.

I think we should have a more frequent release cadence. The community needs
it, we need this, also without frequent release nobody will believe this
project is healthy. I know the first will be the hardest, but we need to
fix those issues and release them. Even a monthly release could be possible
for hive. We are capable to do this!

[image: image.png]

-Attila

On Thu, Jun 1, 2023 at 11:24 AM Stamatis Zampetakis 
wrote:

> +1 from me as well. Any alpha or beta name should be fine I have no
> strong preferences.
>
> On Wed, May 31, 2023, 2:30 PM László Bodor 
> wrote:
>
> > Hi!
> >
> > +1 for creating a new release before GA in the presence of possible
> > correctness problems. I'm not 100% sure about alpha or beta, I'm fine
> with
> > alpha-3.
> >
> > Regards,
> > Laszlo Bodor
> >
> > Denys Kuzmenko  ezt írta (időpont: 2023. máj. 31.,
> > Sze, 14:22):
> >
> > > Hi folks,
> > >
> > > The master branch has many new features, bug fixes, and performance
> > > improvements since alpha-2. However, we still have several correctness
> > bugs
> > > [HIVE-26654] and performance issues that should be eliminated before
> the
> > > GA.
> > >
> > > Could we consider doing a beta release to keep at least a 6-month
> release
> > > cadence and also show the community that 4.0.0 GA is the next stop?
> > >
> > > Thanks,
> > > Denys
> > >
> >
>


Re: Updating the Hive Committer Guide Wiki

2023-05-19 Thread Attila Turoczy
I prefer using a Slack / discord channel, which has become a common trend
in the open source community. Unlike a formal and broadcasting
communication like a mailing list, Slack/Discord/IRC allows for more
peer-level interactions. While it's not mandatory to check another
communication channel, it's beneficial to provide an opportunity for
discussion. However, important decisions or discussions should still be
shared on the mailing list to ensure *wider *visibility  (If it didn't
happen on the Mailing List, means it didn't happen", The apache way [1])
Like this quote

Adding a Slack link to the Hive website should suffice, as it offers a
chance to attract new participants and foster community building. We need
to build a community, we need to provide opportunities to talk, to learn,
to share ideas and code. A community must be open, and needs to open the
door for everybody who is interested about it.

-Attila


On Fri, May 19, 2023 at 11:45 AM Ayush Saxena  wrote:

> I won't even suggest people to get into any dev related discussions on
> slack, some casual stuff/conversations is ok, It was there in the doc
> around irc channel, I just updated it and didn't want to say no to
> anyone interested to join. The channel already had some ~140 people.
>
> I just added the interested people. But yes, as established any
> technical or project level discussion should happen on the relevant
> Mailing Lists. I suppose the contributors joining those will be mature
> enough to know what resource to be used in which way.
>
> As mentioned in the Apache Docs: "If it didn't happen on the Mailing
> List, means it didn't happen", The apache way [1]
>
> btw. there is way to integrate slack to mailing lists and stuff like
> that, but I don't think it is the time to chase that
>
> Yep, let's update the other wiki as well, it would be great if we get
> some more volunteers as well :-)
>
> -Ayush
>
> [1] https://theapacheway.com/on-list/
>
> On Fri, 19 May 2023 at 14:55, Stamatis Zampetakis 
> wrote:
> >
> > Thanks for updating the wiki Ayush! Definitely very helpful and
> > hopefully we can do it for other pages as well.
> >
> > Slack is a very useful tool but personally I don't have much time to
> > monitor yet another channel of communication. I don't know if we
> > should encourage people to start discussions there especially since
> > access is moderated and search archives are not openly available. I
> > would prefer to direct people to dev@ or user@ and not slack but this
> > is just my personal opinion.
> >
> > Best,
> > Stamatis
> >
> > On Fri, May 19, 2023 at 6:27 AM Ayush Saxena  wrote:
> > >
> > > Hi All,
> > > I recently observed that our Hive Committer guide is pretty outdated
> > > and has mentioned legacy ways of committing, but still has a lot of
> > > relevant information.
> > >
> > > After discussing with some friends offline, I have updated the doc.
> > > Feel free to share feedback or improvements.
> > >
> > > Committers to the projects already have access to the wiki, so they
> > > can directly update it, If anyone else has any feedback, feel free to
> > > share and someone amongst the committer group would be happy to get
> > > things updated.
> > >
> > > The Wiki page lies here:
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27362108
> > >
> > > -Ayush
>


Re: [DISCUSS] HIVE 4.0 GA Release Proposal

2023-05-16 Thread Attila Turoczy
+2. Who is working now on the TPCDS regression? Can I / We help him/ her?

-Attila


On Tue, May 16, 2023 at 11:04 AM Stamatis Zampetakis 
wrote:

> I agree with Attila we should do our best to come out with the next GA
> soon. In order to do that we should treat the TPCDS regressions that are
> already reported. It doesn't make much sense to give out a GA that cannot
> run the whole TPCDS suite without crashing or returning wrong results.
>
> If solving all the problems in a reasonable timeframe is not possible then
> I would suggest to cut another alpha or beta release.
>
> Best,
> Stamatis
>
> On Fri, May 12, 2023, 6:36 PM Attila Turoczy  >
> wrote:
>
> > Could we please give some attention to this topic? I strongly believe
> that
> > we should put in every effort to release Hive 4. The Hive community needs
> > to demonstrate that we are active and accomplishing exciting
> developments.
> > It is quite disheartening to note that our last major GA release was a
> > staggering 5 years ago on 18th May 2018! The significance of version 4.0
> > cannot be overstated, and we should definitely prioritize its promotion.
> >
> > [image: image.png]
> >
> > -Attila
> >
> > On Tue, May 9, 2023 at 8:23 PM Kirti Ruge  wrote:
> >
> >> I see a few tickets like HIVE-26400 which is a major milestone, are
> >> resolved .
> >> Can we reevaluate priorities of other JIRAs so that It may give us
> clarity
> >> GO/NO-GO  for 4.0.0 GA release  and its timeline?
> >>
> >>
> >>
> >> Thanks,
> >> Kirti
> >>
> >> On Sat, Mar 25, 2023 at 3:27 PM Stamatis Zampetakis 
> >> wrote:
> >>
> >> > Regarding correctness, I think it makes sense to change default values
> >> and
> >> > possibly add a warning note when there's a known risk of wrong
> results.
> >> > Needless to say that we should try to fix as many issues as possible;
> we
> >> > still need volunteers to review open PRS.
> >> >
> >> > Performances regressions are trickier but if we have the query plans
> >> (CBO +
> >> > full) along with logs (including task counters) for fast and slow
> >> execution
> >> > we may be able to understand what happens. Don't hesitate to create
> Jira
> >> > tickets with these information if available.
> >> >
> >> > Last regarding 4.0.0 blockers, I don't think we need a special label.
> >> The
> >> > built-in and widely used priority "blocker" seems enough to capture
> the
> >> > importance and urgency of a ticket.
> >> > Since I am the release manager for the next release I will go over
> >> tickets
> >> > marked as blockers and reevaluate priorities if necessary.
> >> >
> >> > Best,
> >> > Stamatis
> >> >
> >> > On Thu, Mar 23, 2023, 10:27 AM Denys Kuzmenko 
> >> > wrote:
> >> >
> >> > > Thanks, Sungwoo for running the TPC-DS benchmark. Do we know if the
> >> same
> >> > > level of performance degradation was present in 4.0.0-alpha1?
> >> > >
> >> > > All: please use the `hive-4.0.0-must` label in a ticket if you think
> >> it's
> >> > > a show-stopper for the release.
> >> > >
> >> >
> >>
> >
>


Re: [DISCUSS] Disable JIRA worklog for GitHub PRs

2023-05-12 Thread Attila Turoczy
+1

On Fri, May 12, 2023 at 4:01 PM Alessandro Solimando <
alessandro.solima...@gmail.com> wrote:

> Hi Stamatis,
> I am experiencing the same too, so +1 from me.
>
> Best regards,
> Alessandro
>
> On Fri, 12 May 2023 at 15:58, Stamatis Zampetakis 
> wrote:
>
> > Hello,
> >
> > Everything that happens in a GitHub PR creates a worklog entry under
> > the respective JIRA ticket.
> > For every worklog entry we receive a notification from j...@apache.org
> > when we are watching an issue. The worklog entry and email
> > notification usually appear messy.
> >
> > Moreover, if we are watching the GitHub PR we are going to get a
> > notification from notificati...@github.com which has the same content
> > with the JIRA worklog entry and is much more readable.
> >
> > Finally, the PR notification is also going to
> > iss...@hive.apache.org and git...@hive.apache.org so those who are
> > subscribed to these lists
> > will get the same notification multiple times.
> >
> > Personally, I never read the JIRA worklog notifications and I largely
> > prefer those from notificati...@github.com.
> >
> > How do you feel about disabling the worklog entries in JIRA coming
> > from GitHub PRs?
> >
> > For archiving purposes, the notifications already go to gitbox@ so we
> > don't lose anything from disabling the worklog entries. On the
> > contrary, I find that this would reduce the noise and redundancy in
> > our inboxes.
> >
> > Concretely this is what I have in mind in terms of change:
> > https://github.com/apache/hive/pull/4318
> >
> > Best,
> > Stamatis
> >
>


Hive is on Dockerhub

2023-05-12 Thread Attila Turoczy
Dear All,

I am thrilled to share some exciting news with you! The Hive project is now
available on DockerHub   -
https://hub.docker.com/r/apache/hive, making it easier than ever to explore
our latest and greatest features. By simply executing two commands, you can
dive right into the world of Hive and unleash its full potential.
In the realm of big data, one of the most significant challenges has always
been the onboarding process. Trying out any data warehouse in the market
typically involves a lengthy and cumbersome procedure. Even popular
platforms have complex, time-consuming, and often expensive processes for
trying out  big data and data warehousing solutions. Even for sandboxes.

However, I'm delighted to inform you that Hive is here to break the Ice
(Iceberg :-) ) . Unlike its counterparts, Hive offers a unique data
warehousing experience that anyone can try out effortlessly, even on their
local computer. You can now play with our latest features, including the
revolutionary Iceberg, within a matter of minutes. Although this
"standalone cluster" is not designed for production use, it serves as a
sandbox environment where you can experiment with queries and explore the
capabilities of Hive on small datasets. The availability of Hive on
DockerHub opens up a world of possibilities for enthusiasts and
professionals alike. If you've ever had an interest in Hive, now is the
perfect time to give it a try.  :) ( I watched to many
advertisment recently  ¯\_(ツ)_/¯ )

To get started, simply follow these two straightforward commands:

Pull the latest image:

docker pull apache/hive:4.0.0-alpha-2

Start the container:

docker run -d -p 1:1 -p 10002:10002 --env
SERVICE_NAME=hiveserver2 --name hive4 apache/hive:4.0.0-alpha-2

Once you execute these commands, you'll gain immediate access to Hive and
its coolest features.

Huge thanks for the Hive open source community especially for Ayush, Denys,
Simhadri, Stamatis, Zhihua D. (alphabetical order)

-Attila


Re: [DISCUSS] HIVE 4.0 GA Release Proposal

2023-05-12 Thread Attila Turoczy
Could we please give some attention to this topic? I strongly believe that
we should put in every effort to release Hive 4. The Hive community needs
to demonstrate that we are active and accomplishing exciting developments.
It is quite disheartening to note that our last major GA release was a
staggering 5 years ago on 18th May 2018! The significance of version 4.0
cannot be overstated, and we should definitely prioritize its promotion.

[image: image.png]

-Attila

On Tue, May 9, 2023 at 8:23 PM Kirti Ruge  wrote:

> I see a few tickets like HIVE-26400 which is a major milestone, are
> resolved .
> Can we reevaluate priorities of other JIRAs so that It may give us clarity
> GO/NO-GO  for 4.0.0 GA release  and its timeline?
>
>
>
> Thanks,
> Kirti
>
> On Sat, Mar 25, 2023 at 3:27 PM Stamatis Zampetakis 
> wrote:
>
> > Regarding correctness, I think it makes sense to change default values
> and
> > possibly add a warning note when there's a known risk of wrong results.
> > Needless to say that we should try to fix as many issues as possible; we
> > still need volunteers to review open PRS.
> >
> > Performances regressions are trickier but if we have the query plans
> (CBO +
> > full) along with logs (including task counters) for fast and slow
> execution
> > we may be able to understand what happens. Don't hesitate to create Jira
> > tickets with these information if available.
> >
> > Last regarding 4.0.0 blockers, I don't think we need a special label. The
> > built-in and widely used priority "blocker" seems enough to capture the
> > importance and urgency of a ticket.
> > Since I am the release manager for the next release I will go over
> tickets
> > marked as blockers and reevaluate priorities if necessary.
> >
> > Best,
> > Stamatis
> >
> > On Thu, Mar 23, 2023, 10:27 AM Denys Kuzmenko 
> > wrote:
> >
> > > Thanks, Sungwoo for running the TPC-DS benchmark. Do we know if the
> same
> > > level of performance degradation was present in 4.0.0-alpha1?
> > >
> > > All: please use the `hive-4.0.0-must` label in a ticket if you think
> it's
> > > a show-stopper for the release.
> > >
> >
>


Re: Kill the Pig 

2023-05-12 Thread Attila Turoczy
This decision must be made based on consensus. As we can see, there is
still usage within the community. Despite the personal challenges it
presents to me, I believe we should all agree to maintain support for
Apache Pig, and unfortunately, there won't be any bacon soon :) Thank you
to everyone who has shared their opinions. We will revisit this topic at a
later time.

-Attila

On Fri, Apr 28, 2023 at 4:22 PM Ugur Yardimci 
wrote:

> I  totally agree +1
>
> On Thu, 20 Apr 2023, 10:50 Attila Turoczy,  wrote:
>
>> Hi All,
>>
>> In Hive we have a pretty old component from 1972 and this is the Pig. Pig
>> was cool somewhere in 2008, but nowadays it does not have any value in the
>> big data world. Even the last small release of big was 6 years ago in 2017,
>> also the pig community has pretty much died. Because this component is
>> obsolete I would suggest removing it from Hive 4.0. The hive 3 will still
>> contain it, but I think this is a right time to remove those components
>> that are not valuable for the community.
>>
>> What do you think about it?
>>
>> Ps: If nobody wrote it back, It would mean I could kill the pig (rof rof)
>> :)
>>
>> -Attila
>>
>


Kill the Pig 

2023-04-20 Thread Attila Turoczy
Hi All,

In Hive we have a pretty old component from 1972 and this is the Pig. Pig
was cool somewhere in 2008, but nowadays it does not have any value in the
big data world. Even the last small release of big was 6 years ago in 2017,
also the pig community has pretty much died. Because this component is
obsolete I would suggest removing it from Hive 4.0. The hive 3 will still
contain it, but I think this is a right time to remove those components
that are not valuable for the community.

What do you think about it?

Ps: If nobody wrote it back, It would mean I could kill the pig (rof rof) :)

-Attila


Re: Introducing a DI framework in Hive?

2023-04-20 Thread Attila Turoczy
Cool! Can't wait the first DI specific commit and the review :)

On 2023. Apr 19., Wed at 14:24, Stamatis Zampetakis 
wrote:

> I think we all agree that DI can be beneficial in general.
>
> However, it's hard to say yes or no on something before having a
> concrete case to discuss; it doesn't have to be a PR but we need to
> work on a specific Hive use-case and list advantages/disadvantages of
> the proposal.
>
> Best,
> Stamatis
>
> On Mon, Apr 17, 2023 at 7:33 PM Laszlo Vegh 
> wrote:
> >
> > Hi all,
> >
> > Sorry for not answering for so far, for some reason I did not receive
> your answers in my gmail account. I’m happy to see that there’s a
> conversation around the topic, so let me add my opinion on your points.
> >
> > First of all, introducing a DI framework does not mean a large scale
> refactoring. A suitable module, or a well-bounded set of components can be
> chosen as the first candidate. It’s also important that nobody will be
> forced to utilise the DI container when writing features, or to redesign
> existing code when it is being touched.
> > As for the aim: I’ve worked quite a lot with Java and .Net DI
> frameworks, and my experience was that having a DI framework greatly
> reduces the effort to write well organised and maintainable code. While
> well organised code can be written without DI frameworks too, the lack of
> such framework makes it much more easier to write poorly designed code (bad
> scoping, lifecycle issues, visibility issues, etc). On well-organised I
> mean:
> > Design patterns: DI containers make it easier to write code using the
> well known design patterns. For example you can implement factory, wrapper,
> adapter, etc patterns by simply using the offered features as it is
> supposed to do.
> > Streamlined component initialisation: No more spaghetti/boilerplate
> component init methods
> > Well defined component scopes (lifecycle): DI frameworks support various
> component scopes, which offers a fine grained control over component
> lifecylce -> Singleton, one component per thread, one component per request
> from DI container, etc.
> > Organised and visible component/class dependencies: Through constructor
> injection all the dependencies of a class are visible (unlike static method
> calls). Using this approach it is impossible to create circular
> dependencies which lead to object initialisation issues and hacks. By
> requiring all deps during object creation it’s way easier to detect or
> avoid unwanted dependencies. It also makes easier to better organise the
> code into packages and modules
> > Enhanced testability: I have explained this earlier.
> > Well defined component visibility: No need for “union-all” context
> objects. Instead of having context objects with references for all of the
> components which may required during the execution, each execution step can
> obtain the necessary dependencies from the DI container. Also, no more
> public static methods, or class instances. In order to let some component
> accessible from everywhere, there’s no need to make it public and static.
> DI frameworks also offer nested/sub contexts to limit/control visibility.
> > My original mail was supposed to be a kickoff, to start talking about
> DI. Before creating a PR with an example in Hive, I would like to have a
> common agreement that we want to do this, and there is no blocker which
> prevents us from doing it. Once we have this agreement I can create a
> working example and demonstrate how it will help us in the future.
> > Regarding the stability and performance issues: Of course those must be
> addressed as well, but as Stamatis pointed out, Hive is an open source
> project and everybody can have its own initiative in parallel to the
> others’.
> >
> > In Java I have the most experience with Spring, so I would prefer
> choosing it. It became huge by now, but it’s modular. We are not forced to
> use all of the offered features, if we want a pure DI container with some
> basic extensions, we would only need spring-core, spring-beans, and
> spring-context. It has several extensions and supports tons of other well
> known frameworks and/or technologies.
> >
> > Best regards,
> > Laszlo Vegh
>


Re: Introducing a DI framework in Hive?

2023-04-12 Thread Attila Turoczy
Hi Stamatis and Sungwoo,

Agree with several points. Hive has millions of LOC which is here and will
be with us in the same way, it is not a question. But we need to think
about the future of the project. There are no engineers in the world who
want to use old and legacy technologies, every engineer wants to use cool
staff where He/She can learn new stuff, patterns, designs. If we do not
improve on our codebase that will be a legacy zombieland, which won't be
touched by love and passion. *(Oh what a management bullshit - you can tell
:) )* But I truly think that if we introduce new principals it could give
us speed, motivation, and power to continue the innovation. As an engineer
I always want to use a modern approach, because this gives me more
excitement, I think that introducing a DI for this type of project is hard,
challenging and gives excitement. I want to live in a world where Hive is
the leader of the new principals, stable and easy to use, also the
on-boarding experience would be much much faster and easier.

I don't wanna live in a world 

As you wrote, the DI is powerful, and the hive does not contain it because
it became more widely used after the hive has started. If we / you
introduce it, it does not mean we have to refactor every module with DI.
But we can try to identify some components where we would introduce it,
also we could create a docs for others on how to use and implement it.
Maybe just 1-2 components, others will come later as we touch it, if it
does make sense. We won't remove every static utils class, because it would
not make sense, but with baby steps we could try to introduce, and for new
development we could introduce a loosely coupled standard, where every
dependency is more lightweight and also it would be easier to test these
components. (Which -could-  improves the quality as well)


#2 The quality of the 3.1.x vs 4.0.x is a bit different topic. I don't
think it has too many connections to the DI, but I think we should talk
about the root causes on different threads. You had several good points. We
- ALL - of us should be more careful about this type of issue. It was the
same in the past, especially when the hive 3 introduced there were several
similar issues. When new groundbreaking changes come to the repository it
could happen. Also I think the 4.0.0 alpha describes it as something that
is not solid stone. But anyhow you are right we have to be more careful!
But let's start a different thread about it


-Attila

On Wed, Apr 12, 2023 at 5:07 PM Sungwoo Park  wrote:

> Hello,
>
> I am not a committer, but I would like to add my opinion. At this stage of
> development, I think it is quite risky to switch to a DI framework for a
> couple of reasons.
>
> 1. A DI framework would have been a powerful tool if it had been
> incorporated into the project from the early stage. Now, however, Hive has
> way over 1 million lines of code and tens of thousands test cases, and my
> guess is that the overhead associated with introducing DI into Hive
> (whether gradually or globally at once) is very likely to outweigh the
> additional benefit, if any, of introducing DI, especially if we consider
> the stability of its development infrastructure.
>
> 2. Implementing new features, such as DI, in Hive can be an exciting
> sub-project and fun, but I think more pressing issues are to stabilize the
> current Hive code, although this is certainly less motivating and more
> boring. I hope that no new major features, such as DI, will be introduced
> until Hive becomes, say, as stable as Hive 3.1.
>
> For 2, I can give a few examples to substantiate my claim.
>
> 1) For the past few years, several new techniques for query compilation
> have been introduced. Unfortunately they were buggy and Hive started to
> return wrong results, on the assumption that Hive 3.1.2 was working
> correctly. (Yes, Hive 3.1.2 also has correctness bugs, but when tested
> against TPC-DS, Hive 3.1.2 returned the same results as other frameworks,
> so it can be used as a basis for comparison.) From our own testing, Hive
> 4.0.0-SNAPSHOT returns wrong results on several queries in TPC-DS, and this
> should be a major setback for Hive. If interested, please see [1] and [2].
>
> 2) Perhaps due to the same reason as in 1), Hive 4.0.0-SNAPSHOT is
> noticeably slower than Hive 3.1.2 on the TPC-DS benchmark. However, this is
> only from my own testing (using 10TB TPC-DS), and I hope that someone in
> the Hive team will try similar experiments to confirm/refute my claim.
>
> 3) Currently many q tests are run against MapReduce (which is not
> officially supported as far as I remember). However, some of these q tests
> fail when run against Tez. If Tez and LLAP are the new execution engines,
> these tests should be migrated as well.
>
> Sungwoo Park
>
> [1] https://issues.apache.org/jira/browse/HIVE-26654
> [2] https://issues.apache.org/jira/browse/HIVE-27226
>
> On Wed, Apr 12, 2023 at 10:12 PM 

Re: Will hive support storing all queries in the future?

2023-04-06 Thread Attila Turoczy
Hi Wish,

I personally don't think so. Of course it is a community decision, but to
store the sql query history is mainly belong to the client side from the
domain perspective.
For example the HUE does it. If you have a custom application in that case
the application domain determine it and it is responsible to the
application level. But I personally don't see the value to store and
retrieve it (like pg_stats) on hive side.

-Attila

On 2023. Apr 6., Thu at 11:03, gzu...@163.com  wrote:

> Hi,
>
> I would like to ask if hive will support storing all sql queries in a
> centralized way, like pg_stat_statements in postgresql, which I think can
> be better used and integrated with external systems.
>
> Best Wish
>
>
>
> gzu...@163.com
>