Re: [DISCUSS] HIVE 4.0.0 GA Release Proposal

2024-01-24 Thread Battula, Brahma Reddy
Hi Stamatis,

Any further update on the following which we missed here. thanks


Regards,
Brahma.


From: Stamatis Zampetakis 
Date: Tuesday, August 1, 2023 at 17:05
To: dev@hive.apache.org 
Subject: Re: [DISCUSS] HIVE 4.0.0 GA Release Proposal
Hello,

HIVE-27504 is now merged to master. Thanks everyone for the reviews!

I am going to prepare the release candidate for 4.0.0-beta-1 sometime this week.

Best,
Stamatis

On Thu, Jul 27, 2023 at 1:54 PM Battula, Brahma Reddy
 wrote:
>
> Looks following PR is reviewed. Any chance to get it merged and give the 
> release.?
>
> On 18/07/23, 2:39 PM, "Stamatis Zampetakis"  > wrote:
>
>
> HIVE-27504 still lacks reviews from committers.
>
>
> Note that I will not be able to work on the release from 22/07 to
> 30/07. If HIVE-27504 does not land in the next day or two the beta-1
> release might get delayed unless someone else picks up the RM role and
> cuts the RC.
>
>
> Best,
> Stamatis
>
>
> On Thu, Jul 13, 2023 at 6:33 PM Attila Turoczy
>  lid>
>  wrote:
> >
> > Thanks for the update! Can't wait for the beta :)
> >
> > -Attila
> >
> > On Thu, Jul 13, 2023 at 5:19 PM Stamatis Zampetakis  > >
> > wrote:
> >
> > > Hey everyone,
> > >
> > > As you may have noticed there have been various tickets around LICENSE
> > > and NOTICE files popping up recently. I just logged HIVE-27504 [1]
> > > which hopefully addresses all remaining issues that were found while I
> > > was working with the RC. After this gets resolved we should be good to
> > > go for putting up the RC for vote.
> > >
> > > The structure and content of the LICENSE and NOTICE file are very
> > > important for Apache releases so I would encourage other members of
> > > the community (especially PMC) to review the latest changes and
> > > current status and raise new JIRA tickets if they discover some
> > > problems. I would like to avoid having last minute -1 votes due to
> > > that.
> > >
> > > Best,
> > > Stamatis
> > >
> > > [1] 
> > > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-27504=05%7C01%7Cbbattula%40visa.com%7C79a71df0f45047b2388708db92836740%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638264865295588150%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=jYi7uYaZRu7DeTziITt3tsrJsq605SYZdrVzC594580%3D=0
> > >  
> > > >
> > >
> > > On Tue, Jun 20, 2023 at 11:09 PM Stamatis Zampetakis  > > >
> > > wrote:
> > > >
> > > > Hey team,
> > > >
> > > > Small heads up regarding the progress of the 4.0.0-beta-1 release.
> > > >
> > > > Most of the release steps went out smoothly and I was able to get an
> > > > RC0 ready [1].
> > > >
> > > > However, I am afraid that our binary distribution does not comply
> > > > fully with the ASF Policy [2]. We bundle a lot of dependencies (jars)
> > > > within and I am not sure if we are fully covered in terms of licenses
> > > > and notice files. Thanks Ayush for reminding me to check the
> > > > binary-package-licenses directory [5].
> > > >
> > > > I am checking various resources such as [3, 4] to see what additional
> > > > steps we can take to be on the safe side and also looking for ways to
> > > > automate this so that we don't have to manually inspect the jars on
> > > > every release. I was playing a bit with license-maven-plugin [6] but I
> > > > am not yet completely happy with its output.
> > > >
> > > > The next few days will be a bit busy so most likely I will get back on
> > > > this during the weekend. If people have feedback or other ideas to
> > > > share please let me know.
> > > >
> > > > Best,
> > > > Stamatis
> > > >
> > > > [1] 
> > > > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.apache.org%2F~zabetak%2Fapache-hive-4.0.0-beta-1-rc0%2F=05%7C01%7Cbbattula%40visa.com%7C79a71df0f45047b2388708db92836740%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638264865295588150%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=cR6qbsbhYG3XgPMz77CEf7lfWyTM7Q0KJJVtnZ5ROJA%3D=0
> > > >  
> > > > 

Re: [Discuss] Enable Attachments for Hive mailing lists

2024-01-24 Thread Denys Kuzmenko
+1


Re: [Discuss] Enable Attachments for Hive mailing lists

2024-01-24 Thread Simhadri G
+1 from me.

It would be nice if we could attach design docs to the mail thread.

Thanks!
Simhadri G


On Tue, Jan 23, 2024 at 1:40 PM Stamatis Zampetakis 
wrote:

> +0
>
> I rarely open attachments from public mailing lists for security
> reasons (unless we are talking for known safe extensions).
>
> Moreover, I find it easier to glance through code if people share a
> link to a PR or code in GitHub than if I have to download and apply a
> patch locally.
>
> I understand that for some people this may be helpful so I am not
> opposing the change.
>
> Best,
> Stamatis
>
> On Mon, Jan 22, 2024 at 2:39 PM Attila Turoczy
>  wrote:
> >
> > +1 for me as well. We need it.
> >
> > -Attila
> >
> > On Mon, Jan 22, 2024 at 1:25 PM Ayush Saxena  wrote:
> >
> > > Hi All,
> > > As of now we don't allow having attachments on the hive mailing lists
> > > (apart from security ML), This prevents us from attaching
> patches/design
> > > doc or even screenshots of issues being reported on our mailing lists.
> > >
> > > A lot of projects allow that, I feel we should enable this for our Hive
> > > mailing lists as well for better dev experience.
> > >
> > > Let me know your thoughts!!!
> > >
> > > Obviously a +1 from me
> > >
> > > -Ayush
> > >
>


TABLESAMPLE with buckets not working, it does not prune input

2024-01-24 Thread Pau Tallada
Hi all,

We have a web platform in production[1] that uses Hive to facilitate access
to massive cosmological datasets.
When launched in 2016 over Hive 2.1.2 we used the TABLESAMPLE clause on
clustered tables to allow quick subsampling of the data.
However, we have been unable to get the same behaviour using Hive 3.1.2.

Tables are clustered following indications[2], but the queries always read
all the data in the table.
In fact, they read even more data (HDFS_BYTES_READ counter) when the
tablesample clause is used.

Example query on a very large table:

SELECT SUM(float_column) FROM huge_clustered_table
=>
23552 tasks
HDFS_BYTES_READ  = 44475888168 (44G)

SELECT SUM(float_column) FROM huge_clustered_table
TABLESAMPLE(BUCKET1 OUT OF 1024)
=>
*23552 tasks*

*HDFS_BYTES_READ  = 58372075670 (58G) ()*

However, using block sampling:

SELECT SUM(float_column) FROM huge_clustered_table
TABLESAMPLE(0.1 PERCENT)
=>

*25 tasks*
* HDFS_BYTES_READ  = 45484944 (45M)*

Please, any hint would be greatly appreciated!

[1] https://cosmohub.pic.es
[2] https://cwiki.apache.org/confluence/display/hive/languagemanual+sampling
-- 
--
Pau Tallada Crespí
Departament de Serveis
Port d'Informació Científica (PIC)
Tel: +34 93 170 2729
--