[jira] [Created] (HIVE-19867) Test and verify Concurrent INSERTS

2018-06-11 Thread Steve Yeom (JIRA)
Steve Yeom created HIVE-19867:
-

 Summary: Test and verify Concurrent INSERTS  
 Key: HIVE-19867
 URL: https://issues.apache.org/jira/browse/HIVE-19867
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 4.0.0
Reporter: Steve Yeom
 Fix For: 4.0.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19866) improve cache purge

2018-06-11 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-19866:
---

 Summary: improve cache purge
 Key: HIVE-19866
 URL: https://issues.apache.org/jira/browse/HIVE-19866
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


1) Memory needs to be accounted for.
2) LRFU eviction doesn't need to maintain state between individual removals.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19865) Full ACID table stats has wrong rawDataSize

2018-06-11 Thread Steve Yeom (JIRA)
Steve Yeom created HIVE-19865:
-

 Summary: Full ACID table stats has wrong rawDataSize
 Key: HIVE-19865
 URL: https://issues.apache.org/jira/browse/HIVE-19865
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 4.0.0
Reporter: Steve Yeom
 Fix For: 4.0.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Getting started with Hive

2018-06-11 Thread Alan Gates
Take a look at DDLTask.describeTable.  That is the task run by the executor
to do a describe table.  The 'Hive db' argument is a handle to the
metastore client.  'DescTableDesc descTbl' contains the information from
the parser on what table to describe.

Are you doing this just to learn the system or do you want to add more
metadata that will be returned in show and describe calls?  If this is
something you plan to contribute you should file a JIRA and discuss your
potential changes there, so that you can get feedback on the design, since
adding additional metadata tables to the system is a change others will
want to review.

Alan.

On Mon, Jun 11, 2018 at 5:10 PM Sanchay Javeria 
wrote:

> Thanks Alan! I think I should've phrased my question better. Essentially
> I'm trying to return an extra field when a user describes a table. Say, you
> run `desc formatted foo`, We get some information back like, `table name`,
> `database` etc..
>
> I'm trying to return extra information about the table `foo`. So, I made a
> dummy SQL table where I, say, have the job name (fooBarJob) which populated
> table foo. Now if you run, `desc formatted foo`, you should get the jobName
> along with other fields.
>
> So basically, how exactly does a query like `desc` pulls its metadata from
> the backing RDBMS?
>
> Also, thanks a lot for the easy to follow execution rundown you gave for
> creating a table. Before contributing further, I feel a simple exercise
> like this one can help me understand things clearly.
>
> Thanks,
> Sanchay
>
> On Mon, Jun 11, 2018 at 12:29 PM, Alan Gates  wrote:
>
>> First, my apologies, I thought your first question was sent to dev@hive,
>> which is the right list.  Hence I've removed dev@community from my
>> reply.  If you haven't already you should subscribe to dev@hive.
>>
>> I'm not 100% sure I understand your question, but here's a place to
>> start.  If you do "create table foo..." in SQL that will eventually end up
>> in HiveMetaStore.create_table.  This handles checking the table and
>> creating any necessary directories, and then calls RawStore.createTable,
>> which will end up in ObjectStore.createTable.  This is where the values you
>> sent in createTable get written down to the RDBMS backing the metadata.
>> I'm not sure that's the answer to the question you were asking or not.
>>
>> Alan.
>>
>> On Mon, Jun 11, 2018 at 12:17 PM Sanchay Javeria 
>> wrote:
>>
>>> Hello,
>>>
>>> Thank you. I've also CC'ed @hive.apache.org
>>>
>>> I went through the dev docs on Hive and got an understanding of the
>>> architecture and the high level overview of how a HiveQL query execution
>>> proceeds. To get a better understanding, I decided to add a new field
>>> from
>>> a SQL table when a user describes a table by tweaking the hive meta
>>> store,
>>> in addition to fields like "Database:", OwnerType:" etc.
>>>
>>> I added a new hook to obtain a connection to a SQL server and placed a
>>> watcher under `startMetaStoreThreads()` in `HiveMetaStore.java`.
>>> I then found `getTableMetaDataInformation()` under
>>> `MetaDataFormatUtils.java` which populates the various fields like
>>> "Database", "OwnerType" etc. by calling getters on the `Table` instance.
>>>
>>> This lead me to `api/Table.java`, auto-generated by the Thrift compiler,
>>> which returns private instances for the getters above. However, I'm
>>> unable
>>> to understand how these private variables in `metastore/api/Table.java`
>>> populated? In other words, when we create a new table in Hive, where
>>> exactly is this metadata generated and populated so that it can be later
>>> fetched when describing a table?
>>>
>>> Please let me know if you need any further clarifications on the
>>> question!
>>>
>>> On Mon, Jun 11, 2018 at 12:13 PM, Alan Gates 
>>> wrote:
>>>
>>> > Yes, this is the place to ask dev questions.
>>> >
>>> > Alan.
>>> >
>>> > On Mon, Jun 11, 2018 at 12:10 PM Sanchay Javeria <
>>> javer...@illinois.edu>
>>> > wrote:
>>> >
>>> > > Hi fellow devs,
>>> > >
>>> > > I'm a computer science student at UIUC who just got started with
>>> Apache
>>> > > Hive, I'd love to contribute more towards the open JIRA tickets.
>>> > >
>>> > > I had some questions if anyone could help :) I was wondering if this
>>> > > mailing list is the right space to ask dev questions?
>>> > >
>>> > > Thank you,
>>> > > Sanchay
>>> > >
>>> >
>>>
>>
>


Re: Getting started with Hive

2018-06-11 Thread Sanchay Javeria
Thanks Alan! I think I should've phrased my question better. Essentially
I'm trying to return an extra field when a user describes a table. Say, you
run `desc formatted foo`, We get some information back like, `table name`,
`database` etc..

I'm trying to return extra information about the table `foo`. So, I made a
dummy SQL table where I, say, have the job name (fooBarJob) which populated
table foo. Now if you run, `desc formatted foo`, you should get the jobName
along with other fields.

So basically, how exactly does a query like `desc` pulls its metadata from
the backing RDBMS?

Also, thanks a lot for the easy to follow execution rundown you gave for
creating a table. Before contributing further, I feel a simple exercise
like this one can help me understand things clearly.

Thanks,
Sanchay

On Mon, Jun 11, 2018 at 12:29 PM, Alan Gates  wrote:

> First, my apologies, I thought your first question was sent to dev@hive,
> which is the right list.  Hence I've removed dev@community from my
> reply.  If you haven't already you should subscribe to dev@hive.
>
> I'm not 100% sure I understand your question, but here's a place to
> start.  If you do "create table foo..." in SQL that will eventually end up
> in HiveMetaStore.create_table.  This handles checking the table and
> creating any necessary directories, and then calls RawStore.createTable,
> which will end up in ObjectStore.createTable.  This is where the values you
> sent in createTable get written down to the RDBMS backing the metadata.
> I'm not sure that's the answer to the question you were asking or not.
>
> Alan.
>
> On Mon, Jun 11, 2018 at 12:17 PM Sanchay Javeria 
> wrote:
>
>> Hello,
>>
>> Thank you. I've also CC'ed @hive.apache.org
>>
>> I went through the dev docs on Hive and got an understanding of the
>> architecture and the high level overview of how a HiveQL query execution
>> proceeds. To get a better understanding, I decided to add a new field from
>> a SQL table when a user describes a table by tweaking the hive meta store,
>> in addition to fields like "Database:", OwnerType:" etc.
>>
>> I added a new hook to obtain a connection to a SQL server and placed a
>> watcher under `startMetaStoreThreads()` in `HiveMetaStore.java`.
>> I then found `getTableMetaDataInformation()` under
>> `MetaDataFormatUtils.java` which populates the various fields like
>> "Database", "OwnerType" etc. by calling getters on the `Table` instance.
>>
>> This lead me to `api/Table.java`, auto-generated by the Thrift compiler,
>> which returns private instances for the getters above. However, I'm unable
>> to understand how these private variables in `metastore/api/Table.java`
>> populated? In other words, when we create a new table in Hive, where
>> exactly is this metadata generated and populated so that it can be later
>> fetched when describing a table?
>>
>> Please let me know if you need any further clarifications on the question!
>>
>> On Mon, Jun 11, 2018 at 12:13 PM, Alan Gates 
>> wrote:
>>
>> > Yes, this is the place to ask dev questions.
>> >
>> > Alan.
>> >
>> > On Mon, Jun 11, 2018 at 12:10 PM Sanchay Javeria > >
>> > wrote:
>> >
>> > > Hi fellow devs,
>> > >
>> > > I'm a computer science student at UIUC who just got started with
>> Apache
>> > > Hive, I'd love to contribute more towards the open JIRA tickets.
>> > >
>> > > I had some questions if anyone could help :) I was wondering if this
>> > > mailing list is the right space to ask dev questions?
>> > >
>> > > Thank you,
>> > > Sanchay
>> > >
>> >
>>
>


[jira] [Created] (HIVE-19864) Address TestTriggersWorkloadManager flakiness

2018-06-11 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-19864:


 Summary: Address TestTriggersWorkloadManager flakiness
 Key: HIVE-19864
 URL: https://issues.apache.org/jira/browse/HIVE-19864
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.0, 4.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


TestTriggersWorkloadManager seems flaky and all test cases gets timed out at 
times. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review Request 67540: HIVE-19861 Fix temp table path generation for acid table export

2018-06-11 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67540/
---

Review request for hive and Eugene Koifman.


Bugs: HIVE-19861
https://issues.apache.org/jira/browse/HIVE-19861


Repository: hive-git


Description
---

Change DDLTask so temp tables do not get location generated.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java e06949928d 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java 
209fdfb287 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 2e055aba4b 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 04292787a8 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 83490d2d53 


Diff: https://reviews.apache.org/r/67540/diff/1/


Testing
---


Thanks,

Jason Dere



[jira] [Created] (HIVE-19863) UNION query produce wrong results

2018-06-11 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-19863:
--

 Summary: UNION query produce wrong results
 Key: HIVE-19863
 URL: https://issues.apache.org/jira/browse/HIVE-19863
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


*Reproducer*
{code:sql}
SET hive.vectorized.execution.enabled=false;
set hive.map.aggr=false;

set hive.strict.checks.bucketing=false;
set hive.explain.user=true;

CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
'default') STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;

ANALYZE TABLE src1 COMPUTE STATISTICS;

ANALYZE TABLE src1 COMPUTE STATISTICS FOR COLUMNS key,value;


CREATE TABLE src (key STRING COMMENT 'default', value STRING COMMENT 'default') 
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" INTO TABLE src;

ANALYZE TABLE src COMPUTE STATISTICS;

ANALYZE TABLE src COMPUTE STATISTICS FOR COLUMNS key,value;

SELECT x.key, z.value, y.value
FROM src1 x JOIN src y ON (x.key = y.key) 
JOIN (select * from src1 union select * from src)z ON (x.value = z.value)
union
SELECT x.key, z.value, y.value
FROM src1 x JOIN src y ON (x.key = y.key) 
JOIN (select * from src1 union select * from src)z ON (x.value = z.value);
{code}

*Expected Result*
{code:sql}
128 val_128
146 val_146 val_146
150 val_150 val_150
213 val_213 val_213
224 val_224
238 val_238 val_238
255 val_255 val_255
273 val_273 val_273
278 val_278 val_278
311 val_311 val_311
369 val_369
401 val_401 val_401
406 val_406 val_406
66  val_66  val_66
98  val_98  val_98
{code}

*Actual Result*
{code:sql}
128
146 val_146
150 val_150
213 val_213
224
238 val_238
255 val_255
273 val_273
278 val_278
311 val_311
369
401 val_401
406 val_406
66  val_66
98  val_98
{code}

One whole column is missing from the result




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19862) Postgres init script has a glitch around UNIQUE_DATABASE

2018-06-11 Thread Daniel Dai (JIRA)
Daniel Dai created HIVE-19862:
-

 Summary: Postgres init script has a glitch around UNIQUE_DATABASE
 Key: HIVE-19862
 URL: https://issues.apache.org/jira/browse/HIVE-19862
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Reporter: Daniel Dai
Assignee: Daniel Dai


{code}
ALTER TABLE ONLY "DBS" ADD CONSTRAINT "UNIQUE_DATABASE" UNIQUE ("NAME");
{code}
Should also include "CTLG_NAME".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Getting started with Hive

2018-06-11 Thread Sanchay Javeria
Hello,

Thank you. I've also CC'ed @hive.apache.org

I went through the dev docs on Hive and got an understanding of the
architecture and the high level overview of how a HiveQL query execution
proceeds. To get a better understanding, I decided to add a new field from
a SQL table when a user describes a table by tweaking the hive meta store,
in addition to fields like "Database:", OwnerType:" etc.

I added a new hook to obtain a connection to a SQL server and placed a
watcher under `startMetaStoreThreads()` in `HiveMetaStore.java`.
I then found `getTableMetaDataInformation()` under
`MetaDataFormatUtils.java` which populates the various fields like
"Database", "OwnerType" etc. by calling getters on the `Table` instance.

This lead me to `api/Table.java`, auto-generated by the Thrift compiler,
which returns private instances for the getters above. However, I'm unable
to understand how these private variables in `metastore/api/Table.java`
populated? In other words, when we create a new table in Hive, where
exactly is this metadata generated and populated so that it can be later
fetched when describing a table?

Please let me know if you need any further clarifications on the question!

On Mon, Jun 11, 2018 at 12:13 PM, Alan Gates  wrote:

> Yes, this is the place to ask dev questions.
>
> Alan.
>
> On Mon, Jun 11, 2018 at 12:10 PM Sanchay Javeria 
> wrote:
>
> > Hi fellow devs,
> >
> > I'm a computer science student at UIUC who just got started with Apache
> > Hive, I'd love to contribute more towards the open JIRA tickets.
> >
> > I had some questions if anyone could help :) I was wondering if this
> > mailing list is the right space to ask dev questions?
> >
> > Thank you,
> > Sanchay
> >
>


[jira] [Created] (HIVE-19861) Fix temp table path generation for acid table export

2018-06-11 Thread Jason Dere (JIRA)
Jason Dere created HIVE-19861:
-

 Summary: Fix temp table path generation for acid table export
 Key: HIVE-19861
 URL: https://issues.apache.org/jira/browse/HIVE-19861
 Project: Hive
  Issue Type: Bug
  Components: Import/Export, Transactions
Reporter: Jason Dere
Assignee: Jason Dere


Temp tables that are analyzed by the SemanticAnalyzer get their default 
location set to a location in the session directory. Export of Acid tables also 
creates temp tables, but this is done via a plan transformation, and the temp 
table creation never goes through the SemanticAnalyzer, meaning the location is 
not set. There is some other logic in DDLTask (which I am changing in 
HIV-19837) which ends up automatically setting this path to the default table 
location in the warehouse directory. This should be fixed so that the path 
defaults to a location in the session directory, like with normal temp tables.

cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19860) HiveServer2 ObjectInspectorFactory memory leak with cachedUnionStructObjectInspector

2018-06-11 Thread Rajkumar Singh (JIRA)
Rajkumar Singh created HIVE-19860:
-

 Summary: HiveServer2 ObjectInspectorFactory memory leak with 
cachedUnionStructObjectInspector
 Key: HIVE-19860
 URL: https://issues.apache.org/jira/browse/HIVE-19860
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 2.1.0
 Environment: hiveserver2 Interactive with LLAP.
Reporter: Rajkumar Singh


hiveserver2 is start seeing the memory pressure once the 
cachedUnionStructObjectInspector start going 

[https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java#L345]

I did not see any eviction policy for cachedUnionStructObjectInspector, so we 
should implement some size or time-based eviction policy. 

 

!Screen Shot 2018-06-11 at 1.52.50 PM.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19859) Inspect lock components for DBHiveLock while verifying whether transaction list is valid

2018-06-11 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-19859:
--

 Summary: Inspect lock components for DBHiveLock while verifying 
whether transaction list is valid
 Key: HIVE-19859
 URL: https://issues.apache.org/jira/browse/HIVE-19859
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Getting started with Hive

2018-06-11 Thread Alan Gates
First, my apologies, I thought your first question was sent to dev@hive,
which is the right list.  Hence I've removed dev@community from my reply.
If you haven't already you should subscribe to dev@hive.

I'm not 100% sure I understand your question, but here's a place to start.
If you do "create table foo..." in SQL that will eventually end up in
HiveMetaStore.create_table.  This handles checking the table and creating
any necessary directories, and then calls RawStore.createTable, which will
end up in ObjectStore.createTable.  This is where the values you sent in
createTable get written down to the RDBMS backing the metadata.  I'm not
sure that's the answer to the question you were asking or not.

Alan.

On Mon, Jun 11, 2018 at 12:17 PM Sanchay Javeria 
wrote:

> Hello,
>
> Thank you. I've also CC'ed @hive.apache.org
>
> I went through the dev docs on Hive and got an understanding of the
> architecture and the high level overview of how a HiveQL query execution
> proceeds. To get a better understanding, I decided to add a new field from
> a SQL table when a user describes a table by tweaking the hive meta store,
> in addition to fields like "Database:", OwnerType:" etc.
>
> I added a new hook to obtain a connection to a SQL server and placed a
> watcher under `startMetaStoreThreads()` in `HiveMetaStore.java`.
> I then found `getTableMetaDataInformation()` under
> `MetaDataFormatUtils.java` which populates the various fields like
> "Database", "OwnerType" etc. by calling getters on the `Table` instance.
>
> This lead me to `api/Table.java`, auto-generated by the Thrift compiler,
> which returns private instances for the getters above. However, I'm unable
> to understand how these private variables in `metastore/api/Table.java`
> populated? In other words, when we create a new table in Hive, where
> exactly is this metadata generated and populated so that it can be later
> fetched when describing a table?
>
> Please let me know if you need any further clarifications on the question!
>
> On Mon, Jun 11, 2018 at 12:13 PM, Alan Gates  wrote:
>
> > Yes, this is the place to ask dev questions.
> >
> > Alan.
> >
> > On Mon, Jun 11, 2018 at 12:10 PM Sanchay Javeria 
> > wrote:
> >
> > > Hi fellow devs,
> > >
> > > I'm a computer science student at UIUC who just got started with Apache
> > > Hive, I'd love to contribute more towards the open JIRA tickets.
> > >
> > > I had some questions if anyone could help :) I was wondering if this
> > > mailing list is the right space to ask dev questions?
> > >
> > > Thank you,
> > > Sanchay
> > >
> >
>


[DISCUSS] Catalog feature in 3.x

2018-06-11 Thread Alan Gates
The base of the catalog feature made it into Hive 3, though it's not very
usable yet since it is only in the metastore.  I plan to keep working on
this feature to add it to the rest of Hive.  The (ever growing) set of
tasks for doing this is tracked in HIVE-18685.

I believe this should get pushed into branch-3 since the beginning of the
feature is in 3 and it will likely be a year or more before there's a Hive
4.

Since branch-3 currently has a number of test failures I won't start
pushing patches yet, but once it stabilizes I'd like to push this feature
into branch-3.  Any concerns?

Alan.


[jira] [Created] (HIVE-19858) make interface change from TEZ-3951 non-breaking

2018-06-11 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-19858:
---

 Summary: make interface change from TEZ-3951 non-breaking
 Key: HIVE-19858
 URL: https://issues.apache.org/jira/browse/HIVE-19858
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: May 2018 Hive User Group Meeting

2018-06-11 Thread Mich Talebzadeh
yes indeed I second that

Regards


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 11 June 2018 at 09:11,  wrote:

> Thank you very much for sharing Sahil. I found the video of the event and
> the new features of Hive to be very interesting.
>
>
>
> *From:* Sahil Takiar [mailto:takiar.sa...@gmail.com]
> *Sent:* miércoles, 23 de mayo de 2018 23:08
>
> *To:* u...@hive.apache.org
> *Cc:* dev@hive.apache.org
> *Subject:* Re: May 2018 Hive User Group Meeting
>
>
>
> Wanted to thank everyone for attending the meetup a few weeks ago, and a
> huge thanks to all of our speakers! Apologies for the delay, but we finally
> have the recording uploaded to Youtube along with all the slides uploaded
> to Slidehshare. Below are the links:
>
>
>
> Recording: https://youtu.be/gwX3KpHa2j0
>
>- Hive-on-Spark at Uber: Efficiency & Scale - Xuefu Zhang -
>https://www.slideshare.net/sahiltakiar/hive-on-spark-at-uber-scale
>
>- Hive-on-S3 Performance: Past, Present, and Future - Sahil Takiar -
>https://www.slideshare.net/sahiltakiar/hive-ons3-
>performance-past-present-and-future
>
> 
>- Dali: Data Access Layer at LinkedIn - Adwait Tumbde -
>https://www.slideshare.net/sahiltakiar/dali-data-access-layer
>
>- Parquet Vectorization in Hive - Vihang Karajgaonkar -
>https://www.slideshare.net/sahiltakiar/parquet-vectorization-in-hive
>
>- ORC Column Level Encryption - Owen O’Malley -
>https://www.slideshare.net/sahiltakiar/orc-column-encryption
>
>- Running Hive at Scale @ Lyft - Sharanya Santhanam, Rohit Menon -
>https://www.slideshare.net/sahiltakiar/running-hive-at-scale-lyft
>
>- Materialized Views in Hive - Jesus Camacho Rodriguez -
>https://www.slideshare.net/sahiltakiar/accelerating-
>query-processing-with-materialized-views-in-apache-hive-98333641
>
> 
>- Hive Metastore Caching - Daniel Dai - https://www.slideshare.net/
>sahiltakiar/hive-metastore-cache
>
>- Hive Metastore Separation - Alan Gates - https://www.slideshare.net/
>sahiltakiar/making-the-metastore-standalone
>
>- Customer Use Cases & Pain Points of (Big) Metadata - Rituparna
>Agrawal - https://www.slideshare.net/sahiltakiar/customer-use-
>cases-pain-points-of-big-metadata
>
> 
>
> If you have any issues accessing the links, feel free to reach out to me.
>
>
>
> Looking forward to our next Hive Meetup!
>
>
>
> On Mon, May 14, 2018 at 8:45 AM, Sahil Takiar 
> wrote:
>
> Hello,
>
>
>
> Yes, the meetup was recorded. We are in the process of getting it uploaded
> to Youtube. Once its publicly available I will send out the link on this
> email thread.
>
>
>
> Thanks
>
>
>
> --Sahil
>
>
>
> On Mon, May 14, 2018 at 6:04 AM,  wrote:
>
> Hi,
>
>
>
> If you have recorded the meeting share link please. I could not follow it
> online for the schedule (I live in Spain).
>
>
>
> Kind Regards,
>
>
>
>
>
> *From:* Luis Figueroa [mailto:lef...@outlook.com]
> *Sent:* miércoles, 9 de mayo de 2018 18:01
> *To:* u...@hive.apache.org
> *Cc:* dev@hive.apache.org
> *Subject:* Re: May 2018 Hive User Group Meeting
>
>
>
> Hey everyone,
>
>
>
> Was the meeting recorded by any chance?
>
> Luis
>
>
> On May 8, 2018, at 5:31 PM, Sahil Takiar  wrote:
>
> Hey Everyone,
>
>
>
> Almost time for the meetup! The live stream can be viewed on this link:
> https://live.lifesizecloud.com/extension/2000992219?
> token=067078ac-a8df-45bc-b84c-4b371ecbc719==en&
> meeting=Hive%20User%20Group%20Meetup
>
> The stream won't be live until the meetup starts.
>
> For those attending in person, there will be guest wifi:
>
> Login: HiveMeetup
> Password: ClouderaHive
>
>
>
> On Mon, May 7, 2018 at 12:48 PM, Sahil Takiar 
> wrote:

RE: May 2018 Hive User Group Meeting

2018-06-11 Thread roberto.tardio
Thank you very much for sharing Sahil. I found the video of the event and the 
new features of Hive to be very interesting.

 

From: Sahil Takiar [mailto:takiar.sa...@gmail.com] 
Sent: miércoles, 23 de mayo de 2018 23:08
To: u...@hive.apache.org
Cc: dev@hive.apache.org
Subject: Re: May 2018 Hive User Group Meeting

 

Wanted to thank everyone for attending the meetup a few weeks ago, and a huge 
thanks to all of our speakers! Apologies for the delay, but we finally have the 
recording uploaded to Youtube along with all the slides uploaded to 
Slidehshare. Below are the links:

 

Recording: https://youtu.be/gwX3KpHa2j0

*   Hive-on-Spark at Uber: Efficiency & Scale - Xuefu Zhang - 
https://www.slideshare.net/sahiltakiar/hive-on-spark-at-uber-scale
*   Hive-on-S3 Performance: Past, Present, and Future - Sahil Takiar - 
https://www.slideshare.net/sahiltakiar/hive-ons3-performance-past-present-and-future
*   Dali: Data Access Layer at LinkedIn - Adwait Tumbde - 
https://www.slideshare.net/sahiltakiar/dali-data-access-layer
*   Parquet Vectorization in Hive - Vihang Karajgaonkar - 
https://www.slideshare.net/sahiltakiar/parquet-vectorization-in-hive
*   ORC Column Level Encryption - Owen O’Malley - 
https://www.slideshare.net/sahiltakiar/orc-column-encryption
*   Running Hive at Scale @ Lyft - Sharanya Santhanam, Rohit Menon - 
https://www.slideshare.net/sahiltakiar/running-hive-at-scale-lyft
*   Materialized Views in Hive - Jesus Camacho Rodriguez - 
https://www.slideshare.net/sahiltakiar/accelerating-query-processing-with-materialized-views-in-apache-hive-98333641
*   Hive Metastore Caching - Daniel Dai - 
https://www.slideshare.net/sahiltakiar/hive-metastore-cache
*   Hive Metastore Separation - Alan Gates - 
https://www.slideshare.net/sahiltakiar/making-the-metastore-standalone
*   Customer Use Cases & Pain Points of (Big) Metadata - Rituparna Agrawal 
- 
https://www.slideshare.net/sahiltakiar/customer-use-cases-pain-points-of-big-metadata

If you have any issues accessing the links, feel free to reach out to me.

 

Looking forward to our next Hive Meetup!

 

On Mon, May 14, 2018 at 8:45 AM, Sahil Takiar mailto:takiar.sa...@gmail.com> > wrote:

Hello,

 

Yes, the meetup was recorded. We are in the process of getting it uploaded to 
Youtube. Once its publicly available I will send out the link on this email 
thread.

 

Thanks

 

--Sahil

 

On Mon, May 14, 2018 at 6:04 AM, mailto:roberto.tar...@stratebi.com> > wrote:

Hi,

 

If you have recorded the meeting share link please. I could not follow it 
online for the schedule (I live in Spain).

 

Kind Regards,

 

 

From: Luis Figueroa [mailto:lef...@outlook.com  ] 
Sent: miércoles, 9 de mayo de 2018 18:01
To: u...@hive.apache.org  
Cc: dev@hive.apache.org  
Subject: Re: May 2018 Hive User Group Meeting

 

Hey everyone,  

 

Was the meeting recorded by any chance? 

Luis


On May 8, 2018, at 5:31 PM, Sahil Takiar mailto:takiar.sa...@gmail.com> > wrote:

Hey Everyone, 

 

Almost time for the meetup! The live stream can be viewed on this link: 
https://live.lifesizecloud.com/extension/2000992219?token=067078ac-a8df-45bc-b84c-4b371ecbc719
 

 ==en=Hive%20User%20Group%20Meetup

The stream won't be live until the meetup starts.

For those attending in person, there will be guest wifi:

Login: HiveMeetup
Password: ClouderaHive

 

On Mon, May 7, 2018 at 12:48 PM, Sahil Takiar mailto:takiar.sa...@gmail.com> > wrote:

Hey Everyone, 

 

The meetup is only a day away! Here 

  is a link to all the abstracts we have compiled thus far. Several of you have 
asked about event streaming and recordings. The meetup will be both streamed 
live and recorded. We will post the links on this thread and on the meetup link 
tomorrow closer to the start of the meetup.

 

The meetup will be at Cloudera HQ - 395 Page Mill Rd 
 . If you 
have any trouble getting into the building, feel free to post on the meetup 
link.

 

Meetup Link: https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/

 

On Wed, May 2, 2018 at 7:48 AM, Sahil Takiar mailto:takiar.sa...@gmail.com> > wrote:

Hey Everyone,

 

The agenda for the meetup has been set and I'm excited to say we have lots of 
interesting talks scheduled! Below is final agenda, the full list of abstracts 
will be sent out soon. If you are planning to attend, please RSVP on the meetup 
link so we can get an accurate headcount of attendees ( 
 
https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/).


6:30 - 7:00 PM Networking and Refreshments 


[jira] [Created] (HIVE-19857) Set 3.1.0 for sys db version

2018-06-11 Thread Miklos Gergely (JIRA)
Miklos Gergely created HIVE-19857:
-

 Summary: Set 3.1.0 for sys db version
 Key: HIVE-19857
 URL: https://issues.apache.org/jira/browse/HIVE-19857
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.1.0
Reporter: Miklos Gergely
Assignee: Miklos Gergely
 Fix For: 3.1.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66667: HIVE-19046: Refactor the common parts of the HiveMetastore add_partition_core and add_partitions_pspec_core methods

2018-06-11 Thread Marta Kuczora via Review Board


> On June 6, 2018, 10:34 p.m., Alexander Kolbasov wrote:
> > Looks good, a few nits below.

Thanks for looking into this review. I fixed/answered the issues. 
Please let me know if the patch looks ok, then I will upload it to the Jira to 
run the pre-commit tests.


> On June 6, 2018, 10:34 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 3227 (original), 3322 (patched)
> > 
> >
> > Is it possible to do it once in constructor instead? I suspect that 
> > this is a no-trivial operation.

To be honest, I don't see clearly if it would be worth to move this part to the 
constructor. I am not sure what side effect it would have. In HIVE-15137, where 
this part was added to the code, the problem was that if two HiveCli were 
started with different users and both users added a partition, the owner of the 
partition directories was always the first user. Would moving this code to the 
constructor not affect this use-case? Would it work correctly? I think, this 
should be investigated. I am just not sure of the benefit of moving this code. 
The current user is fetched only once when creating a batch of partitions, and 
I don't see this as a very expensive call. If we want to move this, I would 
suggest to investigate and do it in a seperate Jira. What do you think?


> On June 6, 2018, 10:34 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Lines 3253 (patched)
> > 
> >
> > Can you clarify that "clean up" means removing associated directory.

I fixed it accordingly.


> On June 6, 2018, 10:34 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Lines 3268 (patched)
> > 
> >
> > Please add a Javadoc here explaining what is checked by validation. 
> > Also it isn't obvious that validation has side effects (updating partsToAdd)

Added Javadoc


> On June 6, 2018, 10:34 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 3247 (original), 3343 (patched)
> > 
> >
> > addedPartitions is not defined here so it isn't obvious that it should 
> > be thread-safe. Is it possible to allocate and return addedPartitions here 
> > so that you guarantee using of thread-safe map? 
> > 
> > Another way you can do it is to collect added partitions in thread-safe 
> > local map and then copy it to the resulting map once you are done with 
> > concurrent part.

The createPartitionFolders method is called with a ConcurrentHashMap, I thought 
it would do the trick. 
Returning with the addedPartitions map would be complicated as we have to 
return the newParts list as well. So I fixed this issue by introducing a local 
map and then copy the result to the addedPartitions map.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7/#review203099
---


On June 11, 2018, 11:27 a.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7/
> ---
> 
> (Updated June 11, 2018, 11:27 a.m.)
> 
> 
> Review request for hive, Peter Vary, Sahil Takiar, and Adam Szita.
> 
> 
> Bugs: HIVE-19046
> https://issues.apache.org/jira/browse/HIVE-19046
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The biggest part of these methods use the same code. Refactored these code 
> parts to common methods.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  b9f5fb8 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
>  bf559b4 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
>  4f11a55 
> 
> 
> Diff: https://reviews.apache.org/r/7/diff/6/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>



Re: Review Request 66667: HIVE-19046: Refactor the common parts of the HiveMetastore add_partition_core and add_partitions_pspec_core methods

2018-06-11 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7/
---

(Updated June 11, 2018, 11:27 a.m.)


Review request for hive, Peter Vary, Sahil Takiar, and Adam Szita.


Changes
---

Address review findings.


Bugs: HIVE-19046
https://issues.apache.org/jira/browse/HIVE-19046


Repository: hive-git


Description
---

The biggest part of these methods use the same code. Refactored these code 
parts to common methods.


Diffs (updated)
-

  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 b9f5fb8 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
 bf559b4 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
 4f11a55 


Diff: https://reviews.apache.org/r/7/diff/6/

Changes: https://reviews.apache.org/r/7/diff/5-6/


Testing
---


Thanks,

Marta Kuczora



[jira] [Created] (HIVE-19856) Fix TezPerfCliDriver/q64 stack overflow

2018-06-11 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-19856:
---

 Summary: Fix TezPerfCliDriver/q64 stack overflow
 Key: HIVE-19856
 URL: https://issues.apache.org/jira/browse/HIVE-19856
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


from time-to-time this test fails with a stack overflow error; it seems pretty 
hard to reproduce...

{code}
Regression
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query64]

Failing for the past 1 build (Since Failed#11667 )
Took 2.5 sec.
Stacktrace
java.lang.StackOverflowError
at 
org.apache.calcite.runtime.FlatLists$Flat4List.equals(FlatLists.java:716)
at 
org.apache.calcite.runtime.FlatLists$Flat3List.equals(FlatLists.java:577)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #371: HIVE-19853: Arrow serializer needs to create a TimeS...

2018-06-11 Thread pudidic
GitHub user pudidic opened a pull request:

https://github.com/apache/hive/pull/371

HIVE-19853: Arrow serializer needs to create a TimeStampMicroTZVector…

… instead of TimeStampMicroVector

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pudidic/hive HIVE-19853

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/371.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #371


commit f785b6d5603d94b126a9611b4a583e4803dd54f7
Author: Teddy Choi 
Date:   2018-06-11T09:29:57Z

HIVE-19853: Arrow serializer needs to create a TimeStampMicroTZVector 
instead of TimeStampMicroVector




---


[jira] [Created] (HIVE-19855) TestStatsUpdaterThread.testQueueingWithThreads fails often

2018-06-11 Thread Peter Vary (JIRA)
Peter Vary created HIVE-19855:
-

 Summary: TestStatsUpdaterThread.testQueueingWithThreads fails often
 Key: HIVE-19855
 URL: https://issues.apache.org/jira/browse/HIVE-19855
 Project: Hive
  Issue Type: Test
Reporter: Peter Vary


Taking a look at here, it seems, that 
TestStatsUpdaterThread.testQueueingWithThreads fails on almost every second run:

[https://builds.apache.org/job/PreCommit-HIVE-Build/11698/testReport/junit/org.apache.hadoop.hive.ql.stats/TestStatsUpdaterThread/testQueueingWithThreads/history/]

 

We should fix this



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19854) investigate tez_smb_main.q result difference

2018-06-11 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-19854:
---

 Summary: investigate tez_smb_main.q result difference
 Key: HIVE-19854
 URL: https://issues.apache.org/jira/browse/HIVE-19854
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


found while working on HIVE-19824:
tez_smb_main.q.out seems to give different results for the following query if 
the join is executed by a merge join

{code}
select count(*) from tab_n11 a join tab_part_n12 b on a.key = b.key join src1 c 
on a.value = c.value
{code}

old result: 9
mergejoin result: 40





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19853) Arrow serializer needs to create a TimeStampMicroTZVector instead of TimeStampMicroVector

2018-06-11 Thread Teddy Choi (JIRA)
Teddy Choi created HIVE-19853:
-

 Summary: Arrow serializer needs to create a TimeStampMicroTZVector 
instead of TimeStampMicroVector
 Key: HIVE-19853
 URL: https://issues.apache.org/jira/browse/HIVE-19853
 Project: Hive
  Issue Type: Bug
Reporter: Teddy Choi
Assignee: Teddy Choi


HIVE-19723 changed nanosecond to microsecond in Arrow serialization. However, 
it needs to be microsecond with time zone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)