Re: Review Request 70256: HIVE-21480: Fixed flaky and broken test TestHiveMetaStore.testJDOPersistenceManagerCleanup

2019-03-21 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70256/#review213878
---




standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
Lines 3153-3155 (patched)


My concern here is that we testing a different case with the patch.

Before we tested that we open/use/close a client and do not have lingering 
object. After the patch we test that closing the client will remove the 1 
object - which says that the getAllDatabases will result in exactly 1 object.

How could that happen that there are lingering objects when we create a new 
client? Is there a way to get rid of the lingering object somehow, and then 
test the original usecase?


- Peter Vary


On márc. 20, 2019, 6:43 du, Morio Ramdenbourg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70256/
> ---
> 
> (Updated márc. 20, 2019, 6:43 du)
> 
> 
> Review request for hive, Adam Holley, Karthik Manamcheri, Karen Coppage, 
> Peter Vary, and Vihang Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This test was not correctly counting the number of
> objects in the PersistenceManager cache before and after 
> HiveMetaStoreClient.close(). The
> getJDOPersistenceManagerCacheSize() internal helper method did not use
> the updated fields present in the metastore classes, and was
> consistently returning -1. Additionally, there was a chance to cause
> flakiness since the object count before and after close() could
> differ depending on lingering objects from previous
> tests or setup.
> 
> Modified the helper method to use the new
> fields, and fixed the flakiness on this test.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
>  77e0c98265e7b561f2eb39536e3251dd92e9cab0 
> 
> 
> Diff: https://reviews.apache.org/r/70256/diff/1/
> 
> 
> Testing
> ---
> 
> Unit tests run
> 
> 
> Thanks,
> 
> Morio Ramdenbourg
> 
>



Re: Introduce FORMAT clause to CAST with SQL:2016 datetime patterns

2019-03-21 Thread Gabor Kaszab
Thanks for the quick feedbacks, Maciej and Shawn!

Maciej:
The concern about confusing users with supporting multiple datetime
patterns is a valid one. The cleanest way to introduce SQL:2016 patterns
would be to drop the existing pattern support (SimpleDateFormat in case of
Impala) and replace it with the new approach. This however, would break
backwards compatibility and would break existing user workflows that use
the old pattern. So in order to introduce the patterns from the standard
(to be in sync with RDBMS like Oracle, Postgre and so on) I see the only
way is to have both approaches next to each other. To reduce user confusion
I think we should put emphasis on the docs to have a good coverage on this
topic and clarify in which scenario which pattern is used.

Cheers,
Gabor




On Wed, Mar 20, 2019 at 9:37 PM Shawn Weeks 
wrote:

> I’ve done some work on a to timestamp function for hive and one of the
> things I keep running into is most date time libraries don’t support
> fractional seconds for their format patterns yet most rdbms do support
> fractional seconds. It tends to trip things up when your porting sql over.
> If we’re going the cast with format way everywhere I’d like it to support
> that
>
> Thanks
> Shawn Weeks
>
> Sent from my iPhone
>
> > On Mar 20, 2019, at 4:53 AM, Gabor Kaszab 
> wrote:
> >
> > Hey Hive and Spark communities,
> > [dev@impala in cc]
> >
> > I'm working on an Impala improvement to introduce the FORMAT clause
> within
> > CAST() operator and to implement ISO SQL:2016 datetime pattern support
> for
> > this new FORMAT clause:
> > https://issues.apache.org/jira/browse/IMPALA-4018
> >
> > One example of the new format:
> > SELECT(CAST("2018-01-02 09:15" as timestamp FORMAT "-MM-DD
> HH12:MI"));
> >
> > I have put together a document for my proposal of how to do this in
> Impala
> > and what patterns we plan to support to cover the SQL standard and what
> > additional patterns we propose to support on top of the standard's
> > recommendation.
> >
> https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/
> >
> > The reason I share this with the Hive and Spark communities because I
> feel
> > it would be nice that these systems were in line with the Impala
> > implementation. So I'd like to involve these communities to the planning
> > phase of this task so that everyone can share their opinion about whether
> > this make sense in the proposed form.
> > Eventually I feel that each of these systems should have the SQL:2016
> > datetime format and I think it would be nice to have it with a newly
> > introduced CAST(..FORMAT..) clause.
> >
> > I would like to ask members from both Hive and Spark to take a look at my
> > proposal and share their opinion from their own component's perspective.
> If
> > we get on the same page I'll eventually open Jiras to cover this
> > improvement for each mentioned systems.
> >
> > Cheers,
> > Gabor
>


[jira] [Created] (HIVE-21485) Hive desc operation takes more than 100 seconds after upgrading from Hive 1.2.1 to 2.3.4

2019-03-21 Thread Qingxin Wu (JIRA)
Qingxin Wu created HIVE-21485:
-

 Summary: Hive desc operation takes more than 100 seconds after 
upgrading from Hive 1.2.1 to 2.3.4
 Key: HIVE-21485
 URL: https://issues.apache.org/jira/browse/HIVE-21485
 Project: Hive
  Issue Type: Bug
  Components: CLI, Hive
Affects Versions: 2.3.4
Reporter: Qingxin Wu


Hive desc [formatted|extended] operation cost more than 100 seconds after 
upgrading from Hive 1.2.1 to 2.3.4. This is mainly caused by showing stats for 
partitioned tables which was introduced by HIVE-16098 when the partitioned 
tables have a large amount of partitions. In our case, the number of partition 
is 187221.
{code:java}
hive> desc bus.kafka_data;
OK
id  string
...
d   map
stat_date   string
log_id  string

# Partition Information
# col_name  data_type   comment

stat_date   string
log_id  string
Time taken: 115.342 seconds, Fetched: 42 row(s)
{code}
same operation executed in hive-1.2.1 and only cost 2 seconds.
{code:java}
hive> desc bus.kafka_data;
OK
id  string
...
d   map
stat_date   string
log_id  string

# Partition Information
# col_name  data_type   comment

stat_date   string
log_id  string
Time taken: 2.037 seconds, Fetched: 42 row(s)

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [hive] Fokko opened a new pull request #576: FLINK-11992 Update Apache Parquet to 1.10.1

2019-03-21 Thread GitBox
Fokko opened a new pull request #576: FLINK-11992 Update Apache Parquet to 
1.10.1
URL: https://github.com/apache/hive/pull/576
 
 
   Fixes two bugs which were discovered in Apache Spark:
   https://github.com/apache/parquet-mr/blob/master/CHANGES.md#version-1101


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HIVE-21486) FinalSelectOps is empty in lineage index if there is a script operator(transform)

2019-03-21 Thread Zihao Ye (JIRA)
Zihao Ye created HIVE-21486:
---

 Summary: FinalSelectOps is empty in lineage index if there is a 
script operator(transform)
 Key: HIVE-21486
 URL: https://issues.apache.org/jira/browse/HIVE-21486
 Project: Hive
  Issue Type: Bug
  Components: lineage
Affects Versions: 2.3.4, 2.1.1
Reporter: Zihao Ye


SQL pattern:

create table t1 as select transform(c1) using '/bin/python script.py' as (c2) 
from t2;

Lineage dependencies are correct. But the SelectOperator is not added to the 
finalSelectOps in Lineage Index. So that index.getDependencies(finalSelOp) got 
null in this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21487) COMPLETED_COMPACTIONS table missing appropriate indexes

2019-03-21 Thread Todd Lipcon (JIRA)
Todd Lipcon created HIVE-21487:
--

 Summary: COMPLETED_COMPACTIONS table missing appropriate indexes
 Key: HIVE-21487
 URL: https://issues.apache.org/jira/browse/HIVE-21487
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: Todd Lipcon


Looking at a MySQL install where HMS is pointed on Hive 3.1, I see a constant 
stream of queries of the form:
{code}
select CC_STATE from COMPLETED_COMPACTIONS where CC_DATABASE = 
'tpcds_orc_exact_1000' and CC_TABLE = 'catalog_returns' and CC_PARTITION = 
'cr_returned_date_sk=2452851' and CC_STATE != 'a' order by CC_ID desc;
{code}

but the COMPLETED_COMPACTIONS table has no index. In this case it's resulting 
in a full table scan over 115k rows, which takes around 100ms.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 70256: HIVE-21480: Fixed flaky and broken test TestHiveMetaStore.testJDOPersistenceManagerCleanup

2019-03-21 Thread Morio Ramdenbourg via Review Board


> On March 21, 2019, 10:18 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
> > Lines 3153-3155 (patched)
> > 
> >
> > My concern here is that we testing a different case with the patch.
> > 
> > Before we tested that we open/use/close a client and do not have 
> > lingering object. After the patch we test that closing the client will 
> > remove the 1 object - which says that the getAllDatabases will result in 
> > exactly 1 object.
> > 
> > How could that happen that there are lingering objects when we create a 
> > new client? Is there a way to get rid of the lingering object somehow, and 
> > then test the original usecase?

My knowledge on the PersistenceManager code isn't that great, but before the 
new client is even created, the object count returned from 
getJDOPersistenceManagerCacheSize() is 1. I believe this object comes from when 
the HMS is initializing the database schema. There seems to be a lingering 
object from that, at least when I run this test as a standalone without the 
other tests in the class.


- Morio


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70256/#review213878
---


On March 20, 2019, 6:43 p.m., Morio Ramdenbourg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70256/
> ---
> 
> (Updated March 20, 2019, 6:43 p.m.)
> 
> 
> Review request for hive, Adam Holley, Karthik Manamcheri, Karen Coppage, 
> Peter Vary, and Vihang Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This test was not correctly counting the number of
> objects in the PersistenceManager cache before and after 
> HiveMetaStoreClient.close(). The
> getJDOPersistenceManagerCacheSize() internal helper method did not use
> the updated fields present in the metastore classes, and was
> consistently returning -1. Additionally, there was a chance to cause
> flakiness since the object count before and after close() could
> differ depending on lingering objects from previous
> tests or setup.
> 
> Modified the helper method to use the new
> fields, and fixed the flakiness on this test.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
>  77e0c98265e7b561f2eb39536e3251dd92e9cab0 
> 
> 
> Diff: https://reviews.apache.org/r/70256/diff/1/
> 
> 
> Testing
> ---
> 
> Unit tests run
> 
> 
> Thanks,
> 
> Morio Ramdenbourg
> 
>



Re: Review Request 70256: HIVE-21480: Fixed flaky and broken test TestHiveMetaStore.testJDOPersistenceManagerCleanup

2019-03-21 Thread Morio Ramdenbourg via Review Board


> On March 21, 2019, 10:18 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
> > Lines 3153-3155 (patched)
> > 
> >
> > My concern here is that we testing a different case with the patch.
> > 
> > Before we tested that we open/use/close a client and do not have 
> > lingering object. After the patch we test that closing the client will 
> > remove the 1 object - which says that the getAllDatabases will result in 
> > exactly 1 object.
> > 
> > How could that happen that there are lingering objects when we create a 
> > new client? Is there a way to get rid of the lingering object somehow, and 
> > then test the original usecase?
> 
> Morio Ramdenbourg wrote:
> My knowledge on the PersistenceManager code isn't that great, but before 
> the new client is even created, the object count returned from 
> getJDOPersistenceManagerCacheSize() is 1. I believe this object comes from 
> when the HMS is initializing the database schema. There seems to be a 
> lingering object from that, at least when I run this test as a standalone 
> without the other tests in the class.

Do you know of other ways I can clear / wait for the PMF cache to empty itself?


- Morio


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70256/#review213878
---


On March 20, 2019, 6:43 p.m., Morio Ramdenbourg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70256/
> ---
> 
> (Updated March 20, 2019, 6:43 p.m.)
> 
> 
> Review request for hive, Adam Holley, Karthik Manamcheri, Karen Coppage, 
> Peter Vary, and Vihang Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This test was not correctly counting the number of
> objects in the PersistenceManager cache before and after 
> HiveMetaStoreClient.close(). The
> getJDOPersistenceManagerCacheSize() internal helper method did not use
> the updated fields present in the metastore classes, and was
> consistently returning -1. Additionally, there was a chance to cause
> flakiness since the object count before and after close() could
> differ depending on lingering objects from previous
> tests or setup.
> 
> Modified the helper method to use the new
> fields, and fixed the flakiness on this test.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
>  77e0c98265e7b561f2eb39536e3251dd92e9cab0 
> 
> 
> Diff: https://reviews.apache.org/r/70256/diff/1/
> 
> 
> Testing
> ---
> 
> Unit tests run
> 
> 
> Thanks,
> 
> Morio Ramdenbourg
> 
>



[GitHub] [hive] shawnweeks commented on issue #575: HIVE-21409 Add Jars to Session Conf ClassLoader

2019-03-21 Thread GitBox
shawnweeks commented on issue #575: HIVE-21409 Add Jars to Session Conf 
ClassLoader
URL: https://github.com/apache/hive/pull/575#issuecomment-475433844
 
 
   Closing this as this breaks Ranger. Will continue troubleshooting.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] shawnweeks closed pull request #575: HIVE-21409 Add Jars to Session Conf ClassLoader

2019-03-21 Thread GitBox
shawnweeks closed pull request #575: HIVE-21409 Add Jars to Session Conf 
ClassLoader
URL: https://github.com/apache/hive/pull/575
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] TopGunViper opened a new pull request #577: HIVE-21485: Add flag to turn off fetching partition stats in DESCRIBE…

2019-03-21 Thread GitBox
TopGunViper opened a new pull request #577: HIVE-21485: Add flag to turn off 
fetching partition stats in DESCRIBE…
URL: https://github.com/apache/hive/pull/577
 
 
   ## What changes were proposed in this pull request?
   Hive DESCRIBE [formatted|extended] operation cost more than 100 seconds 
after upgrading from Hive 1.2.1 to 2.3.4. This is mainly caused by showing 
stats for partitioned tables which was introduced by 
[HIVE-16098](https://issues.apache.org/jira/browse/HIVE-16098) when the 
partitioned tables have a large amount of partitions.
   
   So,could we add a flag that determines whether 'DESCRIBE 
[EXTENDED|FORMATTED]' operation display partitioned table stats or not?
   
   ## How was this patch tested?
   Query Unit Test


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] rmsmani commented on issue #577: HIVE-21485: Add flag to turn off fetching partition stats in DESCRIBE…

2019-03-21 Thread GitBox
rmsmani commented on issue #577: HIVE-21485: Add flag to turn off fetching 
partition stats in DESCRIBE…
URL: https://github.com/apache/hive/pull/577#issuecomment-475480051
 
 
   1. Create a patch file (patch file is diff between the latest commit as 
available in master with your proposed changes)
   2. Upload the patch file to Jira card
   3 click the submit patch button
   
   The automatic pre built testing will be triggered at Jenkins and the results 
will be uploaded to Jira card upon build completion.
   
   For more details check how to contribute to hive wiki


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] rmsmani commented on issue #576: FLINK-11992 Update Apache Parquet to 1.10.1

2019-03-21 Thread GitBox
rmsmani commented on issue #576: FLINK-11992 Update Apache Parquet to 1.10.1
URL: https://github.com/apache/hive/pull/576#issuecomment-475505501
 
 
   Hi @Fokko 
   What's the hive Jira number for this...
   If Jira is not available, then please create Jira card, 
   1. Create git diff patch
   2. Upload the patch to Jira 
   3. Then submit patch


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services