Re: Review Request 49965: HIVE-13995 Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-20 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49965/#review143041
---




metastore/if/hive_metastore.thrift (line 548)


you should provide default value here.
5: optional i32 numPartitions = -1


- Ashutosh Chauhan


On July 20, 2016, 10:16 p.m., Hari Sankar Sivarama Subramaniyan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49965/
> ---
> 
> (Updated July 20, 2016, 10:16 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
>  d90085b 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggrStatsCacheIntegration.java
>  51d96dd 
>   metastore/if/hive_metastore.thrift 4d92b73 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
> 38c0eed 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
> 909d8eb 
>   metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
> b6fe502 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> 9c900af 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> 5adfa02 
>   metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java bbd47b8 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseStore.java 
> c65c7a4 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
>  1ea72a0 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
>  3e6acc7 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsCache.java
>  6cd3a46 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsCacheWithBitVector.java
>  e0c4094 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsExtrapolation.java
>  f4e55ed 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsNDVUniformDist.java
>  62918be 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ef0bb3d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
> 26e936e 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/PrunedPartitionList.java 
> da2e1e2 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java d8acf94 
> 
> Diff: https://reviews.apache.org/r/49965/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Hari Sankar Sivarama Subramaniyan
> 
>



Re: Review Request 49965: HIVE-13995 Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-20 Thread Hari Sankar Sivarama Subramaniyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49965/
---

(Updated July 20, 2016, 10:16 p.m.)


Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
partitions leading to higher compile time


Diffs (updated)
-

  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
 d90085b 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggrStatsCacheIntegration.java
 51d96dd 
  metastore/if/hive_metastore.thrift 4d92b73 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
38c0eed 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
909d8eb 
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
b6fe502 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
9c900af 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 5adfa02 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java bbd47b8 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseStore.java 
c65c7a4 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
 1ea72a0 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 3e6acc7 
  
metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsCache.java
 6cd3a46 
  
metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsCacheWithBitVector.java
 e0c4094 
  
metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsExtrapolation.java
 f4e55ed 
  
metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsNDVUniformDist.java
 62918be 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ef0bb3d 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
26e936e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/PrunedPartitionList.java da2e1e2 
  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java d8acf94 

Diff: https://reviews.apache.org/r/49965/diff/


Testing
---


Thanks,

Hari Sankar Sivarama Subramaniyan



Re: Review Request 49965: HIVE-13995 Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-20 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49965/#review142949
---




metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
(line 1313)


this breaks extrapolation.


- Ashutosh Chauhan


On July 20, 2016, 6:15 a.m., Hari Sankar Sivarama Subramaniyan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49965/
> ---
> 
> (Updated July 20, 2016, 6:15 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> 
> 
> Diffs
> -
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
> 38c0eed 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
> 909d8eb 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> 9c900af 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
> 26e936e 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/PrunedPartitionList.java 
> da2e1e2 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java d8acf94 
> 
> Diff: https://reviews.apache.org/r/49965/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Hari Sankar Sivarama Subramaniyan
> 
>



Re: Review Request 49965: HIVE-13995 Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-20 Thread Hari Sankar Sivarama Subramaniyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49965/
---

(Updated July 20, 2016, 6:15 a.m.)


Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
partitions leading to higher compile time


Diffs (updated)
-

  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
38c0eed 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
909d8eb 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
9c900af 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
26e936e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/PrunedPartitionList.java da2e1e2 
  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java d8acf94 

Diff: https://reviews.apache.org/r/49965/diff/


Testing
---


Thanks,

Hari Sankar Sivarama Subramaniyan



Re: Review Request 49965: HIVE-13995 Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan


> On July 19, 2016, 9:40 p.m., Ashutosh Chauhan wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java,
> >  lines 1255-1258
> > 
> >
> > I don't think this null check is needed.

Ok, noticed we wont hit this code if partNames is null.


> On July 19, 2016, 9:40 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java, line 310
> > 
> >
> > getPartsFound() should be set correctly from server and this means this 
> > logic can be improved here.

In the case where partNames = null, does it not mean that we have retrieved 
column stats for all the partitions and hence not set the colState to 
State.PARTIAL.


- Hari Sankar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49965/#review142839
---


On July 19, 2016, 9:19 p.m., Hari Sankar Sivarama Subramaniyan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49965/
> ---
> 
> (Updated July 19, 2016, 9:19 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> 
> 
> Diffs
> -
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
> 38c0eed 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
> 909d8eb 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> 9c900af 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
> 26e936e 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/PrunedPartitionList.java 
> da2e1e2 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java d8acf94 
> 
> Diff: https://reviews.apache.org/r/49965/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Hari Sankar Sivarama Subramaniyan
> 
>



Re: Review Request 49965: HIVE-13995 Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49965/#review142839
---




metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
(line 1200)


Can you add a comment here saying
partNames = null means all partitions.



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
(line 1212)


This needs to be revisited. Can you leave an appropriate TODO here.



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
(lines 1255 - 1258)


I don't think this null check is needed.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java (line 
394)


Is this needed?



ql/src/java/org/apache/hadoop/hive/ql/parse/PrunedPartitionList.java (lines 69 
- 71)


Is this needed?



ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java (line 310)


getPartsFound() should be set correctly from server and this means this 
logic can be improved here.


- Ashutosh Chauhan


On July 19, 2016, 9:19 p.m., Hari Sankar Sivarama Subramaniyan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49965/
> ---
> 
> (Updated July 19, 2016, 9:19 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> 
> 
> Diffs
> -
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
> 38c0eed 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
> 909d8eb 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> 9c900af 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
> 26e936e 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/PrunedPartitionList.java 
> da2e1e2 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java d8acf94 
> 
> Diff: https://reviews.apache.org/r/49965/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Hari Sankar Sivarama Subramaniyan
> 
>



Re: Review Request 49965: HIVE-13995 Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49965/
---

(Updated July 19, 2016, 9:19 p.m.)


Review request for hive and Ashutosh Chauhan.


Changes
---

Ran the failing clidriver test suites, checked hive.log to make sure that we 
are successfully removing the in clause at metastoredirectsql when all 
partitions are found.


Repository: hive-git


Description
---

Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
partitions leading to higher compile time


Diffs (updated)
-

  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
38c0eed 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
909d8eb 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
9c900af 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
26e936e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/PrunedPartitionList.java da2e1e2 
  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java d8acf94 

Diff: https://reviews.apache.org/r/49965/diff/


Testing
---


Thanks,

Hari Sankar Sivarama Subramaniyan



Re: Review Request 49965: HIVE-13995 Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49965/#review142821
---




ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java (line 275)


This should be (partNames == null ||   partNames.size() > 0)


- Hari Sankar Sivarama Subramaniyan


On July 19, 2016, 8:01 p.m., Hari Sankar Sivarama Subramaniyan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49965/
> ---
> 
> (Updated July 19, 2016, 8:01 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> 
> 
> Diffs
> -
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> 9c900af 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
> 26e936e 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/PrunedPartitionList.java 
> da2e1e2 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java d8acf94 
> 
> Diff: https://reviews.apache.org/r/49965/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Hari Sankar Sivarama Subramaniyan
> 
>



Re: Review Request 49965: HIVE-13995 Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49965/
---

(Updated July 19, 2016, 8:01 p.m.)


Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
partitions leading to higher compile time


Diffs (updated)
-

  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
9c900af 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
26e936e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/PrunedPartitionList.java da2e1e2 
  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java d8acf94 

Diff: https://reviews.apache.org/r/49965/diff/


Testing
---


Thanks,

Hari Sankar Sivarama Subramaniyan



Re: Review Request 49965: HIVE-13995 Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49965/
---

(Updated July 19, 2016, 7:26 a.m.)


Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
partitions leading to higher compile time


Diffs (updated)
-

  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
9c900af 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
26e936e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/PrunedPartitionList.java da2e1e2 
  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java d8acf94 

Diff: https://reviews.apache.org/r/49965/diff/


Testing
---


Thanks,

Hari Sankar Sivarama Subramaniyan