[jira] [Created] (HIVE-12916) Cleanup metastore to put unnecessary parameters in EnvironmentContext

2016-01-22 Thread Pengcheng Xiong (JIRA)
Pengcheng Xiong created HIVE-12916:
--

 Summary: Cleanup metastore to put unnecessary parameters in 
EnvironmentContext
 Key: HIVE-12916
 URL: https://issues.apache.org/jira/browse/HIVE-12916
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong


for example, void alter_table(1:string dbname, 2:string tbl_name, 3:Table 
new_tbl) and void alter_table_with_environment_context(1:string dbname, 
2:string tbl_name, 3:Table new_tbl, 4:EnvironmentContext environment_context), 
we plan to keep only the latter one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12915) Tez session pool has concurrency issues during init

2016-01-22 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-12915:
---

 Summary: Tez session pool has concurrency issues during init
 Key: HIVE-12915
 URL: https://issues.apache.org/jira/browse/HIVE-12915
 Project: Hive
  Issue Type: Bug
Reporter: Takahiko Saito
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 42190: HIVE-12478

2016-01-22 Thread John Pullokkaran

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42190/#review115960
---




ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java 
(line 636)


NitPick: Doc seems unfinished



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java
 (line 112)


What about other operators?
HiveFilter, HiveProject .



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRulesRegistry.java
 (line 34)


Does this needs to be multimap?


- John Pullokkaran


On Jan. 21, 2016, 10:45 p.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42190/
> ---
> 
> (Updated Jan. 21, 2016, 10:45 p.m.)
> 
> 
> Review request for hive and John Pullokkaran.
> 
> 
> Bugs: HIVE-12478
> https://issues.apache.org/jira/browse/HIVE-12478
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve Hive/Calcite Trasitive Predicate inference
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java 
> 4825a617876374085b6fac1192ba1531ec916bce 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveHepPlannerContext.java
>  ad79aeec2fbc0454ab1ccc608944752d01324dca 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveVolcanoPlannerContext.java
>  8859fc268666cef1be283a9179aa0beb7ef1bdeb 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java
>  d15d885d2348d666df069228a93d6c5f914c79df 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveVolcanoPlanner.java
>  8610edc5ddc00d523610fb29f5e504c3e876a542 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java
>  27b1e76a104dc961cb4bce554602d90b3aa867e0 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveSemiJoin.java
>  35586768c2c2b81e4213495632e4457dd3d70443 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveUnion.java
>  8b57b3504c407b8a1e73d48ea240c4ec7558b327 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinAddNotNullRule.java
>  de880ce26f1e172288f700c8566fbe71f42af115 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinPushTransitivePredicatesRule.java
>  703c8c6dbdfa281443cbcf7b08de2266697da8a9 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePreFilteringRule.java
>  d37fc0e08d5e41b29539a990e6638385c1135eec 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRulesRegistry.java
>  18a065e87e1ec266bf28b4ccfe10a1f863f847c2 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 
> 3fefbd710c4bb81d5f746cd91889b532b0a6029f 
>   
> ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/TestCBORuleFiredOnlyOnce.java
>  f1d8d1de00e9de7fa9ffea7d3aa2400e5073ac9c 
>   ql/src/test/results/clientpositive/annotate_stats_join.q.out 
> 7fc754d5712d5f05efc943b66d3c829d47312d19 
>   ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out 
> f13643e8db57cc0a85b2626c37437fd030f72029 
>   ql/src/test/results/clientpositive/annotate_stats_select.q.out 
> b158d8567f6cc02d990d175e93996239aba0c5ed 
>   ql/src/test/results/clientpositive/auto_join12.q.out 
> 8ef3664764d04f53f3685d8f66dc4a353776a488 
>   ql/src/test/results/clientpositive/auto_join16.q.out 
> c1da6d2968697d304311044d358f1af267dc6e60 
>   ql/src/test/results/clientpositive/auto_join_reordering_values.q.out 
> 59aa738c779d50a760e1b0d36e4ce83295b0d70f 
>   ql/src/test/results/clientpositive/auto_join_without_localtask.q.out 
> d40b1655e27fa70efc8dbf0475c688f6e2b3608f 
>   ql/src/test/results/clientpositive/auto_sortmerge_join_6.q.out 
> cb87f761be7c58ecc6435bb0a9b0e96c46a36828 
>   ql/src/test/results/clientpositive/bucket_map_join_spark4.q.out 
> 4abdab53b5562dc129a2d9a73c63cf44d066c05e 
>   ql/src/test/results/clientpositive/bucketizedhiveinputformat.q.out 
> cfb95be72b32a354faeddc79dc52bc29c7593a2b 
>   ql/src/test/results/clientpositive/bucketsortoptimize_insert_6.q.out 
> a7ad04c7208f8e8486e79c1e749184e5b532a1fc 
>   ql/src/test/results/clientpositive/cast1.q.out 
> 48a0c14031ef38dd5b4df7efa718a4d6ce04bc94 
>   ql/src/test/results/clientpositive/cbo_const.q.out 
> adc5232a67b2243dd5f09acaf1f7c49baea5daad 
>   ql/src/test/results/clientpositive/cbo_rp_cross_product_check_2.q.out 
> f1707eb4146c55338500c1fcaf4ff7199750250f 
>   ql/src/test/results/clientpositive/cbo_rp_lineage2.q.out 
> 1b2a2ab1af5992753c37d053942ecb2ebf7

Re: Review Request 42487: Use bit vector to track NDV

2016-01-22 Thread John Pullokkaran

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42487/#review115948
---




metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java
 (line 46)


We should meassure, computational/space complexity of Bitwise opeartions & 
FastBitSet.

It would be good to get a meassure for 1000, 1, 50 partitions



ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java (line 1563)


Why is this needed?
Seems like its not used.


- John Pullokkaran


On Jan. 22, 2016, 7:09 p.m., pengcheng xiong wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42487/
> ---
> 
> (Updated Jan. 22, 2016, 7:09 p.m.)
> 
> 
> Review request for hive, Alan Gates and John Pullokkaran.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-12763
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 9193f80 
>   metastore/if/hive_metastore.thrift 81837e6 
>   metastore/pom.xml a8e84a1 
>   
> metastore/src/gen/protobuf/gen-java/org/apache/hadoop/hive/metastore/hbase/HbaseMetastoreProto.java
>  39a7278 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h ce1d7da 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 0203b06 
>   
> metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/BinaryColumnStatsData.java
>  84e393c 
>   
> metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/BooleanColumnStatsData.java
>  6aa4668 
>   
> metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/DateColumnStatsData.java
>  2ebb811 
>   
> metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/DecimalColumnStatsData.java
>  720176a 
>   
> metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/DoubleColumnStatsData.java
>  5d48b5d 
>   
> metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LongColumnStatsData.java
>  2f41c5a 
>   
> metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StringColumnStatsData.java
>  bd8a922 
>   metastore/src/gen/thrift/gen-php/metastore/Types.php 380e6d0 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ttypes.py 409c247 
>   metastore/src/gen/thrift/gen-rb/hive_metastore_types.rb a473611 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java
>  PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseUtils.java 
> f4df2e2 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/StatsCache.java 
> 5ec60be 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/BinaryColumnStatsAggregator.java
>  bbd2c7b 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/BooleanColumnStatsAggregator.java
>  9047f68 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregator.java
>  217b654 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregatorFactory.java
>  a8dbc1f 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java
>  ec25b31 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java
>  71af0ac 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java
>  15b8cf7 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java
>  fe1a04c 
>   
> metastore/src/protobuf/org/apache/hadoop/hive/metastore/hbase/hbase_metastore_proto.proto
>  0d0ef89 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseStoreBitVector.java
>  PRE-CREATION 
>   ql/pom.xml 358cd2a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 7914471 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
> 1f30cbd 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java b4cf58f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
>  0e96f89 
>   ql/src/test/queries/clientpositive/tez_aggr_part_stats.q PRE-CREATION 
>   ql/src/test/results/clientpositive/char_udf1.q.java1.7.out bfed116 
>   ql/src/test/results/clientpositive/columnstats_partlvl.q.out b7c9075 
>   ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out 9685202 
>   ql/src/test/results/clientpositive/compute_stats_date.q.out b57a862 
>   ql/src/test/results/clientpositive/compute_stats_decimal.q.out 35abb37 
>   ql/src/test/results/clientpositive/compute

Re: Review Request 42508: HIVE-12889: Support COUNT(DISTINCT) for partitioning query.

2016-01-22 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42508/#review115951
---




ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java (line 
160)


Chatted with Aihua offline, following the example of the similar function 
GenericUDAFMin/Max will be easier :)


- Szehon Ho


On Jan. 20, 2016, 5:05 p.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42508/
> ---
> 
> (Updated Jan. 20, 2016, 5:05 p.m.)
> 
> 
> Review request for hive, Chaoyu Tang, Szehon Ho, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-12889: Support COUNT(DISTINCT) for partitioning query.
> 
> 
> Diffs
> -
> 
>   data/files/windowing_distinct.txt PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/functions/HiveSqlCountAggFunction.java
>  7937040 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/functions/HiveSqlSumAggFunction.java
>  8f62970 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java
>  e2fbb4f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/SqlFunctionConverter.java
>  37249f9 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 3fefbd7 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 15ca754 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/PTFInvocationSpec.java 29b8510 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 15773e5 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java a181f7c 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java 
> eaf112e 
>   ql/src/test/queries/clientpositive/windowing_distinct.q PRE-CREATION 
>   ql/src/test/results/clientpositive/windowing_distinct.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/42508/diff/
> 
> 
> Testing
> ---
> 
> Support count(distinct) over partitioning window. 
> 
> 1. Enabling the parser to properly parse such query "count(distinct) over 
> (partition by c1)";
> 2. ORDER BY and windowing frame won't work with the functions of distinct due 
> to performance concern and implementation requirement.
> 3. We insert the distinct fields into the order by list, so during counting, 
> we only need to compare the current row against the previous remembered row.
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>



Re: Review Request 41821: HIVE-12767: Implement table property to address Parquet int96 timestamp bug

2016-01-22 Thread Ryan Blue

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41821/#review115731
---




ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 
(line 139)


One of the test cases listed in the scope doc is when the zone ID is 
unknown, like "Antarctica/South_Pole", that isn't valid. In that case the table 
should not be readable because the correct zone offset can't be determined.

`TimeZone.getTimeZone(String)` will return GMT when the zone id isn't 
recognized, so I'm concerned that this will do the wrong thing and use GMT in 
some cases where the zone should be rejected.

See 
https://docs.oracle.com/javase/7/docs/api/java/util/TimeZone.html#getTimeZone(java.lang.String)
You might need to make a set of available IDs using 
https://docs.oracle.com/javase/7/docs/api/java/util/TimeZone.html#getAvailableIDs()



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
(line 168)


It looks like this removes support for 
`HiveConf.ConfVars.HIVE_PARQUET_TIMESTAMP_SKIP_CONVERSION.varname`. I think we 
should maintain support for that configuration setting since customers may 
already be using it and we don't want to change behavior. If it is set to true, 
it should cause the calendar to use UTC (so no adjustment is made). Howevre, 
the new table property should override the Hive setting. So I think the check 
should be in the Strings.isNullOrEmpty case.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
 (line 57)


Nit: We use UTC elsewhere. I believe that they are identical for our 
purposes, but I want to reduce confusion and would recommend UTC to GMT.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
 (line 211)


The file metadata is also passed in [via 
InitContext](https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordReader.java#L173).
 If the file has the write zone property set, then it should be validated 
against the configured write zone (if set). When present, the file's value 
should override the table value and there should be a warning if the table 
value doesn't match the file value. That covers cases where files are moved 
from one table to another.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
 (line 161)


Shouldn't this use UTC, which will apply an offset of 0?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
 (line 165)


When will the zone property be set before this method? Is this how the 
table property is passed in?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
(line 21)


I think calling this the default write zone is not quite correct. It is the 
default zone when the Hive config property is set, but the current zone is the 
default otherwise. What about calling this the PARQUET_INT96_NO_ADJUSTMENT_ZONE?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
(line 53)


This is a little odd. Normally, I would consider GMT/UTC to be the calendar 
that applies no adjustment. But what is happening in this code is that the UTC 
millisecond value is being constructed using values in that zone and then a new 
timestamp is created from it that is in the local zone. So local->local 
produces no adjustment, while GMT->local does.

I was thinking that GMT would always be used to create the milliseconds, 
then the calendar would be used to adjust the value by some amount. UTC by 0, 
EST by -5, and PST by -8.

I think it may be easier to have a method that does a single conversion to 
UTC in milliseconds. Then, adjust the value using a static offset like the one 
from `TimeZone.getOffset(utc_ms)`. That would result in a bit less work done 
here and I think would make the adjustment easier to reason about.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java
 (line 35)


What is the purpose of this property? Is this just a way to pass the time 
zone? Why not use the property used elsewhere?



ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 
(line 227)


It looks like this tests that 

Review Request 42671: Improve dynamic partition loading

2016-01-22 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42671/
---

Review request for hive.


Bugs: HIVE-12897
https://issues.apache.org/jira/browse/HIVE-12897


Repository: hive-git


Description
---

Reduces number of calls to Metastore.


Diffs
-

  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
eee7f1b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 3289cfc 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java efb50b2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 48105de 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 8a9411a 
  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 3fefbd7 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
1f30cbd 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainSQRewriteSemanticAnalyzer.java
 2c2339a 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainSemanticAnalyzer.java 
c1e9ec1 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9a3708c 
  ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 
5b4365c 
  ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicPartitionCtx.java 95d5635 

Diff: https://reviews.apache.org/r/42671/diff/


Testing
---

Regression test suite.


Thanks,

Ashutosh Chauhan



[jira] [Created] (HIVE-12914) Vectorization: vectorized SEL_ shows up as OP_

2016-01-22 Thread Gopal V (JIRA)
Gopal V created HIVE-12914:
--

 Summary: Vectorization: vectorized SEL_ shows up as OP_
 Key: HIVE-12914
 URL: https://issues.apache.org/jira/browse/HIVE-12914
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V
Priority: Minor


The OP_* names in the following explain will have issues when applying 
RuleRegex walkers against it.

Please note that we do not run any rule-regex after the vectorizer runs, so 
this is a possible bug with no ability to trigger it.

{code}
-|<-Map 1 [SIMPLE_EDGE] vectorized
-   Reduce Output Operator [RS_6]
-  key expressions:_col0 (type: int), _col1 (type: string)
-  sort order:++
-  Statistics:Num rows: 10 Data size: 1704 Basic stats: 
COMPLETE Column stats: NONE
-  Select Operator [OP_5]
- outputColumnNames:["_col0","_col1"]
- Statistics:Num rows: 10 Data size: 1704 Basic stats: 
COMPLETE Column stats: NONE
- TableScan [TS_0]
-ACID table:true
-alias:acid_vectorized
-Statistics:Num rows: 10 Data size: 1704 Basic stats: 
COMPLETE Column stats: NONE
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12913) Hive ptest is running tests on MR1 that must run only on MR2

2016-01-22 Thread JIRA
Sergio Peña created HIVE-12913:
--

 Summary: Hive ptest is running tests on MR1 that must run only on 
MR2
 Key: HIVE-12913
 URL: https://issues.apache.org/jira/browse/HIVE-12913
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Sergio Peña
Assignee: Sergio Peña


Some of our MR1 tests fail on some jenkins jobs because tests that are supposed 
to run only on MR2 are also running on MR1, and no test results are generated.

We should skip those tests to be run by MR2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 42487: Use bit vector to track NDV

2016-01-22 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42487/
---

(Updated Jan. 22, 2016, 7:09 p.m.)


Review request for hive, Alan Gates and John Pullokkaran.


Repository: hive-git


Description
---

HIVE-12763


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 9193f80 
  metastore/if/hive_metastore.thrift 81837e6 
  metastore/pom.xml a8e84a1 
  
metastore/src/gen/protobuf/gen-java/org/apache/hadoop/hive/metastore/hbase/HbaseMetastoreProto.java
 39a7278 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h ce1d7da 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 0203b06 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/BinaryColumnStatsData.java
 84e393c 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/BooleanColumnStatsData.java
 6aa4668 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/DateColumnStatsData.java
 2ebb811 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/DecimalColumnStatsData.java
 720176a 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/DoubleColumnStatsData.java
 5d48b5d 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LongColumnStatsData.java
 2f41c5a 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StringColumnStatsData.java
 bd8a922 
  metastore/src/gen/thrift/gen-php/metastore/Types.php 380e6d0 
  metastore/src/gen/thrift/gen-py/hive_metastore/ttypes.py 409c247 
  metastore/src/gen/thrift/gen-rb/hive_metastore_types.rb a473611 
  
metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java
 PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseUtils.java 
f4df2e2 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/StatsCache.java 
5ec60be 
  
metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/BinaryColumnStatsAggregator.java
 bbd2c7b 
  
metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/BooleanColumnStatsAggregator.java
 9047f68 
  
metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregator.java
 217b654 
  
metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregatorFactory.java
 a8dbc1f 
  
metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java
 ec25b31 
  
metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java
 71af0ac 
  
metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java
 15b8cf7 
  
metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java
 fe1a04c 
  
metastore/src/protobuf/org/apache/hadoop/hive/metastore/hbase/hbase_metastore_proto.proto
 0d0ef89 
  
metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseStoreBitVector.java
 PRE-CREATION 
  ql/pom.xml 358cd2a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 7914471 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
1f30cbd 
  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java b4cf58f 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 
0e96f89 
  ql/src/test/queries/clientpositive/tez_aggr_part_stats.q PRE-CREATION 
  ql/src/test/results/clientpositive/char_udf1.q.java1.7.out bfed116 
  ql/src/test/results/clientpositive/columnstats_partlvl.q.out b7c9075 
  ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out 9685202 
  ql/src/test/results/clientpositive/compute_stats_date.q.out b57a862 
  ql/src/test/results/clientpositive/compute_stats_decimal.q.out 35abb37 
  ql/src/test/results/clientpositive/compute_stats_double.q.out f6b4052 
  ql/src/test/results/clientpositive/compute_stats_empty_table.q.out f76c760 
  ql/src/test/results/clientpositive/compute_stats_long.q.out 2c6171d 
  ql/src/test/results/clientpositive/compute_stats_string.q.out bdf9d85 
  ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out 
ae39d18 
  ql/src/test/results/clientpositive/tez/tez_aggr_part_stats.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/varchar_udf1.q.java1.7.out 853bc4a 

Diff: https://reviews.apache.org/r/42487/diff/


Testing
---


Thanks,

pengcheng xiong



Re: Review Request 41377: HIVE-12528 don't start HS2 Tez sessions in a single thread

2016-01-22 Thread Siddharth Seth

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41377/#review115890
---

Ship it!


Ship It!

- Siddharth Seth


On Jan. 21, 2016, 12:27 a.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41377/
> ---
> 
> (Updated Jan. 21, 2016, 12:27 a.m.)
> 
> 
> Review request for hive, Gunther Hagleitner and Siddharth Seth.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see JIRA
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 32049eb 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java e8864ae 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 
> 3bfe35a 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java 
> a2791a1 
> 
> Diff: https://reviews.apache.org/r/41377/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



[jira] [Created] (HIVE-12912) Refactor setting internal configuration properties to a single place

2016-01-22 Thread Matt McCline (JIRA)
Matt McCline created HIVE-12912:
---

 Summary: Refactor setting internal configuration properties to a 
single place
 Key: HIVE-12912
 URL: https://issues.apache.org/jira/browse/HIVE-12912
 Project: Hive
  Issue Type: Bug
  Components: Hive, ORC
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical


Currently, filters and projections are set to conf object in HiveInputFormat 
(refer pushProjectionsAndFilters). Setting config transactional table is done 
in several places (Fetchtask, SMBMapJoinOperator, MapredLocalTask etc.). 
Setting schema evolution columns is done in Utilities. 

Although all the values are available in TableScanDesc we are setting them in 
different places making it harder to track. It's better to do the all the 
following in single place (may be HiveInputFormat?)
1) Filters
2) Projection
3) Transactional table boolean
4) Schema evolution columns



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 42508: HIVE-12889: Support COUNT(DISTINCT) for partitioning query.

2016-01-22 Thread Aihua Xu


> On Jan. 22, 2016, 8:04 a.m., Szehon Ho wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java, 
> > line 161
> > 
> >
> > Why do we need to do ArrayUtils.isEquals as well as hash comparison?
> > 
> > And another suggestion, can we do something like 
> > ObjectInspectorUtils.compare like other UDAF's?

Actually I have difficulties to make a copy of the previous rows and do a 
comparison. Also for your second question, I tried all the data types and 
noticed that I need to make a copy for Text/LazyString, but really not 100% 
sure if anything else needs a copy. Because of that, I added a hash comparison 
in addition to ArrayUtils.isEquals() for the comparison. In most of the cases 
(for all the hive data types), the hash should be able to represent the data, 
but as the standard, different data could have the same hash. That's why when 
the hash is the same, I need to compare the data itself to tell if they are the 
same or not. 

With the current hash code implementation for the hive data types, I think hash 
comparison probably is enough since the hash is generated from characters of 
Text/String/map/..., or numbers of int, double,etc. Of course, there is no such 
guarantee for that. How do you think? 

I checked ObjectInspectorUtils.compare(). That is much better for comparison. 

Do you think just hash comparison is enough? Any better idea to make a copy?


- Aihua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42508/#review115804
---


On Jan. 20, 2016, 5:05 p.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42508/
> ---
> 
> (Updated Jan. 20, 2016, 5:05 p.m.)
> 
> 
> Review request for hive, Chaoyu Tang, Szehon Ho, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-12889: Support COUNT(DISTINCT) for partitioning query.
> 
> 
> Diffs
> -
> 
>   data/files/windowing_distinct.txt PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/functions/HiveSqlCountAggFunction.java
>  7937040 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/functions/HiveSqlSumAggFunction.java
>  8f62970 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java
>  e2fbb4f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/SqlFunctionConverter.java
>  37249f9 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 3fefbd7 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 15ca754 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/PTFInvocationSpec.java 29b8510 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 15773e5 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java a181f7c 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java 
> eaf112e 
>   ql/src/test/queries/clientpositive/windowing_distinct.q PRE-CREATION 
>   ql/src/test/results/clientpositive/windowing_distinct.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/42508/diff/
> 
> 
> Testing
> ---
> 
> Support count(distinct) over partitioning window. 
> 
> 1. Enabling the parser to properly parse such query "count(distinct) over 
> (partition by c1)";
> 2. ORDER BY and windowing frame won't work with the functions of distinct due 
> to performance concern and implementation requirement.
> 3. We insert the distinct fields into the order by list, so during counting, 
> we only need to compare the current row against the previous remembered row.
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>



Re: Review Request 42626: CBO: Calcite Operator To Hive Operator (Calcite Return Path): MiniTezCliDriver.vector_join_filters.q failure

2016-01-22 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42626/#review115832
---

Ship it!


Ship It!

- Jesús Camacho Rodríguez


On Jan. 22, 2016, 12:27 a.m., Hari Sankar Sivarama Subramaniyan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42626/
> ---
> 
> (Updated Jan. 22, 2016, 12:27 a.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez and John Pullokkaran.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Please see the jira https://issues.apache.org/jira/browse/HIVE-12802
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java
>  f677f68 
> 
> Diff: https://reviews.apache.org/r/42626/diff/
> 
> 
> Testing
> ---
> 
> Precommit runs
> 
> 
> Thanks,
> 
> Hari Sankar Sivarama Subramaniyan
> 
>



[jira] [Created] (HIVE-12911) PPD might get exercised even when flag is false if CBO is on

2016-01-22 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-12911:
--

 Summary: PPD might get exercised even when flag is false if CBO is 
on
 Key: HIVE-12911
 URL: https://issues.apache.org/jira/browse/HIVE-12911
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 2.0.0, 2.1.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
Priority: Blocker


Introduced in HIVE-11865.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 42134: More information to user on GetOperationStatus in Hive Server2 when query is still executing

2016-01-22 Thread Akshay Goyal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42134/
---

(Updated Jan. 22, 2016, 10:25 a.m.)


Review request for hive and Amareshwari Sriramadasu.


Changes
---

added test cases for taskstatus


Bugs: HIVE-4570
https://issues.apache.org/jira/browse/HIVE-4570


Repository: hive-git


Description
---

Driver maintains list of running and runnable tasks although that info is not 
exposed outside. It's kept locally in the driver's execute method. We can add 
Driver.getTaskStatuses() to return status on all tasks (both running and 
runnable). Similarly, start and completion times for operations.

Proposed changes are :

struct TGetOperationStatusResp {
  1: required TStatus status
  2: optional TOperationState operationState

  // If operationState is ERROR_STATE, then the following fields may be set
  // sqlState as defined in the ISO/IEF CLI specification
  3: optional string sqlState

  // Internal error code
  4: optional i32 errorCode

  // Error message
  5: optional string errorMessage

  // List of statuses of sub tasks
  6: optional string taskStatus

  // When was the operation started
  7: optional i64 operationStarted
  // When was the operation completed
  8: optional i64 operationCompleted

}


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 75187cf 
  ql/src/java/org/apache/hadoop/hive/ql/TaskStatus.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 40c89cb 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java b184b4e 
  service-rpc/if/TCLIService.thrift 0aa9d13 
  service/src/java/org/apache/hive/service/cli/OperationStatus.java e45b828 
  
service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
 8868ec1 
  
service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java 
35b6c52 
  
service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
 8db2e62 
  
service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java 
d6f6280 
  
service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
 a09b39a 
  
service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java 
740b851 
  
service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
 2a0fec2 
  
service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java
 f5a9771 
  service/src/java/org/apache/hive/service/cli/operation/Operation.java 113eddf 
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
c8a69b9 
  
service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIServiceClient.java 
5f01165 
  service/src/test/org/apache/hive/service/cli/CLIServiceTest.java e78181a 

Diff: https://reviews.apache.org/r/42134/diff/


Testing
---


Thanks,

Akshay Goyal



Re: Review Request 42508: HIVE-12889: Support COUNT(DISTINCT) for partitioning query.

2016-01-22 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42508/#review115804
---


Overall logic makes sense.. just some (maybe basic) questions below.


ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java (line 
160)


Why do we need to do ArrayUtils.isEquals as well as hash comparison?

And another suggestion, can we do something like 
ObjectInspectorUtils.compare like other UDAF's?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java (line 
171)


Just curious how you know that Text/LazyString are the only ones that need 
a copy?  Are there other data types that also need it?


- Szehon Ho


On Jan. 20, 2016, 5:05 p.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42508/
> ---
> 
> (Updated Jan. 20, 2016, 5:05 p.m.)
> 
> 
> Review request for hive, Chaoyu Tang, Szehon Ho, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-12889: Support COUNT(DISTINCT) for partitioning query.
> 
> 
> Diffs
> -
> 
>   data/files/windowing_distinct.txt PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/functions/HiveSqlCountAggFunction.java
>  7937040 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/functions/HiveSqlSumAggFunction.java
>  8f62970 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java
>  e2fbb4f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/SqlFunctionConverter.java
>  37249f9 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 3fefbd7 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 15ca754 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/PTFInvocationSpec.java 29b8510 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 15773e5 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java a181f7c 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java 
> eaf112e 
>   ql/src/test/queries/clientpositive/windowing_distinct.q PRE-CREATION 
>   ql/src/test/results/clientpositive/windowing_distinct.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/42508/diff/
> 
> 
> Testing
> ---
> 
> Support count(distinct) over partitioning window. 
> 
> 1. Enabling the parser to properly parse such query "count(distinct) over 
> (partition by c1)";
> 2. ORDER BY and windowing frame won't work with the functions of distinct due 
> to performance concern and implementation requirement.
> 3. We insert the distinct fields into the order by list, so during counting, 
> we only need to compare the current row against the previous remembered row.
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>