Re: Review Request 71811: Extract Compiler from Driver

2019-11-29 Thread Miklos Gergely

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71811/
---

(Updated Nov. 29, 2019, 9:47 p.m.)


Review request for hive and Zoltan Haindrich.


Bugs: HIVE-22526
https://issues.apache.org/jira/browse/HIVE-22526


Repository: hive-git


Description
---

The Driver class contains ~600 lines of code responsible for compiling the 
command. That means that from the command String a Plan needs to be created, 
and also a transaction needs to be started (in most of the cases). This is a 
thing done by the compile function, which has a lot of sub functions to help 
this task, while itself is also really big. All these codes should be put into 
a separate class, where it can do it's job without getting mixed with the other 
codes in the Driver.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/Compiler.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java bb41c15bb4 
  ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java 1afcfc8969 
  ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java 26e904af0b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 4d79ebc933 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java fa6c9d03ec 


Diff: https://reviews.apache.org/r/71811/diff/2/

Changes: https://reviews.apache.org/r/71811/diff/1-2/


Testing
---

All the tests are still running fine.


Thanks,

Miklos Gergely



Re: Review Request 71811: Extract Compiler from Driver

2019-11-29 Thread Miklos Gergely


> On Nov. 28, 2019, 12:39 p.m., Zoltan Haindrich wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
> > Lines 167 (patched)
> > 
> >
> > I don't feel this closely connected to compilation; queryId could be 
> > assigned in the driver (followup?)

Moved queryId related stuff back to Driver.


- Miklos


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71811/#review218846
---


On Nov. 29, 2019, 9:47 p.m., Miklos Gergely wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71811/
> ---
> 
> (Updated Nov. 29, 2019, 9:47 p.m.)
> 
> 
> Review request for hive and Zoltan Haindrich.
> 
> 
> Bugs: HIVE-22526
> https://issues.apache.org/jira/browse/HIVE-22526
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The Driver class contains ~600 lines of code responsible for compiling the 
> command. That means that from the command String a Plan needs to be created, 
> and also a transaction needs to be started (in most of the cases). This is a 
> thing done by the compile function, which has a lot of sub functions to help 
> this task, while itself is also really big. All these codes should be put 
> into a separate class, where it can do it's job without getting mixed with 
> the other codes in the Driver.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Compiler.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java bb41c15bb4 
>   ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java 1afcfc8969 
>   ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java 26e904af0b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 4d79ebc933 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java fa6c9d03ec 
> 
> 
> Diff: https://reviews.apache.org/r/71811/diff/2/
> 
> 
> Testing
> ---
> 
> All the tests are still running fine.
> 
> 
> Thanks,
> 
> Miklos Gergely
> 
>



Re: Review Request 71811: Extract Compiler from Driver

2019-11-29 Thread Miklos Gergely

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71811/#review218856
---




ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 101 (patched)


Good point, let's put it into this patch. I've moved some functions into 
the DriverUtils, so now both Compiler and Driver can access them. Some other 
function call are put back into Driver.



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 106 (patched)


sure, done



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 125 (patched)


That was what I had in mind at first. Later I realized that the AST tree 
can not be the return of the parse function, as it may be modified later in the 
analyze function. Still I agree with you, let's have the semantic analyzer and 
the query plan at least as local variables, result of the functions analyze and 
createPlan. I also agree on returning the plan, it is making it cleaner indeed.



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 132 (patched)


Got rid of parseError and compileError as global variables, now they are 
local variables in compile. In cleanUp compileError was and still not always 
true, as cleanUp is also invoked in the finally block if there was no error at 
all.



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 161 (patched)


Moved back to Driver.



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 165 (patched)


Moved back to Driver.



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 167 (patched)


Moved back to Driver.



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 188 (patched)


I've moved everything back to the driver except the processing of the query 
string itself, I believe it may still should be here in the compiler. Let me 
know what you think.



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 193 (patched)


Ensuring of the context moved to Driver.



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 498 (patched)


Agree, moved it there.



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 565 (patched)


Moved it to Hive. Nice catch!


- Miklos Gergely


On Nov. 29, 2019, 9:47 p.m., Miklos Gergely wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71811/
> ---
> 
> (Updated Nov. 29, 2019, 9:47 p.m.)
> 
> 
> Review request for hive and Zoltan Haindrich.
> 
> 
> Bugs: HIVE-22526
> https://issues.apache.org/jira/browse/HIVE-22526
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The Driver class contains ~600 lines of code responsible for compiling the 
> command. That means that from the command String a Plan needs to be created, 
> and also a transaction needs to be started (in most of the cases). This is a 
> thing done by the compile function, which has a lot of sub functions to help 
> this task, while itself is also really big. All these codes should be put 
> into a separate class, where it can do it's job without getting mixed with 
> the other codes in the Driver.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Compiler.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java bb41c15bb4 
>   ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java 1afcfc8969 
>   ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java 26e904af0b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 4d79ebc933 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java fa6c9d03ec 
> 
> 
> Diff: https://reviews.apache.org/r/71811/diff/2/
> 
> 
> Testing
> ---
> 
> All the tests are still running fine.
> 
> 
> Thanks,
> 
> Miklos Gergely
> 
>



[jira] [Created] (HIVE-22565) Make calling alter_table unnecessary during inserts

2019-11-29 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created HIVE-22565:
--

 Summary: Make calling alter_table unnecessary during inserts
 Key: HIVE-22565
 URL: https://issues.apache.org/jira/browse/HIVE-22565
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Csaba Ringhofer


tl dr: it would be good to set the table's writeId during commit to make the 
extra alter_table call unnecessary

This came up during the implementation of (insert_only) ACID inserts in Apache 
Impala.

The following description deals with the non-partitioned case, partitioned 
tables are a bit more complicated.

apply_table is called by Impala during inserts  mainly to set stats to 
non-accurate:
- the table's writeId is set to the writeId of the insert
- remove table property column_stats_accurate

In the past we had the false assumption that setting the writeId is done 
automatically by committing the transaction. It would be nice to have a version 
of commit that actually does this - commits the transaction + changes the 
writeId/marks stats as inaccurate in a single atomic step.

The current state of alter_table + commit being non-atomic can lead to weird 
scenarios in parallel inserts(+ computes stats).

Impala calls apply_table before commit, so the calls to HMS during inserts look 
like this:
1. open  new transaction
2. get shared lock on the table
3. get write id
... write the files ...
4. call alter_table to remove column_stats_accurate (this also sets writeId)
5. commit the transaction

So the following can occur with two parallel writes + a compute stats: 
1. txn 1 calls alter_table (sets to writeId of txn 1)
2. txn 2 calls alter_table (sets to writeId of txn 2)
3. txn 2 is committed
4. compute stats runs (gets validWriteList, reads the table, sets the stats 
with alter_table)
5. txn 1 is committed

The compute stats will have the writeId of txn 2 in it's validWriteId list, so 
it will assume that it computed accurate stats. After step 5. the stats will be 
considered accurate while they do not contain the new rows from txn 1.

Another issue with frequent alter_table calls is that the effect of actual 
ALTER TABLE commands that use shared locks (I think SET TBLPROPERTIES does this 
in Hive) can be simply overwritten by alter_table calls from inserts that used 
a different cached version of the table. This is generally a problem if ALTER 
TABLE is called from different clients (without taking exclusive lock), but 
doing parallel DMLs is probably more common than doing parallel DDLs.

So issues can occur even if clients use the API correctly - another problem is 
that the hard to use API may lead to buggy client implementation that can 
easily mess up things for other components too.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71761: HIVE-22489

2019-11-29 Thread Krisztian Kasa

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71761/
---

(Updated Nov. 29, 2019, 3:03 p.m.)


Review request for hive, Jesús Camacho Rodríguez and Zoltan Haindrich.


Bugs: HIVE-22489
https://issues.apache.org/jira/browse/HIVE-22489


Repository: hive-git


Description (updated)
---

Reduce Sink operator orders nulls first
===
1. Set the default null sort order by hive config when creating Reduce Sink 
Desc.
2. Hash join uses 
`org.apache.hadoop.hive.serde2.binarysortable.fast.BinarySortableSerializeWrite`
 for selializing keys. For bigtable keys always ascending and nulls first 
ordering was hardcoded. This patch changes this behaviour to use the 
`Operator.getConf().TableDesc.getProperties()` (in this case `MapJoinOperator`) 
to setup ordering in `BinarySortableSerializeWrite`


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinInnerBigOnlyMultiKeyOperator.java
 f587517b08 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinInnerMultiKeyOperator.java
 cdee3fd957 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinLeftSemiMultiKeyOperator.java
 e5d9fdae19 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinOuterMultiKeyOperator.java
 29c531bd51 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedCreateHashTable.java
 21c355cb42 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongCommon.java
 de1ee15c3b 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongHashMap.java
 42573f0898 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongHashMultiSet.java
 829a03737d 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongHashSet.java
 18e1435019 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedStringCommon.java
 da0e8365b1 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedStringHashMap.java
 6c4d8a81d1 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedStringHashMultiSet.java
 a6b754c7eb 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedStringHashSet.java
 fdcd83dde7 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
 5c409e4573 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java 
a50ad78e8f 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
 0f95d7788c 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 
89b55001f0 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/opconventer/HiveGBOpConvUtil.java
 46ddffd4fa 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/opconventer/HiveOpConverterUtils.java
 9cc1712f45 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java
 ac5caa6135 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 60bfba826d 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 2314f49631 
  ql/src/java/org/apache/hadoop/hive/ql/util/NullOrdering.java 46ff329981 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java dd70524948 
  ql/src/test/queries/clientpositive/hashjoin.q PRE-CREATION 
  ql/src/test/results/clientpositive/autoColumnStats_5a.q.out 9e2606f7d9 
  ql/src/test/results/clientpositive/autoColumnStats_8.q.out 90039f828e 
  ql/src/test/results/clientpositive/auto_join_reordering_values.q.out 
d9c7720da5 
  ql/src/test/results/clientpositive/bucket3.q.out d418750071 
  ql/src/test/results/clientpositive/cbo_rp_outer_join_ppr.q.out 605f1aec22 
  ql/src/test/results/clientpositive/columnstats_partlvl.q.out 3e2557455b 
  ql/src/test/results/clientpositive/cte_1.q.out 8a621cf872 
  ql/src/test/results/clientpositive/distinct_groupby.q.out d9e3cf9eff 
  ql/src/test/results/clientpositive/filter_aggr.q.out a4fe9405f9 
  ql/src/test/results/clientpositive/filter_join_breaktask.q.out ec5d4a14e1 
  ql/src/test/results/clientpositive/filter_union.q.out d2c167df7a 
  ql/src/test/results/clientpositive/groupby_grouping_sets_limit.q.out 
d0eaf46d86 
  ql/src/test/results/clientpositive/groupby_map_ppr.q.out 66988ee04a 
  ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct.q.out 
9f2a587ada 
  ql/src/test/results/clientpositive/groupby_ppr.q.out d84c649e2b 
  ql/src/test/results/clientpositive/groupby_ppr_multi_distinct.q.out 
db358d9a53 
  ql/src/test/results/clientpositive/groupby_rollup_empty.q.out 1fcdf15976 
  

Re: Review Request 71761: HIVE-22489

2019-11-29 Thread Krisztian Kasa

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71761/
---

(Updated Nov. 29, 2019, 2:46 p.m.)


Review request for hive, Jesús Camacho Rodríguez and Zoltan Haindrich.


Bugs: HIVE-22489
https://issues.apache.org/jira/browse/HIVE-22489


Repository: hive-git


Description
---

Reduce Sink operator orders nulls first


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinInnerBigOnlyMultiKeyOperator.java
 f587517b08 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinInnerMultiKeyOperator.java
 cdee3fd957 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinLeftSemiMultiKeyOperator.java
 e5d9fdae19 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinOuterMultiKeyOperator.java
 29c531bd51 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedCreateHashTable.java
 21c355cb42 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongCommon.java
 de1ee15c3b 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongHashMap.java
 42573f0898 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongHashMultiSet.java
 829a03737d 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongHashSet.java
 18e1435019 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedStringCommon.java
 da0e8365b1 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedStringHashMap.java
 6c4d8a81d1 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedStringHashMultiSet.java
 a6b754c7eb 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedStringHashSet.java
 fdcd83dde7 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
 5c409e4573 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java 
a50ad78e8f 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
 0f95d7788c 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 
89b55001f0 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/opconventer/HiveGBOpConvUtil.java
 46ddffd4fa 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/opconventer/HiveOpConverterUtils.java
 9cc1712f45 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java
 ac5caa6135 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 60bfba826d 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 2314f49631 
  ql/src/java/org/apache/hadoop/hive/ql/util/NullOrdering.java 46ff329981 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java dd70524948 
  ql/src/test/queries/clientpositive/hashjoin.q PRE-CREATION 
  ql/src/test/results/clientpositive/autoColumnStats_5a.q.out 9e2606f7d9 
  ql/src/test/results/clientpositive/autoColumnStats_8.q.out 90039f828e 
  ql/src/test/results/clientpositive/auto_join_reordering_values.q.out 
d9c7720da5 
  ql/src/test/results/clientpositive/bucket3.q.out d418750071 
  ql/src/test/results/clientpositive/cbo_rp_outer_join_ppr.q.out 605f1aec22 
  ql/src/test/results/clientpositive/columnstats_partlvl.q.out 3e2557455b 
  ql/src/test/results/clientpositive/cte_1.q.out 8a621cf872 
  ql/src/test/results/clientpositive/distinct_groupby.q.out d9e3cf9eff 
  ql/src/test/results/clientpositive/filter_aggr.q.out a4fe9405f9 
  ql/src/test/results/clientpositive/filter_join_breaktask.q.out ec5d4a14e1 
  ql/src/test/results/clientpositive/filter_union.q.out d2c167df7a 
  ql/src/test/results/clientpositive/groupby_grouping_sets_limit.q.out 
d0eaf46d86 
  ql/src/test/results/clientpositive/groupby_map_ppr.q.out 66988ee04a 
  ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct.q.out 
9f2a587ada 
  ql/src/test/results/clientpositive/groupby_ppr.q.out d84c649e2b 
  ql/src/test/results/clientpositive/groupby_ppr_multi_distinct.q.out 
db358d9a53 
  ql/src/test/results/clientpositive/groupby_rollup_empty.q.out 1fcdf15976 
  ql/src/test/results/clientpositive/groupby_sort_1_23.q.out 13074dd1de 
  ql/src/test/results/clientpositive/groupby_sort_6.q.out eee2ffa8e8 
  ql/src/test/results/clientpositive/groupby_sort_skew_1_23.q.out 96e4b9d140 
  ql/src/test/results/clientpositive/groupingset_high_columns.q.out 3456719f7e 
  ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out 
8fa79f708b 
  ql/src/test/results/clientpositive/join17.q.out 57c702e5de 
  ql/src/test/results/clientpositive/join35.q.out 5f91f28ad6 
  

Re: Review Request 71821: HIVE-22544

2019-11-29 Thread Krisztian Kasa

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71821/
---

(Updated Nov. 29, 2019, 1:41 p.m.)


Review request for hive and Jesús Camacho Rodríguez.


Bugs: HIVE-22544
https://issues.apache.org/jira/browse/HIVE-22544


Repository: hive-git


Description
---

Disable null sort order at user level


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 77c8300e32 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java dc45800d74 
  ql/src/test/results/clientpositive/llap/constprog_dpp.q.out 11af918604 
  ql/src/test/results/clientpositive/llap/constprog_semijoin.q.out 90dd54b78e 
  ql/src/test/results/clientpositive/llap/cte_mat_3.q.out 2433e99dbb 
  ql/src/test/results/clientpositive/llap/cte_mat_4.q.out 52ed4a78b3 
  ql/src/test/results/clientpositive/llap/cte_mat_5.q.out fac092947a 
  ql/src/test/results/clientpositive/llap/deleteAnalyze.q.out a095fcc0ae 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_user_level.q.out 
4b9035e5b9 
  ql/src/test/results/clientpositive/llap/empty_join.q.out b19ce14c78 
  ql/src/test/results/clientpositive/llap/estimate_pkfk_filtered_fk.q.out 
b74a20f305 
  ql/src/test/results/clientpositive/llap/estimate_pkfk_nocond.q.out 5a58b13b8c 
  ql/src/test/results/clientpositive/llap/estimate_pkfk_push.q.out 2af0fb5ee3 
  ql/src/test/results/clientpositive/llap/explainanalyze_2.q.out f5b2e4b294 
  ql/src/test/results/clientpositive/llap/explainuser_1.q.out 735296f814 
  ql/src/test/results/clientpositive/llap/explainuser_2.q.out b90fb55fd1 
  ql/src/test/results/clientpositive/llap/explainuser_4.q.out 3504d17875 
  ql/src/test/results/clientpositive/llap/groupby_groupingset_bug.q.out 
a8c1d56ef7 
  ql/src/test/results/clientpositive/llap/reopt_dpp.q.out 141b2b617d 
  ql/src/test/results/clientpositive/llap/retry_failure_reorder.q.out 
9e1c249ab7 
  ql/src/test/results/clientpositive/llap/retry_failure_stat_changes.q.out 
204fa8f711 
  ql/src/test/results/clientpositive/llap/runtime_stats_hs2.q.out f88aa718f8 
  ql/src/test/results/clientpositive/llap/runtime_stats_merge.q.out a11e81f1bc 
  ql/src/test/results/clientpositive/llap/windowing_gby.q.out 0888fd979e 
  ql/src/test/results/clientpositive/perf/tez/constraints/mv_query44.q.out 
6a67ca6823 
  ql/src/test/results/clientpositive/perf/tez/constraints/query1.q.out 
211e92ecf4 
  ql/src/test/results/clientpositive/perf/tez/constraints/query10.q.out 
7ad55b5c5f 
  ql/src/test/results/clientpositive/perf/tez/constraints/query11.q.out 
2deb5827a5 
  ql/src/test/results/clientpositive/perf/tez/constraints/query12.q.out 
035c908ec3 
  ql/src/test/results/clientpositive/perf/tez/constraints/query13.q.out 
b27552f901 
  ql/src/test/results/clientpositive/perf/tez/constraints/query14.q.out 
9b07ceb3a7 
  ql/src/test/results/clientpositive/perf/tez/constraints/query15.q.out 
7c0e6cf8cd 
  ql/src/test/results/clientpositive/perf/tez/constraints/query16.q.out 
377307f6e3 
  ql/src/test/results/clientpositive/perf/tez/constraints/query17.q.out 
44fd104a9b 
  ql/src/test/results/clientpositive/perf/tez/constraints/query18.q.out 
b65bebc22b 
  ql/src/test/results/clientpositive/perf/tez/constraints/query19.q.out 
d313b21add 
  ql/src/test/results/clientpositive/perf/tez/constraints/query2.q.out 
e1308a315a 
  ql/src/test/results/clientpositive/perf/tez/constraints/query20.q.out 
a10979641a 
  ql/src/test/results/clientpositive/perf/tez/constraints/query21.q.out 
d944adec04 
  ql/src/test/results/clientpositive/perf/tez/constraints/query22.q.out 
0e1f13e65e 
  ql/src/test/results/clientpositive/perf/tez/constraints/query23.q.out 
761369c88b 
  ql/src/test/results/clientpositive/perf/tez/constraints/query24.q.out 
8b4fea7d80 
  ql/src/test/results/clientpositive/perf/tez/constraints/query25.q.out 
1ca55ca290 
  ql/src/test/results/clientpositive/perf/tez/constraints/query26.q.out 
90a56acafc 
  ql/src/test/results/clientpositive/perf/tez/constraints/query27.q.out 
58b2b86cdf 
  ql/src/test/results/clientpositive/perf/tez/constraints/query28.q.out 
9634957a52 
  ql/src/test/results/clientpositive/perf/tez/constraints/query29.q.out 
073ae52f1f 
  ql/src/test/results/clientpositive/perf/tez/constraints/query3.q.out 
50beff8772 
  ql/src/test/results/clientpositive/perf/tez/constraints/query30.q.out 
6aa98f2bdd 
  ql/src/test/results/clientpositive/perf/tez/constraints/query31.q.out 
3f0e054d0c 
  ql/src/test/results/clientpositive/perf/tez/constraints/query32.q.out 
f3f381a26e 
  ql/src/test/results/clientpositive/perf/tez/constraints/query33.q.out 
fbab5700b7 
  ql/src/test/results/clientpositive/perf/tez/constraints/query34.q.out 
8f730c8f3a 
  ql/src/test/results/clientpositive/perf/tez/constraints/query35.q.out 
4ca9d3cb61 
  ql/src/test/results/clientpositive/perf/tez/constraints/query36.q.out 
d8fa6f4f5a 
  

Re: Review Request 71820: HIVE-20150

2019-11-29 Thread Krisztian Kasa

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71820/
---

(Updated Nov. 29, 2019, 1:27 p.m.)


Review request for hive, Jesús Camacho Rodríguez and Zoltan Haindrich.


Bugs: HIVE-20150
https://issues.apache.org/jira/browse/HIVE-20150


Repository: hive-git


Description
---

TopNKey pushdown

1. Apply patch: 
https://issues.apache.org/jira/secure/attachment/12941630/HIVE-20150.11.patch
2. TopNKey introduction depends only from Reduce Sink with topn property >= 0
3. Implement TopNKey operator pushdown through: projection, group by, redeuce 
sink, left outer join, other topnkey
4. Add sort order and null sort order direction check when determining if the 
topnkey op can be pushed
5. Implement handling cases when topnkey op and the parent op has a common key 
prefix only.
6. turn off topnkey optimization by default


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 4393a2825e 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CommonKeyPrefix.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java 
0d6cf3c755 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java bf58bd8bb8 
  ql/src/test/org/apache/hadoop/hive/ql/optimizer/TestCommonKeyPrefix.java 
PRE-CREATION 
  ql/src/test/queries/clientpositive/topnkey.q 057b6a45ba 
  ql/src/test/queries/clientpositive/vector_topnkey.q 85c5880cd6 
  ql/src/test/results/clientpositive/llap/bucket_groupby.q.out 0c051c926b 
  ql/src/test/results/clientpositive/llap/check_constraint.q.out 9f2c9a1cd0 
  ql/src/test/results/clientpositive/llap/constraints_optimization.q.out 
b6d210becf 
  ql/src/test/results/clientpositive/llap/enforce_constraint_notnull.q.out 
9343e078b7 
  ql/src/test/results/clientpositive/llap/explainuser_1.q.out 735296f814 
  ql/src/test/results/clientpositive/llap/explainuser_2.q.out b90fb55fd1 
  ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out 
545cce75a9 
  ql/src/test/results/clientpositive/llap/filter_union.q.out 0df77762a0 
  ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 3fdd77d802 
  ql/src/test/results/clientpositive/llap/limit_pushdown3.q.out efa8c38d7c 
  ql/src/test/results/clientpositive/llap/llap_decimal64_reader.q.out 
ffe5f6fb22 
  ql/src/test/results/clientpositive/llap/offset_limit.q.out 23f2de46e5 
  ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out 
4ecb7bc46d 
  ql/src/test/results/clientpositive/llap/orc_struct_type_vectorization.q.out 
0eac389eb7 
  
ql/src/test/results/clientpositive/llap/parquet_complex_types_vectorization.q.out
 4362fb6f2e 
  ql/src/test/results/clientpositive/llap/parquet_map_type_vectorization.q.out 
24468c9a1b 
  
ql/src/test/results/clientpositive/llap/parquet_struct_type_vectorization.q.out 
45890a1890 
  ql/src/test/results/clientpositive/llap/semijoin_reddedup.q.out 0e9723b8f3 
  ql/src/test/results/clientpositive/llap/subquery_ALL.q.out d910c1a79d 
  ql/src/test/results/clientpositive/llap/subquery_ANY.q.out 91472d631e 
  ql/src/test/results/clientpositive/llap/topnkey.q.out 1e77587f82 
  ql/src/test/results/clientpositive/llap/vector_cast_constant.q.out cc2dc47280 
  ql/src/test/results/clientpositive/llap/vector_char_2.q.out f7e76e5a8b 
  
ql/src/test/results/clientpositive/llap/vector_groupby_grouping_sets_limit.q.out
 6fd15e7101 
  ql/src/test/results/clientpositive/llap/vector_groupby_reduce.q.out 
d6325982e3 
  ql/src/test/results/clientpositive/llap/vector_mr_diff_schema_alias.q.out 
4d417b9c3d 
  ql/src/test/results/clientpositive/llap/vector_reduce_groupby_decimal.q.out 
97a211cfc6 
  ql/src/test/results/clientpositive/llap/vector_string_concat.q.out a8019be7aa 
  ql/src/test/results/clientpositive/llap/vector_topnkey.q.out c140bdfd37 
  ql/src/test/results/clientpositive/llap/vectorization_limit.q.out 7326adf522 
  ql/src/test/results/clientpositive/perf/tez/cbo_query14.q.out e9308cd709 
  ql/src/test/results/clientpositive/perf/tez/cbo_query77.q.out 02caf99f7d 
  ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query14.q.out 
43e1b2b5c2 
  ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query77.q.out 
2f75361df1 
  ql/src/test/results/clientpositive/perf/tez/constraints/query10.q.out 
7ad55b5c5f 
  ql/src/test/results/clientpositive/perf/tez/constraints/query14.q.out 
9b07ceb3a7 
  ql/src/test/results/clientpositive/perf/tez/constraints/query15.q.out 
7c0e6cf8cd 
  ql/src/test/results/clientpositive/perf/tez/constraints/query17.q.out 
44fd104a9b 
  ql/src/test/results/clientpositive/perf/tez/constraints/query25.q.out 
1ca55ca290 
  ql/src/test/results/clientpositive/perf/tez/constraints/query26.q.out 
90a56acafc 
  

Re: Review Request 71844: HIVE-22554: ACID: Wait timeout for blocking compaction should be configurable

2019-11-29 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71844/#review218855
---


Ship it!




Ship It!

- Peter Vary


On nov. 28, 2019, 1:49 du, Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71844/
> ---
> 
> (Updated nov. 28, 2019, 1:49 du)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-22554: ACID: Wait timeout for blocking compaction should be configurable
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 4393a2825e1f465781fc07a6678ebaa2bab906bd 
>   
> ql/src/java/org/apache/hadoop/hive/ql/ddl/table/storage/AlterTableCompactOperation.java
>  fd0ae3a3df731aa690d024dfdbf89f7754ca2a41 
> 
> 
> Diff: https://reviews.apache.org/r/71844/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: ORC: duplicate record - rowid meaning ?

2019-11-29 Thread Peter Vary
Hi David,

Not entirely sure what you are doing here :), my guess is that you are trying 
to write ACID tables outside of hive. Am I right? What is the exact use-case? 
There might be better solutions out there than writing the files by hand.

As for your question below: Yes, the files should be ordered by: 
originalTransacion, bucket, rowId triple, otherwise you will get wrong results.

Thanks,
Peter

> On Nov 19, 2019, at 13:30, David Morin  wrote:
> 
> here after more details about ORC content and the fact we have duplicate rows:
> 
> /delta_0011365_0011365_/bucket_3
> 
> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11365,"row":{"TS":1574156027915254212,"cle":5218,...}}
> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11365,"row":{"TS":1574156027915075038,"cle":5216,...}}
> 
> 
> /delta_0011368_0011368_/bucket_3
> 
> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11368,"row":null}
> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11368,"row":null}
> 
> /delta_0011369_0011369_/bucket_3
> 
> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":1,"currentTransaction":11369,"row":{"TS":1574157407855174144,"cle":5216,...}}
> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":0,"currentTransaction":11369,"row":{"TS":1574157407855265906,"cle":5218,...}}
> 
> +-+---+--+
> | row__id |  cle  |
> +-+---+--+
> | {"transactionid":11367,"bucketid":0,"rowid":0}  | 5209  |
> | {"transactionid":11369,"bucketid":0,"rowid":0}  | 5211  |
> | {"transactionid":11369,"bucketid":1,"rowid":0}  | 5210  |
> | {"transactionid":11369,"bucketid":2,"rowid":0}  | 5214  |
> | {"transactionid":11369,"bucketid":2,"rowid":1}  | 5215  |
> | {"transactionid":11365,"bucketid":3,"rowid":0}  | 5218  |
> | {"transactionid":11365,"bucketid":3,"rowid":1}  | 5216  |
> | {"transactionid":11369,"bucketid":3,"rowid":1}  | 5216  |
> | {"transactionid":11369,"bucketid":3,"rowid":0}  | 5218  |
> | {"transactionid":11369,"bucketid":4,"rowid":0}  | 5217  |
> | {"transactionid":11369,"bucketid":4,"rowid":1}  | 5213  |
> | {"transactionid":11369,"bucketid":7,"rowid":0}  | 5212  |
> +-+---+--+
> 
> As you can see we have duplicate rows for column "cle" 5216 and 5218
> Do we have to keep the rowids ordered ? because this is the only difference I 
> have noticed based on some tests with beeline.
> 
> Thanks
> 
> 
> 
> Le mar. 19 nov. 2019 à 00:18, David Morin  > a écrit :
> Hello,
> 
> I'm trying to understand the purpose of the rowid column inside ORC delta file
> {"transactionid":11359,"bucketid":5,"rowid":0}
> Orc view: 
> {"operation":0,"originalTransaction":11359,"bucket":5,"rowId":0,"currentTransaction":11359,"row":...}
> I use HDP 2.6 => Hive 2
> 
> If I want to be idempotent with INSERT / DELETE / INSERT. 
> Do we have to keep the same rowid ?
> It seems that when the rowid is changed during the second INSERT I have a 
> duplicate row.
> For me, I can create a new rowid for the new transaction during the second 
> INSERT but that seems to generate duplicate records.
> 
> Regards,
> David
> 
> 
> 



Review Request 71845: ConcurrentModificationException in TriggerValidatorRunnable stops trigger processing

2019-11-29 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71845/
---

Review request for hive, Laszlo Bodor, prasanthj, and Slim Bouguerra.


Bugs: HIVE-22502
https://issues.apache.org/jira/browse/HIVE-22502


Repository: hive-git


Description
---

A ConcurrentModificationException was thrown from the main loop of 
TriggerValidatorRunnable.


The ConcurrentModificationException happened because an other thread (from 
TezSessionPoolManager) updated the sessions list while the 
TriggerValidatorRunnable was iterating over it.

The sessions list is updated by TezSessionPoolManager when opening or closing a 
session. These operations are synchronized but the iteration in 
TriggerValidatorRunnable is not.

The TriggerValidatorRunnable is executed frequently (it is scheduled at a 500ms 
rate by default) therefore I was reluctant put the whole iteration into a 
synchronized block. Opening and closing a session happens not so often so I 
decided to make a copy of the sessions list before passing it to the 
TriggerValidatorRunnable. Let me know if you think otherwise.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 
7c0a1fe120b 


Diff: https://reviews.apache.org/r/71845/diff/1/


Testing
---

qtest


Thanks,

Attila Magyar



[jira] [Created] (HIVE-22564) IO exception in Hive Spark Remote Client sub-module

2019-11-29 Thread AK97 (Jira)
AK97 created HIVE-22564:
---

 Summary: IO exception in Hive Spark Remote Client sub-module
 Key: HIVE-22564
 URL: https://issues.apache.org/jira/browse/HIVE-22564
 Project: Hive
  Issue Type: Test
Affects Versions: 3.1.2
 Environment: os: rhel:7.6 
 architecture: ppc64le
Reporter: AK97


I have been trying to build the Apache Hive on rhel_7.6/ppc64le. The build 
passes, however it leads to a test failure giving the following error:

[ERROR] testServerPort(org.apache.hive.spark.client.rpc.TestRpc) Time elapsed: 
0.021 s <<< ERROR!

java.io.IOException: Incorrect RPC server port configuration for HiveServer2

I am on branch release-3.1.2-rc0

{color:#172b4d}Would like some help on understanding the cause for the same . I 
am running it on a High end VM with good connectivity.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)