Review Request 71932: HIVE-22652

2019-12-19 Thread Krisztian Kasa

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71932/
---

Review request for hive and Jesús Camacho Rodríguez.


Bugs: HIVE-22652
https://issues.apache.org/jira/browse/HIVE-22652


Repository: hive-git


Description
---

HIVE-22652: TopNKey push through Group by with Grouping sets

Enable TNK op push down through Group by with Grouping sets by removing the 
lines which checked whether the GBY operator has GROUPING SETS


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyPushdownProcessor.java
 c79c371a8b 
  ql/src/test/queries/clientpositive/topnkey_grouping_sets.q PRE-CREATION 
  ql/src/test/results/clientpositive/llap/topnkey_grouping_sets.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/perf/tez/constraints/query14.q.out 
65d3faa20f 
  ql/src/test/results/clientpositive/perf/tez/constraints/query27.q.out 
e1a48eaeea 
  ql/src/test/results/clientpositive/perf/tez/constraints/query5.q.out 
13288d28b4 
  ql/src/test/results/clientpositive/perf/tez/constraints/query77.q.out 
c2758b7033 
  ql/src/test/results/clientpositive/perf/tez/constraints/query80.q.out 
72a54928c2 
  ql/src/test/results/clientpositive/perf/tez/query14.q.out 00bc4cb026 
  ql/src/test/results/clientpositive/perf/tez/query27.q.out 774c0fd192 
  ql/src/test/results/clientpositive/perf/tez/query5.q.out 03980ac2c0 
  ql/src/test/results/clientpositive/perf/tez/query77.q.out fcfc5a33bc 
  ql/src/test/results/clientpositive/perf/tez/query80.q.out 3020b58781 


Diff: https://reviews.apache.org/r/71932/diff/1/


Testing
---

- New q test: topnkey_grouping_sets.q
- Run `src/test/queries/clientpositive/perf/query*.q` tests with 
TestTezPerfCliDriver, TestTezPerfConstraintsCliDriver


Thanks,

Krisztian Kasa



[jira] [Created] (HIVE-22664) Error while waiting for Remote Spark Driver to connect back to HiveServer2

2019-12-19 Thread jiama (Jira)
jiama created HIVE-22664:


 Summary:  Error while waiting for Remote Spark Driver to connect 
back to HiveServer2
 Key: HIVE-22664
 URL: https://issues.apache.org/jira/browse/HIVE-22664
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.1.1
 Environment: CDH6.2 hiveOnSpark
Reporter: jiama


2019-12-19 11:33:56,441 ERROR org.apache.hive.service.cli.operation.Operation: 
[HiveServer2-Background-Pool: Thread-67]: Error running hive query: 
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: 
FAILED: Execution Error, return code 30041 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client 
for Spark session b0b83870-9e62-4802-9b86-deab128d44ec_0: 
java.lang.RuntimeException: spark-submit process failed with exit code 1 and 
error ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71820: HIVE-20150

2019-12-19 Thread Ramesh Kumar Thangarajan


> On Dec. 17, 2019, 10:01 p.m., Ramesh Kumar Thangarajan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
> > Lines 4341 (patched)
> > 
> >
> > What is the bug in Decimal64 to Decimal conversion? 
> > 
> > Do we need a call to the function 
> > fixDecimalDataTypePhysicalVariations() ? Because I see the function 
> > getVectorExpressionsUpConvertDecimal64() is calling 
> > wrapWithDecimal64ToDecimalConversion() on all the child expressions.
> 
> Krisztian Kasa wrote:
> Please consider the following query: 
> ```
> CREATE TABLE decimal_test_small_n0 STORED AS ORC AS SELECT cdouble, CAST 
> (((cdouble*22.1)/37) AS DECIMAL(10,3)) AS cdecimal1, CAST (((cdouble*9.3)/13) 
> AS DECIMAL(7,2)) AS cdecimal2 FROM alltypesorc;
> 
> SELECT cdecimal1 - (2*cdecimal2) as c2 FROM decimal_test_small_n0
> ORDER BY c2
> LIMIT 10;
> ```
> 
> With given keyColumns: GenericUDFOPMinus(Column[cdecimal1], 
> GenericUDFOPMultiply(Const decimal(1,0) 2, Column[cdecimal2])) 
> ```
> keyExpressions = 
> vContext.getVectorExpressionsUpConvertDecimal64(keyColumns);
> ```
> will produce keyExpressions: 
> DecimalColSubtractDecimalColumn(col 4:decimal(10,3), col 
> 5:decimal(9,2)/DECIMAL_64)
> (children: 
> ConvertDecimal64ToDecimal(col 1:decimal(10,3)/DECIMAL_64) -> 
> 4:decimal(10,3), 
> Decimal64ScalarMultiplyDecimal64ColumnUnscaled(decimal64Val 2, 
> decimalVal 2, col 2:decimal(7,2)/DECIMAL_64) -> 5:decimal(9,2)/DECIMAL_64
> ) -> 6:decimal(11,3)
> 
> So the 2nd child of DecimalColSubtractDecimalColumn is not converted from 
> decimal64 to decimal and I got a 
> ```
>  java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.Decimal64ColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.DecimalColSubtractDecimalColumn.evaluate(DecimalColSubtractDecimalColumn.java:69)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorTopNKeyOperator.process(VectorTopNKeyOperator.java:101)
> ```
> 
> Checking the code of `getVectorExpressionsUpConvertDecimal64` i found 
> that it calls `getVectorExpressions` and for each result expression if that 
> expression's output type is decimal64 then it wraps it with conversion. But 
> it does not checks the children expressions.
> 
> Probably we don't need both function calls, please let me check in a 
> follow up patch if `fixDecimalDataTypePhysicalVariations` is enough.

Makes sense! Thanks for the explanation


- Ramesh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71820/#review219047
---


On Dec. 14, 2019, 10:31 a.m., Krisztian Kasa wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71820/
> ---
> 
> (Updated Dec. 14, 2019, 10:31 a.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez and Zoltan Haindrich.
> 
> 
> Bugs: HIVE-20150
> https://issues.apache.org/jira/browse/HIVE-20150
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> TopNKey pushdown
> 
> 1. Apply patch: 
> https://issues.apache.org/jira/secure/attachment/12941630/HIVE-20150.11.patch
> 2. TopNKey introduction depends only from Reduce Sink with topn property >= 0
> 3. Implement TopNKey operator pushdown through: projection, group by, redeuce 
> sink, left outer join, other topnkey
> 4. Add sort order and null sort order direction check when determining if the 
> topnkey op can be pushed
> 5. Implement handling cases when topnkey op and the parent op has a common 
> key prefix only.
> 6. fix Key object inspectors non-vectorized mode 
> 7. fix decimal64 to decimal cast issues when creating VectorExpressions of 
> keyExpression during TopnKey vectorization
> 
> 
> Diffs
> -
> 
>   kudu-handler/src/test/results/positive/kudu_complex_queries.q.out 
> 1324b27f8e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java bbbde7978b 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java 
> 0d6cf3c755 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
> 6876787e11 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/CommonKeyPrefix.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyPushdownProcessor.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 5c7a64c950 
>   
> ql/src/test/org/apache/hadoop/hive/ql/optimizer/topnkey/TestCommonKeyPrefix.java
>  PRE-CREATION 
>   

Dataloss on skewjoin for FullAcid table.

2019-12-19 Thread Aditya Shah
I am trying to test skewjoin and writing the result into a FullAcid table.
I see incorrect results for it.

Steps to reproduce:

It can be reproduced with the following qtest (similar to one done for mm
tables in mm_all.q) in the master branch.

--! qt:dataset:src1
> --! qt:dataset:src
> -- MASK_LINEAGE
> set hive.mapred.mode=nonstrict;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.optimize.skewjoin=true;
> set hive.skewjoin.key=2;
> set hive.optimize.metadataonly=false;
> CREATE TABLE skewjoin_acid(key INT, value STRING) STORED AS ORC
> tblproperties ("transactional"="true");
> FROM src src1 JOIN src src2 ON (src1.key = src2.key) INSERT into TABLE
> skewjoin_acid SELECT src1.key, src2.value;
> select count(distinct key) from skewjoin_acid;
> drop table skewjoin_acid;


The expected result of count was 309 but I got it to be 173.

On, looking into it I figured a skewjoin produces 2 delta directories one
is delta_x_x with x being the correct write id and another as
delta_000_000 (called d0 hereafter). The d0 is unexpected and
cannot be read. Some observations were:
1) The filesinkdesc carries the writeid which FSOP then uses to make the
correct paths.
2) This write id is set while acquiring the lock in the driver
3) In case of skew, I found the skewed data was written in the same job and
a new job (which perhaps does the map joins) is launched by the running job
with its own tasks. The filesinkdesc is again made for each of the tasks
and we loose the writeid here.

The questions I had regarding correcting this were, Is my understanding
correct and if so, I couldn't figure out where do I set this write id in
the filesink desc for each task or is there any other way around it?

I have created a JIRA for the same: HIVE-22636.

Thanks,
Aditya


Re: Review Request 71761: HIVE-22489

2019-12-19 Thread Krisztian Kasa

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71761/
---

(Updated Dec. 19, 2019, 11:35 a.m.)


Review request for hive, Jesús Camacho Rodríguez and Zoltan Haindrich.


Bugs: HIVE-22489
https://issues.apache.org/jira/browse/HIVE-22489


Repository: hive-git


Description
---

Reduce Sink operator orders nulls first
===
1. Set the default null sort order by hive config when creating Reduce Sink 
Desc.
2. Hash join uses 
`org.apache.hadoop.hive.serde2.binarysortable.fast.BinarySortableSerializeWrite`
 or `BinarySortableDeserializeRead` for selializing keys. For bigtable keys 
always ascending and nulls first ordering was hardcoded. This patch changes 
this behaviour to use the `Operator.getConf().TableDesc.getProperties()` (in 
this case `MapJoinOperator`) to setup ordering in `BinarySortableSerializeWrite`
3. Use null ordering set in ReduceRecordSource at Reduce phase when comparing 
keys in `CommonMergeJoinOperator` (This is the null ordering of the children 
Reduce Sink operators)


Diffs (updated)
-

  accumulo-handler/src/test/results/positive/accumulo_queries.q.out 7c552621f2 
  contrib/src/test/results/clientpositive/udaf_example_group_concat.q.out 
6846720d95 
  hbase-handler/src/test/results/positive/hbase_queries.q.out a32ef81a7b 
  
itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
 e997fa65cf 
  kudu-handler/src/test/results/positive/kudu_complex_queries.q.out 1324b27f8e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonMergeJoinOperator.java 
3974627a24 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java 
72446afeda 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java
 2380d936f2 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinInnerBigOnlyMultiKeyOperator.java
 f587517b08 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinInnerMultiKeyOperator.java
 cdee3fd957 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinLeftSemiMultiKeyOperator.java
 e5d9fdae19 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinOuterMultiKeyOperator.java
 29c531bd51 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastLongHashMap.java
 a4cda921a5 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastLongHashMultiSet.java
 43f093d906 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastLongHashSet.java
 8dce5b82d3 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastLongHashTable.java
 a35401d9b2 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastStringCommon.java
 1b108a8c14 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastStringHashMap.java
 446feb2526 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastStringHashMultiSet.java
 c28ef9be2b 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastStringHashSet.java
 17bd5fda93 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java
 4ab8902a3f 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedCreateHashTable.java
 21c355cb42 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongCommon.java
 de1ee15c3b 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongHashMap.java
 42573f0898 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongHashMultiSet.java
 829a03737d 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongHashSet.java
 18e1435019 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedStringCommon.java
 da0e8365b1 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedStringHashMap.java
 6c4d8a81d1 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedStringHashMultiSet.java
 a6b754c7eb 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedStringHashSet.java
 fdcd83dde7 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
 5c409e4573 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java 
a50ad78e8f 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
 0f95d7788c 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 
89b55001f0