[ 
https://issues.apache.org/jira/browse/HIVE-21170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Suller reassigned HIVE-21170:
----------------------------------

    Assignee:     (was: Ivan Suller)

> Wrong (no) results of cross-product query executed on LLAP 
> -----------------------------------------------------------
>
>                 Key: HIVE-21170
>                 URL: https://issues.apache.org/jira/browse/HIVE-21170
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>         Environment: Hive distribution: HDP 3.1.0. 
> LLAP execution engine in version:
> {code:java}
> $ beeline --version
> Hive 3.1.0.3.1.0.0-78
> Git 
> git://ctr-e138-1518143905142-586755-01-000015.hwx.site/grid/0/jenkins/workspace/HDP-parallel-centos7/SOURCES/hive
>  -r 56673b027117d8cb3400675b1680a4d992360808
> Compiled by jenkins on Thu Dec 6 12:27:21 UTC 2018
> From source with checksum 97cc61f6acbe68b1fa988aa9f76b34cc{code}
>  
>            Reporter: Krzysztof Zarzycki
>            Priority: Critical
>
> *In the environment:*
> {code:java}
>  $ beeline --version
>  Hive 3.1.0.3.1.0.0-78
>  Git 
> git://ctr-e138-1518143905142-586755-01-000015.hwx.site/grid/0/jenkins/workspace/HDP-parallel-centos7/SOURCES/hive
>  -r 56673b027117d8cb3400675b1680a4d992360808
>  Compiled by jenkins on Thu Dec 6 12:27:21 UTC 2018
>  From source with checksum 97cc61f6acbe68b1fa988aa9f76b34cc
> {code}
> *On LLAP execution engine*, the following query gives *wrong results*:
> {code:java}
> -- prepare test data
>  set hive.query.results.cache.enabled=false;
>  create table test1 (id int);
>  insert into test1 values (1),(2),(3);
> -- query
>  select * from test1 t1 cross join test1 t2;
> -- or any other cross join query, that involve 1 or more tables.{code}
> *Query result:*
> {code:java}
> 0: jdbc:hive2://hostname:> select t1.* from test1 t1 cross join test1 t2;
>  INFO : Compiling 
> command(queryId=hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6): 
> select t1.* from test1 t1 cross join test1 t2
>  INFO : Warning: Shuffle Join MERGEJOIN[9][tables = [$hdt$_0, $hdt$_1]] in 
> Stage 'Reducer 2' is a cross product
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:t1.id, 
> type:int, comment:null)], properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6); 
> Time taken: 0.229 seconds
>  INFO : Executing 
> command(queryId=hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6): 
> select t1.* from test1 t1 cross join test1 t2
>  INFO : Query ID = hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6
>  INFO : Total jobs = 1
>  INFO : Launching Job 1 out of 1
>  INFO : Starting task [Stage-1:MAPRED] in parallel
>  
> ----------------------------------------------------------------------------------------------
>  VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
>  
> ----------------------------------------------------------------------------------------------
>  Map 1 .......... llap SUCCEEDED 1 1 0 0 0 0
>  Map 3 .......... llap SUCCEEDED 1 1 0 0 0 0
>  Reducer 2 llap SUCCEEDED 0 0 0 0 0 0
>  
> ----------------------------------------------------------------------------------------------
>  VERTICES: 02/03 [==========================>>] 100% ELAPSED TIME: 1.92 s
>  
> ----------------------------------------------------------------------------------------------
>  INFO : Completed executing 
> command(queryId=hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6); 
> Time taken: 2.006 seconds
>  INFO : OK
>  +--------+
> |t1.id|
> +--------+
>  +--------+
>  No rows selected (2.284 seconds)
> {code}
> *Expected result:*
> {code:java}
> +--------+
> |t1.id|
> +--------+
> |3|
> |3|
> |3|
> |2|
> |2|
> |2|
> |1|
> |1|
> |1|
> +--------+
>  9 rows selected
> {code}
> *What worked as a workaround:*
> {code:java}
> set hive.tez.cartesian-product.enabled=false; (default true)
> {code}
> Then query gave a correct result.
> *Difference in execution plans :*
>  * With {{set hive.tez.cartesian-product.enabled=true;}}:
> {code:java}
> +----------------------------------------------------+
> |Explain|
> +----------------------------------------------------+
> |Plan optimized by CBO.|
> | |
> |Vertex dependency in root stage|
> |Reducer 2 <- Map 1 (XPROD_EDGE), Map 3 (XPROD_EDGE)|
> | |
> |Stage-0|
> |Fetch Operator|
> |limit:-1|
> |Stage-1|
> |Reducer 2 llap|
> |File Output Operator [FS_8]|
> |Merge Join Operator [MERGEJOIN_9] (rows=9 width=8)|
> |Conds:(Inner),Output:["_col0","_col1"]|
> |<-Map 1 [XPROD_EDGE] vectorized, llap|
> |XPROD_EDGE [RS_11]|
> |Select Operator [SEL_10] (rows=3 width=4)|
> |Output:["_col0"]|
> |TableScan [TS_0] (rows=3 width=4)|
> |zarz@test1,t1, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["id"]|
> |<-Map 3 [XPROD_EDGE] vectorized, llap|
> |XPROD_EDGE [RS_13]|
> |Select Operator [SEL_12] (rows=3 width=4)|
> |Output:["_col0"]|
> |TableScan [TS_2] (rows=3 width=4)|
> |zarz@test2,t2, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["id"]|
> | |
> +----------------------------------------------------+
> {code}
>  * With {{set hive.tez.cartesian-product.enabled=false;}}:
> {code:java}
> +----------------------------------------------------+
> |Explain|
> +----------------------------------------------------+
> |Plan optimized by CBO.|
> | |
> |Vertex dependency in root stage|
> |Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 3 (CUSTOM_SIMPLE_EDGE)|
> | |
> |Stage-0|
> |Fetch Operator|
> |limit:-1|
> |Stage-1|
> |Reducer 2 llap|
> |File Output Operator [FS_8]|
> |Merge Join Operator [MERGEJOIN_9] (rows=9 width=8)|
> |Conds:(Inner),Output:["_col0","_col1"]|
> |<-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized, llap|
> |PARTITION_ONLY_SHUFFLE [RS_11]|
> |Select Operator [SEL_10] (rows=3 width=4)|
> |Output:["_col0"]|
> |TableScan [TS_0] (rows=3 width=4)|
> |zarz@test1,t1, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["id"]|
> |<-Map 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap|
> |PARTITION_ONLY_SHUFFLE [RS_13]|
> |Select Operator [SEL_12] (rows=3 width=4)|
> |Output:["_col0"]|
> |TableScan [TS_2] (rows=3 width=4)|
> |zarz@test2,t2, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["id"]|
> | |
> +----------------------------------------------------+
> {code}
> So I suspect some bug in XPROD_EDGE component executed on LLAP.
> *Note 1: Same query on Hive on Tez works as expected.*
> *Note 2: What I also tried but didn't help*:
> {code:java}
> set hive.vectorized.execution.enabled=false; (default true)
> {code}
> That didn't change the result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to