[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same

2016-01-19 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106575#comment-15106575
 ] 

Chengxiang Li commented on HIVE-12736:
--

Besides, during test, i found TestSparkNegativeCliDriver run in MR mode 
actually, i would create another JIRA to track it.

> It seems that result of Hive on Spark be mistaken and result of Hive and Hive 
> on Spark are not the same
> ---
>
> Key: HIVE-12736
> URL: https://issues.apache.org/jira/browse/HIVE-12736
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Chengxiang Li
> Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch, 
> HIVE-12736.3-spark.patch, HIVE-12736.4-spark.patch, HIVE-12736.5-spark.patch
>
>
> {code}
> select  * from staff;
> 1 jone22  1
> 2 lucy21  1
> 3 hmm 22  2
> 4 james   24  3
> 5 xiaoliu 23  3
> select id,date_ from trade union all select id,"test" from trade ;
> 1 201510210908
> 2 201509080234
> 2 201509080235
> 1 test
> 2 test
> 2 test
> set hive.execution.engine=spark;
> set spark.master=local;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> 1 jone22  1   1   201510210908
> 2 lucy21  1   2   201509080234
> 2 lucy21  1   2   201509080235
> set hive.execution.engine=mr;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> FAILED: SemanticException [Error 10227]: Not all clauses are supported with 
> mapjoin hint. Please remove mapjoin hint.
> {code}
> I have two questions
> 1.Why result of hive on spark not include the following record?
> {code}
> 1 jone22  1   1   test
> 2 lucy21  1   2   test
> 2 lucy21  1   2   test
> {code}
> 2.Why there are two different ways of dealing same query?
> explain 1:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select id,date_ from trade union all select id,"test" from trade;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), date_ (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), 'test' (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> ListSink
> {code}
> explain 2:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> 

[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same

2016-01-19 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108052#comment-15108052
 ] 

Xuefu Zhang commented on HIVE-12736:


I also tried memcheck.q, and it passed locally for me too. It doesn't seem 
related to the patch regardless.

As to the patch, it looks good to me. However, I do know much about mapjoin 
with hint, not sure why groupby and union cannot exist before mapjoin. If you 
have some explanation, that will help.

+1 for the patch.

> It seems that result of Hive on Spark be mistaken and result of Hive and Hive 
> on Spark are not the same
> ---
>
> Key: HIVE-12736
> URL: https://issues.apache.org/jira/browse/HIVE-12736
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Chengxiang Li
> Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch, 
> HIVE-12736.3-spark.patch, HIVE-12736.4-spark.patch, HIVE-12736.5-spark.patch, 
> HIVE-12736.5-spark.patch
>
>
> {code}
> select  * from staff;
> 1 jone22  1
> 2 lucy21  1
> 3 hmm 22  2
> 4 james   24  3
> 5 xiaoliu 23  3
> select id,date_ from trade union all select id,"test" from trade ;
> 1 201510210908
> 2 201509080234
> 2 201509080235
> 1 test
> 2 test
> 2 test
> set hive.execution.engine=spark;
> set spark.master=local;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> 1 jone22  1   1   201510210908
> 2 lucy21  1   2   201509080234
> 2 lucy21  1   2   201509080235
> set hive.execution.engine=mr;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> FAILED: SemanticException [Error 10227]: Not all clauses are supported with 
> mapjoin hint. Please remove mapjoin hint.
> {code}
> I have two questions
> 1.Why result of hive on spark not include the following record?
> {code}
> 1 jone22  1   1   test
> 2 lucy21  1   2   test
> 2 lucy21  1   2   test
> {code}
> 2.Why there are two different ways of dealing same query?
> explain 1:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select id,date_ from trade union all select id,"test" from trade;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), date_ (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), 'test' (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> ListSink
> {code}
> 

[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same

2016-01-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106714#comment-15106714
 ] 

Hive QA commented on HIVE-12736:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12783072/HIVE-12736.5-spark.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9870 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_memcheck
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1036/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1036/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-1036/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12783072 - PreCommit-HIVE-SPARK-Build

> It seems that result of Hive on Spark be mistaken and result of Hive and Hive 
> on Spark are not the same
> ---
>
> Key: HIVE-12736
> URL: https://issues.apache.org/jira/browse/HIVE-12736
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Chengxiang Li
> Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch, 
> HIVE-12736.3-spark.patch, HIVE-12736.4-spark.patch, HIVE-12736.5-spark.patch
>
>
> {code}
> select  * from staff;
> 1 jone22  1
> 2 lucy21  1
> 3 hmm 22  2
> 4 james   24  3
> 5 xiaoliu 23  3
> select id,date_ from trade union all select id,"test" from trade ;
> 1 201510210908
> 2 201509080234
> 2 201509080235
> 1 test
> 2 test
> 2 test
> set hive.execution.engine=spark;
> set spark.master=local;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> 1 jone22  1   1   201510210908
> 2 lucy21  1   2   201509080234
> 2 lucy21  1   2   201509080235
> set hive.execution.engine=mr;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> FAILED: SemanticException [Error 10227]: Not all clauses are supported with 
> mapjoin hint. Please remove mapjoin hint.
> {code}
> I have two questions
> 1.Why result of hive on spark not include the following record?
> {code}
> 1 jone22  1   1   test
> 2 lucy21  1   2   test
> 2 lucy21  1   2   test
> {code}
> 2.Why there are two different ways of dealing same query?
> explain 1:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select id,date_ from trade union all select id,"test" from trade;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), date_ (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> 

[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same

2016-01-18 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105449#comment-15105449
 ] 

Xuefu Zhang commented on HIVE-12736:


Hi [~chengxiang li], thanks for the explanation. That makes sense. The patch 
looks good. However, could you check if the test failures are related? 
Specifically I tried join29.q, the test pass w/o your patch. You can also refer 
to HIVE-9774, which has recent runs. Thanks.

> It seems that result of Hive on Spark be mistaken and result of Hive and Hive 
> on Spark are not the same
> ---
>
> Key: HIVE-12736
> URL: https://issues.apache.org/jira/browse/HIVE-12736
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Chengxiang Li
> Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch, 
> HIVE-12736.3-spark.patch
>
>
> {code}
> select  * from staff;
> 1 jone22  1
> 2 lucy21  1
> 3 hmm 22  2
> 4 james   24  3
> 5 xiaoliu 23  3
> select id,date_ from trade union all select id,"test" from trade ;
> 1 201510210908
> 2 201509080234
> 2 201509080235
> 1 test
> 2 test
> 2 test
> set hive.execution.engine=spark;
> set spark.master=local;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> 1 jone22  1   1   201510210908
> 2 lucy21  1   2   201509080234
> 2 lucy21  1   2   201509080235
> set hive.execution.engine=mr;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> FAILED: SemanticException [Error 10227]: Not all clauses are supported with 
> mapjoin hint. Please remove mapjoin hint.
> {code}
> I have two questions
> 1.Why result of hive on spark not include the following record?
> {code}
> 1 jone22  1   1   test
> 2 lucy21  1   2   test
> 2 lucy21  1   2   test
> {code}
> 2.Why there are two different ways of dealing same query?
> explain 1:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select id,date_ from trade union all select id,"test" from trade;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), date_ (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), 'test' (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> ListSink
> {code}
> explain 2:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select /*+mapjoin(t)*/ * from staff 

[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same

2016-01-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106218#comment-15106218
 ] 

Hive QA commented on HIVE-12736:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782975/HIVE-12736.4-spark.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 9868 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_leftsemi_mapjoin
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_semijoin
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1035/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1035/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-1035/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782975 - PreCommit-HIVE-SPARK-Build

> It seems that result of Hive on Spark be mistaken and result of Hive and Hive 
> on Spark are not the same
> ---
>
> Key: HIVE-12736
> URL: https://issues.apache.org/jira/browse/HIVE-12736
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Chengxiang Li
> Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch, 
> HIVE-12736.3-spark.patch, HIVE-12736.4-spark.patch
>
>
> {code}
> select  * from staff;
> 1 jone22  1
> 2 lucy21  1
> 3 hmm 22  2
> 4 james   24  3
> 5 xiaoliu 23  3
> select id,date_ from trade union all select id,"test" from trade ;
> 1 201510210908
> 2 201509080234
> 2 201509080235
> 1 test
> 2 test
> 2 test
> set hive.execution.engine=spark;
> set spark.master=local;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> 1 jone22  1   1   201510210908
> 2 lucy21  1   2   201509080234
> 2 lucy21  1   2   201509080235
> set hive.execution.engine=mr;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> FAILED: SemanticException [Error 10227]: Not all clauses are supported with 
> mapjoin hint. Please remove mapjoin hint.
> {code}
> I have two questions
> 1.Why result of hive on spark not include the following record?
> {code}
> 1 jone22  1   1   test
> 2 lucy21  1   2   test
> 2 lucy21  1   2   test
> {code}
> 2.Why there are two different ways of dealing same query?
> explain 1:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select id,date_ from trade union all select id,"test" from trade;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), date_ (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>

[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same

2016-01-17 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15104138#comment-15104138
 ] 

Xuefu Zhang commented on HIVE-12736:


Hi [~chengxiang li], Sorry for being late in reviewing this. The patch looks 
good, but patch #2 has a change in ReduceSinkOperator. Is that intentional? It 
seems changing the return value from "false" to "true" (inherited from Operator 
class).

Secondly, can we incorporate the test case provided in the JIRA description? 
Let's forget about it if it's too hard.

Thanks.

> It seems that result of Hive on Spark be mistaken and result of Hive and Hive 
> on Spark are not the same
> ---
>
> Key: HIVE-12736
> URL: https://issues.apache.org/jira/browse/HIVE-12736
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Chengxiang Li
> Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch
>
>
> {code}
> select  * from staff;
> 1 jone22  1
> 2 lucy21  1
> 3 hmm 22  2
> 4 james   24  3
> 5 xiaoliu 23  3
> select id,date_ from trade union all select id,"test" from trade ;
> 1 201510210908
> 2 201509080234
> 2 201509080235
> 1 test
> 2 test
> 2 test
> set hive.execution.engine=spark;
> set spark.master=local;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> 1 jone22  1   1   201510210908
> 2 lucy21  1   2   201509080234
> 2 lucy21  1   2   201509080235
> set hive.execution.engine=mr;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> FAILED: SemanticException [Error 10227]: Not all clauses are supported with 
> mapjoin hint. Please remove mapjoin hint.
> {code}
> I have two questions
> 1.Why result of hive on spark not include the following record?
> {code}
> 1 jone22  1   1   test
> 2 lucy21  1   2   test
> 2 lucy21  1   2   test
> {code}
> 2.Why there are two different ways of dealing same query?
> explain 1:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select id,date_ from trade union all select id,"test" from trade;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), date_ (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), 'test' (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> ListSink
> {code}
> explain 2:
> {code}
> set hive.execution.engine=spark;
> 

[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same

2016-01-17 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15104080#comment-15104080
 ] 

Chengxiang Li commented on HIVE-12736:
--

[~xuefuz], would you help to review this patch?

> It seems that result of Hive on Spark be mistaken and result of Hive and Hive 
> on Spark are not the same
> ---
>
> Key: HIVE-12736
> URL: https://issues.apache.org/jira/browse/HIVE-12736
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Chengxiang Li
> Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch
>
>
> select  * from staff;
> 1 jone22  1
> 2 lucy21  1
> 3 hmm 22  2
> 4 james   24  3
> 5 xiaoliu 23  3
> select id,date_ from trade union all select id,"test" from trade ;
> 1 201510210908
> 2 201509080234
> 2 201509080235
> 1 test
> 2 test
> 2 test
> set hive.execution.engine=spark;
> set spark.master=local;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> 1 jone22  1   1   201510210908
> 2 lucy21  1   2   201509080234
> 2 lucy21  1   2   201509080235
> set hive.execution.engine=mr;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> FAILED: SemanticException [Error 10227]: Not all clauses are supported with 
> mapjoin hint. Please remove mapjoin hint.
> I have two questions
> 1.Why result of hive on spark not include the following record?
> 1 jone22  1   1   test
> 2 lucy21  1   2   test
> 2 lucy21  1   2   test
> 2.Why there are two different ways of dealing same query?
> explain 1:
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select id,date_ from trade union all select id,"test" from trade;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), date_ (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), 'test' (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> ListSink
> explain 2:
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> OK
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   DagName: 

[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same

2016-01-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093864#comment-15093864
 ] 

Hive QA commented on HIVE-12736:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12781765/HIVE-12736.1-spark.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 63 failed/errored test(s), 9851 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_non_string_partition.q-delete_where_non_partitioned.q-auto_sortmerge_join_16.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin_negative
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin_negative2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin_negative3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_column_access_stats
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join26
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join27
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join30
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join36
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join37
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join38
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join39
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join40
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_empty
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_filters_overlap
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_map_ppr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_nullsafe
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_distinct
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_filter_on_outerjoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_test_outer
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_semijoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_17
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_6
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_7

[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same

2016-01-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094439#comment-15094439
 ] 

Hive QA commented on HIVE-12736:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12781785/HIVE-12736.2-spark.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 9836 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestParseNegative - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_join29
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1027/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1027/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-1027/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12781785 - PreCommit-HIVE-SPARK-Build

> It seems that result of Hive on Spark be mistaken and result of Hive and Hive 
> on Spark are not the same
> ---
>
> Key: HIVE-12736
> URL: https://issues.apache.org/jira/browse/HIVE-12736
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Chengxiang Li
> Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch
>
>
> select  * from staff;
> 1 jone22  1
> 2 lucy21  1
> 3 hmm 22  2
> 4 james   24  3
> 5 xiaoliu 23  3
> select id,date_ from trade union all select id,"test" from trade ;
> 1 201510210908
> 2 201509080234
> 2 201509080235
> 1 test
> 2 test
> 2 test
> set hive.execution.engine=spark;
> set spark.master=local;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> 1 jone22  1   1   201510210908
> 2 lucy21  1   2   201509080234
> 2 lucy21  1   2   201509080235
> set hive.execution.engine=mr;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> FAILED: SemanticException [Error 10227]: Not all clauses are supported with 
> mapjoin hint. Please remove mapjoin hint.
> I have two questions
> 1.Why result of hive on spark not include the following record?
> 1 jone22  1   1   test
> 2 lucy21  1   2   test
> 2 lucy21  1   2   test
> 2.Why there are two different ways of dealing same query?
> explain 1:
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select id,date_ from trade union all select id,"test" from trade;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), date_ (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>  

[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same

2016-01-07 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088702#comment-15088702
 ] 

Chengxiang Li commented on HIVE-12736:
--

I would work on this issue.

> It seems that result of Hive on Spark be mistaken and result of Hive and Hive 
> on Spark are not the same
> ---
>
> Key: HIVE-12736
> URL: https://issues.apache.org/jira/browse/HIVE-12736
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Chengxiang Li
>
> select  * from staff;
> 1 jone22  1
> 2 lucy21  1
> 3 hmm 22  2
> 4 james   24  3
> 5 xiaoliu 23  3
> select id,date_ from trade union all select id,"test" from trade ;
> 1 201510210908
> 2 201509080234
> 2 201509080235
> 1 test
> 2 test
> 2 test
> set hive.execution.engine=spark;
> set spark.master=local;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> 1 jone22  1   1   201510210908
> 2 lucy21  1   2   201509080234
> 2 lucy21  1   2   201509080235
> set hive.execution.engine=mr;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> FAILED: SemanticException [Error 10227]: Not all clauses are supported with 
> mapjoin hint. Please remove mapjoin hint.
> I have two questions
> 1.Why result of hive on spark not include the following record?
> 1 jone22  1   1   test
> 2 lucy21  1   2   test
> 2 lucy21  1   2   test
> 2.Why there are two different ways of dealing same query?
> explain 1:
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select id,date_ from trade union all select id,"test" from trade;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), date_ (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), 'test' (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> ListSink
> explain 2:
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> OK
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   DagName: jonezhang_20151222191716_be7eac84-b5b6-4478-b88f-9f59e2b1b1a8:3
>   Vertices:
> Map 1 
> Map