[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same
[ https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106575#comment-15106575 ] Chengxiang Li commented on HIVE-12736: -- Besides, during test, i found TestSparkNegativeCliDriver run in MR mode actually, i would create another JIRA to track it. > It seems that result of Hive on Spark be mistaken and result of Hive and Hive > on Spark are not the same > --- > > Key: HIVE-12736 > URL: https://issues.apache.org/jira/browse/HIVE-12736 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.1, 1.2.1 >Reporter: JoneZhang >Assignee: Chengxiang Li > Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch, > HIVE-12736.3-spark.patch, HIVE-12736.4-spark.patch, HIVE-12736.5-spark.patch > > > {code} > select * from staff; > 1 jone22 1 > 2 lucy21 1 > 3 hmm 22 2 > 4 james 24 3 > 5 xiaoliu 23 3 > select id,date_ from trade union all select id,"test" from trade ; > 1 201510210908 > 2 201509080234 > 2 201509080235 > 1 test > 2 test > 2 test > set hive.execution.engine=spark; > set spark.master=local; > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > 1 jone22 1 1 201510210908 > 2 lucy21 1 2 201509080234 > 2 lucy21 1 2 201509080235 > set hive.execution.engine=mr; > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > FAILED: SemanticException [Error 10227]: Not all clauses are supported with > mapjoin hint. Please remove mapjoin hint. > {code} > I have two questions > 1.Why result of hive on spark not include the following record? > {code} > 1 jone22 1 1 test > 2 lucy21 1 2 test > 2 lucy21 1 2 test > {code} > 2.Why there are two different ways of dealing same query? > explain 1: > {code} > set hive.execution.engine=spark; > set spark.master=local; > explain > select id,date_ from trade union all select id,"test" from trade; > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Spark > DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: trade > Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: id (type: int), date_ (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12 Data size: 96 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Map 2 > Map Operator Tree: > TableScan > alias: trade > Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: id (type: int), 'test' (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12 Data size: 96 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} > explain 2: > {code} > set hive.execution.engine=spark; > set spark.master=local; > explain > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on >
[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same
[ https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108052#comment-15108052 ] Xuefu Zhang commented on HIVE-12736: I also tried memcheck.q, and it passed locally for me too. It doesn't seem related to the patch regardless. As to the patch, it looks good to me. However, I do know much about mapjoin with hint, not sure why groupby and union cannot exist before mapjoin. If you have some explanation, that will help. +1 for the patch. > It seems that result of Hive on Spark be mistaken and result of Hive and Hive > on Spark are not the same > --- > > Key: HIVE-12736 > URL: https://issues.apache.org/jira/browse/HIVE-12736 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.1, 1.2.1 >Reporter: JoneZhang >Assignee: Chengxiang Li > Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch, > HIVE-12736.3-spark.patch, HIVE-12736.4-spark.patch, HIVE-12736.5-spark.patch, > HIVE-12736.5-spark.patch > > > {code} > select * from staff; > 1 jone22 1 > 2 lucy21 1 > 3 hmm 22 2 > 4 james 24 3 > 5 xiaoliu 23 3 > select id,date_ from trade union all select id,"test" from trade ; > 1 201510210908 > 2 201509080234 > 2 201509080235 > 1 test > 2 test > 2 test > set hive.execution.engine=spark; > set spark.master=local; > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > 1 jone22 1 1 201510210908 > 2 lucy21 1 2 201509080234 > 2 lucy21 1 2 201509080235 > set hive.execution.engine=mr; > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > FAILED: SemanticException [Error 10227]: Not all clauses are supported with > mapjoin hint. Please remove mapjoin hint. > {code} > I have two questions > 1.Why result of hive on spark not include the following record? > {code} > 1 jone22 1 1 test > 2 lucy21 1 2 test > 2 lucy21 1 2 test > {code} > 2.Why there are two different ways of dealing same query? > explain 1: > {code} > set hive.execution.engine=spark; > set spark.master=local; > explain > select id,date_ from trade union all select id,"test" from trade; > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Spark > DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: trade > Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: id (type: int), date_ (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12 Data size: 96 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Map 2 > Map Operator Tree: > TableScan > alias: trade > Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: id (type: int), 'test' (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12 Data size: 96 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} >
[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same
[ https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106714#comment-15106714 ] Hive QA commented on HIVE-12736: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12783072/HIVE-12736.5-spark.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9870 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_memcheck org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1036/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1036/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-1036/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12783072 - PreCommit-HIVE-SPARK-Build > It seems that result of Hive on Spark be mistaken and result of Hive and Hive > on Spark are not the same > --- > > Key: HIVE-12736 > URL: https://issues.apache.org/jira/browse/HIVE-12736 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.1, 1.2.1 >Reporter: JoneZhang >Assignee: Chengxiang Li > Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch, > HIVE-12736.3-spark.patch, HIVE-12736.4-spark.patch, HIVE-12736.5-spark.patch > > > {code} > select * from staff; > 1 jone22 1 > 2 lucy21 1 > 3 hmm 22 2 > 4 james 24 3 > 5 xiaoliu 23 3 > select id,date_ from trade union all select id,"test" from trade ; > 1 201510210908 > 2 201509080234 > 2 201509080235 > 1 test > 2 test > 2 test > set hive.execution.engine=spark; > set spark.master=local; > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > 1 jone22 1 1 201510210908 > 2 lucy21 1 2 201509080234 > 2 lucy21 1 2 201509080235 > set hive.execution.engine=mr; > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > FAILED: SemanticException [Error 10227]: Not all clauses are supported with > mapjoin hint. Please remove mapjoin hint. > {code} > I have two questions > 1.Why result of hive on spark not include the following record? > {code} > 1 jone22 1 1 test > 2 lucy21 1 2 test > 2 lucy21 1 2 test > {code} > 2.Why there are two different ways of dealing same query? > explain 1: > {code} > set hive.execution.engine=spark; > set spark.master=local; > explain > select id,date_ from trade union all select id,"test" from trade; > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Spark > DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: trade > Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: id (type: int), date_ (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12 Data size: 96 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: >
[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same
[ https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105449#comment-15105449 ] Xuefu Zhang commented on HIVE-12736: Hi [~chengxiang li], thanks for the explanation. That makes sense. The patch looks good. However, could you check if the test failures are related? Specifically I tried join29.q, the test pass w/o your patch. You can also refer to HIVE-9774, which has recent runs. Thanks. > It seems that result of Hive on Spark be mistaken and result of Hive and Hive > on Spark are not the same > --- > > Key: HIVE-12736 > URL: https://issues.apache.org/jira/browse/HIVE-12736 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.1, 1.2.1 >Reporter: JoneZhang >Assignee: Chengxiang Li > Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch, > HIVE-12736.3-spark.patch > > > {code} > select * from staff; > 1 jone22 1 > 2 lucy21 1 > 3 hmm 22 2 > 4 james 24 3 > 5 xiaoliu 23 3 > select id,date_ from trade union all select id,"test" from trade ; > 1 201510210908 > 2 201509080234 > 2 201509080235 > 1 test > 2 test > 2 test > set hive.execution.engine=spark; > set spark.master=local; > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > 1 jone22 1 1 201510210908 > 2 lucy21 1 2 201509080234 > 2 lucy21 1 2 201509080235 > set hive.execution.engine=mr; > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > FAILED: SemanticException [Error 10227]: Not all clauses are supported with > mapjoin hint. Please remove mapjoin hint. > {code} > I have two questions > 1.Why result of hive on spark not include the following record? > {code} > 1 jone22 1 1 test > 2 lucy21 1 2 test > 2 lucy21 1 2 test > {code} > 2.Why there are two different ways of dealing same query? > explain 1: > {code} > set hive.execution.engine=spark; > set spark.master=local; > explain > select id,date_ from trade union all select id,"test" from trade; > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Spark > DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: trade > Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: id (type: int), date_ (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12 Data size: 96 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Map 2 > Map Operator Tree: > TableScan > alias: trade > Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: id (type: int), 'test' (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12 Data size: 96 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} > explain 2: > {code} > set hive.execution.engine=spark; > set spark.master=local; > explain > select /*+mapjoin(t)*/ * from staff
[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same
[ https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106218#comment-15106218 ] Hive QA commented on HIVE-12736: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12782975/HIVE-12736.4-spark.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 9868 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_semijoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_leftsemi_mapjoin org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_semijoin org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1035/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1035/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-1035/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12782975 - PreCommit-HIVE-SPARK-Build > It seems that result of Hive on Spark be mistaken and result of Hive and Hive > on Spark are not the same > --- > > Key: HIVE-12736 > URL: https://issues.apache.org/jira/browse/HIVE-12736 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.1, 1.2.1 >Reporter: JoneZhang >Assignee: Chengxiang Li > Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch, > HIVE-12736.3-spark.patch, HIVE-12736.4-spark.patch > > > {code} > select * from staff; > 1 jone22 1 > 2 lucy21 1 > 3 hmm 22 2 > 4 james 24 3 > 5 xiaoliu 23 3 > select id,date_ from trade union all select id,"test" from trade ; > 1 201510210908 > 2 201509080234 > 2 201509080235 > 1 test > 2 test > 2 test > set hive.execution.engine=spark; > set spark.master=local; > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > 1 jone22 1 1 201510210908 > 2 lucy21 1 2 201509080234 > 2 lucy21 1 2 201509080235 > set hive.execution.engine=mr; > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > FAILED: SemanticException [Error 10227]: Not all clauses are supported with > mapjoin hint. Please remove mapjoin hint. > {code} > I have two questions > 1.Why result of hive on spark not include the following record? > {code} > 1 jone22 1 1 test > 2 lucy21 1 2 test > 2 lucy21 1 2 test > {code} > 2.Why there are two different ways of dealing same query? > explain 1: > {code} > set hive.execution.engine=spark; > set spark.master=local; > explain > select id,date_ from trade union all select id,"test" from trade; > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Spark > DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: trade > Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: id (type: int), date_ (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12 Data size: 96 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat >
[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same
[ https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15104138#comment-15104138 ] Xuefu Zhang commented on HIVE-12736: Hi [~chengxiang li], Sorry for being late in reviewing this. The patch looks good, but patch #2 has a change in ReduceSinkOperator. Is that intentional? It seems changing the return value from "false" to "true" (inherited from Operator class). Secondly, can we incorporate the test case provided in the JIRA description? Let's forget about it if it's too hard. Thanks. > It seems that result of Hive on Spark be mistaken and result of Hive and Hive > on Spark are not the same > --- > > Key: HIVE-12736 > URL: https://issues.apache.org/jira/browse/HIVE-12736 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.1, 1.2.1 >Reporter: JoneZhang >Assignee: Chengxiang Li > Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch > > > {code} > select * from staff; > 1 jone22 1 > 2 lucy21 1 > 3 hmm 22 2 > 4 james 24 3 > 5 xiaoliu 23 3 > select id,date_ from trade union all select id,"test" from trade ; > 1 201510210908 > 2 201509080234 > 2 201509080235 > 1 test > 2 test > 2 test > set hive.execution.engine=spark; > set spark.master=local; > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > 1 jone22 1 1 201510210908 > 2 lucy21 1 2 201509080234 > 2 lucy21 1 2 201509080235 > set hive.execution.engine=mr; > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > FAILED: SemanticException [Error 10227]: Not all clauses are supported with > mapjoin hint. Please remove mapjoin hint. > {code} > I have two questions > 1.Why result of hive on spark not include the following record? > {code} > 1 jone22 1 1 test > 2 lucy21 1 2 test > 2 lucy21 1 2 test > {code} > 2.Why there are two different ways of dealing same query? > explain 1: > {code} > set hive.execution.engine=spark; > set spark.master=local; > explain > select id,date_ from trade union all select id,"test" from trade; > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Spark > DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: trade > Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: id (type: int), date_ (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12 Data size: 96 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Map 2 > Map Operator Tree: > TableScan > alias: trade > Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: id (type: int), 'test' (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12 Data size: 96 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} > explain 2: > {code} > set hive.execution.engine=spark; >
[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same
[ https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15104080#comment-15104080 ] Chengxiang Li commented on HIVE-12736: -- [~xuefuz], would you help to review this patch? > It seems that result of Hive on Spark be mistaken and result of Hive and Hive > on Spark are not the same > --- > > Key: HIVE-12736 > URL: https://issues.apache.org/jira/browse/HIVE-12736 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.1, 1.2.1 >Reporter: JoneZhang >Assignee: Chengxiang Li > Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch > > > select * from staff; > 1 jone22 1 > 2 lucy21 1 > 3 hmm 22 2 > 4 james 24 3 > 5 xiaoliu 23 3 > select id,date_ from trade union all select id,"test" from trade ; > 1 201510210908 > 2 201509080234 > 2 201509080235 > 1 test > 2 test > 2 test > set hive.execution.engine=spark; > set spark.master=local; > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > 1 jone22 1 1 201510210908 > 2 lucy21 1 2 201509080234 > 2 lucy21 1 2 201509080235 > set hive.execution.engine=mr; > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > FAILED: SemanticException [Error 10227]: Not all clauses are supported with > mapjoin hint. Please remove mapjoin hint. > I have two questions > 1.Why result of hive on spark not include the following record? > 1 jone22 1 1 test > 2 lucy21 1 2 test > 2 lucy21 1 2 test > 2.Why there are two different ways of dealing same query? > explain 1: > set hive.execution.engine=spark; > set spark.master=local; > explain > select id,date_ from trade union all select id,"test" from trade; > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Spark > DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: trade > Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: id (type: int), date_ (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12 Data size: 96 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Map 2 > Map Operator Tree: > TableScan > alias: trade > Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: id (type: int), 'test' (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12 Data size: 96 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > explain 2: > set hive.execution.engine=spark; > set spark.master=local; > explain > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > OK > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-1 depends on stages: Stage-2 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > DagName:
[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same
[ https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093864#comment-15093864 ] Hive QA commented on HIVE-12736: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12781765/HIVE-12736.1-spark.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 63 failed/errored test(s), 9851 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestMiniTezCliDriver-vector_non_string_partition.q-delete_where_non_partitioned.q-auto_sortmerge_join_16.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin12 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin4 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin8 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin9 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin_negative org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin_negative2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin_negative3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_column_access_stats org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join26 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join27 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join30 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join36 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join37 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join38 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join39 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join40 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_empty org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_filters_overlap org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_map_ppr org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_nullsafe org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_distinct org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_filter_on_outerjoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_test_outer org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_semijoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_12 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_15 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_16 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_17 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_4 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_6 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_7
[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same
[ https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094439#comment-15094439 ] Hive QA commented on HIVE-12736: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12781785/HIVE-12736.2-spark.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 9836 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestParseNegative - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_join29 org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1027/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1027/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-1027/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12781785 - PreCommit-HIVE-SPARK-Build > It seems that result of Hive on Spark be mistaken and result of Hive and Hive > on Spark are not the same > --- > > Key: HIVE-12736 > URL: https://issues.apache.org/jira/browse/HIVE-12736 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.1, 1.2.1 >Reporter: JoneZhang >Assignee: Chengxiang Li > Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch > > > select * from staff; > 1 jone22 1 > 2 lucy21 1 > 3 hmm 22 2 > 4 james 24 3 > 5 xiaoliu 23 3 > select id,date_ from trade union all select id,"test" from trade ; > 1 201510210908 > 2 201509080234 > 2 201509080235 > 1 test > 2 test > 2 test > set hive.execution.engine=spark; > set spark.master=local; > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > 1 jone22 1 1 201510210908 > 2 lucy21 1 2 201509080234 > 2 lucy21 1 2 201509080235 > set hive.execution.engine=mr; > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > FAILED: SemanticException [Error 10227]: Not all clauses are supported with > mapjoin hint. Please remove mapjoin hint. > I have two questions > 1.Why result of hive on spark not include the following record? > 1 jone22 1 1 test > 2 lucy21 1 2 test > 2 lucy21 1 2 test > 2.Why there are two different ways of dealing same query? > explain 1: > set hive.execution.engine=spark; > set spark.master=local; > explain > select id,date_ from trade union all select id,"test" from trade; > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Spark > DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: trade > Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: id (type: int), date_ (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12 Data size: 96 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat >
[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same
[ https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088702#comment-15088702 ] Chengxiang Li commented on HIVE-12736: -- I would work on this issue. > It seems that result of Hive on Spark be mistaken and result of Hive and Hive > on Spark are not the same > --- > > Key: HIVE-12736 > URL: https://issues.apache.org/jira/browse/HIVE-12736 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.1, 1.2.1 >Reporter: JoneZhang >Assignee: Chengxiang Li > > select * from staff; > 1 jone22 1 > 2 lucy21 1 > 3 hmm 22 2 > 4 james 24 3 > 5 xiaoliu 23 3 > select id,date_ from trade union all select id,"test" from trade ; > 1 201510210908 > 2 201509080234 > 2 201509080235 > 1 test > 2 test > 2 test > set hive.execution.engine=spark; > set spark.master=local; > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > 1 jone22 1 1 201510210908 > 2 lucy21 1 2 201509080234 > 2 lucy21 1 2 201509080235 > set hive.execution.engine=mr; > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > FAILED: SemanticException [Error 10227]: Not all clauses are supported with > mapjoin hint. Please remove mapjoin hint. > I have two questions > 1.Why result of hive on spark not include the following record? > 1 jone22 1 1 test > 2 lucy21 1 2 test > 2 lucy21 1 2 test > 2.Why there are two different ways of dealing same query? > explain 1: > set hive.execution.engine=spark; > set spark.master=local; > explain > select id,date_ from trade union all select id,"test" from trade; > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Spark > DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: trade > Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: id (type: int), date_ (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12 Data size: 96 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Map 2 > Map Operator Tree: > TableScan > alias: trade > Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: id (type: int), 'test' (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12 Data size: 96 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > explain 2: > set hive.execution.engine=spark; > set spark.master=local; > explain > select /*+mapjoin(t)*/ * from staff s join > (select id,date_ from trade union all select id,"test" from trade ) t on > s.id=t.id; > OK > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-1 depends on stages: Stage-2 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > DagName: jonezhang_20151222191716_be7eac84-b5b6-4478-b88f-9f59e2b1b1a8:3 > Vertices: > Map 1 > Map