[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same

Xuefu Zhang (JIRA) Sun, 17 Jan 2016 20:11:13 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15104138#comment-15104138
 ]


Xuefu Zhang commented on HIVE-12736:
------------------------------------

Hi [~chengxiang li], Sorry for being late in reviewing this. The patch looks 
good, but patch #2 has a change in ReduceSinkOperator. Is that intentional? It 
seems changing the return value from "false" to "true" (inherited from Operator 
class).

Secondly, can we incorporate the test case provided in the JIRA description? 
Let's forget about it if it's too hard.

Thanks.

> It seems that result of Hive on Spark be mistaken and result of Hive and Hive 
> on Spark are not the same
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-12736
>                 URL: https://issues.apache.org/jira/browse/HIVE-12736
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.1.1, 1.2.1
>            Reporter: JoneZhang
>            Assignee: Chengxiang Li
>         Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch
>
>
> {code}
> select  * from staff;
> 1     jone    22      1
> 2     lucy    21      1
> 3     hmm     22      2
> 4     james   24      3
> 5     xiaoliu 23      3
> select id,date_ from trade union all select id,"test" from trade ;
> 1     201510210908
> 2     201509080234
> 2     201509080235
> 1     test
> 2     test
> 2     test
> set hive.execution.engine=spark;
> set spark.master=local;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> 1     jone    22      1       1       201510210908
> 2     lucy    21      1       2       201509080234
> 2     lucy    21      1       2       201509080235
> set hive.execution.engine=mr;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> FAILED: SemanticException [Error 10227]: Not all clauses are supported with 
> mapjoin hint. Please remove mapjoin hint.
> {code}
> I have two questions
> 1.Why result of hive on spark not include the following record?
> {code}
> 1     jone    22      1       1       test
> 2     lucy    21      1       2       test
> 2     lucy    21      1       2       test
> {code}
> 2.Why there are two different ways of dealing same query?
> explain 1:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select id,date_ from trade union all select id,"test" from trade;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Spark
>       DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: trade
>                   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>                   Select Operator
>                     expressions: id (type: int), date_ (type: string)
>                     outputColumnNames: _col0, _col1
>                     Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
>                     File Output Operator
>                       compressed: false
>                       Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>                       table:
>                           input format: 
> org.apache.hadoop.mapred.TextInputFormat
>                           output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                           serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>         Map 2 
>             Map Operator Tree:
>                 TableScan
>                   alias: trade
>                   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>                   Select Operator
>                     expressions: id (type: int), 'test' (type: string)
>                     outputColumnNames: _col0, _col1
>                     Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
>                     File Output Operator
>                       compressed: false
>                       Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>                       table:
>                           input format: 
> org.apache.hadoop.mapred.TextInputFormat
>                           output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                           serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}
> explain 2:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> OK
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
>     Spark
>       DagName: jonezhang_20151222191716_be7eac84-b5b6-4478-b88f-9f59e2b1b1a8:3
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: trade
>                   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>                   Filter Operator
>                     predicate: id is not null (type: boolean)
>                     Statistics: Num rows: 3 Data size: 24 Basic stats: 
> COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: id (type: int), date_ (type: string)
>                       outputColumnNames: _col0, _col1
>                       Statistics: Num rows: 3 Data size: 24 Basic stats: 
> COMPLETE Column stats: NONE
>                       Spark HashTable Sink Operator
>                         keys:
>                           0 id (type: int)
>                           1 _col0 (type: int)
>             Local Work:
>               Map Reduce Local Work
>   Stage: Stage-1
>     Spark
>       DagName: jonezhang_20151222191716_be7eac84-b5b6-4478-b88f-9f59e2b1b1a8:2
>       Vertices:
>         Map 2 
>             Map Operator Tree:
>                 TableScan
>                   alias: s
>                   Statistics: Num rows: 1 Data size: 66 Basic stats: COMPLETE 
> Column stats: NONE
>                   Filter Operator
>                     predicate: id is not null (type: boolean)
>                     Statistics: Num rows: 1 Data size: 66 Basic stats: 
> COMPLETE Column stats: NONE
>                     Map Join Operator
>                       condition map:
>                            Inner Join 0 to 1
>                       keys:
>                         0 id (type: int)
>                         1 _col0 (type: int)
>                       outputColumnNames: _col0, _col1, _col2, _col3, _col7, 
> _col8
>                       input vertices:
>                         1 Map 1
>                       Statistics: Num rows: 6 Data size: 52 Basic stats: 
> COMPLETE Column stats: NONE
>                       Select Operator
>                         expressions: _col0 (type: int), _col1 (type: string), 
> _col2 (type: int), _col3 (type: int), _col7 (type: int), _col8 (type: string)
>                         outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
> _col5
>                         Statistics: Num rows: 6 Data size: 52 Basic stats: 
> COMPLETE Column stats: NONE
>                         File Output Operator
>                           compressed: false
>                           Statistics: Num rows: 6 Data size: 52 Basic stats: 
> COMPLETE Column stats: NONE
>                           table:
>                               input format: 
> org.apache.hadoop.mapred.TextInputFormat
>                               output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                               serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>             Local Work:
>               Map Reduce Local Work
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}
> I can't find any information about union "test" in explain 2.
> Some properties on hive-site.xml is 
> {code}
> <property>
> <name>hive.ignore.mapjoin.hint</name>
> <value>false</value>
> </property>
> <property>
> <name>hive.auto.convert.join</name>
> <value>true</value>
> </property>
> <property>
> <name>hive.auto.convert.join.noconditionaltask</name>
> <value>true</value>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same

Reply via email to