[ 
https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15692721#comment-15692721
 ] 

Rui Li commented on HIVE-15239:
-------------------------------

Thanks [~xuefuz] for the suggestions.

1. Not sure if I'm following your point. The code is in method {{compareWork}}, 
which checks if two works are equivalent. The patch adds some special checking 
for MapWork. If the check fails, we don't have to check the operators.
2. OK I'll move the null check to compare methods. Some of them need to stay in 
the caller though, otherwise we'll get NPEs.
3. I thought about override the equals method of each corresponding classes. 
But I'm not sure how to override the hashCode methods accordingly.

The fields used in the comparison are same as those used in the clone method of 
each classes. So I think it's exhaustive. Actually I'm not sure if it's 
necessary to go this far in the comparison. We can simply compare the paths to 
solve the example problem in this JIRA - different paths mean the MapWorks are 
for different tables/partitions. I don't know if it's ever possible that two 
MapWorks point to the same path but have different PartitionDesc.

> hive on spark combine equivalentwork get wrong result because of  tablescan 
> operation compare
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-15239
>                 URL: https://issues.apache.org/jira/browse/HIVE-15239
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 1.2.0, 2.1.0
>            Reporter: wangwenli
>            Assignee: Rui Li
>         Attachments: HIVE-15239.1.patch
>
>
> env: hive on spark engine
> reproduce step:
> {code}
> create table a1(KEHHAO string, START_DT string) partitioned by (END_DT 
> string);
> create table a2(KEHHAO string, START_DT string) partitioned by (END_DT 
> string);
> alter table a1 add partition(END_DT='20161020');
> alter table a1 add partition(END_DT='20161021');
> insert into table a1 partition(END_DT='20161020') 
> values('2000721360','20161001');
> SELECT T1.KEHHAO,COUNT(1) FROM ( 
> SELECT KEHHAO FROM a1 T 
> WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND 
> T.END_DT-1 
> UNION ALL 
> SELECT KEHHAO FROM a2 T
> WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND 
> T.END_DT-1 
> ) T1 
> GROUP BY T1.KEHHAO 
> HAVING COUNT(1)>1; 
> +-------------+------+--+
> |  t1.kehhao  | _c1  |
> +-------------+------+--+
> | 2000721360  | 2    |
> +-------------+------+--+
> {code}
> the result should be none record



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to