[ https://issues.apache.org/jira/browse/HIVE-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104730#comment-14104730 ]
Na Yang commented on HIVE-7810: ------------------------------- Hi Chao, For the (1) you observed, the union optimization changed the operator tree and task tree, so that the outer union is removed from the task tree. It expects the move task to do the union work to merge the inner union result and the other map result to the destination location. For the (2) you observed, I have fixed the GraphTran in HIVE-7767, but the patch has not been committed to the branch yet. That is why you only see two MapWork, not three MapWork in the dependency graph. Please wait until that patch gets committed. The behavior I reported in this JIRA is from my local build with the HIVE-7767 patch applied. Thanks, Na > Insert overwrite table query has strange behavior when set > hive.optimize.union.remove=true [Spark Branch] > --------------------------------------------------------------------------------------------------------- > > Key: HIVE-7810 > URL: https://issues.apache.org/jira/browse/HIVE-7810 > Project: Hive > Issue Type: Task > Components: Spark > Reporter: Na Yang > Assignee: Na Yang > > Insert overwrite table query has strange behavior when > set hive.optimize.union.remove=true > set hive.mapred.supports.subdirectories=true; > We expect the following two sets of queries return the same set of data > result, but they do not. > 1) > {noformat} > insert overwrite table outputTbl1 > SELECT * FROM > ( > select key, 1 as values from inputTbl1 > union all > select * FROM ( > SELECT key, count(1) as values from inputTbl1 group by key > UNION ALL > SELECT key, 2 as values from inputTbl1 > ) a > )b; > select * from outputTbl1 order by key, values; > {noformat} > Below is the query result: > {noformat} > 1 1 > 1 2 > 2 1 > 2 2 > 3 1 > 3 2 > 7 1 > 7 2 > 8 2 > 8 2 > 8 2 > {noformat} > 2) > {noformat} > SELECT * FROM > ( > select key, 1 as values from inputTbl1 > union all > select * FROM ( > SELECT key, count(1) as values from inputTbl1 group by key > UNION ALL > SELECT key, 2 as values from inputTbl1 > ) a > )b order by key, values; > {noformat} > Below is the query result: > {noformat} > 1 1 > 1 1 > 1 2 > 2 1 > 2 1 > 2 2 > 3 1 > 3 1 > 3 2 > 7 1 > 7 1 > 7 2 > 8 1 > 8 1 > 8 2 > 8 2 > 8 2 > {noformat} > Some data is missing in the first set of query result. -- This message was sent by Atlassian JIRA (v6.2#6252)