[
https://issues.apache.org/jira/browse/HIVE-7767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103146#comment-14103146
]
Na Yang commented on HIVE-7767:
-------------------------------
By looking into this issue, I find out the reason that caused this issue.
The "hive.optimize.union.remove=true" optimizer removes the union operator from
the operator tree and ends up generating two graphs in the spark transformation
graph. The current GraphTran execute API is not able to handle multiples graphs
properly. We need to change the execute impl in GraphTran.java to make it
handle multiple transformation graphs. I will upload a patch shortly after
HIVE-7717's patch is committed.
> hive.optimize.union.remove does not work properly [Spark Branch]
> ----------------------------------------------------------------
>
> Key: HIVE-7767
> URL: https://issues.apache.org/jira/browse/HIVE-7767
> Project: Hive
> Issue Type: Sub-task
> Reporter: Na Yang
> Assignee: Na Yang
>
> Turing on the hive.optimize.union.remove property generates wrong union all
> result.
> For Example:
> {noformat}
> create table inputTbl1(key string, val string) stored as textfile;
> load data local inpath '../../data/files/T1.txt' into table inputTbl1;
> SELECT *
> FROM (
> SELECT key, count(1) as values from inputTbl1 group by key
> UNION ALL
> SELECT key, count(1) as values from inputTbl1 group by key
> ) a;
> {noformat}
> when the hive.optimize.union.remove is turned on, the query result is like:
> {noformat}
> 1 1
> 2 1
> 3 1
> 7 1
> 8 2
> {noformat}
> when the hive.optimize.union.remove is turned off, the query result is like:
> {noformat}
> 7 1
> 2 1
> 8 2
> 3 1
> 1 1
> 7 1
> 2 1
> 8 2
> 3 1
> 1 1
> {noformat}
> The expected query result is:
> {noformat}
> 7 1
> 2 1
> 8 2
> 3 1
> 1 1
> 7 1
> 2 1
> 8 2
> 3 1
> 1 1
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)