[ https://issues.apache.org/jira/browse/HIVE-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478346#comment-13478346 ]
Kevin Wilfong commented on HIVE-3544: ------------------------------------- I tracked the problem down to two causes. 1) In the genUnionPlan, when preparing the ColumnInfo objects to be used to generate the RowResolver for the Union operator, it actually changes the ColumnInfo objects of the left operator's RowResolver to have the "common class" as its type. This would cause it to get serialized wrong in the intermediate FileSink operator between map reduce jobs (as was the case when the left subquery of the union involved a join). 2) The common class for a column of the Union operator would get determined once at compile time and again later at run time using different functions which could return different classes (for instance when the type on one side was a double and on the other it was a string). This caused the union operator to return objects with a different type from what the RowResolver specified causing serialization errors/failures. To fix 1) I added the ability to clone a ColumnInfo, and in the SemanticAnalyzer the left operator's ColumnInfo objects are now cloned before being modified. To fix 2) I added Select operators between the input operators and the union operator. These select operators cast the input columns to the types determined at compile time if they do not match, otherwise they simply forward the value. Now the conversion in the union operator is only needed to alter the the type of the ObjectInspector, not the type of the column. > union involving double column with a map join subquery will fail or give > wrong results > -------------------------------------------------------------------------------------- > > Key: HIVE-3544 > URL: https://issues.apache.org/jira/browse/HIVE-3544 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.10.0 > Reporter: Kevin Wilfong > Assignee: Kevin Wilfong > Attachments: HIVE-3581.1.patch.txt > > > The following query fails: > select * from (select cast(a.key as bigint) as key from src a join src b on > a.key = b.key union all select cast(key as double) as key from src)a > The following query gives wrong results: > select * from (select cast(a.key as bigint) as key, cast(b.key as double) as > value from src a join src b on a.key = b.key union all select cast(key as > double) as key, cast(key as string) as value from src)a > But the following query runs fine: > select * from (select cast(a.key as bigint) as key from src a union all > select cast(key as double) as key from src)a -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira