[ 
https://issues.apache.org/jira/browse/HIVE-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478346#comment-13478346
 ] 

Kevin Wilfong commented on HIVE-3544:
-------------------------------------

I tracked the problem down to two causes.

1) In the genUnionPlan, when preparing the ColumnInfo objects to be used to 
generate the RowResolver for the Union operator, it actually changes the 
ColumnInfo objects of the left operator's RowResolver to have the "common 
class" as its type. This would cause it to get serialized wrong in the 
intermediate FileSink operator between map reduce jobs (as was the case when 
the left subquery of the union involved a join).

2) The common class for a column of the Union operator would get determined 
once at compile time and again later at run time using different functions 
which could return different classes (for instance when the type on one side 
was a double and on the other it was a string). This caused the union operator 
to return objects with a different type from what the RowResolver specified 
causing serialization errors/failures.

To fix 1) I added the ability to clone a ColumnInfo, and in the 
SemanticAnalyzer the left operator's ColumnInfo objects are now cloned before 
being modified.

To fix 2) I added Select operators between the input operators and the union 
operator. These select operators cast the input columns to the types determined 
at compile time if they do not match, otherwise they simply forward the value. 
Now the conversion in the union operator is only needed to alter the the type 
of the ObjectInspector, not the type of the column.
                
> union involving double column with a map join subquery will fail or give 
> wrong results
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-3544
>                 URL: https://issues.apache.org/jira/browse/HIVE-3544
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.10.0
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>         Attachments: HIVE-3581.1.patch.txt
>
>
> The following query fails:
> select * from (select cast(a.key as bigint) as key from src a join src b on 
> a.key = b.key union all select cast(key as double) as key from src)a
> The following query gives wrong results:
> select * from (select cast(a.key as bigint) as key, cast(b.key as double) as 
> value from src a join src b on a.key = b.key union all select cast(key as 
> double) as key, cast(key as string) as value from src)a
> But the following query runs fine:
> select * from (select cast(a.key as bigint) as key from src a union all 
> select cast(key as double) as key from src)a

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to