[ 
https://issues.apache.org/jira/browse/HIVE-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HIVE-8312:
--------------------------
    Description: 
Suppose we have a query as follows.
"
SELECT ....
FROM A LEFT SEMI JOIN B
ON (A.col1 = B.col2)
WHERE ...
"
If A.col1 is of STRING type, but B.col2 is of BIGINT, or DOUBLE,
Hive finds the common compatible NUMERIC type (here is DOUBLE) for both cols 
and do implicit type conversion.

However, this implicit conversion from STRING to DOUBLE could produce NULL 
values, which could further
generate unexpected results, like skew.

Why do we always convert to NUMERIC type when both columns are in different 
type groups?  Do we expect the corresponding exceptions happen?

  was:
Suppose we have a query as follows.
"
SELECT ....
FROM A LEFT SEMI JOIN B
ON (A.col1 = B.col2)
WHERE ...
"
If A.col1 is of STRING type, but B.col2 is of BIGINT, or DOUBLE,
Hive finds the common compatible NUMERIC type (here is DOUBLE) for both cols 
and do implicit type conversion.

However, this implicit conversion from STRING to DOUBLE could produce NULL 
values, which could further
generate unexpected results, like skew.

Why do we always convert to NUMERIC type? Any rationale here? If not expected, 
how should we handle it?


> Implicit type conversion on Join keys
> -------------------------------------
>
>                 Key: HIVE-8312
>                 URL: https://issues.apache.org/jira/browse/HIVE-8312
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Lin Liu
>
> Suppose we have a query as follows.
> "
> SELECT ....
> FROM A LEFT SEMI JOIN B
> ON (A.col1 = B.col2)
> WHERE ...
> "
> If A.col1 is of STRING type, but B.col2 is of BIGINT, or DOUBLE,
> Hive finds the common compatible NUMERIC type (here is DOUBLE) for both cols 
> and do implicit type conversion.
> However, this implicit conversion from STRING to DOUBLE could produce NULL 
> values, which could further
> generate unexpected results, like skew.
> Why do we always convert to NUMERIC type when both columns are in different 
> type groups?  Do we expect the corresponding exceptions happen?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to