[ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319171#comment-15319171
 ] 

Sergey Shelukhin commented on HIVE-13957:
-----------------------------------------

Can you elaborate? The problem is the difference between the approaches, not 
the type of the cast per se.

Normal IN, at some point that doesn't really matter, derives the common type 
for the column and constants, and casts both columns and constants to that 
whenever needed.
Whereas vectorization always tries to convert the constants to the column type, 
the reason being (I supposed) that the specializations for IN all have a 
particular column type in mind. I am not actually very familiar with these and 
whether it would be easy to incorporate a cast; I assume the cast of the column 
would need to come earlier than the specialized IN (i.e. specialized IN should 
already be able to utilize values of the correct type straight out of the VRB), 
which would require the vectorizer to modify the plan above the IN. Or 
something like that.

We could do that, however, as far as I see, it's not the solution we want, 
because of the following.
First, in case of decimal-string, this issue can produce incorrect results, so 
we want a simple fix for that, which the above isn't.
>From the long term perspective, I'd say we need to prohibit implicit casts in 
>this case (I opened a separate JIRA) AND/OR change non-vectorized pipeline 
>rather than vectorized, because casting decimal column to string in this case 
>(what the non-vectorized IN does) is not the intuitively logical thing for the 
>user and may produce unexpected result.

With the latter in mind, we /could/ fix the proximate issue in vectorized code 
(cast to decimal(38,38) that ends up converting all reasonable values to null), 
e.g. constrain the precision and scale to the column type (potentially +2/+1 
for NOT, although the enforcement will probably convert the values that don't 
fit to NULL), assuming the values are trimmed, since more should never be 
needed. But that's still inconsistent with normal IN, and we should probably do 
it later. 
Actually, come think of it, this might also be broken for other UDFs, where 
constraining is not as easy or at least is different (e.g. between needs more 
than strict equality, and with arithmetic ops, if this problem applies, the 
only way would be to derive the maximum values from the value list). I can also 
file a separate JIRA for that...


> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-13957
>                 URL: https://issues.apache.org/jira/browse/HIVE-13957
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to