[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17274497#comment-17274497
 ] 

Zhihua Deng edited comment on HIVE-24666 at 1/29/21, 3:08 PM:
--------------------------------------------------------------

Correct me if wrong...
 Before the patch, queries like 
{code:java}
select count (distinct cint) from alltypesorc where cast(cint as boolean);{code}
the filter *_cast(cint as boolean)_* would be converted to 
"_SelectColumnIsTrue(col 13:boolean)(children: 
VectorUDFAdaptor(UDFToBoolean(cint)) -> 13:boolean)" ._ This is caused by the 
vectored expressions of UDFToBoolean all in project mode, so an adapter will be 
here.
  
 For _cint1 or cint2_, SelectColumnIsTrue may not be needed,  but this may make 
things a bit difficult as we should take some actions when the expression *or* 
cannot be converted to vectorized filter, we should put a cast to the children 
and try again. For example:
{code:java}
select count (distinct cint) from alltypesorc where cfloat or cint; {code}
the filter _*cfloat or cint*_ would be converted into  _"SelectColumnIsTrue(col 
13:boolean)(children: VectorUDFAdaptor((cfloat or cint)) -> 13:boolean)"_*,*  a 
ClassCastException can be seen at runtime because cfloat/cint is not the bool 
type, there may be some other cases that cannot be converted to the vectorized 
filter of GenericUDFOPOr and result to such problem.
  
 For these cases, a simple useful method is that we insert a bool type 
conversion when generating the expression, this can avoid the 
ClassCastException at runtime, make such filters be converted to vectorized,and 
also benefit the queries running on non vectorization mode.


was (Author: dengzh):
Correct me if wrong...
 Before the patch, queries like 
{code:java}
select count (distinct cint) from alltypesorc where cast(cint as boolean);{code}
the filter *_cast(cint as boolean)_* would be converted to 
"_SelectColumnIsTrue(col 13:boolean)(children: 
VectorUDFAdaptor(UDFToBoolean(cint)) -> 13:boolean)" ._ This is caused by the 
vectored expressions of UDFToBoolean all in project mode, so an adapter will be 
here.
  
 For _cint1 or cint2_, SelectColumnIsTrue may not be needed,  but this may make 
things a bit difficult as we should take some actions when the expression *or* 
cannot be converted to vectorized filter, we should put a cast to the children 
and try again. For example:
{code:java}
select count (distinct cint) from alltypesorc where cfloat or cint; {code}
the filter _*cfloat or cint*_ would be converted into  _"SelectColumnIsTrue(col 
13:boolean)(children: VectorUDFAdaptor((cfloat or cint)) -> 13:boolean)"_*,*  a 
ClassCastException can be seen at runtime because cfloat/cint is not the bool 
type, there may be some other cases that cannot be converted to the vectorized 
filter of ** GenericUDFOPOr and result to such problem.
  
 For these cases, a simple useful method is that we insert a bool type 
conversion when generating the expression, this can avoid the 
ClassCastException at runtime, make such filters be converted to vectorized,and 
also benefit the queries running on non vectorization mode.

> Vectorized UDFToBoolean may unable to filter rows if input is string
> --------------------------------------------------------------------
>
>                 Key: HIVE-24666
>                 URL: https://issues.apache.org/jira/browse/HIVE-24666
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>            Reporter: Zhihua Deng
>            Assignee: Zhihua Deng
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: HIVE-24666.2.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to