[ 
https://issues.apache.org/jira/browse/HIVE-18524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340871#comment-16340871
 ] 

Ke Jia commented on HIVE-18524:
-------------------------------

[~mmccline], Thanks for your findings.

if((length(name) > 10), CAST( name AS BINARY), null) (type: binary)

Here we set “((length(name)) >10)” as conditional expr1,”CAST(name) AS BINARY” 
as else expr2.
 *  HIVE-17139 calculate the conditional expr1 firstly and then calculate the 
satisfied else expr2 and skip non-satisfied expr2. Here the else expr2 is the 
BytesColumnVector and the initial value of the vector array is null. So if we 
skip the expr2, the final value of vector is still null.
 * For VectorUDFAdaptor class, it gets the expression result of batch in row 
mode in setResult method. For the skipped expr2, the value of vector is null. 
So the VectorUDFAdaptor #setResult method causes the NPE.
 * [~mmccline] If the genericUDF is GenericUDFIf, whether we should only get 
the satisfied expression value in setResult method to avoid the unnecessary 
operation. I will upload the patch later. Thanks.

> Vectorization: Execution failure related to non-standard embedding of 
> IfExprConditionalFilter inside VectorUDFAdaptor (Revert HIVE-17139)
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-18524
>                 URL: https://issues.apache.org/jira/browse/HIVE-18524
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 3.0.0
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>             Fix For: 3.0.0
>
>         Attachments: HIVE-18524.01.patch
>
>
> {noformat}
> insert overwrite table insert_10_1
>     select cast(gpa as float),
>            age,
>            IF(age>40,cast('2011-01-01 01:01:01' as timestamp),NULL),
>            IF(LENGTH(name)>10,cast(name as binary),NULL)
>     from studentnull10k
> vectorizationSchemaColumns: [0:name:string, 1:age:int, 2:gpa:double]
> ExprNodeDescs:
>     UDFToFloat(gpa) (type: float),
>     age (type: int),
>     if((age > 40), 2011-01-01 01:01:01.0, null) (type: timestamp),
>     if((length(name) > 10), CAST( name AS BINARY), null) (type: binary)
> selectExpressions:
>     VectorUDFAdaptor(if((age > 40), 2011-01-01 01:01:01.0, null))
>         (children: LongColGreaterLongScalar(col 1:int, val 40) -> 4:boolean) 
> -> 5:timestamp,
>     VectorUDFAdaptor(if((length(name) > 10), CAST( name AS BINARY), null))
>         (children: LongColGreaterLongScalar(col 4:int, val 10)(children: 
> StringLength(col 0:string) -> 4:int) -> 6:boolean,
>         VectorUDFAdaptor(CAST( name AS BINARY)) -> 7:binary) -> 8:binary
> {noformat}
> *// Notice there is no vector expression shown for the last IF stmt.*  It has 
> been magically embedded inside the VectorUDFAdaptor object...
> Execution results in this call stack.
> {nocode}
> Caused by: java.lang.NullPointerException
>       at java.util.Arrays.copyOfRange(Arrays.java:3521)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$9.writeValue(VectorExpressionWriterFactory.java:1101)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterBytes.writeValue(VectorExpressionWriterFactory.java:343)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFArgDesc.getDeferredJavaObject(VectorUDFArgDesc.java:123)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:211)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:177)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145)
>       ... 22 more
> {nocode}
> Change is due to:
> HIVE-17139: Conditional expressions optimization: skip the expression 
> evaluation if the condition is not satisfied for vectorization engine. (Jia 
> Ke, reviewed by Ferdinand Xu)
> Embedding a raw vector expression outside of VectorizationContext is quite 
> non-standard and evidently buggy.
> [~Ferd] [~Ke Jia] I am inclined to revert this change.  Comments?  CC: 
> [~ashutoshc] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to