[ https://issues.apache.org/jira/browse/HIVE-18524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375323#comment-16375323 ]
Ke Jia commented on HIVE-18524: ------------------------------- [~mmccline]: {quote}Before we add any more vectorization features like this one I'd like to see a better testing framework. HIVE-18622 was a huge fix to many vector expressions that were handling NULLs incorrectly. In particular, IfExprColumnNull and IfExprNullColumn had bugs. By better framework, I like to see all the different UDFs generated with expressions for both row- and vector- and have data will NULLs generated and driven through those expressions and the results compared. I think a framework like this would have found the bug that resulted in the revert. And, also would have found many of the problems fixed by HIVE-18622. {quote}Now, we have passed all the origin qtest and the new added qtest I can think of . If you have comprehensive test cases, Please tell me, thanks. {quote}Vector expressions need to be created in the VectorizationContext class and not as a special case in VectorUDFAdaptor. VectorizationContext instantiates vector expressions so they have proper TypeInfo and can be displayed with EXPLAIN VECTORIZATION. {quote} I will try to move the creation of Vector expression code in VectorUDFAdaptor to VectorizationContext . {quote}What I don't understand is why an IF expr with computed THEN and/or ELSE values isn't just another vector expression. I may be missing something. I certainly see it is more sophisticated in that you want to avoid executing any THEN expressions whose IF expr row value is false and similarly avoid executing any ELSE expressions whose IF expr is true. {quote}Case When Else expression is translated to If expression in HIVE-16731. If I have wrong understanding your question, Please tell me, thanks. {quote}Usually, rather than add special cases to existing vector expressions we use GenVectorCode to generate with templates new class variations. We try to avoid complicated base classes that have lots of decision logic. That was my impression when I read the code a while ago. I could be wrong but even though it may seem redudant we generally have vectorization code variation just focus on the specific variation they are executing. {quote}The special case only focus on the row- expression (VectorUDFAdaptor.java) to solve the GenericUDFIf case and for the vector- expression, I think it can apply to all the generally case. If I have wrong understanding, Please tell me. > Vectorization: Execution failure related to non-standard embedding of > IfExprConditionalFilter inside VectorUDFAdaptor (Revert HIVE-17139) > ----------------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-18524 > URL: https://issues.apache.org/jira/browse/HIVE-18524 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 3.0.0 > Reporter: Matt McCline > Assignee: Matt McCline > Priority: Critical > Fix For: 3.0.0 > > Attachments: HIVE-18524.01.patch, HIVE-18524.02.patch > > > {noformat} > insert overwrite table insert_10_1 > select cast(gpa as float), > age, > IF(age>40,cast('2011-01-01 01:01:01' as timestamp),NULL), > IF(LENGTH(name)>10,cast(name as binary),NULL) > from studentnull10k > vectorizationSchemaColumns: [0:name:string, 1:age:int, 2:gpa:double] > ExprNodeDescs: > UDFToFloat(gpa) (type: float), > age (type: int), > if((age > 40), 2011-01-01 01:01:01.0, null) (type: timestamp), > if((length(name) > 10), CAST( name AS BINARY), null) (type: binary) > selectExpressions: > VectorUDFAdaptor(if((age > 40), 2011-01-01 01:01:01.0, null)) > (children: LongColGreaterLongScalar(col 1:int, val 40) -> 4:boolean) > -> 5:timestamp, > VectorUDFAdaptor(if((length(name) > 10), CAST( name AS BINARY), null)) > (children: LongColGreaterLongScalar(col 4:int, val 10)(children: > StringLength(col 0:string) -> 4:int) -> 6:boolean, > VectorUDFAdaptor(CAST( name AS BINARY)) -> 7:binary) -> 8:binary > {noformat} > *// Notice there is no vector expression shown for the last IF stmt.* It has > been magically embedded inside the VectorUDFAdaptor object... > Execution results in this call stack. > {nocode} > Caused by: java.lang.NullPointerException > at java.util.Arrays.copyOfRange(Arrays.java:3521) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$9.writeValue(VectorExpressionWriterFactory.java:1101) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterBytes.writeValue(VectorExpressionWriterFactory.java:343) > at > org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFArgDesc.getDeferredJavaObject(VectorUDFArgDesc.java:123) > at > org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:211) > at > org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:177) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145) > ... 22 more > {nocode} > Change is due to: > HIVE-17139: Conditional expressions optimization: skip the expression > evaluation if the condition is not satisfied for vectorization engine. (Jia > Ke, reviewed by Ferdinand Xu) > Embedding a raw vector expression outside of VectorizationContext is quite > non-standard and evidently buggy. > [~Ferd] [~Ke Jia] I am inclined to revert this change. Comments? CC: > [~ashutoshc] [~hagleitn] -- This message was sent by Atlassian JIRA (v7.6.3#76005)