[ https://issues.apache.org/jira/browse/SPARK-42384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685995#comment-17685995 ]
Apache Spark commented on SPARK-42384: -------------------------------------- User 'bersprockets' has created a pull request for this issue: https://github.com/apache/spark/pull/39945 > Mask function's generated code does not handle null input > --------------------------------------------------------- > > Key: SPARK-42384 > URL: https://issues.apache.org/jira/browse/SPARK-42384 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.4.0, 3.5.0 > Reporter: Bruce Robbins > Priority: Major > > Example: > {noformat} > create or replace temp view v1 as > select * from values > (null), > ('AbCD123-@$#') > as data(col1); > cache table v1; > select mask(col1) from v1; > {noformat} > This query results in a {{NullPointerException}}: > {noformat} > 23/02/07 16:36:06 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) > java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) > {noformat} > The generated code calls {{UnsafeWriter.write(0, value_0)}} regardless of > whether {{Mask.transformInput}} returns null or not. The > {{UnsafeWriter.write}} method for {{UTF8String}} does not expect a null > pointer. > {noformat} > /* 031 */ boolean isNull_1 = i.isNullAt(0); > /* 032 */ UTF8String value_1 = isNull_1 ? > /* 033 */ null : (i.getUTF8String(0)); > /* 034 */ > /* 035 */ > /* 036 */ > /* 037 */ > /* 038 */ UTF8String value_0 = null; > /* 039 */ value_0 = > org.apache.spark.sql.catalyst.expressions.Mask.transformInput(value_1, > ((UTF8String) references[0] /* literal */), ((UTF8String) references[1] /* > literal */), ((UTF8String) references[2] /* literal */), ((UTF8String) > references[3] /* literal */));; > /* 040 */ if (false) { > /* 041 */ mutableStateArray_0[0].setNullAt(0); > /* 042 */ } else { > /* 043 */ mutableStateArray_0[0].write(0, value_0); > /* 044 */ } > /* 045 */ return (mutableStateArray_0[0].getRow()); > /* 046 */ } > {noformat} > The bug is not exercised by a literal null input value, since there appears > to be some optimization that simply replaces the entire function call with a > null literal: > {noformat} > spark-sql> explain SELECT mask(NULL); > == Physical Plan == > *(1) Project [null AS mask(NULL, X, x, n, NULL)#47] > +- *(1) Scan OneRowRelation[] > Time taken: 0.026 seconds, Fetched 1 row(s) > spark-sql> SELECT mask(NULL); > NULL > Time taken: 0.042 seconds, Fetched 1 row(s) > spark-sql> > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org