[ https://issues.apache.org/jira/browse/SPARK-21351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-21351: ----------------------------------- Assignee: Takeshi Yamamuro > Update nullability based on children's output in optimized logical plan > ----------------------------------------------------------------------- > > Key: SPARK-21351 > URL: https://issues.apache.org/jira/browse/SPARK-21351 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.1.1 > Reporter: Takeshi Yamamuro > Assignee: Takeshi Yamamuro > Priority: Minor > Fix For: 2.4.0 > > > In the master, optimized plans do not respect the nullability that `Filter` > might change when having `IsNotNull`. > This generates unnecessary code for NULL checks. For example: > {code} > scala> val df = Seq((Some(1), Some(2))).toDF("a", "b") > scala> val bIsNotNull = df.where($"b" =!= 2).select($"b") > scala> val targetQuery = bIsNotNull.distinct > scala> val targetQuery.queryExecution.optimizedPlan.output(0).nullable > res5: Boolean = true > scala> targetQuery.debugCodegen > Found 2 WholeStageCodegen subtrees. > == Subtree 1 / 2 == > *HashAggregate(keys=[b#19], functions=[], output=[b#19]) > +- Exchange hashpartitioning(b#19, 200) > +- *HashAggregate(keys=[b#19], functions=[], output=[b#19]) > +- *Project [_2#16 AS b#19] > +- *Filter isnotnull(_2#16) > +- LocalTableScan [_1#15, _2#16] > Generated code: > ... > /* 124 */ protected void processNext() throws java.io.IOException { > ... > /* 132 */ // output the result > /* 133 */ > /* 134 */ while (agg_mapIter.next()) { > /* 135 */ wholestagecodegen_numOutputRows.add(1); > /* 136 */ UnsafeRow agg_aggKey = (UnsafeRow) agg_mapIter.getKey(); > /* 137 */ UnsafeRow agg_aggBuffer = (UnsafeRow) agg_mapIter.getValue(); > /* 138 */ > /* 139 */ boolean agg_isNull4 = agg_aggKey.isNullAt(0); > /* 140 */ int agg_value4 = agg_isNull4 ? -1 : (agg_aggKey.getInt(0)); > /* 141 */ agg_rowWriter1.zeroOutNullBytes(); > /* 142 */ > // We don't need this NULL check because NULL is filtered out > in `$"b" =!=2` > /* 143 */ if (agg_isNull4) { > /* 144 */ agg_rowWriter1.setNullAt(0); > /* 145 */ } else { > /* 146 */ agg_rowWriter1.write(0, agg_value4); > /* 147 */ } > /* 148 */ append(agg_result1); > /* 149 */ > /* 150 */ if (shouldStop()) return; > /* 151 */ } > /* 152 */ > /* 153 */ agg_mapIter.close(); > /* 154 */ if (agg_sorter == null) { > /* 155 */ agg_hashMap.free(); > /* 156 */ } > /* 157 */ } > /* 158 */ > /* 159 */ } > {code} > In the line 143, we don't need this NULL check because NULL is filtered out > in `$"b" =!=2`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org