[ https://issues.apache.org/jira/browse/SPARK-28478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-28478: ---------------------------------- Component/s: (was: Optimizer) > Optimizer rule to remove unnecessary explicit null checks for null-intolerant > expressions (e.g. if(x is null, x, f(x))) > ----------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-28478 > URL: https://issues.apache.org/jira/browse/SPARK-28478 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Josh Rosen > Priority: Major > > I ran across a family of expressions like > {code:java} > if(x is null, x, substring(x, 0, 1024)){code} > or > {code:java} > when($"x".isNull, $"x", substring($"x", 0, 1024)){code} > that were written this way because the query author was unsure about whether > {{substring}} would return {{null}} when its input string argument is null. > This explicit null-handling is unnecessary and adds bloat to the generated > code, especially if it's done via a {{CASE}} statement (which compiles down > to a {{do-while}} loop). > In another case I saw a query compiler which automatically generated this > type of code. > It would be cool if Spark could automatically optimize such queries to remove > these redundant null checks. Here's a sketch of what such a rule might look > like (assuming that SPARK-28477 has been implement so we only need to worry > about the {{IF}} case): > * In the pattern match, check the following three conditions in the > following order (to benefit from short-circuiting) > ** The {{IF}} condition is an explicit null-check of a column {{c}} > ** The {{true}} expression returns either {{c}} or {{null}} > ** The {{false}} expression is a _null-intolerant_ expression with {{c}} as > a _direct_ child. > * If this condition matches, replace the entire {{If}} with the {{false}} > branch's expression.. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org