[
https://issues.apache.org/jira/browse/SPARK-42492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-42492:
------------------------------------
Assignee: Apache Spark
> Add new function filter_value
> -----------------------------
>
> Key: SPARK-42492
> URL: https://issues.apache.org/jira/browse/SPARK-42492
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 3.3.2
> Reporter: Adam Binford
> Assignee: Apache Spark
> Priority: Major
>
> Doing data validation in Spark can lead to a lot of extra evaluations of
> expressions. This is because conditionally evaluated expressions aren't
> candidates for subexpression elimination. For example a simple expression
> such asĀ
> {{when(validate(col), col)}}
> to only keep col if it matches some condition, will lead to col being
> evaluated twice. And if call itself is made up of a series of expensive
> expressions itself, like regular expression checks, this can lead to a lot of
> wasted computation time.
> The initial attempt to resolve this was
> https://issues.apache.org/jira/browse/SPARK-35564, adding support for
> subexpression elimination to conditionally evaluated expressions. However I
> have not been able to get that merged, so this is an alternative (though I
> believe that is still useful on top of this).
> We can add a new higher order function "filter_value" that takes the column
> you want to validate as an argument, and then a function that runs a lambda
> expression returning a boolean on whether to keep that column or not. It
> would have the same semantics as the above when expression, except it would
> guarantee to only evaluate the initial column once.
> An alternative would be to implement a real definition for the NullIf
> expression, but that would only support exact equals checks and not any
> generic condition.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]