Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19720 No, a query with a `coalesce` with many/complex parameters will hit this problem. A query with a lot of small `coalesce` will not have the problem. For `AtLeastNNonNulls ` the fix would be safe to be backported, because no class variables are defined, but for `coalesce` it is safer to fix it only with SPARK-18016. In particular, the ongoing PR will solve the issue. The same is true also for all the other similar PRs. Maybe what we can do to backport this to branch-2.2 is to do the splitting and define class level variables only after a threshold of parameter is met, otherwise we go on with the previous code generation (without splitting). In this way we don't have any regression. Or maybe we can backport to 2.2 only those fix which are not introducing class level variables, like for `AtLeastNNonNulls`. Actually I think that the most important of all of these fixes is `AtLeastNNonNulls` indeed, because it is used to drop rows containing all nulls and this fails with dataset with a lot of columns before this PR. All the other functions are less likely to have a huge amount of parameters, despite this may happen and we should support it.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org