[GitHub] spark issue #19720: [SPARK-22494][SQL] Fix 64KB limit exception with Coalesc...

mgaido91 Thu, 16 Nov 2017 14:38:08 -0800

Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/19720
  
    No, a query with a `coalesce` with many/complex parameters will hit this 
problem. A query with a lot of small `coalesce` will not have the problem.
    For `AtLeastNNonNulls ` the fix would be safe to be backported, because no 
class variables are defined, but for `coalesce` it is safer to fix it only with 
SPARK-18016. In particular, the ongoing PR will solve the issue.
    The same is true also for all the other similar PRs.
    Maybe what we can do to backport this to branch-2.2 is to do the splitting 
and define class level variables only after a threshold of parameter is met, 
otherwise we go on with the previous code generation (without splitting). In 
this way we don't have any regression.
    Or maybe we can backport to 2.2 only those fix which are not introducing 
class level variables, like for `AtLeastNNonNulls`.
    Actually I think that the most important of all of these fixes is 
`AtLeastNNonNulls` indeed, because it is used to drop rows containing all nulls 
and this fails with dataset with a lot of columns before this PR. All the other 
functions are less likely to have a huge amount of parameters, despite this may 
happen and we should support it.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19720: [SPARK-22494][SQL] Fix 64KB limit exception with Coalesc...

Reply via email to