Davies Liu created SPARK-3554:
---------------------------------

             Summary: handle large dataset in closure of PySpark
                 Key: SPARK-3554
                 URL: https://issues.apache.org/jira/browse/SPARK-3554
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
            Reporter: Davies Liu


Sometimes there are large dataset used in closure and user forget to use 
broadcast for it, then the serialized command will become huge.

py4j can not handle large objects efficiently, we should compress the 
serialized command and user broadcast for it if it's huge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to