Davies Liu created SPARK-3554: --------------------------------- Summary: handle large dataset in closure of PySpark Key: SPARK-3554 URL: https://issues.apache.org/jira/browse/SPARK-3554 Project: Spark Issue Type: Improvement Components: PySpark Reporter: Davies Liu
Sometimes there are large dataset used in closure and user forget to use broadcast for it, then the serialized command will become huge. py4j can not handle large objects efficiently, we should compress the serialized command and user broadcast for it if it's huge. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org