[jira] [Commented] (SPARK-1425) PySpark can crash Executors if worker.py fails while serializing data

holdenk (JIRA) Fri, 07 Oct 2016 16:59:50 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556680#comment-15556680
 ]


holdenk commented on SPARK-1425:
--------------------------------

Is this still an issue or do we have a repro case for it? The current framed 
serializer seems to only write out the object once its fully pickled, although 
we are still using the same pipe for both error messages and data.

> PySpark can crash Executors if worker.py fails while serializing data
> ---------------------------------------------------------------------
>
>                 Key: SPARK-1425
>                 URL: https://issues.apache.org/jira/browse/SPARK-1425
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 0.9.0
>            Reporter: Matei Zaharia
>
> The PythonRDD code that talks to the worker will keep calling 
> stream.readInt() and allocating an array of that size. Unfortunately, if the 
> worker gives it corrupted data, it will attempt to allocate a huge array and 
> get an OutOfMemoryError. It would be better to use a different stream to give 
> feedback, *or* only write an object out to the stream once it's been properly 
> pickled to bytes or to a string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1425) PySpark can crash Executors if worker.py fails while serializing data

Reply via email to