[ 
https://issues.apache.org/jira/browse/SPARK-27629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-27629.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Issue resolved by pull request 24521
[https://github.com/apache/spark/pull/24521]

> Prevent Unpickler from intervening each unpickling
> --------------------------------------------------
>
>                 Key: SPARK-27629
>                 URL: https://issues.apache.org/jira/browse/SPARK-27629
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.0.0
>            Reporter: Liang-Chi Hsieh
>            Priority: Major
>             Fix For: 3.0.0
>
>
> In SPARK-27612, one correctness issue was reported. When protocol 4 is used 
> to pickle Python objects, we found that unpickled objects were wrong. A 
> temporary fix was proposed by not using highest protocol.
> It was found that Opcodes.MEMOIZE was appeared in the opcodes in protocol 4. 
> It is suspect to this issue.
> A deeper dive found that Opcodes.MEMOIZE stores objects into internal map of 
> Unpickler object. We use single Unpickler object to unpickle serialized 
> Python bytes. Stored objects intervenes next round of unpickling, if the map 
> is not cleared.
> We has two options:
> 1. Continues to reuse Unpickler, but calls its close after each unpickling.
> 2. Not to reuse Unpickler and create new Unpickler object in each unpickling.
> Note: This issue is because internal object map in Pyrolite is not cleared 
> after op code STOP. If we use protocol 4 to pickle Python objects, op code 
> MEMOIZE will store objects in the map. We need to clear up it to make sure 
> next unpickling works on clear map. For now, we can manually clear the map.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to