[ https://issues.apache.org/jira/browse/SPARK-27629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-27629. ---------------------------------- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24521 [https://github.com/apache/spark/pull/24521] > Prevent Unpickler from intervening each unpickling > -------------------------------------------------- > > Key: SPARK-27629 > URL: https://issues.apache.org/jira/browse/SPARK-27629 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.0.0 > Reporter: Liang-Chi Hsieh > Priority: Major > Fix For: 3.0.0 > > > In SPARK-27612, one correctness issue was reported. When protocol 4 is used > to pickle Python objects, we found that unpickled objects were wrong. A > temporary fix was proposed by not using highest protocol. > It was found that Opcodes.MEMOIZE was appeared in the opcodes in protocol 4. > It is suspect to this issue. > A deeper dive found that Opcodes.MEMOIZE stores objects into internal map of > Unpickler object. We use single Unpickler object to unpickle serialized > Python bytes. Stored objects intervenes next round of unpickling, if the map > is not cleared. > We has two options: > 1. Continues to reuse Unpickler, but calls its close after each unpickling. > 2. Not to reuse Unpickler and create new Unpickler object in each unpickling. > Note: This issue is because internal object map in Pyrolite is not cleared > after op code STOP. If we use protocol 4 to pickle Python objects, op code > MEMOIZE will store objects in the map. We need to clear up it to make sure > next unpickling works on clear map. For now, we can manually clear the map. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org