[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

superbobry Fri, 28 Sep 2018 04:57:39 -0700

Github user superbobry commented on the issue:

    https://github.com/apache/spark/pull/21157
  
    > so this change would introduce a pretty big regression?
    
    The change does introduce a regression as some namedtuples will become 
unpicklable. However, it makes pickling in PySpark more predictable and robust 
(see the linked blog posts for details).
    
    Sidenote: `mllib.fpm` contains another case of non-picklable namedtuple -- 
"nested":
    
    ```python
    >>> class A:
    ...     class B(namedtuple("B", [])): pass
    ...
    >>> import pickle
    >>> pickle.loads(pickle.dumps(A.B))
    Traceback (most recent call last):
      [...]
    pickle.PicklingError: Can't pickle <class '__main__.B'>: it's not found as 
__main__.B
    ```
    
    It is pickleable with `_hijack_namedtuple` enabled, because the namedtuple 
class is recreated during unpickling. However, I think that "nested" classes 
are an anti-pattern in Python, because the outer class is not a proper 
namespace (like locals/globals/...):
    
    ```python
    >>> A.B
    <class '__main__.B'>
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

Reply via email to