Github user superbobry commented on the issue: https://github.com/apache/spark/pull/21157 > so this change would introduce a pretty big regression? The change does introduce a regression as some namedtuples will become unpicklable. However, it makes pickling in PySpark more predictable and robust (see the linked blog posts for details). Sidenote: `mllib.fpm` contains another case of non-picklable namedtuple -- "nested": ```python >>> class A: ... class B(namedtuple("B", [])): pass ... >>> import pickle >>> pickle.loads(pickle.dumps(A.B)) Traceback (most recent call last): [...] pickle.PicklingError: Can't pickle <class '__main__.B'>: it's not found as __main__.B ``` It is pickleable with `_hijack_namedtuple` enabled, because the namedtuple class is recreated during unpickling. However, I think that "nested" classes are an anti-pattern in Python, because the outer class is not a proper namespace (like locals/globals/...): ```python >>> A.B <class '__main__.B'> ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org