GitHub user holdenk opened a pull request:

    https://github.com/apache/spark/pull/18734

    [WIP][SPARK-21070][PYSPARK] Attempt to update cloudpickle again

    ## What changes were proposed in this pull request?
    
    Based on https://github.com/apache/spark/pull/18282 by @rgbkrk this PR 
attempts to update to the current released cloudpickle and minimize the 
difference between Spark cloudpickle and "stock" cloud pickle with the goal of 
eventually using the stock cloud pickle.
    
    Some notable changes:
    * Import submodules accessed by pickled functions (cloudpipe/cloudpickle#80)
    * Support recursive functions inside closures (cloudpipe/cloudpickle#89, 
cloudpipe/cloudpickle#90)
    * Fix ResourceWarnings and DeprecationWarnings (cloudpipe/cloudpickle#88)
    * Assume modules with __file__ attribute are not dynamic 
(cloudpipe/cloudpickle#85)
    * Make cloudpickle Python 3.6 compatible (cloudpipe/cloudpickle#72)
    * Allow pickling of builtin methods (cloudpipe/cloudpickle#57)
    * Add ability to pickle dynamically created modules 
(cloudpipe/cloudpickle#52)
    * Support method descriptor (cloudpipe/cloudpickle#46)
    * No more pickling of closed files, was broken on Python 3 
(cloudpipe/cloudpickle#32)
    * ** Remove non-standard __transient__check (cloudpipe/cloudpickle#110)** 
-- while we don't use this internally, and have no tests or documentation for 
its use, downstream code may use __transient__, although it has never been part 
of the API, if we merge this we should include a note about this in the release 
notes.
    * Support for pickling loggers (yay!) (cloudpipe/cloudpickle#96)
    * BUG: Fix crash when pickling dynamic class cycles. 
(cloudpipe/cloudpickle#102)
    
    
    ## How was this patch tested?
    
    Existing PySpark unit tests + the unit tests from the cloudpickle project 
on their own.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/holdenk/spark 
holden-rgbkrk-cloudpickle-upgrades

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18734.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18734
    
----
commit b84222b6ea660ce5ec1aedfee50e297b436eb824
Author: Kyle Kelley <rgb...@gmail.com>
Date:   2017-06-12T22:19:29Z

    [SPARK-21070][PYSPARK] Upgrade cloudpickle
    
    This brings in fixes and upgrades from the 
[cloudpickle](https://github.com/cloudpipe/cloudpickle) module, notably:
    
    * Import submodules accessed by pickled functions 
(https://github.com/cloudpipe/cloudpickle/pull/80)
    * Support recursive functions inside closures 
(https://github.com/cloudpipe/cloudpickle/pull/89, 
https://github.com/cloudpipe/cloudpickle/pull/90)
    * Fix ResourceWarnings and DeprecationWarnings 
(https://github.com/cloudpipe/cloudpickle/pull/88)
    * Assume modules with __file__ attribute are not dynamic 
(https://github.com/cloudpipe/cloudpickle/pull/85)
    * Make cloudpickle Python 3.6 compatible 
(https://github.com/cloudpipe/cloudpickle/pull/72)
    * Allow pickling of builtin methods 
(https://github.com/cloudpipe/cloudpickle/pull/57)
    * Add ability to pickle dynamically created modules 
(https://github.com/cloudpipe/cloudpickle/pull/52)
    * Support method descriptor 
(https://github.com/cloudpipe/cloudpickle/pull/46)
    * No more pickling of closed files, was broken on Python 3 
(https://github.com/cloudpipe/cloudpickle/pull/32)

commit 6d5e5cf412e11d4b8b30f7d46c15405edcb4cb05
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-07-24T19:58:32Z

    Merge branch 'master' into holden-rgbkrk-cloudpickle-upgrades

commit 1cfd38f73da328bcf58ab32228ace4ff59bc26d2
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-07-24T20:01:46Z

    Copy over support work weakset, dynamic classess, and remove __transient__ 
support from PR#110

commit f8ff2da5c093bf20a65df1869ecd3def3fbac2c5
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-07-25T22:02:16Z

    Test fix

commit cff6bfb83d04ee29d660a300b08f6a99dd636cf0
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-07-25T22:02:24Z

    Revert "Copy over support work weakset, dynamic classess, and remove 
__transient__ support from PR#110"
    
    This reverts commit 1cfd38f73da328bcf58ab32228ace4ff59bc26d2.

commit 195cd21ece2036df88a95d2fdc7dfd29c5681efa
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-07-25T22:07:13Z

    Fixed named tuple issue

commit 74880eabf218b4e9f29a25583442ee8d3b6bb0a6
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-07-25T22:09:30Z

    Try and move the fix for namedtuple and re-enable the rest of the useful 
cloudpickle fixes.
    
    Revert "Revert "Copy over support work weakset, dynamic classess, and 
remove __transient__ support from PR#110""
    
    This reverts commit cff6bfb83d04ee29d660a300b08f6a99dd636cf0.

commit 9a0f9b4b9958c5fed6d2c84d725cb03a7be7d41e
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-07-25T22:12:24Z

    Re-enable our custom exception message

commit 09cf41eb3e75e92cc9914e675fb2cb2f99290d38
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-07-25T23:49:57Z

    Save and restore the module info functions

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to