Re: [PYTHON][DISCUSS] Moving to cloudpickle and or Py4J as a dependencies?

2017-02-14 Thread Maciej Szymkiewicz
I don't have any strong views, so just to highlight possible issues: * Based on different issues I've seen there is a substantial amount of users which depend on system wide Python installations. As far as I am aware neither Py4j nor cloudpickle are present in the standard system

Re: [PYTHON][DISCUSS] Moving to cloudpickle and or Py4J as a dependencies?

2017-02-13 Thread Holden Karau
It's a good question. Py4J seems to have been updated 5 times in 2016 and is a bit involved (from a review point of view verifying the zip file contents is somewhat tedious). cloudpickle is a bit difficult to tell since we can have changes to cloudpickle which aren't correctly tagged as

Re: [PYTHON][DISCUSS] Moving to cloudpickle and or Py4J as a dependencies?

2017-02-13 Thread Reynold Xin
With any dependency update (or refactoring of existing code), I always ask this question: what's the benefit? In this case it looks like the benefit is to reduce efforts in backports. Do you know how often we needed to do those? On Tue, Feb 14, 2017 at 12:01 AM, Holden Karau

[PYTHON][DISCUSS] Moving to cloudpickle and or Py4J as a dependencies?

2017-02-13 Thread Holden Karau
Hi PySpark Developers, Cloudpickle is a core part of PySpark, and is originally copied from (and improved from) picloud. Since then other projects have found cloudpickle useful and a fork of cloudpickle was created and is now maintained as its own