[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-11-11 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
Closing this PR to continue the discussion in the new one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-11-11 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
@HyukjinKwon done in #23008.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
I meant to use


https://github.com/apache/spark/blob/a97001d21757ae214c86371141bd78a376200f66/python/pyspark/serializers.py#L583

Instead of 


https://github.com/apache/spark/blob/a97001d21757ae214c86371141bd78a376200f66/python/pyspark/serializers.py#L561



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-27 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
> I think people do defined NamedTuples in Notebooks, so I'm going to stick 
with -1.

@holdenk I understand your point, but there is still something we can do 
without breaking existing code relying on namedtuple serialization. Option 1: 
switch to cloudpickle as suggested by @HyukjinKwon. Option 2: #21180. What 
would be your choice between the two? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-27 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
@HyukjinKwon do you mean change the default serializer to cloudpickle and 
remove _hack_namedtuple?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
Adding @gatorsmile and @cloud-fan as well since this might be potentially 
breaking changes for 3.0 release (it affects RDD operation only with namedtuple 
in certain case tho)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
And you can also run profiler to show the performance effect. See 
https://github.com/apache/spark/pull/19246#discussion_r139874732 to run the 
profile


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
You can just replace it to CloudPickler, remove changes at tests, and push 
that commit here to show no case is broken


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
Yea, so to avoid to break, we could change the default pickler to 
CloudPickler or document this workaround. @superbobry, can you check if the 
case can be preserved if we use CloudPickler instead?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-21 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
`cloudpickle` does indeed support pickling namedtuples. Maybe the way to go 
is to remove the patch advertise `cloudpickle` serializer for projects relying 
on the old behaviour. Wdyt @holdenk? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
The workaround is to use CloudPickler btw. Technically we many cases that 
normal pickler does not support. This one specific case (namedtuple) was 
allowed by this weird hack.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-20 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
> If removing the hack entirely is going to brake named tuples defined in 
the repl I'm a -1 on that change. 

Yes, but it might be OK for two reasons: people rarely define namedtuples 
in the REPL (hypothesis); and non-namedtuple classes do not work in the REPL 
even with the hack.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-19 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/21157
  
If removing the hack entirely is going to brake named tuples defined in the 
repl I'm a -1 on that change. While we certainly are more free to make breaking 
API changes in a majour version release we still have to think through the 
scope of the change we're going to be pushing onto users and that's pretty 
large.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
To keep the current behaviour without the workaround above (using 
CloudPickler), the weird fix is required 
(https://github.com/apache/spark/pull/21180) where some private methods should 
be used. I also gave few quick tries but looks not quite easy to fix.

It is a hack to remove and looks difficult to remove without any behaviour 
change but still now a rough workaround looked possible (CloudPickler) so I 
inclined to get rid of it at Spark 3.0 for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
To keep the current behaviour without the workaround above (using 
CloudPickler), the weird fix is required 
(https://github.com/apache/spark/pull/21180) where some private methods should 
be used. I also gave few quick tries but looks not quite easy to fix.

It is a hack to remove and looks difficult to remove. Now a rough 
workaround was possible (CloudPickler) so I inclined to get rid of it at Spark 
3.0 for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
> Ok it looks like it was @HyukjinKwon who suggested that we remove this 
hack in general rather than the partial work around can I get your thoughts on 
why? It seems like the partial work around would give us the best of both 
worlds (e.g. we don't break peoples existing Spark code and we handle Python 
tuples better).

Sorry for the late response. Yes, I spent some time to take a look for this 
named tuple hack, and my impression was that we should have not added such 
fixes to only allow named tuple pickling. The named tuple hack was introduced 
for both cloudpickle (SQL path) and normal pickle path, if I am not mistaken.

Cloudpickle at PySpark side now supports it so workaround to allow the 
cases above should be to use CloudPickler when it's possible. I think PySpark 
API exposes this pickler (see the `SparkContext`'s `__init__`) (@superbobry, 
mind if I ask to document this workaround and add a test (and see if it really 
works?)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-13 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
Yes, it will break IPython notebooks as well. I wonder how often people 
actually defined namedtuples in a notebook? 

Emitting a warning is a less extreme option, yes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-12 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/21157
  
I mean, we could warn if we are doing the hijacking and not break peoples 
pipelines?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-12 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21157
  
But that would break both ipython notebooks and repl right? Pretty 
significant breaking change.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-12 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
Yes, that is correct. That is why I think hijacking behaviour should be 
removed. It silently slows down the job and does not notify the user that a 
trivial change such as making the namedtuple importable could result in a 
speedup.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-12 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
Nope, the job I was referring to is not open source; but I guess the 
speedup is easy to justify: much less payload and faster deserialization:

```
>>> from collections import namedtuple
>>> Stats = namedtuple("Stats", ["sample_mean", "sample_variance"])
>>> import pickle
>>> len(pickle.dumps(Stats(42, 42)))
31
>>> len(pickle.dumps(("Stats", Stats._fields, (42, 42
68
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-12 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/21157
  
Do you have the code for demonstrating the 2x speed up @superbobry ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-12 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/21157
  
Ok it looks like it was @HyukjinKwon who suggested that we remove this hack 
in general rather than the partial work around can I get your thoughts on why? 
It seems like the partial work around would give us the best of both worlds 
(e.g. we don't break peoples existing Spark code and we handle Python tuples 
better).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-04 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
> Is it possible to keep the current hack for things which can't be 
pickled, but remove the hack in the situation where the namedtuple is well 
behaved and it could be pickled directly by cloudpickle?

@holdenk yes, this has been proposed in #21180, and later rejected in 
favour of this one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-04 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/21157
  
Is it possible to keep the current hack for things which can't be pickled, 
but remove the hack in the situation where the namedtuple is well behaved and 
it could be pickled directly by cloudpickle? That way we don't have a 
functionality regression but we also improve handling of named tuples more 
generally. Even if so, it would probably be best to wait for 3.0 since this is 
a pretty core change in terms of PySpark.

Before you put in the work though let's see if that the consensus approach 
(if possible).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96815/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #96815 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96815/testReport)**
 for PR 21157 at commit 
[`2addefb`](https://github.com/apache/spark/commit/2addefb9ae2ec12f30e0e939f7782b521efccdd0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #96815 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96815/testReport)**
 for PR 21157 at commit 
[`2addefb`](https://github.com/apache/spark/commit/2addefb9ae2ec12f30e0e939f7782b521efccdd0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96813/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #96813 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96813/testReport)**
 for PR 21157 at commit 
[`9c69047`](https://github.com/apache/spark/commit/9c690474986be63e41eed7aba59399c5a154b72e).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #96813 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96813/testReport)**
 for PR 21157 at commit 
[`9c69047`](https://github.com/apache/spark/commit/9c690474986be63e41eed7aba59399c5a154b72e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-01 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
@rxin the ones mentioned as 1 and 2 in the PR description:

https://superbobry.github.io/pyspark-silently-breaks-your-namedtuples.html

https://superbobry.github.io/tensorflowonspark-or-the-namedtuple-patch-strikes-again.html


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-28 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21157
  
@superbobry which blog were you referring to?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96760/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #96760 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96760/testReport)**
 for PR 21157 at commit 
[`765362e`](https://github.com/apache/spark/commit/765362efb1f40fde3820704fee1628c90d6c0566).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #96760 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96760/testReport)**
 for PR 21157 at commit 
[`765362e`](https://github.com/apache/spark/commit/765362efb1f40fde3820704fee1628c90d6c0566).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96758/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #96758 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96758/testReport)**
 for PR 21157 at commit 
[`65dc266`](https://github.com/apache/spark/commit/65dc2662287c64fdd26e7d46f5da5354e4e9eff6).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #96758 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96758/testReport)**
 for PR 21157 at commit 
[`65dc266`](https://github.com/apache/spark/commit/65dc2662287c64fdd26e7d46f5da5354e4e9eff6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #96754 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96754/testReport)**
 for PR 21157 at commit 
[`ac6c7bc`](https://github.com/apache/spark/commit/ac6c7bce70b1929d54162090a0cb5df0cf53041a).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96754/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #96754 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96754/testReport)**
 for PR 21157 at commit 
[`ac6c7bc`](https://github.com/apache/spark/commit/ac6c7bce70b1929d54162090a0cb5df0cf53041a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-28 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
> so this change would introduce a pretty big regression?

The change does introduce a regression as some namedtuples will become 
unpicklable. However, it makes pickling in PySpark more predictable and robust 
(see the linked blog posts for details).

Sidenote: `mllib.fpm` contains another case of non-picklable namedtuple -- 
"nested":

```python
>>> class A:
... class B(namedtuple("B", [])): pass
...
>>> import pickle
>>> pickle.loads(pickle.dumps(A.B))
Traceback (most recent call last):
  [...]
pickle.PicklingError: Can't pickle : it's not found as 
__main__.B
```

It is pickleable with `_hijack_namedtuple` enabled, because the namedtuple 
class is recreated during unpickling. However, I think that "nested" classes 
are an anti-pattern in Python, because the outer class is not a proper 
namespace (like locals/globals/...):

```python
>>> A.B

```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-27 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21157
  
so this change would introduce a pretty big regression?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96706/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #96706 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96706/testReport)**
 for PR 21157 at commit 
[`7f2ad87`](https://github.com/apache/spark/commit/7f2ad870dc82e058a76403099e2fc98d9f62e899).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-27 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
> So, real downside of removing this now is we disallow global scope 
namedtuple.

Importable namedtuples and their subclasses could still be used inside an 
RDD. Only the namedtuples defined in the REPL would fail to pickle once this PR 
is merged.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
Woah. Okay. Let me add some guys interested in this again (@felixcheung 
looks already here) - @ueshin, @BryanCutler, @holdenk amd @JoshRosen 

Additionally @rxin too. Here's my understanding:

Reynold, here's what's going on: this is about the namedtuple hack removal 
we added a long long while ago. This hack isn't now super crucial since 
cloudpickle can handle this by its own without this hack. If we remove this, in 
case of normal RDD operations, that named tuple should be defined in local 
scope. If they are defined in global scope, it fails to pickle in the normal 
pickle (not cloudpickle which SQL code path uses).

1. So, real downside of removing this now is we disallow global scope 
namedtuple. 

2. actual advantage of this is, that we can get rid of weird behaviours by 
this hack. For instance, see the PR description (both links 
https://superbobry.github.io/pyspark-silently-breaks-your-namedtuples.html and 
https://superbobry.github.io/tensorflowonspark-or-the-namedtuple-patch-strikes-again.html).

@superbobry, wanna add some more words?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #96706 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96706/testReport)**
 for PR 21157 at commit 
[`7f2ad87`](https://github.com/apache/spark/commit/7f2ad870dc82e058a76403099e2fc98d9f62e899).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-27 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
Reopened and rebased to be merged into the 3.X branch. See discussion in 
#21180. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-27 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
Closing in favour of #21180.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-27 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21157
  
agree we should avoid removing test code


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
Please go ahead if there's another approach to avoid to remove but fix it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
One improvement we can make is change the patch to bypass namedtuples which 
are importable. This would resolve the issues with namedtuples coming from 
third-party libraries. I can open a new PR doing this, wdyt?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
Yea, my point is that it breaks other codes without a warning at all. We 
already have the copy of cloudpickle. The best should be a deduplicated fix for 
it, shouldn't it?

I am still solid -1 on the complete removal for Spark 2.x. We should find 
another way first for now. Removing out is the last resort.

I would consider the complete removal in Spark 3.x after having sufficient 
discussions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89883/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #89883 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89883/testReport)**
 for PR 21157 at commit 
[`c67ce29`](https://github.com/apache/spark/commit/c67ce29a3279073812070e6ff4bb2e2624961b36).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #89883 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89883/testReport)**
 for PR 21157 at commit 
[`c67ce29`](https://github.com/apache/spark/commit/c67ce29a3279073812070e6ff4bb2e2624961b36).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
Yes, we can backport some of the cloudpickle code to make the patch less 
fragile. This would be a nontrivial change in an already complex code, but I'd 
be happy to sketch this if there's a consensus on the ML

Also, note that even without the patch it is possible to have an RDD of 
namedtuples as long as the namedtuple classes are defined inside an importable 
module, i.e. NOT inside a function/REPL.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89878/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #89878 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89878/testReport)**
 for PR 21157 at commit 
[`67c4f67`](https://github.com/apache/spark/commit/67c4f6707aab670b9cde5a3afa34fda3abbbf46d).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
I don't like the hack too but the complete removal just basically means we 
are going to drop namedtuple supports in RDD without, for example, any 
deprecation warnings. Spark is being super conservative and this's going to 
break compatibility. So, I was thinking we could do this for Spark 3.0. We 
already start to talk about this.

This should probably be something we should discuss in the mailing list 
since it's a breaking change. One thing clear is that the complete removal 
should target 3.0.0.

For now, yea, `cloudpickle` solves it but probably it's less performant.

Logically, `cloudpickle` [fixed 
it](https://github.com/cloudpipe/cloudpickle/pull/113) (our cloudpickle copy is 
matched to 
[0.4.3](https://github.com/cloudpipe/cloudpickle/releases/tag/v0.4.3)) and we 
can take after the fix with the normal pickle side, aren't we?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #89878 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89878/testReport)**
 for PR 21157 at commit 
[`67c4f67`](https://github.com/apache/spark/commit/67c4f6707aab670b9cde5a3afa34fda3abbbf46d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
> Does the test even pass?

The tests should pass module the tests specifically checking the behaviour 
being removed. I think the failing RDD test is in this group as well. 

> Why don't we try to fix it rather than removing out? 

I might be overly pessimistic but I don't see how we can make the patch 
work in all cases without making the implementation more magical, and as a 
result, producing even more confusing error messages when things go wrong. 
Consider, for instance, a widespread pattern

```python
class Foo(namedtuple("Foo", [])):
def foo(self):
return 42
```

If the outer `Foo` class does not explicitly customize pickling, it would 
use the "fallback" implementation added by `_hijack_namedtuple`, which only 
knows about the inner namedtuple class. Therefore, confusingly enough 
`issubclass(pickle.loads(foo), Foo)` is False (as detailed in [2]).

What can we do about this? We somehow need to serialize the **full 
definition** of the outer `Foo` class alongside every instance. Maybe this can 
be done by recursively pickling the class `__name__`, `__bases__` and 
`__dict__`, but `__dict__` could have some other hard-to-pickle objects like 
user-defined methods. Should we serialize these in the deconstructed form as 
well? These are tough questions, and I think they are better left outside the 
scope of PySpark. 

That said, I think an alternative to completely removing the patch might be 
deprecating it, and advertizing `cloudpickle` for workloads using namedtuples 
(or even making it the default?). I've played with `cloudpickle` a little bit, 
and it seems to solve the aforementioned issues in a consistent manner. The 
price, however, is the added overhead:

```python
>>> len(pickle.dumps(Foo()))
23
>>> len(cloudpickle.dumps(Foo()))
3538
```

or, even more extreme,

```python
>>> class A: pass
...
>>> len(cloudpickle.dumps(A()))
177
```

What do you think?

[2]: 
https://superbobry.github.io/tensorflowonspark-or-the-namedtuple-patch-strikes-again.html


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
Let's think about other ways to fix them until 3.0.0. I think the complete 
removal is the last resort we could consider for 3.0.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
Solid -1 if it breaks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89865/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #89865 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89865/testReport)**
 for PR 21157 at commit 
[`eadc0c8`](https://github.com/apache/spark/commit/eadc0c8af853a57ee80f5e80fb708451931eedc0).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21157
  
**[Test build #89865 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89865/testReport)**
 for PR 21157 at commit 
[`eadc0c8`](https://github.com/apache/spark/commit/eadc0c8af853a57ee80f5e80fb708451931eedc0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
Why don't we try to fix it rather than removing out? Does the test even 
pass?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21157
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org