[GitHub] spark pull request: [SPARK-14859][PYSPARK] Make Lambda Serializer ...

2016-04-25 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/12620#issuecomment-214439666
  
Can we support dill directly and have a flag to choose from the two 
serializer? cloud-pickler could be the default one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14859][PYSPARK] Make Lambda Serializer ...

2016-04-25 Thread njwhite
Github user njwhite commented on the pull request:

https://github.com/apache/spark/pull/12620#issuecomment-214210492
  
@davies I'm using this to use the "dill" serializer, as it can pickle more 
things (and allows more fine-grained control) than the cloud-pickle serializer. 
What about making that the default for functions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14859][PYSPARK] Make Lambda Serializer ...

2016-04-24 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/12620#issuecomment-214147661
  
@njwhite We still use PickleSerializer to deserialize the functions, so it 
means the serializer MUST be compatible with Pickle, I'm not sure make it 
configurable will be really helpful (not a good API interface).

If you really want to hack it in your case, I think you could have many 
ways to hack it in Python.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14859][PYSPARK] Make Lambda Serializer ...

2016-04-24 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/12620#issuecomment-214104132
  
If we do end up adding this we would probably want to add a test of using a 
custom serializer (but maybe don't rush to do this since I think if we want to 
expose this is maybe not yet clear).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14859][PYSPARK] Make Lambda Serializer ...

2016-04-24 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/12620#issuecomment-214103364
  
Is this functionality we want to add? cc @davies ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14859][PYSPARK] Make Lambda Serializer ...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12620#issuecomment-213586086
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14859][PYSPARK] Make Lambda Serializer ...

2016-04-22 Thread njwhite
GitHub user njwhite opened a pull request:

https://github.com/apache/spark/pull/12620

[SPARK-14859][PYSPARK] Make Lambda Serializer Configurable

## What changes were proposed in this pull request?

Store the serializer that we should use to serialize RDD transformation
functions on the SparkContext, defaulting to a CloudPickleSerializer if not
given. Allow a user to change this serializer when first constructing the
SparkContext.

## How was this patch tested?

Unit tests and manual integration tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/njwhite/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12620.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12620


commit 0a3a7c168c6b262671a14c02c16aec3207ce9ee0
Author: Nick White 
Date:   2016-04-22T20:53:20Z

[SPARK-14859][PYSPARK] Make Lambda Serializer Configurable

Store the serializer that we should use to serialize RDD transformation
functions on the SparkContext, defaulting to a CloudPickleSerializer if not
given. Allow a user to change this serializer when first constructing the
SparkContext.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org