[GitHub] spark pull request: [SPARK-14859][PYSPARK] Make Lambda Serializer ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12620#issuecomment-214439666 Can we support dill directly and have a flag to choose from the two serializer? cloud-pickler could be the default one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14859][PYSPARK] Make Lambda Serializer ...
Github user njwhite commented on the pull request: https://github.com/apache/spark/pull/12620#issuecomment-214210492 @davies I'm using this to use the "dill" serializer, as it can pickle more things (and allows more fine-grained control) than the cloud-pickle serializer. What about making that the default for functions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14859][PYSPARK] Make Lambda Serializer ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12620#issuecomment-214147661 @njwhite We still use PickleSerializer to deserialize the functions, so it means the serializer MUST be compatible with Pickle, I'm not sure make it configurable will be really helpful (not a good API interface). If you really want to hack it in your case, I think you could have many ways to hack it in Python. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14859][PYSPARK] Make Lambda Serializer ...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/12620#issuecomment-214104132 If we do end up adding this we would probably want to add a test of using a custom serializer (but maybe don't rush to do this since I think if we want to expose this is maybe not yet clear). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14859][PYSPARK] Make Lambda Serializer ...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/12620#issuecomment-214103364 Is this functionality we want to add? cc @davies ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14859][PYSPARK] Make Lambda Serializer ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12620#issuecomment-213586086 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14859][PYSPARK] Make Lambda Serializer ...
GitHub user njwhite opened a pull request: https://github.com/apache/spark/pull/12620 [SPARK-14859][PYSPARK] Make Lambda Serializer Configurable ## What changes were proposed in this pull request? Store the serializer that we should use to serialize RDD transformation functions on the SparkContext, defaulting to a CloudPickleSerializer if not given. Allow a user to change this serializer when first constructing the SparkContext. ## How was this patch tested? Unit tests and manual integration tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/njwhite/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12620.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12620 commit 0a3a7c168c6b262671a14c02c16aec3207ce9ee0 Author: Nick WhiteDate: 2016-04-22T20:53:20Z [SPARK-14859][PYSPARK] Make Lambda Serializer Configurable Store the serializer that we should use to serialize RDD transformation functions on the SparkContext, defaulting to a CloudPickleSerializer if not given. Allow a user to change this serializer when first constructing the SparkContext. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org