[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19929 Could you change the JIRA number to https://issues.apache.org/jira/browse/SPARK-22901 ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19929 @gatorsmile, yes, the reason why seed doesn't work is in the way Python UDFs are executed, i.e. a new python process is created for each partition to evaluate the Python UDF. Thus the seed is set only on the driver, but not in the process where the UDF is executed. What I am saying can be easily confirmed by this: ``` >>> from pyspark.sql.functions import udf >>> import os >>> pid_udf = udf(lambda: str(os.getpid())) >>> spark.range(2).select(pid_udf()).show() +--+ |()| +--+ | 4132| | 4130| +--+ >>> os.getpid() 4070 ``` Therefore there is no easy way to set the seed. If I set it inside the UDF, the UDF would become deterministic. Therefore I think that the best option is the current test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19929 Any update on https://github.com/apache/spark/pull/19929/files/cc309b0ce2496365afd8c602c282e3d84aeed940#r158579661? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19929 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85359/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19929 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19929 **[Test build #85359 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85359/testReport)** for PR 19929 at commit [`47801c7`](https://github.com/apache/spark/commit/47801c7dc532aa9a19d59cdef1fe021c61a0b2c8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19929 **[Test build #85359 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85359/testReport)** for PR 19929 at commit [`47801c7`](https://github.com/apache/spark/commit/47801c7dc532aa9a19d59cdef1fe021c61a0b2c8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19929 **[Test build #85358 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85358/testReport)** for PR 19929 at commit [`a40ba73`](https://github.com/apache/spark/commit/a40ba7384db1030b6facb14b741349da09562d1f). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19929 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19929 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85358/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19929 **[Test build #85358 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85358/testReport)** for PR 19929 at commit [`a40ba73`](https://github.com/apache/spark/commit/a40ba7384db1030b6facb14b741349da09562d1f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19929 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19929 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85337/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19929 **[Test build #85337 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85337/testReport)** for PR 19929 at commit [`462a92a`](https://github.com/apache/spark/commit/462a92a4237deb63d0b7128ff3585bb7595692fe). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19929 **[Test build #85337 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85337/testReport)** for PR 19929 at commit [`462a92a`](https://github.com/apache/spark/commit/462a92a4237deb63d0b7128ff3585bb7595692fe). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19929 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85322/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19929 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19929 **[Test build #85322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85322/testReport)** for PR 19929 at commit [`cc309b0`](https://github.com/apache/spark/commit/cc309b0ce2496365afd8c602c282e3d84aeed940). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19929 **[Test build #85322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85322/testReport)** for PR 19929 at commit [`cc309b0`](https://github.com/apache/spark/commit/cc309b0ce2496365afd8c602c282e3d84aeed940). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19929 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19929 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19929 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85316/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19929 **[Test build #85316 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85316/testReport)** for PR 19929 at commit [`cc309b0`](https://github.com/apache/spark/commit/cc309b0ce2496365afd8c602c282e3d84aeed940). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19929 **[Test build #85316 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85316/testReport)** for PR 19929 at commit [`cc309b0`](https://github.com/apache/spark/commit/cc309b0ce2496365afd8c602c282e3d84aeed940). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19929 thank you @cloud-fan, changed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19929 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19929 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85309/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19929 **[Test build #85309 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85309/testReport)** for PR 19929 at commit [`187ff9a`](https://github.com/apache/spark/commit/187ff9a22edecdca582893b2fd836a343972f68b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19929 `UDFRegistration.registerFunction` needs a minor update for the log --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19929 @gatorsmile I added the test, but I didn't get what needs to be updated in `registerPython`. May you explain me please? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19929 **[Test build #85309 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85309/testReport)** for PR 19929 at commit [`187ff9a`](https://github.com/apache/spark/commit/187ff9a22edecdca582893b2fd836a343972f68b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19929 We need test cases. Manual tests are not enough. I will try to review this tomorrow. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19929 kindly ping @cloud-fan @gatorsmile @HyukjinKwon @zero323 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19929 cc @cloud-fan @HyukjinKwon @zero323 maybe you can help too reviewing this, thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19929 @gatorsmile sorry, I saw that you did the path for scala UDF. Might you help reviewing this please? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19929 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19929 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84660/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19929 **[Test build #84660 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84660/testReport)** for PR 19929 at commit [`6187d5a`](https://github.com/apache/spark/commit/6187d5a0df7c409a49cd636eb74dea9323044c6b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19929 **[Test build #84660 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84660/testReport)** for PR 19929 at commit [`6187d5a`](https://github.com/apache/spark/commit/6187d5a0df7c409a49cd636eb74dea9323044c6b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org