[GitHub] spark pull request: [SPARK-10685] [SPARK-8632] [SQL] [PYSPARK] Pyt...

2015-09-23 Thread davies
Github user davies closed the pull request at: https://github.com/apache/spark/pull/8833 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-10685] [SPARK-8632] [SQL] [PYSPARK] Pyt...

2015-09-22 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/8833#issuecomment-142423117 Based on some extended offline discussion / debate and code-review, we've decided to merge #8835 instead of this fix. The basic approaches in both patches are the

[GitHub] spark pull request: [SPARK-10685] [SPARK-8632] [SQL] [PYSPARK] Pyt...

2015-09-21 Thread justinuang
Github user justinuang commented on a diff in the pull request: https://github.com/apache/spark/pull/8833#discussion_r40048503 --- Diff: python/pyspark/sql/functions.py --- @@ -1414,7 +1414,7 @@ def __init__(self, func, returnType, name=None): def _create_judf(self, name):

[GitHub] spark pull request: [SPARK-10685] [SPARK-8632] [SQL] [PYSPARK] Pyt...

2015-09-21 Thread justinuang
Github user justinuang commented on the pull request: https://github.com/apache/spark/pull/8833#issuecomment-142161491 lgtm! So this avoids deadlock by getting rid of the blocking queue (duh!) and then assumes the OS buffer will rate limit how much gets buffered on the writer side?

[GitHub] spark pull request: [SPARK-10685] [SPARK-8632] [SQL] [PYSPARK] Pyt...

2015-09-20 Thread justinuang
Github user justinuang commented on a diff in the pull request: https://github.com/apache/spark/pull/8833#discussion_r39933648 --- Diff: python/pyspark/sql/functions.py --- @@ -1414,7 +1414,7 @@ def __init__(self, func, returnType, name=None): def _create_judf(self, name):

[GitHub] spark pull request: [SPARK-10685] [SPARK-8632] [SQL] [PYSPARK] Pyt...

2015-09-20 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8833#discussion_r39935553 --- Diff: python/pyspark/sql/functions.py --- @@ -1414,7 +1414,7 @@ def __init__(self, func, returnType, name=None): def _create_judf(self, name):

[GitHub] spark pull request: [SPARK-10685] [SPARK-8632] [SQL] [PYSPARK] Pyt...

2015-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8833#issuecomment-141701004 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-10685] [SPARK-8632] [SQL] [PYSPARK] Pyt...

2015-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8833#issuecomment-141700988 [Test build #42714 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42714/console) for PR 8833 at commit

[GitHub] spark pull request: [SPARK-10685] [SPARK-8632] [SQL] [PYSPARK] Pyt...

2015-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8833#issuecomment-141701003 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-10685] [SPARK-8632] [SQL] [PYSPARK] Pyt...

2015-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8833#issuecomment-141708146 [Test build #1775 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1775/consoleFull) for PR 8833 at commit

[GitHub] spark pull request: [SPARK-10685] [SPARK-8632] [SQL] [PYSPARK] Pyt...

2015-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8833#issuecomment-141712761 [Test build #1775 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1775/console) for PR 8833 at commit

[GitHub] spark pull request: [SPARK-10685] [SPARK-8632] [SQL] [PYSPARK] Pyt...

2015-09-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/8833#discussion_r39917194 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/pythonUDFs.scala --- @@ -338,7 +338,11 @@ case class BatchPythonEvaluation(udf: PythonUDF,

[GitHub] spark pull request: [SPARK-10685] [SPARK-8632] [SQL] [PYSPARK] Pyt...

2015-09-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8833#discussion_r39921623 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/pythonUDFs.scala --- @@ -338,7 +338,11 @@ case class BatchPythonEvaluation(udf: PythonUDF,

[GitHub] spark pull request: [SPARK-10685] [SPARK-8632] [SQL] [PYSPARK] Pyt...

2015-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8833#issuecomment-141691813 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-10685] [SPARK-8632] [SQL] [PYSPARK] Pyt...

2015-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8833#issuecomment-141691826 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-10685] [SPARK-8632] [SQL] [PYSPARK] Pyt...

2015-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8833#issuecomment-141693095 [Test build #42714 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42714/consoleFull) for PR 8833 at commit

[GitHub] spark pull request: [SPARK-10685] [SPARK-8632] [SQL] [PYSPARK] Pyt...

2015-09-18 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/8833 [SPARK-10685] [SPARK-8632] [SQL] [PYSPARK] Python UDF should only compute the upstream once This PR changes to buffer the rows from upstream into a Queue, then zip them with result from Python UDF,