[GitHub] spark issue #19321: [SPARK-22100] [SQL] Make percentile_approx support numer...

2017-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19321 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19321: [SPARK-22100] [SQL] Make percentile_approx support numer...

2017-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19321 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82101/ Test FAILed. ---

[GitHub] spark issue #19321: [SPARK-22100] [SQL] Make percentile_approx support numer...

2017-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19321 **[Test build #82101 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82101/testReport)** for PR 19321 at commit

[GitHub] spark issue #19329: [SPARK-22110][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19329 **[Test build #82105 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82105/testReport)** for PR 19329 at commit

[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...

2017-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9207 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #19329: [SPARK-22110][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19329 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...

2017-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9207 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82102/ Test FAILed. ---

[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...

2017-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9207 **[Test build #82102 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82102/consoleFull)** for PR 9207 at commit

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140626955 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -51,10 +51,12 @@ case class

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140626292 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -51,10 +51,12 @@ case class

[GitHub] spark issue #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter pandas_u...

2017-09-22 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19325 Most looks pretty good. Only main question I have is about the empty partition issue. --- - To unsubscribe, e-mail:

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140626184 --- Diff: python/pyspark/sql/tests.py --- @@ -3344,6 +3342,22 @@ def test_vectorized_udf_wrong_return_type(self):

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140626154 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -51,10 +51,12 @@ case class

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140626051 --- Diff: python/pyspark/worker.py --- @@ -80,14 +77,12 @@ def wrap_pandas_udf(f, return_type): arrow_return_type = toArrowType(return_type)

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140625992 --- Diff: python/pyspark/sql/tests.py --- @@ -3344,6 +3342,22 @@ def test_vectorized_udf_wrong_return_type(self):

[GitHub] spark issue #19329: [SPARK-22110][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19329 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19329: [SPARK-22110][SQL][Documentation] Add usage and i...

2017-09-22 Thread kevinyu98
GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/19329 [SPARK-22110][SQL][Documentation] Add usage and improve documentation with arguments and examples for trim function ## What changes were proposed in this pull request? This PR proposes

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2017-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #82104 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82104/testReport)** for PR 13599 at commit

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19328 Why you close it? You can just edit the PR title. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark pull request #19328: [SPARK-22088][SQL][Documentation] Add usage and i...

2017-09-22 Thread kevinyu98
Github user kevinyu98 closed the pull request at: https://github.com/apache/spark/pull/19328 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread kevinyu98
Github user kevinyu98 commented on the issue: https://github.com/apache/spark/pull/19328 I am so sorry that I made mistake on the jira number, I create a new jira SPARK-22110, but I used the wrong number, let me close this PR, then put correct jira number. ---

[GitHub] spark issue #19317: [SPARK-22098][CORE] Add new method aggregateByKeyLocally...

2017-09-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19317 Oh I get your point. This is different from `RDD.aggregate`, it directly return Map and avoid shuffling. it seems useful when numKeys is small. But, I check the final `reduce` step, it

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2017-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2017-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82100/ Test FAILed. ---

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2017-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #82100 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82100/testReport)** for PR 19222 at commit

[GitHub] spark issue #13143: [SPARK-15359] [Mesos] Mesos dispatcher should handle DRI...

2017-09-22 Thread ArtRand
Github user ArtRand commented on the issue: https://github.com/apache/spark/pull/13143 Hello @devaraj-kavali. Yes. I've been playing around with this because it's inconvenient to clean up ZK whenever you uninstall/reinstall the Dispatcher. The problem is that the only signal of a

[GitHub] spark issue #19320: [SPARK-22099] The 'job ids' list style needs to be chang...

2017-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19320 **[Test build #82103 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82103/testReport)** for PR 19320 at commit

[GitHub] spark issue #19320: [SPARK-22099] The 'job ids' list style needs to be chang...

2017-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19320 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19272: [Spark-21842][Mesos] Support Kerberos ticket rene...

2017-09-22 Thread ArtRand
Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/19272#discussion_r140625136 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCredentialRenewer.scala --- @@ -0,0 +1,150 @@ +/* +

[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...

2017-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9207 **[Test build #82102 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82102/consoleFull)** for PR 9207 at commit

[GitHub] spark issue #19326: [SPARK-22107] Change as to alias in python quickstart

2017-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19326 How does it relate to https://github.com/apache/spark/pull/19283? --- - To unsubscribe, e-mail:

[GitHub] spark pull request #19303: [SPARK-22085][CORE]When the application has no co...

2017-09-22 Thread 10110346
Github user 10110346 closed the pull request at: https://github.com/apache/spark/pull/19303 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19328 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82098/ Test PASSed. ---

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19328 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19328 **[Test build #82098 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82098/testReport)** for PR 19328 at commit

[GitHub] spark issue #19321: [SPARK-22100] [SQL] Make percentile_approx support numer...

2017-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19321 **[Test build #82101 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82101/testReport)** for PR 19321 at commit

[GitHub] spark issue #19295: [SPARK-22080][SQL] Adds support for allowing user to add...

2017-09-22 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19295 ping @cloud-fan @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19321: [SPARK-22100] [SQL] Make percentile_approx support numer...

2017-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19321 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82099/ Test FAILed. ---

[GitHub] spark issue #19321: [SPARK-22100] [SQL] Make percentile_approx support numer...

2017-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19321 **[Test build #82099 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82099/testReport)** for PR 19321 at commit

[GitHub] spark issue #19321: [SPARK-22100] [SQL] Make percentile_approx support numer...

2017-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19321 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19295: [SPARK-22080][SQL] Adds support for allowing user to add...

2017-09-22 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19295 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter pandas_u...

2017-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19325 To me, I am willing to merge this one soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140622826 --- Diff: python/pyspark/serializers.py --- @@ -246,15 +243,9 @@ def cast_series(s, t): def loads(self, obj): """

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140623893 --- Diff: python/pyspark/worker.py --- @@ -80,14 +77,12 @@ def wrap_pandas_udf(f, return_type): arrow_return_type = toArrowType(return_type)

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140623560 --- Diff: python/pyspark/sql/functions.py --- @@ -2183,14 +2183,29 @@ def pandas_udf(f=None, returnType=StringType()): :param f: python

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140624294 --- Diff: python/pyspark/sql/functions.py --- @@ -2183,14 +2183,29 @@ def pandas_udf(f=None, returnType=StringType()): :param f: python

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140623030 --- Diff: python/pyspark/sql/tests.py --- @@ -3256,11 +3256,9 @@ def test_vectorized_udf_null_string(self): def

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140623282 --- Diff: python/pyspark/sql/tests.py --- @@ -3308,12 +3306,12 @@ def test_vectorized_udf_invalid_length(self): from pyspark.sql.functions

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140623236 --- Diff: python/pyspark/sql/tests.py --- @@ -3308,12 +3306,12 @@ def test_vectorized_udf_invalid_length(self): from pyspark.sql.functions

[GitHub] spark pull request #19295: [SPARK-22080][SQL] Adds support for allowing user...

2017-09-22 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19295#discussion_r140624191 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala --- @@ -28,12 +28,18 @@ class SparkOptimizer(

[GitHub] spark issue #19295: [SPARK-22080][SQL] Adds support for allowing user to add...

2017-09-22 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19295 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19295: [SPARK-22080][SQL] Adds support for allowing user...

2017-09-22 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19295#discussion_r140624028 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala --- @@ -28,12 +28,18 @@ class SparkOptimizer(

[GitHub] spark pull request #19295: [SPARK-22080][SQL] Adds support for allowing user...

2017-09-22 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19295#discussion_r140623965 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ExperimentalMethods.scala --- @@ -44,11 +44,14 @@ class ExperimentalMethods private[sql]() {

[GitHub] spark pull request #19295: [SPARK-22080][SQL] Adds support for allowing user...

2017-09-22 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19295#discussion_r140623955 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala --- @@ -28,12 +28,18 @@ class SparkOptimizer(

[GitHub] spark pull request #19295: [SPARK-22080][SQL] Adds support for allowing user...

2017-09-22 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19295#discussion_r140624052 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLContextSuite.scala --- @@ -78,8 +82,14 @@ class SQLContextSuite extends SparkFunSuite with

[GitHub] spark issue #19317: [SPARK-22098][CORE] Add new method aggregateByKeyLocally...

2017-09-22 Thread ConeyLiu
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/19317 @jiangxb1987 ,@WeichenXu123, thanks for your reviewing. This change is inspired by the `TODO List`. You can see the follow code snippet: ```scala // TODO: Calling aggregateByKey and

[GitHub] spark issue #18936: [SPARK-21688][ML][MLLIB] make native BLAS the first choi...

2017-09-22 Thread VinceShieh
Github user VinceShieh commented on the issue: https://github.com/apache/spark/pull/18936 Hi Sean, sorry for late reply. Yeah, actually we do have some performance data on F2J vs. OpenBLAS. It seems there is no performance gain from openblas, not even on the unit test level. We are

[GitHub] spark issue #19317: [SPARK-22098][CORE] Add new method aggregateByKeyLocally...

2017-09-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19317 Yes. I guess the perf gain is because, this PR use local hashmap which can use unlimited memory, but current spark aggregation impl, will auto spill local hashmap when exceeding a threshold.

[GitHub] spark issue #19317: [SPARK-22098][CORE] Add new method aggregateByKeyLocally...

2017-09-22 Thread VinceShieh
Github user VinceShieh commented on the issue: https://github.com/apache/spark/pull/19317 Nice catch. thanks. the perf gain is truly narrow. I believe this impl just tried to align with the impl of 'reduceByKeyLocally'. @ConeyLiu maybe we should revisit the code, along with

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2017-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #82100 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82100/testReport)** for PR 19222 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2017-09-22 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19222 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82097/ Test PASSed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #82097 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82097/testReport)** for PR 16578 at commit

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140622457 --- Diff: python/pyspark/sql/functions.py --- @@ -2183,14 +2183,29 @@ def pandas_udf(f=None, returnType=StringType()): :param f: python

[GitHub] spark issue #19317: [SPARK-22098][CORE] Add new method aggregateByKeyLocally...

2017-09-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19317 And I have to point out that your impl have high risk causing OOM. The current impl will auto spill when local hashmap is too large and can take advantage of spark auto memory management

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140622410 --- Diff: python/pyspark/sql/functions.py --- @@ -2183,14 +2183,29 @@ def pandas_udf(f=None, returnType=StringType()): :param f: python function if

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19328 Yea .. let's open a separate JIRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19229 @viirya Yeah the perf gap I only focus on `mean` which can take advantage of codegen. --- - To unsubscribe, e-mail:

[GitHub] spark pull request #19278: [SPARK-22060][ML] Fix CrossValidator/TrainValidat...

2017-09-22 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19278 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-22 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 Yeah, I think that fix should work for the strategy `Imputer.mean` because `Imputer.mean` aggregates many columns at once now and that can be a too large gen'd code for aggregation. For the

[GitHub] spark issue #19278: [SPARK-22060][ML] Fix CrossValidator/TrainValidationSpli...

2017-09-22 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19278 LGTM Merging with master Thanks @WeichenXu123 for the fix and for testing for backwards compatibility! --- - To

[GitHub] spark pull request #19278: [SPARK-22060][ML] Fix CrossValidator/TrainValidat...

2017-09-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19278#discussion_r140621871 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/TrainValidationSplitSuite.scala --- @@ -160,11 +160,13 @@ class TrainValidationSplitSuite

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-09-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r140621444 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,36 +462,55 @@ final class OnlineLDAOptimizer extends

[GitHub] spark issue #19286: [SPARK-21338][SQL][FOLLOW-UP] Implement isCascadingTrunc...

2017-09-22 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19286 ping @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19328 I think you should create a new JIRA, instead of using SPARK-22088 which is for the wrong style issue. --- - To unsubscribe,

[GitHub] spark issue #19321: [SPARK-22100] [SQL] Make percentile_approx support numer...

2017-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19321 **[Test build #82099 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82099/testReport)** for PR 19321 at commit

[GitHub] spark issue #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter pandas_u...

2017-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19325 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter pandas_u...

2017-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19325 **[Test build #82096 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82096/testReport)** for PR 19325 at commit

[GitHub] spark issue #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter pandas_u...

2017-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19325 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82096/ Test FAILed. ---

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19328 **[Test build #82098 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82098/testReport)** for PR 19328 at commit

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19328 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19328 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19328: [SPARK-22088][SQL][Documentation] Add usage and i...

2017-09-22 Thread kevinyu98
GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/19328 [SPARK-22088][SQL][Documentation] Add usage and improve documentation with arguments and examples for trim function ## What changes were proposed in this pull request? This PR proposes

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140605142 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala --- @@ -146,7 +146,13 @@ case class

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140608418 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala --- @@ -89,61 +89,124 @@ class

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140615004 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala --- @@ -89,61 +89,124 @@ class

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140616618 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala --- @@ -216,22 +229,51 @@ case class

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140614146 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala --- @@ -89,61 +89,124 @@ class

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140605854 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala --- @@ -324,17 +367,34 @@ case class

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140617927 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala --- @@ -216,22 +229,51 @@ case class

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140614325 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala --- @@ -89,61 +89,124 @@ class

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140608463 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala --- @@ -89,61 +89,124 @@ class

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140612077 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala --- @@ -329,6 +392,27 @@ class

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140607924 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala --- @@ -89,61 +89,124 @@ class

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140611178 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala --- @@ -329,6 +392,27 @@ class

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140612202 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala --- @@ -89,61 +89,124 @@ class

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140605588 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala --- @@ -324,17 +367,34 @@ case class

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140614816 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala --- @@ -89,61 +89,124 @@ class

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140617841 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala --- @@ -216,22 +229,51 @@ case class

  1   2   3   4   >