[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/20561 lgtm --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20537 **[Test build #87290 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87290/testReport)** for PR 20537 at commit

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20537 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20537 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/776/

[GitHub] spark pull request #20520: [SPARK-23344][PYTHON][ML] Add distanceMeasure par...

2018-02-10 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20520 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-10 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/20537 @HyukjinKwon no worries. Rebased. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #20520: [SPARK-23344][PYTHON][ML] Add distanceMeasure param to K...

2018-02-10 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/20520 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #20516: [SPARK-23343][CORE][TEST] Increase the exception ...

2018-02-10 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20516#discussion_r167403542 --- Diff: project/SparkBuild.scala --- @@ -792,7 +792,6 @@ object TestSettings { "JAVA_HOME" ->

[GitHub] spark pull request #20559: [SPARK-23360][SQL][PYTHON] Get local timezone fro...

2018-02-10 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20559 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20537 Sorry, @icexelloss. Mind resolving the conflict? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

2018-02-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20559 Merged to master and branch-2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87289/ Test FAILed. ---

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87289 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87289/testReport)** for PR 20566 at commit

[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...

2018-02-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20519 Hm .. yea but we can't simply flush and ignore the stdout specifically from sitecustomize unless we define a kind of an additional protocol like this because we can't simply distinguish if the

[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20560 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87288/ Test PASSed. ---

[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20560 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20560 **[Test build #87288 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87288/testReport)** for PR 20560 at commit

[GitHub] spark pull request #20569: Branch 2.2

2018-02-10 Thread MohammedLayeeq
GitHub user MohammedLayeeq opened a pull request: https://github.com/apache/spark/pull/20569 Branch 2.2 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this

[GitHub] spark pull request #20569: Branch 2.2

2018-02-10 Thread MohammedLayeeq
Github user MohammedLayeeq closed the pull request at: https://github.com/apache/spark/pull/20569 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20568 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20568 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #20568: [SPARK-23381][CORE] Murmur3 hash generates a diff...

2018-02-10 Thread mrkm4ntr
GitHub user mrkm4ntr opened a pull request: https://github.com/apache/spark/pull/20568 [SPARK-23381][CORE] Murmur3 hash generates a different value from other implementations ## What changes were proposed in this pull request? Murmur3 hash generates a different value from the

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87289/testReport)** for PR 20566 at commit

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/775/

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

2018-02-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20566#discussion_r167398469 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -864,6 +864,11 @@ trait Params extends Identifiable with Serializable {

[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer

2018-02-10 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20560 @gatorsmile thanks for your comment. I moved it to a separate rule and added more tests. As per the added value of this rule, I see 3 main points: 1. Let's imagine that a user exposes

[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20560 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/774/

[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20560 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20560 **[Test build #87288 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87288/testReport)** for PR 20560 at commit

[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20531 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20531 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87281/ Test PASSed. ---

[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20561 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20561 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87282/ Test PASSed. ---

[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20531 **[Test build #87281 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87281/testReport)** for PR 20531 at commit

[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20561 **[Test build #87282 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87282/testReport)** for PR 20561 at commit

[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20559 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20559 **[Test build #87287 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87287/testReport)** for PR 20559 at commit

[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20559 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87287/ Test PASSed. ---

[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20559 **[Test build #87287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87287/testReport)** for PR 20559 at commit

[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20559 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20559 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/773/

[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

2018-02-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20559 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][ex...

2018-02-10 Thread akonopko
Github user akonopko commented on a diff in the pull request: https://github.com/apache/spark/pull/19431#discussion_r167395493 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/DirectKafkaStreamSuite.scala --- @@ -22,6 +22,7 @@ import java.lang.{

[GitHub] spark pull request #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][ex...

2018-02-10 Thread akonopko
Github user akonopko commented on a diff in the pull request: https://github.com/apache/spark/pull/19431#discussion_r167395482 --- Diff: external/kafka-0-8/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala --- @@ -539,6 +456,58 @@ class

[GitHub] spark pull request #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][ex...

2018-02-10 Thread akonopko
Github user akonopko commented on a diff in the pull request: https://github.com/apache/spark/pull/19431#discussion_r167395492 --- Diff: external/kafka-0-8/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala --- @@ -21,6 +21,7 @@ import java.io.File

[GitHub] spark pull request #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][ex...

2018-02-10 Thread akonopko
Github user akonopko commented on a diff in the pull request: https://github.com/apache/spark/pull/19431#discussion_r167395487 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/DirectKafkaStreamSuite.scala --- @@ -687,6 +618,51 @@ class

[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20559 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20559 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87286/ Test FAILed. ---

[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20559 **[Test build #87286 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87286/testReport)** for PR 20559 at commit

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fall back to Arrow o...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20567 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fall back to Arrow o...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20567 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87284/ Test PASSed. ---

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fall back to Arrow o...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20567 **[Test build #87284 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87284/testReport)** for PR 20567 at commit

[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20559 **[Test build #87286 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87286/testReport)** for PR 20559 at commit

[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20559 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/772/

[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20559 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87285/ Test FAILed. ---

[GitHub] spark pull request #20559: [SPARK-23360][SQL][PYTHON] Get local timezone fro...

2018-02-10 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/20559#discussion_r167394718 --- Diff: python/pyspark/sql/types.py --- @@ -1766,15 +1781,13 @@ def _check_series_convert_timestamps_localize(s, from_timezone, to_timezone):

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87285 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87285/testReport)** for PR 20566 at commit

[GitHub] spark pull request #20559: [SPARK-23360][SQL][PYTHON] Get local timezone fro...

2018-02-10 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/20559#discussion_r167394720 --- Diff: python/pyspark/sql/tests.py --- @@ -2867,6 +2867,35 @@ def test_create_dataframe_required_pandas_not_found(self):

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20566 Yeah, IMHO, when the user loads a model from old version into new version to run, I think it is reasonable to run it with current default value because the param is not explicitly set and should use

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20566 @viirya that's a good question. Honestly my idea is that if the user doesn't set a value, he/she doesn't care about it, so it is good to use the new version default IMHO. But it is also true that

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fall back to Arrow o...

2018-02-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 Seems it happened to fix this case too: ```python spark.conf.set("spark.sql.execution.arrow.enabled", "false") df = spark.createDataFrame([[bytearray("a")]]) df.toPandas()

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/771/

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fall back to Arrow o...

2018-02-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 cc @ueshin, @BryanCutler and @icexelloss, could you take a look please when you have some time? --- - To unsubscribe,

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87285 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87285/testReport)** for PR 20566 at commit

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fall back to Arrow o...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20567 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fall back to Arrow o...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20567 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/770/

[GitHub] spark pull request #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

2018-02-10 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20566#discussion_r167394494 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -864,6 +864,11 @@ trait Params extends Identifiable with Serializable {

[GitHub] spark pull request #20567: [SPARK-23380][PYTHON] Make toPandas fall back to ...

2018-02-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20567#discussion_r167394432 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1941,12 +1941,24 @@ def toPandas(self): timezone = None if

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fall back to Arrow o...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20567 **[Test build #87284 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87284/testReport)** for PR 20567 at commit

[GitHub] spark pull request #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

2018-02-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20566#discussion_r167394422 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -864,6 +864,11 @@ trait Params extends Identifiable with Serializable {

[GitHub] spark pull request #20567: [SPARK-23380][PYTHON] Make toPandas fall back to ...

2018-02-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20567#discussion_r167394424 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1955,38 +1967,34 @@ def toPandas(self): return

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20566 @mgaido91 I also considered the issue of changed default values across versions. I'm not sure which is more reasonable, using old version's default value or using current version's default value.

[GitHub] spark pull request #20567: [SPARK-23380][PYTHON] Make toPandas fall back to ...

2018-02-10 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/20567 [SPARK-23380][PYTHON] Make toPandas fall back to Arrow optimization disabled when schema is not supported in Arrow optimization ## What changes were proposed in this pull request?

[GitHub] spark pull request #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

2018-02-10 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20566#discussion_r167394216 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -864,6 +864,11 @@ trait Params extends Identifiable with Serializable {

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87283/ Test FAILed. ---

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87283 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87283/testReport)** for PR 20566 at commit

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/769/

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87283 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87283/testReport)** for PR 20566 at commit

[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

2018-02-10 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20472#discussion_r167393891 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -931,7 +925,8 @@ private[spark] object RandomForest extends Logging {

[GitHub] spark pull request #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

2018-02-10 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/20566 [SPARK-23377][ML] Fixes Bucketizer with multiple columns persistence bug ## What changes were proposed in this pull request? Since 2.3, `Bucketizer` supports multiple input/output columns.

[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20561 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/768/

[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20561 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20561 **[Test build #87282 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87282/testReport)** for PR 20561 at commit

[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20561 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20531 **[Test build #87281 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87281/testReport)** for PR 20531 at commit

[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20531 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20531 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/767/

[GitHub] spark pull request #20407: [SPARK-23124][SQL] Allow to disable BroadcastNest...

2018-02-10 Thread mgaido91
Github user mgaido91 closed the pull request at: https://github.com/apache/spark/pull/20407 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...

2018-02-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20531 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #20397: [SPARK-23219][SQL]Rename ReadTask to DataReaderFa...

2018-02-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20397#discussion_r167393259 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RateStreamSourceV2.scala --- @@ -139,15 +139,15 @@ class

[GitHub] spark issue #20397: [SPARK-23219][SQL]Rename ReadTask to DataReaderFactory i...

2018-02-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20397 I am careful to say this out .. but let me leave my +0 for https://github.com/apache/spark/pull/20397#issuecomment-361345426. One option might be a similar name with `DataWriterFactory` but

[GitHub] spark pull request #20397: [SPARK-23219][SQL]Rename ReadTask to DataReaderFa...

2018-02-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20397#discussion_r167392663 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/RateSourceV2Suite.scala --- @@ -78,7 +78,7 @@ class RateSourceV2Suite

[GitHub] spark pull request #20407: [SPARK-23124][SQL] Allow to disable BroadcastNest...

2018-02-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20407#discussion_r167392901 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -156,6 +156,15 @@ object SQLConf { .booleanConf

<    1   2   3   >