[GitHub] spark issue #20416: [SPARK-23248][PYTHON][EXAMPLES] Relocate module docstrin...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20416 **[Test build #86727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86727/testReport)** for PR 20416 at commit

[GitHub] spark issue #20369: [SPARK-23196] Unify continuous and microbatch V2 sinks

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20369 **[Test build #86735 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86735/testReport)** for PR 20369 at commit

[GitHub] spark issue #20396: [SPARK-23217][ML] Add cosine distance measure to Cluster...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20396 **[Test build #86731 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86731/testReport)** for PR 20396 at commit

[GitHub] spark issue #20409: [SPARK-23233][PYTHON] Reset the cache in asNondeterminis...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20409 **[Test build #86729 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86729/testReport)** for PR 20409 at commit

[GitHub] spark issue #20375: [SPARK-23199][SQL]improved Removes repetition from group...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20375 **[Test build #86732 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86732/testReport)** for PR 20375 at commit

[GitHub] spark issue #20416: [SPARK-23248][PYTHON][EXAMPLES] Relocate module docstrin...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20416 **[Test build #86727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86727/testReport)** for PR 20416 at commit

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20146 **[Test build #86734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86734/testReport)** for PR 20146 at commit

[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20403 **[Test build #86730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86730/testReport)** for PR 20403 at commit

[GitHub] spark issue #20369: [SPARK-23196] Unify continuous and microbatch V2 sinks

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20369 **[Test build #86733 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86733/testReport)** for PR 20369 at commit

[GitHub] spark issue #20414: [SPARK-23243][SQL] Shuffle+Repartition on an RDD could l...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20414 **[Test build #86728 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86728/testReport)** for PR 20414 at commit

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20146 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20415 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20369: [SPARK-23196] Unify continuous and microbatch V2 sinks

2018-01-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20369 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20146 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/308/

[GitHub] spark issue #20396: [SPARK-23217][ML] Add cosine distance measure to Cluster...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20396 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20396: [SPARK-23217][ML] Add cosine distance measure to Cluster...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20396 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/307/

[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20403 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20409: [SPARK-23233][PYTHON] Reset the cache in asNondeterminis...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20409 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/305/

[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20403 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/306/

[GitHub] spark issue #20409: [SPARK-23233][PYTHON] Reset the cache in asNondeterminis...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20409 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20414: [SPARK-23243][SQL] Shuffle+Repartition on an RDD could l...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20414 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20414: [SPARK-23243][SQL] Shuffle+Repartition on an RDD could l...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20414 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/304/

[GitHub] spark issue #20416: [SPARK-23248][PYTHON][EXAMPLES] Relocate module docstrin...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20416 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20415 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20416: [SPARK-23248][PYTHON][EXAMPLES] Relocate module docstrin...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20416 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/303/

[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-27 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20403 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-27 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20403 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20367: [SPARK-23166][ML] Add maxDF Parameter to CountVectorizer

2018-01-27 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20367 LGTM, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20375: [SPARK-23199][SQL]improved Removes repetition from group...

2018-01-27 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20375 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #20367: [SPARK-23166][ML] Add maxDF Parameter to CountVec...

2018-01-27 Thread ymazari
Github user ymazari commented on a diff in the pull request: https://github.com/apache/spark/pull/20367#discussion_r164275764 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -155,24 +182,48 @@ class CountVectorizer @Since("1.5.0")

[GitHub] spark pull request #19575: [SPARK-22221][DOCS] Adding User Documentation for...

2018-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19575#discussion_r164275737 --- Diff: docs/sql-programming-guide.md --- @@ -1640,6 +1640,133 @@ Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` a

[GitHub] spark pull request #20367: [SPARK-23166][ML] Add maxDF Parameter to CountVec...

2018-01-27 Thread ymazari
Github user ymazari commented on a diff in the pull request: https://github.com/apache/spark/pull/20367#discussion_r164275714 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -155,24 +182,47 @@ class CountVectorizer @Since("1.5.0")

[GitHub] spark pull request #20367: [SPARK-23166][ML] Add maxDF Parameter to CountVec...

2018-01-27 Thread ymazari
Github user ymazari commented on a diff in the pull request: https://github.com/apache/spark/pull/20367#discussion_r164275722 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -155,24 +182,47 @@ class CountVectorizer @Since("1.5.0")

[GitHub] spark pull request #20367: [SPARK-23166][ML] Add maxDF Parameter to CountVec...

2018-01-27 Thread ymazari
Github user ymazari commented on a diff in the pull request: https://github.com/apache/spark/pull/20367#discussion_r164275721 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -155,24 +182,47 @@ class CountVectorizer @Since("1.5.0")

[GitHub] spark pull request #20367: [SPARK-23166][ML] Add maxDF Parameter to CountVec...

2018-01-27 Thread ymazari
Github user ymazari commented on a diff in the pull request: https://github.com/apache/spark/pull/20367#discussion_r164275712 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -155,24 +182,47 @@ class CountVectorizer @Since("1.5.0")

[GitHub] spark pull request #20367: [SPARK-23166][ML] Add maxDF Parameter to CountVec...

2018-01-27 Thread ymazari
Github user ymazari commented on a diff in the pull request: https://github.com/apache/spark/pull/20367#discussion_r164275697 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -155,24 +182,47 @@ class CountVectorizer @Since("1.5.0")

[GitHub] spark pull request #20367: [SPARK-23166][ML] Add maxDF Parameter to CountVec...

2018-01-27 Thread ymazari
Github user ymazari commented on a diff in the pull request: https://github.com/apache/spark/pull/20367#discussion_r164275706 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -155,24 +182,47 @@ class CountVectorizer @Since("1.5.0")

[GitHub] spark issue #20372: [SPARK-23249] Improved block merging logic for partition...

2018-01-27 Thread glentakahashi
Github user glentakahashi commented on the issue: https://github.com/apache/spark/pull/20372 Created https://issues.apache.org/jira/browse/SPARK-23249 --- - To unsubscribe, e-mail:

[GitHub] spark issue #20396: [SPARK-23217][ML] Add cosine distance measure to Cluster...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20396 **[Test build #4080 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4080/testReport)** for PR 20396 at commit

[GitHub] spark pull request #20367: [SPARK-23166][ML] Add maxDF Parameter to CountVec...

2018-01-27 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20367#discussion_r164273743 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -155,24 +182,47 @@ class CountVectorizer @Since("1.5.0")

[GitHub] spark issue #20396: [SPARK-23217][ML] Add cosine distance measure to Cluster...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20396 **[Test build #4080 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4080/testReport)** for PR 20396 at commit

[GitHub] spark pull request #20367: [SPARK-23166][ML] Add maxDF Parameter to CountVec...

2018-01-27 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20367#discussion_r164273656 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -155,24 +182,47 @@ class CountVectorizer @Since("1.5.0")

[GitHub] spark issue #20416: [SPARK-23248][PYTHON][EXAMPLES] Relocate module docstrin...

2018-01-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20416 cc @yanboliang, @ueshin, @viirya and @MLnick, could you guys check if it makes sense to you when you are available? --- -

[GitHub] spark pull request #20416: [SPARK-23248][PYTHON][EXAMPLES] Relocate module d...

2018-01-27 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/20416 [SPARK-23248][PYTHON][EXAMPLES] Relocate module docstrings to the top in PySpark examples ## What changes were proposed in this pull request? This PR proposes to relocate the

[GitHub] spark pull request #20404: [SPARK-23228][PYSPARK] Add Python Created jsparkS...

2018-01-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20404#discussion_r164269656 --- Diff: python/pyspark/sql/session.py --- @@ -225,6 +225,7 @@ def __init__(self, sparkContext, jsparkSession=None): if

[GitHub] spark issue #20383: [SPARK-23200] Reset Kubernetes-specific config on Checkp...

2018-01-27 Thread ssaavedra
Github user ssaavedra commented on the issue: https://github.com/apache/spark/pull/20383 spark-integration was created much later. I originally opened this as https://github.com/apache-spark-on-k8s/spark/pull/516 last September. However, the integration tests repo exists since

[GitHub] spark issue #20409: [SPARK-23233][PYTHON] Reset the cache in asNondeterminis...

2018-01-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20409 I was hesitant because it accesses to a lot of internal instances via JVM as I wrote in the PR description. Let me just use it as a test. ---

[GitHub] spark pull request #20410: [SPARK-23234][ML][PYSPARK] Remove setting default...

2018-01-27 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20410#discussion_r164267635 --- Diff: python/pyspark/ml/wrapper.py --- @@ -118,10 +118,9 @@ def _transfer_params_to_java(self): """ Transforms the embedded

[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...

2018-01-27 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20405#discussion_r164267184 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1216,7 +1216,7 @@ class Dataset[T] private[sql]( */

[GitHub] spark pull request #20367: [SPARK-23166][ML] Add maxDF Parameter to CountVec...

2018-01-27 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20367#discussion_r164267076 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -155,24 +182,47 @@ class CountVectorizer @Since("1.5.0")

[GitHub] spark pull request #20367: [SPARK-23166][ML] Add maxDF Parameter to CountVec...

2018-01-27 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20367#discussion_r164267079 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -155,24 +182,47 @@ class CountVectorizer @Since("1.5.0")

[GitHub] spark pull request #20367: [SPARK-23166][ML] Add maxDF Parameter to CountVec...

2018-01-27 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20367#discussion_r164267103 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -155,24 +182,47 @@ class CountVectorizer @Since("1.5.0")

[GitHub] spark pull request #20367: [SPARK-23166][ML] Add maxDF Parameter to CountVec...

2018-01-27 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20367#discussion_r164267128 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -155,24 +182,47 @@ class CountVectorizer @Since("1.5.0")

[GitHub] spark pull request #20367: [SPARK-23166][ML] Add maxDF Parameter to CountVec...

2018-01-27 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20367#discussion_r164267057 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -155,24 +182,47 @@ class CountVectorizer @Since("1.5.0")

[GitHub] spark pull request #20367: [SPARK-23166][ML] Add maxDF Parameter to CountVec...

2018-01-27 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20367#discussion_r164267118 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -155,24 +182,47 @@ class CountVectorizer @Since("1.5.0")

[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...

2018-01-27 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/20405#discussion_r164267100 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1216,7 +1216,7 @@ class Dataset[T] private[sql]( */

<    1   2