[GitHub] spark pull request #21097: [SPARK-14682][ML] Provide evaluateEachIteration m...

2018-04-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21097#discussion_r184760717 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -365,6 +365,20 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #21097: [SPARK-14682][ML] Provide evaluateEachIteration m...

2018-04-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21097#discussion_r184763088 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GBTRegressor.scala --- @@ -269,6 +269,21 @@ class GBTRegressionModel private[ml

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r183866393 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r183864852 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r183865224 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r183865609 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r183864177 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r183864721 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r183865745 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r183863701 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r183865387 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

spark git commit: [SPARK-23990][ML] Instruments logging improvements - ML regression package

2018-04-24 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 83013752e -> 379bffa05 [SPARK-23990][ML] Instruments logging improvements - ML regression package ## What changes were proposed in this pull request? Instruments logging improvements - ML regression package I add an `OptionalInstrument`

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21078 LGTM Merging with master Thanks @WeichenXu123 ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

spark git commit: [SPARK-23455][ML] Default Params in ML should be saved separately in metadata

2018-04-24 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master ce7ba2e98 -> 83013752e [SPARK-23455][ML] Default Params in ML should be saved separately in metadata ## What changes were proposed in this pull request? We save ML's user-supplied params and default params as one entity in metadata.

spark git commit: [SPARK-23975][ML] Allow Clustering to take Arrays of Double as input features

2018-04-24 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 55c4ca88a -> 2a24c481d [SPARK-23975][ML] Allow Clustering to take Arrays of Double as input features ## What changes were proposed in this pull request? - Multiple possible input types is added in validateAndTransformSchema() and

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r183796459 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -86,13 +86,23 @@ private[clustering] trait KMeansParams extends Params

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r183797106 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/DatasetUtils.scala --- @@ -27,28 +26,38 @@ import org.apache.spark.sql.types.{ArrayType

[GitHub] spark pull request #20319: [SPARK-22884][ML][TESTS] ML test for StructuredSt...

2018-04-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20319#discussion_r183565892 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -20,13 +20,14 @@ package org.apache.spark.ml.clustering

[GitHub] spark pull request #20319: [SPARK-22884][ML][TESTS] ML test for StructuredSt...

2018-04-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20319#discussion_r183565968 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -65,10 +66,12 @@ class BisectingKMeansSuite

[GitHub] spark pull request #20319: [SPARK-22884][ML][TESTS] ML test for StructuredSt...

2018-04-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20319#discussion_r182180564 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/Encoders.scala --- @@ -0,0 +1,25 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #20319: [SPARK-22884][ML][TESTS] ML test for StructuredSt...

2018-04-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20319#discussion_r183566557 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -102,17 +105,14 @@ class BisectingKMeansSuite

[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

2018-04-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20633 Sorry for the pause in review. LGTM Merging with master @dbtsai I'm going to merge this since I'm worried it will collect more conflicts, but let's discuss more if needed

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r183555957 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -86,13 +86,23 @@ private[clustering] trait KMeansParams extends Params

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r183556811 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/DatasetUtils.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r183558105 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -199,6 +201,47 @@ class KMeansSuite extends SparkFunSuite

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r183558056 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -199,6 +201,47 @@ class KMeansSuite extends SparkFunSuite

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r183557655 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/DatasetUtils.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r183556424 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/DatasetUtils.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21078: [SPARK-23990][ML] Instruments logging improvement...

2018-04-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21078#discussion_r183514966 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala --- @@ -157,34 +161,55 @@ private[spark] object Instrumentation

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182925491 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -144,8 +168,23 @@ class KMeansModel private[ml] ( // TODO

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182924819 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -120,11 +123,32 @@ class KMeansModel private[ml] ( @Since("

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182925299 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -305,15 +344,45 @@ class KMeans @Since("1.5.0") ( @Si

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182924903 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -120,11 +123,32 @@ class KMeansModel private[ml] ( @Since("

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182925210 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -120,11 +123,32 @@ class KMeansModel private[ml] ( @Since("

[GitHub] spark issue #21081: [SPARK-23975][ML]Allow Clustering to take Arrays of Doub...

2018-04-19 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21081 @WeichenXu123 A generic vector class would be interesting, but that would be a big project, way out of scope of this PR. You could bring it up if that person on the dev list sends a SPIP about

[GitHub] spark pull request #21078: [SPARK-23990][ML] Instruments logging improvement...

2018-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21078#discussion_r182834828 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala --- @@ -150,3 +156,35 @@ private[spark] object Instrumentation

[GitHub] spark pull request #21078: [SPARK-23990][ML] Instruments logging improvement...

2018-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21078#discussion_r182836091 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -378,18 +378,24 @@ class LinearRegression @Since("

[GitHub] spark pull request #21078: [SPARK-23990][ML] Instruments logging improvement...

2018-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21078#discussion_r182835834 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -326,7 +326,7 @@ class LinearRegression @Since("

[GitHub] spark pull request #21078: [SPARK-23990][ML] Instruments logging improvement...

2018-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21078#discussion_r182833013 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala --- @@ -150,3 +156,35 @@ private[spark] object Instrumentation

[GitHub] spark pull request #21078: [SPARK-23990][ML] Instruments logging improvement...

2018-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21078#discussion_r182833153 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala --- @@ -150,3 +156,35 @@ private[spark] object Instrumentation

[GitHub] spark pull request #21078: [SPARK-23990][ML] Instruments logging improvement...

2018-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21078#discussion_r182833420 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala --- @@ -150,3 +156,35 @@ private[spark] object Instrumentation

[GitHub] spark pull request #21078: [SPARK-23990][ML] Instruments logging improvement...

2018-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21078#discussion_r182833731 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala --- @@ -150,3 +156,35 @@ private[spark] object Instrumentation

spark git commit: [SPARK-24026][ML] Add Power Iteration Clustering to spark.ml

2018-04-19 Thread jkbradley
K. Bradley <jos...@databricks.com> Closes #21090 from jkbradley/wangmiao1981-pic. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a471880a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a471880a Diff: http://git-wip-us.apach

[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...

2018-04-19 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21090 Thanks for reviewing this and for the LGTM @wangmiao1981 ! I'll merge with master now, with you as the primary author

[GitHub] spark pull request #21090: [SPARK-15784][ML] Add Power Iteration Clustering ...

2018-04-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21090#discussion_r182606716 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed

[GitHub] spark issue #21081: [SPARK-23975][ML]Allow Clustering to take Arrays of Doub...

2018-04-18 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21081 I hope we can apply it to other algs too. @ludatabricks is doing some refactoring which should make that easier, but we're not going for a completely general approach right away. I

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21078 Thanks for thinking through the optional logging issue! I responded in the JIRA to preserve the design discussion there: https://issues.apache.org/jira/browse/SPARK-23990

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182269723 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -123,7 +128,21 @@ class KMeansModel private[ml] ( @Since("

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182269644 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -123,7 +128,21 @@ class KMeansModel private[ml] ( @Since("

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182216309 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -90,7 +90,12 @@ private[clustering] trait KMeansParams extends Params

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182216415 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -90,7 +90,12 @@ private[clustering] trait KMeansParams extends Params

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182215434 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -123,7 +128,21 @@ class KMeansModel private[ml] ( @Since("

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182217722 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -123,7 +128,21 @@ class KMeansModel private[ml] ( @Since("

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r182215639 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -123,7 +128,21 @@ class KMeansModel private[ml] ( @Since("

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2018-04-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15770 OK sorry to push @wangmiao1981 ! I just want to make sure this gets in before I no longer have bandwidth for it. If you have the time, would you mind checking the updates I made in the new PR

[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...

2018-04-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21090 @wangmiao1981 and @WeichenXu123 would you mind taking a look? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21090: [SPARK-15784][ML] Add Power Iteration Clustering to spar...

2018-04-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21090 **To review this PR**: This was copied from https://github.com/apache/spark/pull/15770 with the following changes: * Addressed comments in original PR (See my review comments

[GitHub] spark pull request #21090: [SPARK-15784][ML] Add Power Iteration Clustering ...

2018-04-17 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/21090 [SPARK-15784][ML] Add Power Iteration Clustering to spark.ml ## What changes were proposed in this pull request? This PR adds PowerIterationClustering as a Transformer to spark.ml

[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...

2018-04-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20319 Reviewing now! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

spark git commit: [SPARK-21741][ML][PYSPARK] Python API for DataFrame-based multivariate summarizer

2018-04-17 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master f39e82ce1 -> 1ca3c50fe [SPARK-21741][ML][PYSPARK] Python API for DataFrame-based multivariate summarizer ## What changes were proposed in this pull request? Python API for DataFrame-based multivariate summarizer. ## How was this patch

[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-04-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20695 LGTM Thanks for the PR! Merging with master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2018-04-16 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15770 I don't mind; I'll take it. But I'll mark @wangmiao1981 as the main contributor for the PR. Would you mind closing this issue @wangmiao1981 and I'll reopen a new PR under the same JIRA

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r181847061 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -144,7 +156,12 @@ class KMeansModel private[ml] ( // TODO

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r181846789 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -123,8 +128,15 @@ class KMeansModel private[ml] ( @Since("

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r181841713 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -194,6 +195,34 @@ class KMeansSuite extends SparkFunSuite

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r181847695 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -312,6 +329,8 @@ class KMeans @Since("1.5.0") (

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r181841557 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -194,6 +195,34 @@ class KMeansSuite extends SparkFunSuite

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r181840894 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -194,6 +195,34 @@ class KMeansSuite extends SparkFunSuite

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r181841503 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -194,6 +195,34 @@ class KMeansSuite extends SparkFunSuite

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r181847784 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -123,8 +128,15 @@ class KMeansModel private[ml] ( @Since("

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r181840765 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -194,6 +195,34 @@ class KMeansSuite extends SparkFunSuite

spark git commit: [SPARK-21088][ML] CrossValidator, TrainValidationSplit support collect all models when fitting: Python API

2018-04-16 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 5003736ad -> 04614820e [SPARK-21088][ML] CrossValidator, TrainValidationSplit support collect all models when fitting: Python API ## What changes were proposed in this pull request? Add python API for collecting sub-models during

[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...

2018-04-16 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19627 LGTM Merging with master Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #7652: [SPARK-9312] [ML] Added max confidence factor to OneVsRes...

2018-04-16 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/7652 Hi, sorry I let this PR get stale. This should be resolved now by https://github.com/apache/spark/pull/21044 so would you mind closing this issue @badriub ? Thanks though

spark git commit: [SPARK-9312][ML] Add RawPrediction, numClasses, and numFeatures for OneVsRestModel

2018-04-16 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 083cf2235 -> 5003736ad [SPARK-9312][ML] Add RawPrediction, numClasses, and numFeatures for OneVsRestModel add RawPrediction as output column add numClasses and numFeatures to OneVsRestModel ## What changes were proposed in this pull

[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

2018-04-16 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21044 LGTM Merging with master Thanks!! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #20695: [SPARK-21741][ML][PySpark] Python API for DataFra...

2018-04-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20695#discussion_r181802586 --- Diff: python/pyspark/ml/stat.py --- @@ -195,6 +197,195 @@ def test(dataset, sampleCol, distName, *params

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r181731252 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -81,32 +81,37 @@ object KolmogorovSmirnovTest { * Java

[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

2018-04-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21044#discussion_r181288716 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -146,6 +152,10 @@ final class OneVsRestModel private[ml

[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

2018-04-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21044#discussion_r181288736 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -195,15 +206,32 @@ final class OneVsRestModel private[ml

[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

2018-04-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21044#discussion_r181288725 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -195,15 +206,32 @@ final class OneVsRestModel private[ml

[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

2018-04-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21044#discussion_r181288721 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -195,15 +206,32 @@ final class OneVsRestModel private[ml

[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

2018-04-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21044#discussion_r181288710 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -138,6 +138,12 @@ final class OneVsRestModel private[ml

[GitHub] spark pull request #20695: [SPARK-21741][ML][PySpark] Python API for DataFra...

2018-04-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20695#discussion_r181263309 --- Diff: python/pyspark/ml/stat.py --- @@ -195,6 +197,185 @@ def test(dataset, sampleCol, distName, *params

[GitHub] spark pull request #20695: [SPARK-21741][ML][PySpark] Python API for DataFra...

2018-04-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20695#discussion_r181259361 --- Diff: python/pyspark/ml/stat.py --- @@ -195,6 +197,185 @@ def test(dataset, sampleCol, distName, *params

[GitHub] spark pull request #20695: [SPARK-21741][ML][PySpark] Python API for DataFra...

2018-04-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20695#discussion_r181259181 --- Diff: python/pyspark/ml/stat.py --- @@ -195,6 +197,185 @@ def test(dataset, sampleCol, distName, *params

[GitHub] spark pull request #20695: [SPARK-21741][ML][PySpark] Python API for DataFra...

2018-04-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20695#discussion_r181259536 --- Diff: python/pyspark/ml/stat.py --- @@ -195,6 +197,185 @@ def test(dataset, sampleCol, distName, *params

spark git commit: [SPARK-23751][FOLLOW-UP] fix build for scala-2.12

2018-04-12 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 0b19122d4 -> 0f93b91a7 [SPARK-23751][FOLLOW-UP] fix build for scala-2.12 ## What changes were proposed in this pull request? fix build for scala-2.12 ## How was this patch tested? Manual. Author: WeichenXu

[GitHub] spark issue #21051: [SPARK-23751][FOLLOW-UP] fix build for scala-2.12

2018-04-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21051 LGTM Merging with master Thanks @WeichenXu123 ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r181230965 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -81,32 +81,37 @@ object KolmogorovSmirnovTest { * Java

spark git commit: typo rawPredicition changed to rawPrediction

2018-04-11 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 75a183071 -> 9d960de08 typo rawPredicition changed to rawPrediction MultilayerPerceptronClassifier had 4 occurrences ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch

spark git commit: typo rawPredicition changed to rawPrediction

2018-04-11 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-2.3 acfc156df -> 03a4dfd69 typo rawPredicition changed to rawPrediction MultilayerPerceptronClassifier had 4 occurrences ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this

[GitHub] spark issue #21030: typo rawPredicition changed to rawPrediction

2018-04-11 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21030 Actually, forget the JIRA; I'll just merge it with master and branch-2.3 as is. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

2018-04-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21044#discussion_r180920806 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -195,14 +205,18 @@ final class OneVsRestModel private[ml

[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

2018-04-11 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21044 Thanks for the PR! Quick high-level comment: We'll need to have rawPredictionCol be optional. If it's not set or is an empty string, then it should not be added to the output DataFrame

[GitHub] spark pull request #19627: [SPARK-21088][ML] CrossValidator, TrainValidation...

2018-04-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19627#discussion_r180874011 --- Diff: python/pyspark/ml/tuning.py --- @@ -194,7 +195,8 @@ def _to_java_impl(self): return java_estimator, java_epms, java_evaluator

[GitHub] spark pull request #19627: [SPARK-21088][ML] CrossValidator, TrainValidation...

2018-04-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19627#discussion_r180862523 --- Diff: python/pyspark/ml/tuning.py --- @@ -194,7 +195,8 @@ def _to_java_impl(self): return java_estimator, java_epms, java_evaluator

[GitHub] spark pull request #19627: [SPARK-21088][ML] CrossValidator, TrainValidation...

2018-04-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19627#discussion_r180876558 --- Diff: python/pyspark/ml/tests.py --- @@ -1018,6 +1018,48 @@ def test_parallel_evaluation(self): cvParallelModel = cv.fit(dataset

[GitHub] spark pull request #19627: [SPARK-21088][ML] CrossValidator, TrainValidation...

2018-04-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19627#discussion_r180610868 --- Diff: python/pyspark/ml/tests.py --- @@ -1186,6 +1228,38 @@ def test_parallel_evaluation(self): tvsParallelModel = tvs.fit(dataset

[GitHub] spark pull request #19627: [SPARK-21088][ML] CrossValidator, TrainValidation...

2018-04-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19627#discussion_r180611695 --- Diff: python/pyspark/ml/param/_shared_params_code_gen.py --- @@ -157,6 +157,8 @@ def get$Name(self): "TypeConverters.

<    1   2   3   4   5   6   7   8   9   10   >