[GitHub] spark pull request #21042: [SPARK-22883][ML] ML test for StructuredStreaming...

2018-04-11 Thread jkbradley
Github user jkbradley closed the pull request at: https://github.com/apache/spark/pull/21042 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-22883][ML] ML test for StructuredStreaming: spark.ml.feature, I-M

2018-04-11 Thread jkbradley
1042 from jkbradley/SPARK-22883-part2-2.3backport. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/acfc156d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/acfc156d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff

[GitHub] spark issue #21042: [SPARK-22883][ML] ML test for StructuredStreaming: spark...

2018-04-11 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21042 Since this had a LGTM for the original PR and has no changes and tests pass, I'll merge this with branch-2.3

[GitHub] spark issue #21042: [SPARK-22883][ML] ML test for StructuredStreaming: spark...

2018-04-11 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21042 I haven't made any changes from the PR which was merged into master in https://github.com/apache/spark/pull/20964

[GitHub] spark pull request #21042: [SPARK-22883] ML test for StructuredStreaming: sp...

2018-04-11 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/21042 [SPARK-22883] ML test for StructuredStreaming: spark.ml.feature, I-M This backports https://github.com/apache/spark/pull/20964 to branch-2.3. ## What changes were proposed in this pull

[GitHub] spark issue #20964: [SPARK-22883] ML test for StructuredStreaming: spark.ml....

2018-04-11 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20964 Merging with master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

spark git commit: [SPARK-22883] ML test for StructuredStreaming: spark.ml.feature, I-M

2018-04-11 Thread jkbradley
ter * Interaction * MaxAbsScaler * MinHashLSH * MinMaxScaler * NGram ## How was this patch tested? It is a bunch of tests! Author: Joseph K. Bradley <jos...@databricks.com> Closes #20964 from jkbradley/SPARK-22883-part2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http:

[GitHub] spark issue #20964: [SPARK-22883] ML test for StructuredStreaming: spark.ml....

2018-04-10 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20964 I rebased off of master because of the merge warning in the last tests. I did not have to resolve any conflicts. I'll merge this once tests pass

spark git commit: [SPARK-23944][ML] Add the set method for the two LSHModel

2018-04-10 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 4f1e8b9bb -> 7c7570d46 [SPARK-23944][ML] Add the set method for the two LSHModel ## What changes were proposed in this pull request? Add two set method for LSHModel in LSH.scala, BucketedRandomProjectionLSH.scala, and MinHashLSH.scala

[GitHub] spark issue #21015: [SPARK-23944][ML] Add the set method for the two LSHMode...

2018-04-10 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21015 LGTM Merging with master Thanks @ludatabricks ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2018-04-10 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15770 @wangmiao1981 Do let me know if you're too busy now to resume this; I know it's been a long time. Thanks

[GitHub] spark issue #20964: [SPARK-22883] ML test for StructuredStreaming: spark.ml....

2018-04-10 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20964 Thanks! I'll rerun tests since they are stale and merge after they pass. --- - To unsubscribe, e-mail: reviews-unsubscr

spark git commit: [SPARK-23871][ML][PYTHON] add python api for VectorAssembler handleInvalid

2018-04-10 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master adb222b95 -> 4f1e8b9bb [SPARK-23871][ML][PYTHON] add python api for VectorAssembler handleInvalid ## What changes were proposed in this pull request? add python api for VectorAssembler handleInvalid ## How was this patch tested? Add

[GitHub] spark issue #21003: [SPARK-23871][ML][PYTHON]add python api for VectorAssemb...

2018-04-10 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21003 Merging with master Thanks @huaxingao ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21003: [SPARK-23871][ML][PYTHON]add python api for VectorAssemb...

2018-04-10 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21003 LGTM But would you mind making one fix in the doc here and in Scala? "Param for how to handle invalid data (NULL values)" should actually read "Param for how to handle inv

[GitHub] spark issue #21030: typo rawPredicition changed to rawPrediction

2018-04-10 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21030 LGTM pending tests Thanks for finding & fixing this! Would you mind creating a JIRA and linking it to https://issues.apache.org/jira/browse/SPARK-21856 ? I'd like a tracking

[GitHub] spark issue #21030: typo rawPredicition changed to rawPrediction

2018-04-10 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21030 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

spark git commit: [SPARK-23751][ML][PYSPARK] Kolmogorov-Smirnoff test Python API in pyspark.ml

2018-04-10 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master e17965891 -> adb222b95 [SPARK-23751][ML][PYSPARK] Kolmogorov-Smirnoff test Python API in pyspark.ml ## What changes were proposed in this pull request? Kolmogorov-Smirnoff test Python API in `pyspark.ml` **Note** API with `CDF` is a

[GitHub] spark issue #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff test Pyth...

2018-04-10 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20904 LGTM Thanks for the PR! Merging with master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21015: [SPARK-23944][ML] Add the set method for the two LSHMode...

2018-04-09 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21015 add to whitelist --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-09 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r180228711 --- Diff: python/pyspark/ml/stat.py --- @@ -127,13 +113,86 @@ class Correlation(object): def corr(dataset, column, method="pe

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-09 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r180245120 --- Diff: python/pyspark/ml/stat.py --- @@ -127,13 +113,86 @@ class Correlation(object): def corr(dataset, column, method="pe

spark git commit: [SPARK-14681][ML] Provide label/impurity stats for spark.ml decision tree nodes

2018-04-09 Thread jkbradley
ide val rootNode: ClassificationNode class DecisionTreeRegressionModel override val rootNode: RegressionNode ``` Closes #17466 ## How was this patch tested? UT will be added soon. Author: WeichenXu <weichen...@databricks.com> Author: jkbradley <joseph.kurata.brad...@gmail.com> Clo

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-04-09 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20786 LGTM Merging with master Thanks @WeichenXu123 ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #18227: [SPARK-21005][ML] Fix VectorIndexerModel does not prepar...

2018-04-09 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18227 Just commented on the JIRA about this issue: https://issues.apache.org/jira/browse/SPARK-21005 --- - To unsubscribe, e-mail

[GitHub] spark issue #20825: add impurity stats in tree leaf node debug string

2018-04-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20825 I actually would prefer not to merge this change since it could blow up the size of the strings printed for some classification tasks with large numbers of labels. If people want to debug

[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...

2018-04-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20319 @smurakozi Thanks for the PR! I have bandwidth to review this now. Do you have time to rebase this to fix the merge conflicts

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r179824556 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -102,10 +102,11 @@ object KolmogorovSmirnovTest

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r179831482 --- Diff: python/pyspark/ml/stat.py --- @@ -134,6 +134,65 @@ def corr(dataset, column, method="pearson"): return _

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r179833156 --- Diff: python/pyspark/ml/stat.py --- @@ -134,6 +134,65 @@ def corr(dataset, column, method="pearson"): return _

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r179830986 --- Diff: python/pyspark/ml/stat.py --- @@ -134,6 +134,65 @@ def corr(dataset, column, method="pearson"): return _

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r179832593 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -81,7 +81,7 @@ object KolmogorovSmirnovTest { * Java

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r179824228 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -59,7 +59,7 @@ object KolmogorovSmirnovTest

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r179832114 --- Diff: python/pyspark/ml/stat.py --- @@ -134,6 +134,65 @@ def corr(dataset, column, method="pearson"): return _

spark git commit: [SPARK-23859][ML] Initial PR for Instrumentation improvements: UUID and logging levels

2018-04-06 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master c926acf71 -> d23a805f9 [SPARK-23859][ML] Initial PR for Instrumentation improvements: UUID and logging levels ## What changes were proposed in this pull request? Initial PR for Instrumentation improvements: UUID and logging levels. This

[GitHub] spark issue #20982: [SPARK-23859][ML] Initial PR for Instrumentation improve...

2018-04-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20982 LGTM Merging with master Thanks @WeichenXu123 and @MrBago ! --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #20837: [SPARK-23686][ML][WIP] Better instrumentation

2018-04-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20837 We can close this issue now that it's been replaced by https://github.com/apache/spark/pull/20982 --- - To unsubscribe, e

[GitHub] spark issue #20994: [SPARK-21898][ML][FOLLOWUP] Fix Scala 2.12 build.

2018-04-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20994 LGTM (assuming it 2.12 builds now?) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

spark git commit: [SPARK-23870][ML] Forward RFormula handleInvalid Param to VectorAssembler to handle invalid values in non-string columns

2018-04-05 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 4807d381b -> f2ac08795 [SPARK-23870][ML] Forward RFormula handleInvalid Param to VectorAssembler to handle invalid values in non-string columns ## What changes were proposed in this pull request? `handleInvalid` Param was forwarded to

[GitHub] spark issue #20970: [SPARK-23870][ML] Forward RFormula handleInvalid Param t...

2018-04-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20970 LGTM Merging with master Thanks @yogeshg ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r178988503 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,216 @@ +/* + * Licensed

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r178984276 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,216 @@ +/* + * Licensed

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r178991306 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,216 @@ +/* + * Licensed

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r178992899 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala --- @@ -0,0 +1,171 @@ +/* + * Licensed

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r178988149 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,216 @@ +/* + * Licensed

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r178991834 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala --- @@ -0,0 +1,171 @@ +/* + * Licensed

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r178987751 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,216 @@ +/* + * Licensed

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r178987675 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r178987121 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,216 @@ +/* + * Licensed

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r178983843 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,216 @@ +/* + * Licensed

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2018-04-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15770 Just pinged @yanboliang on JIRA about me taking over shepherding this. It will need at least one update: change Since versions from 2.3.0 to 2.4.0. Sorry for the long wait @wangmiao1981

[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r178956696 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -351,27 +359,90 @@ private[ml] object DefaultParamsReader

[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r178955058 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -905,6 +905,15 @@ trait Params extends Identifiable with Serializable

[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20633#discussion_r178955105 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -791,7 +791,7 @@ trait Params extends Identifiable with Serializable

[GitHub] spark issue #20970: [SPARK-23562][ML] Forward RFormula handleInvalid Param t...

2018-04-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20970 Side note: In general, when there is a JIRA with subtasks, it's nice to send all PRs against subtasks, rather than against the parent task. This can be good when we think of future subtasks

[GitHub] spark pull request #20970: [SPARK-23562][ML] Forward RFormula handleInvalid ...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20970#discussion_r178938488 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala --- @@ -592,4 +593,26 @@ class RFormulaSuite extends MLTest

[GitHub] spark pull request #20970: [SPARK-23562][ML] Forward RFormula handleInvalid ...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20970#discussion_r178938415 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala --- @@ -592,4 +593,26 @@ class RFormulaSuite extends MLTest

[GitHub] spark pull request #20837: [SPARK-23686][ML][WIP] Better instrumentation

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20837#discussion_r178905201 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -517,6 +517,9 @@ class LogisticRegression @Since

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-04-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20786 Thanks for the updates! I had a spacing typo in the fromOld() style fix I wrote above, and there are some traits which still need to be sealed. Hope you don't mind, but I sent a PR

[GitHub] spark pull request #20964: [SPARK-22883] ML test for StructuredStreaming: sp...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20964#discussion_r178896682 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/MinHashLSHSuite.scala --- @@ -167,4 +166,20 @@ class MinHashLSHSuite extends SparkFunSuite

[GitHub] spark pull request #20964: [SPARK-22883] ML test for StructuredStreaming: sp...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20964#discussion_r178896432 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/ImputerSuite.scala --- @@ -76,6 +75,28 @@ class ImputerSuite extends SparkFunSuite

[GitHub] spark pull request #20964: [SPARK-22883] ML test for StructuredStreaming: sp...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20964#discussion_r178894229 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala --- @@ -48,8 +46,8 @@ class MinMaxScalerSuite extends SparkFunSuite

[GitHub] spark pull request #20964: [SPARK-22883] ML test for StructuredStreaming: sp...

2018-04-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20964#discussion_r178893936 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/NGramSuite.scala --- @@ -84,7 +84,7 @@ class NGramSuite extends MLTest

spark git commit: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-04-02 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 28ea4e314 -> a1351828d [SPARK-23690][ML] Add handleinvalid to VectorAssembler ## What changes were proposed in this pull request? Introduce `handleInvalid` parameter in `VectorAssembler` that can take in `"keep", "skip", "error"`

[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-04-02 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20829 LGTM Merging with master Thanks @yogeshg for the PR and @WeichenXu123 for taking a look! --- - To unsubscribe, e-mail

[GitHub] spark pull request #20964: [SPARK-22883] ML test for StructuredStreaming: sp...

2018-04-02 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/20964 [SPARK-22883] ML test for StructuredStreaming: spark.ml.feature, I-M ## What changes were proposed in this pull request? Adds structured streaming tests using testTransformer

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-04-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r178676396 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -136,34 +181,106 @@ class VectorAssembler @Since("

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-04-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r178620282 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -136,34 +181,106 @@ class VectorAssembler @Since("

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-04-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r178620200 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -136,34 +181,106 @@ class VectorAssembler @Since("

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-04-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r178620104 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -136,34 +181,106 @@ class VectorAssembler @Since("

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-04-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r178620142 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -136,34 +181,106 @@ class VectorAssembler @Since("

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-04-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r178619977 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -136,34 +181,106 @@ class VectorAssembler @Since("

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-04-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r178619893 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -136,34 +181,106 @@ class VectorAssembler @Since("

[GitHub] spark pull request #20786: [SPARK-14681][ML] Provide label/impurity stats fo...

2018-03-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20786#discussion_r178202438 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala --- @@ -84,35 +86,85 @@ private[ml] object Node { /** * Create a new

[GitHub] spark pull request #20786: [SPARK-14681][ML] Provide label/impurity stats fo...

2018-03-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20786#discussion_r178202596 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala --- @@ -84,35 +86,85 @@ private[ml] object Node { /** * Create a new

[GitHub] spark pull request #20786: [SPARK-14681][ML] Provide label/impurity stats fo...

2018-03-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20786#discussion_r178204351 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala --- @@ -84,35 +86,85 @@ private[ml] object Node { /** * Create a new

[GitHub] spark pull request #20786: [SPARK-14681][ML] Provide label/impurity stats fo...

2018-03-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20786#discussion_r178202260 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala --- @@ -84,35 +86,73 @@ private[ml] object Node { /** * Create a new

[GitHub] spark pull request #20786: [SPARK-14681][ML] Provide label/impurity stats fo...

2018-03-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20786#discussion_r178202685 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala --- @@ -84,35 +86,85 @@ private[ml] object Node { /** * Create a new

[GitHub] spark pull request #20786: [SPARK-14681][ML] Provide label/impurity stats fo...

2018-03-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20786#discussion_r178204775 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala --- @@ -17,9 +17,11 @@ package org.apache.spark.ml.tree +import

[GitHub] spark pull request #20786: [SPARK-14681][ML] Provide label/impurity stats fo...

2018-03-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20786#discussion_r178202528 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala --- @@ -84,35 +86,85 @@ private[ml] object Node { /** * Create a new

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r177542325 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -136,34 +181,106 @@ class VectorAssembler @Since("

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r177543971 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -136,34 +181,106 @@ class VectorAssembler @Since("

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r177500915 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -17,26 +17,32 @@ package org.apache.spark.ml.feature

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r177542280 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -136,34 +181,106 @@ class VectorAssembler @Since("

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r177505970 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -49,32 +55,64 @@ class VectorAssembler @Since("1.4.0"

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r177503206 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -49,32 +55,64 @@ class VectorAssembler @Since("1.4.0"

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r177543316 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -136,34 +181,106 @@ class VectorAssembler @Since("

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r177543289 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -136,34 +181,106 @@ class VectorAssembler @Since("

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r177501836 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -49,32 +53,57 @@ class VectorAssembler @Since("1.4.0"

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r177547373 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -136,34 +181,106 @@ class VectorAssembler @Since("

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r177559339 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala --- @@ -147,4 +159,88 @@ class VectorAssemblerSuite

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r177558064 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala --- @@ -18,56 +18,68 @@ package org.apache.spark.ml.feature

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r177543627 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -136,34 +181,106 @@ class VectorAssembler @Since("

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r177559587 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala --- @@ -147,4 +159,88 @@ class VectorAssemblerSuite

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r177547735 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -136,34 +181,106 @@ class VectorAssembler @Since("

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r177504904 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -49,32 +55,64 @@ class VectorAssembler @Since("1.4.0"

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r177560225 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala --- @@ -147,4 +159,88 @@ class VectorAssemblerSuite

spark git commit: [MINOR] Fix Java lint from new JavaKolmogorovSmirnovTestSuite

2018-03-21 Thread jkbradley
ion of JavaKolmogorovSmirnovTestSuite Author: Joseph K. Bradley <jos...@databricks.com> Closes #20875 from jkbradley/kstest-lint-fix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a091ee67 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree

[GitHub] spark issue #20875: [MINOR] Fix Java lint from new JavaKolmogorovSmirnovTest...

2018-03-21 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20875 Merging with master Thanks for checking @adrian-ionescu ! --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #20875: [MINOR] Fix Java lint from new JavaKolmogorovSmirnovTest...

2018-03-21 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20875 CC @WeichenXu123 @MrBago @adrian-ionescu --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

<    1   2   3   4   5   6   7   8   9   10   >