[GitHub] spark issue #22271: [SPARK-25268][GraphX]run Parallel Personalized PageRank ...

2018-09-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/22271 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22271: [SPARK-25268][GraphX]run Parallel Personalized PageRank ...

2018-09-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/22271 LGTM I tested this locally and confirmed it fixes the serialization issue. Thank you @shahidki31 ! Merging with master after fresh tests finish

[GitHub] spark issue #22228: [SPARK-25124][ML]VectorSizeHint setSize and getSize don'...

2018-08-24 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/8 Awesome, thank you! LGTM Merging with branch-2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22136: [SPARK-25124][ML]VectorSizeHint setSize and getSize don'...

2018-08-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/22136 Well, this merged successfully with master but not with 2.3; it seemed to pull in code from another PR, strangely. Would you mind sending a backport PR against branch-2.3? Thank you

[GitHub] spark issue #22136: [SPARK-25124][ML]VectorSizeHint setSize and getSize don'...

2018-08-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/22136 LGTM Merging with master. I'll try to backport it to 2.3 too. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22136: [SPARK-25124][ML]VectorSizeHint setSize and getSi...

2018-08-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/22136#discussion_r212483950 --- Diff: python/pyspark/ml/tests.py --- @@ -844,6 +844,28 @@ def test_string_indexer_from_labels(self): .select

[GitHub] spark pull request #22136: [SPARK-25124][ML]VectorSizeHint setSize and getSi...

2018-08-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/22136#discussion_r212069129 --- Diff: python/pyspark/ml/tests.py --- @@ -844,6 +844,28 @@ def test_string_indexer_from_labels(self): .select

[GitHub] spark issue #22139: [SPARK-25149][GraphX] Update Parallel Personalized Page ...

2018-08-21 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/22139 LGTM I'll merge this with master Thanks @MrBago ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21799: [SPARK-24852][ML] Update spark.ml to use Instrumentation...

2018-07-20 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21799 LGTM Merging with master Thanks @MrBago ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21719: [SPARK-24747][ML] Make Instrumentation class more flexib...

2018-07-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21719 LGTM Merging with master Thanks @MrBago ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21719: [SPARK-24747][ML] Make Instrumentation class more...

2018-07-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21719#discussion_r202800969 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -488,9 +488,10 @@ class LogisticRegression @Since

[GitHub] spark pull request #21719: [SPARK-24747][ML] Make Instrumentation class more...

2018-07-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21719#discussion_r202805710 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala --- @@ -19,45 +19,60 @@ package org.apache.spark.ml.util import

[GitHub] spark issue #21719: [SPARK-24747][ML] Make Instrumentation class more flexib...

2018-07-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21719 jenkins test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20442: [SPARK-23265][ML]Update multi-column error handling logi...

2018-06-13 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20442 @huaxingao Thanks for this follow-up! I realized that https://github.com/apache/spark/pull/19715 introduced a breaking change which we missed in Spark 2.3 QA: In Spark 2.2, a user could set

[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...

2018-05-21 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21129 I made a JIRA for the Python part of this: https://issues.apache.org/jira/browse/SPARK-24333 --- - To unsubscribe, e-mail

[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...

2018-05-21 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21129 LGTM Merging with master Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21163: [SPARK-24097][ML] Instrumentation improvements - RandomF...

2018-05-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21163 jenkins test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...

2018-05-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20319 Done! Here it is: https://github.com/apache/spark/pull/21358 @smurakozi Could you please close this issue and help review the new PR if you have time? Thanks

[GitHub] spark pull request #21358: [SPARK-22884][ML] ML tests for StructuredStreamin...

2018-05-17 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/21358 [SPARK-22884][ML] ML tests for StructuredStreaming: spark.ml.clustering ## What changes were proposed in this pull request? Converting clustering tests to also check code with structured

[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...

2018-05-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20319 I'm going to take this over to get this done, but @smurakozi you'll be the primary author. I'll link the PR here in a minute

[GitHub] spark issue #21163: [SPARK-24097][ML] Instrumentation improvements - RandomF...

2018-05-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21163 LGTM pending fresh tests Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21344: [SPARK-24114] Add instrumentation to FPGrowth.

2018-05-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21344 Whoops! This was merged linked to the wrong JIRA. It should have been: https://issues.apache.org/jira/browse/SPARK-24310 for the record

[GitHub] spark issue #21344: [SPARK-24114] Add instrumentation to FPGrowth.

2018-05-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21344 LGTM It'll be nice when we move the implementation to spark.ml in the future so that we can log more info, but this is good for now. Thanks @MrBago and @ludatabricks ! Merging

[GitHub] spark issue #21344: [SPARK-24114] Add instrumentation to FPGrowth.

2018-05-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21344 I'll take a look now --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21347: [SPARK-24290][ML] add support for Array input for...

2018-05-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21347#discussion_r189057734 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala --- @@ -125,6 +125,19 @@ private[spark] class Instrumentation[E

[GitHub] spark pull request #21129: [SPARK-7132][ML] Add fit with validation set to s...

2018-05-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21129#discussion_r189052646 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala --- @@ -460,18 +461,37 @@ private[ml] trait RandomForestRegressorParams

[GitHub] spark pull request #21090: [SPARK-24026][ML] Add Power Iteration Clustering ...

2018-05-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21090#discussion_r188813735 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-05-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r188813405 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-05-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r188813297 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #21183: [SPARK-22210][ML] Add seed for LDA variationalTopicInfer...

2018-05-16 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21183 Thanks for checking this manually. Since the test sometimes fails, then let's leave it. LGTM Merging with master Thanks @ludatabricks and @mengxr

[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-15 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21153 OK thanks @viirya ! Merging with master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-05-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r187129517 --- Diff: python/pyspark/ml/util.py --- @@ -396,6 +397,7 @@ def saveMetadata(instance, path, sc, extraMetadata=None, paramMap=None

[GitHub] spark pull request #21183: [SPARK-22210][ML] Add seed for LDA variationalTop...

2018-05-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21183#discussion_r188081089 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -473,7 +475,8 @@ final class OnlineLDAOptimizer extends

[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-05-10 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21153 Would you mind rebasing this off of the upstream master branch? I'm having trouble running the tests for this PR locally

[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...

2018-05-10 Thread jkbradley
Github user jkbradley closed the pull request at: https://github.com/apache/spark/pull/21274 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

2018-05-10 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21274 This issue actually brings up a problem with the Transformer approach for PIC. Just commented more here: https://issues.apache.org/jira/browse/SPARK-15784 Thank you for pushing back

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-05-10 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21119 I think we messed up with the original PIC API. Could you please check out my comment here https://issues.apache.org/jira/browse/SPARK-15784 ? If others agree, I'll revert the Scala API and we

[GitHub] spark pull request #21183: [SPARK-22210][ML] Add seed for LDA variationalTop...

2018-05-09 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21183#discussion_r187217859 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala --- @@ -622,11 +623,11 @@ object LocalLDAModel extends MLReadable[LocalLDAModel

[GitHub] spark pull request #21183: [SPARK-22210][ML] Add seed for LDA variationalTop...

2018-05-09 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21183#discussion_r187216371 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala --- @@ -252,6 +252,15 @@ class LDASuite extends SparkFunSuite

[GitHub] spark issue #21265: [SPARK-24146][PySpark][ML] spark.ml parity for sequentia...

2018-05-09 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21265 Let's wait on this until we make the decision in the last thread in https://github.com/apache/spark/pull/20973

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-05-09 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r187216080 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #21265: [SPARK-24146][PySpark][ML] spark.ml parity for se...

2018-05-09 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21265#discussion_r187187330 --- Diff: python/pyspark/ml/fpm.py --- @@ -243,3 +244,75 @@ def setParams(self, minSupport=0.3, minConfidence=0.8, itemsCol="items",

[GitHub] spark issue #21203: [SPARK-24131][PySpark] Add majorMinorVersion API to PySp...

2018-05-09 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21203 It's OK but would you mind fixing it @viirya before we use it in https://github.com/apache/spark/pull/21153 ? Thanks

[GitHub] spark issue #21097: [SPARK-14682][ML] Provide evaluateEachIteration method o...

2018-05-09 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21097 LGTM Merging with master Thanks @WeichenXu123 ! Would you mind creating & linking a JIRA for the Python API up

[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

2018-05-09 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21274 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21272: [MINOR][ML][DOC] Improved Naive Bayes user guide explana...

2018-05-09 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21272 Thanks for the LGTM! Merging with master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21129: [SPARK-7132][ML] Add fit with validation set to s...

2018-05-09 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21129#discussion_r187114172 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala --- @@ -460,18 +461,29 @@ private[ml] trait RandomForestRegressorParams

[GitHub] spark pull request #21129: [SPARK-7132][ML] Add fit with validation set to s...

2018-05-09 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21129#discussion_r187113953 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala --- @@ -460,18 +461,29 @@ private[ml] trait RandomForestRegressorParams

[GitHub] spark pull request #21129: [SPARK-7132][ML] Add fit with validation set to s...

2018-05-09 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21129#discussion_r187112582 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala --- @@ -460,18 +461,29 @@ private[ml] trait RandomForestRegressorParams

[GitHub] spark pull request #21218: [SPARK-24155][ML] Instrumentation improvements fo...

2018-05-09 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21218#discussion_r187115704 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala --- @@ -278,6 +279,7 @@ class BisectingKMeans @Since("

[GitHub] spark issue #21270: [SPARK-24213][ML]Power Iteration Clustering in SparkML t...

2018-05-08 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21270 Thanks for the patch! I just commented on https://issues.apache.org/jira/browse/SPARK-24213 though and would like to replace this with https://github.com/apache/spark/pull/21274 Could you

[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...

2018-05-08 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/21274 [SPARK-24213][ML] Fix for Int id type for PowerIterationClustering in spark.ml ## What changes were proposed in this pull request? PIC in spark.ml has tests for "id" type I

[GitHub] spark pull request #21097: [SPARK-14682][ML] Provide evaluateEachIteration m...

2018-05-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21097#discussion_r186865863 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -367,11 +367,31 @@ class GBTClassifierSuite extends

[GitHub] spark issue #21272: [MINOR][ML][DOC] Improved Naive Bayes user guide explana...

2018-05-08 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21272 ![screen shot 2018-05-08 at 2 03 13 pm](https://user-images.githubusercontent.com/5084283/39783013-a1650846-52c8-11e8-8f15-42b93dd51168.png

[GitHub] spark pull request #21272: [MINOR][ML][DOC] Improved Naive Bayes user guide ...

2018-05-08 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/21272 [MINOR][ML][DOC] Improved Naive Bayes user guide explanation ## What changes were proposed in this pull request? This copies the material from the spark.mllib user guide page for Naive

[GitHub] spark pull request #21129: [SPARK-7132][ML] Add fit with validation set to s...

2018-05-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21129#discussion_r186572477 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala --- @@ -497,6 +498,9 @@ private[ml] trait GBTParams extends TreeEnsembleParams

[GitHub] spark pull request #21129: [SPARK-7132][ML] Add fit with validation set to s...

2018-05-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21129#discussion_r186572374 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -365,6 +366,50 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #21129: [SPARK-7132][ML] Add fit with validation set to s...

2018-05-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21129#discussion_r186573136 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -146,20 +146,40 @@ class GBTClassifier @Since("

[GitHub] spark pull request #21129: [SPARK-7132][ML] Add fit with validation set to s...

2018-05-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21129#discussion_r186569928 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -365,6 +366,50 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #21129: [SPARK-7132][ML] Add fit with validation set to s...

2018-05-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21129#discussion_r186572853 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -146,20 +146,40 @@ class GBTClassifier @Since("

[GitHub] spark pull request #21097: [SPARK-14682][ML] Provide evaluateEachIteration m...

2018-05-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21097#discussion_r186565756 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GBTRegressor.scala --- @@ -269,6 +269,20 @@ class GBTRegressionModel private[ml

[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-05-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20973 Merging with master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20261: [SPARK-22885][ML][TEST] ML test for StructuredStreaming:...

2018-05-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20261 Merging with master Thanks @WeichenXu123 ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...

2018-05-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/13493 Merging with master Thanks all! @zjffdu Did you want to backport this to branch-2.3 too? --- - To unsubscribe, e

[GitHub] spark pull request #21163: [SPARK-24097][ML] Instruments improvements - Rand...

2018-05-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21163#discussion_r185628460 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -22,12 +22,11 @@ import java.io.IOException import

[GitHub] spark issue #21163: [SPARK-24097][ML] Instruments improvements - RandomFores...

2018-05-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21163 Regarding logging on executors, are you OK with the proposed plan here? https://issues.apache.org/jira/browse/SPARK-23686 Basically, we'll keep using regular Logging on executors, rather than

[GitHub] spark issue #20235: [Spark-22887][ML][TESTS][WIP] ML test for StructuredStre...

2018-05-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20235 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...

2018-05-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/13493 LGTM pending fresh tests Sorry for the delay! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21185: [SPARK-23894][CORE][SQL] Defensively clear ActiveSession...

2018-05-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21185 There have been several of these R tests. May be from flakiness with CRAN; testing locally now (since I didn't see any recent bad commits in R

[GitHub] spark pull request #21204: [SPARK-24132][ML]Expand instrumentation for class...

2018-05-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21204#discussion_r185325228 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala --- @@ -97,9 +97,10 @@ class DecisionTreeClassifier

[GitHub] spark pull request #21204: [SPARK-24132][ML]Expand instrumentation for class...

2018-05-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21204#discussion_r185325179 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala --- @@ -97,9 +97,10 @@ class DecisionTreeClassifier

[GitHub] spark pull request #21204: [SPARK-24132][ML]Expand instrumentation for class...

2018-05-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21204#discussion_r185324974 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala --- @@ -103,7 +103,10 @@ abstract class Classifier[ * @throws

[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-05-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20973 Rerunning tests in case the R CRAN failure was from flakiness --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21195: [Spark-23975][ML] Add support of array input for all clu...

2018-05-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21195 Rerunning tests in case the R CRAN failure was from flakiness --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #20261: [SPARK-22885][ML][TEST] ML test for StructuredStreaming:...

2018-05-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20261 LGTM Will merge after fresh tests Thanks @WeichenXu123 and @smurakozi ! --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...

2018-05-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20319 @smurakozi Do you have time to update this? I did a full review, though it now has a small merge conflict. Thanks

[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-05-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20973 LGTM pending jenkins tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-05-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r185262870 --- Diff: python/pyspark/util.py --- @@ -61,6 +62,26 @@ def _get_argspec(f): return argspec +def majorMinorVersion(version

[GitHub] spark issue #21195: [Spark-23975][ML] Add support of array input for all clu...

2018-05-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21195 add to whitelist --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-04-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r185131834 --- Diff: python/pyspark/util.py --- @@ -61,6 +62,26 @@ def _get_argspec(f): return argspec +def majorMinorVersion(version

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r185058005 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -44,26 +43,37 @@ object PrefixSpan { * * @param dataset

[GitHub] spark pull request #21183: [SPARK-22210][ML] Add seed for LDA variationalTop...

2018-04-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21183#discussion_r184753197 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -473,7 +475,8 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #21183: [SPARK-22210][ML] Add seed for LDA variationalTop...

2018-04-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21183#discussion_r185049467 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -605,14 +609,16 @@ private[clustering] object

[GitHub] spark pull request #21129: [SPARK-7132][ML] Add fit with validation set to s...

2018-04-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21129#discussion_r184768987 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala --- @@ -95,7 +95,9 @@ private[shared] object

[GitHub] spark pull request #21097: [SPARK-14682][ML] Provide evaluateEachIteration m...

2018-04-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21097#discussion_r184760717 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -365,6 +365,20 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #21097: [SPARK-14682][ML] Provide evaluateEachIteration m...

2018-04-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21097#discussion_r184763088 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GBTRegressor.scala --- @@ -269,6 +269,21 @@ class GBTRegressionModel private[ml

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r183866393 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r183864852 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r183865224 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r183865609 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r183864177 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r183864721 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r183865745 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r183863701 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r183865387 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21078 LGTM Merging with master Thanks @WeichenXu123 ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r183796459 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -86,13 +86,23 @@ private[clustering] trait KMeansParams extends Params

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21081#discussion_r183797106 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/DatasetUtils.scala --- @@ -27,28 +26,38 @@ import org.apache.spark.sql.types.{ArrayType

[GitHub] spark pull request #20319: [SPARK-22884][ML][TESTS] ML test for StructuredSt...

2018-04-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20319#discussion_r183565892 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -20,13 +20,14 @@ package org.apache.spark.ml.clustering

[GitHub] spark pull request #20319: [SPARK-22884][ML][TESTS] ML test for StructuredSt...

2018-04-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20319#discussion_r183565968 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -65,10 +66,12 @@ class BisectingKMeansSuite

  1   2   3   4   5   6   7   8   9   10   >