[GitHub] spark issue #20389: [SPARK-23205][ML] Update ImageSchema.readImages to corre...

2018-01-24 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/20389 Thanks for the reviews @srowen, @dongjoon-hyun! Would it make sense to merge this before Spark 2.3 is released & if so would one of you be able to do

[GitHub] spark pull request #20389: [SPARK-23205][ML] Update ImageSchema.readImages t...

2018-01-24 Thread smurching
GitHub user smurching opened a pull request: https://github.com/apache/spark/pull/20389 [SPARK-23205][ML] Update ImageSchema.readImages to correctly set alpha values for four-channel images ## What changes were proposed in this pull request? When parsing raw image data

[GitHub] spark pull request #19758: [SPARK-3162][MLlib] Local Tree Training Pt 1: Ref...

2017-11-30 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19758#discussion_r154284858 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeSplitUtilsSuite.scala --- @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #19753: [SPARK-22521][ML] VectorIndexerModel support handle unse...

2017-11-15 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19753 This LGTM, @jkbradley would you be able to give this a look? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #19753: [SPARK-22521][ML] VectorIndexerModel support hand...

2017-11-15 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19753#discussion_r151315757 --- Diff: python/pyspark/ml/feature.py --- @@ -2565,22 +2575,28 @@ class VectorIndexer(JavaEstimator, HasInputCol, HasOutputCol, JavaMLReadable, Ja

[GitHub] spark pull request #19753: [SPARK-22521][ML] VectorIndexerModel support hand...

2017-11-15 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19753#discussion_r151298582 --- Diff: python/pyspark/ml/feature.py --- @@ -2565,22 +2575,28 @@ class VectorIndexer(JavaEstimator, HasInputCol, HasOutputCol, JavaMLReadable, Ja

[GitHub] spark pull request #19753: [SPARK-22521][ML] VectorIndexerModel support hand...

2017-11-15 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19753#discussion_r151298765 --- Diff: python/pyspark/ml/feature.py --- @@ -2565,22 +2575,28 @@ class VectorIndexer(JavaEstimator, HasInputCol, HasOutputCol, JavaMLReadable, Ja

[GitHub] spark pull request #19758: [SPARK-3162][MLlib] Local Tree Training Pt 1: Ref...

2017-11-15 Thread smurching
GitHub user smurching opened a pull request: https://github.com/apache/spark/pull/19758 [SPARK-3162][MLlib] Local Tree Training Pt 1: Refactor RandomForest.scala into utility classes ## What changes were proposed in this pull request? Breaks up #19433 to help unblock

[GitHub] spark issue #19753: [SPARK-22521][ML] VectorIndexerModel support handle unse...

2017-11-15 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19753 Looking at this now, thanks @WeichenXu123! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19381: [SPARK-10884][ML] Support prediction on single in...

2017-11-14 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19381#discussion_r151020618 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/RandomForestRegressorSuite.scala --- @@ -19,14 +19,16 @@ package

[GitHub] spark pull request #19381: [SPARK-10884][ML] Support prediction on single in...

2017-11-14 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19381#discussion_r151020666 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala --- @@ -19,15 +19,16 @@ package org.apache.spark.ml.regression

[GitHub] spark pull request #19433: [SPARK-3162] [MLlib] Add local tree training for ...

2017-11-14 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19433#discussion_r151019591 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -852,6 +662,41 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #19433: [SPARK-3162] [MLlib] Add local tree training for ...

2017-11-14 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19433#discussion_r151017375 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/SplitUtils.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19433: [SPARK-3162] [MLlib] Add local tree training for ...

2017-11-14 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19433#discussion_r151011913 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -627,221 +621,37 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #19433: [SPARK-3162] [MLlib] Add local tree training for ...

2017-11-14 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19433#discussion_r151011879 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/InformationGainStats.scala --- @@ -112,7 +113,7 @@ private[spark] object ImpurityStats

[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-13 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19666 Discussed with @jkbradley, I'll split up #19433 so that the parts of it that'd potentially conflict with this PR (refactoring RandomForest.scala into utility classes) can be merged first

[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...

2017-11-06 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19666#discussion_r149238531 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -976,6 +930,44 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...

2017-11-06 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19666#discussion_r149238185 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -741,17 +678,43 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...

2017-11-06 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19666#discussion_r149237212 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -741,17 +678,43 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...

2017-11-06 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19666#discussion_r149242575 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -976,6 +930,44 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...

2017-11-06 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19666#discussion_r149238123 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -741,17 +678,43 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...

2017-11-06 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19666#discussion_r149241680 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -976,6 +930,44 @@ private[spark] object RandomForest extends

[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...

2017-11-06 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19433 jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19381: [SPARK-10884][ML] Support prediction on single in...

2017-11-02 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19381#discussion_r148709163 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -202,6 +202,15 @@ class LinearSVCSuite extends

[GitHub] spark pull request #19381: [SPARK-10884][ML] Support prediction on single in...

2017-11-02 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19381#discussion_r148708939 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala --- @@ -165,6 +165,35 @@ class NaiveBayesSuite extends

[GitHub] spark pull request #19381: [SPARK-10884][ML] Support prediction on single in...

2017-11-02 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19381#discussion_r148709125 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala --- @@ -165,6 +165,35 @@ class NaiveBayesSuite extends

[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...

2017-11-02 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19433 jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...

2017-10-26 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19433 Made a few updates, here’s a quick summary/what I’d propose moving forward: Right now: * Shared row indices for all (categorical & continuous) features are stored &

[GitHub] spark pull request #19433: [SPARK-3162] [MLlib] Add local tree training for ...

2017-10-26 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19433#discussion_r147307553 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/LocalDecisionTree.scala --- @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19381: [SPARK-10884][ML] Support prediction on single in...

2017-10-25 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19381#discussion_r146986798 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -267,6 +268,24 @@ class

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2017-10-25 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19381 Looking at this now! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19433: [SPARK-3162] [MLlib] Add local tree training for ...

2017-10-25 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19433#discussion_r146981434 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/LocalDecisionTree.scala --- @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...

2017-10-24 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19433 @WeichenXu123 Thanks for the comments! I'll respond inline: > In your doc, you said "Specifically, we only need to store sufficient stats for each bin of a single feature, as

[GitHub] spark pull request #19433: [SPARK-3162] [MLlib] Add local tree training for ...

2017-10-24 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19433#discussion_r146731101 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/TrainingInfo.scala --- @@ -0,0 +1,144 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19433: [SPARK-3162] [MLlib] Add local tree training for ...

2017-10-24 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19433#discussion_r146731083 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/LocalDecisionTree.scala --- @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19433: [SPARK-3162] [MLlib] Add local tree training for ...

2017-10-24 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19433#discussion_r146731070 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/AggUpdateUtils.scala --- @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19433: [SPARK-3162] [MLlib] Add local tree training for ...

2017-10-24 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19433#discussion_r146731072 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/FeatureVector.scala --- @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...

2017-10-12 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19433 Sorry, realized I conflated feature subsampling and `subsampleWeights` (instance weights for training examples). IMO feature subsampling can be added in a follow-up PR, but `subsampleWeights

[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...

2017-10-12 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19433 Thanks for the comments! - Yep, feature subsampling is necessary for using local tree training in distributed training. I was thinking of adding subsampling in a follow-up PR. You're

[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...

2017-10-11 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19433 The failing SparkR test (which compares `RandomForest` predictions to hardcoded values) fails not due to a correctness issue but (AFAICT) because of an implementation change in best-split

[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...

2017-10-09 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19433 The failing tests (in `DecisionTreeSuite`) fail because we've historically handled a) splits that have 0 gain differently from b) splits that fail to achieve user

[GitHub] spark issue #19433: [SPARK-3162] [MLlib][WIP] Add local tree training for de...

2017-10-09 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19433 Thanks! I'll remove the WIP. To clear things up for the future, I'd thought [WIP] was the appropriate tag for a PR that's ready for review but not ready to be merged (based on https

[GitHub] spark pull request #19433: [SPARK-3162] [MLlib][WIP] Add local tree training...

2017-10-09 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19433#discussion_r143398990 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/LocalDecisionTreeUtils.scala --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #19433: [SPARK-3162] [MLlib][WIP] Add local tree training for de...

2017-10-04 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19433 @WeichenXu123 would you be able to take an initial look at this? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #19433: [SPARK-3162] [MLlib][WIP] Add local tree training...

2017-10-04 Thread smurching
GitHub user smurching opened a pull request: https://github.com/apache/spark/pull/19433 [SPARK-3162] [MLlib][WIP] Add local tree training for decision tree regressors ## What changes were proposed in this pull request? WIP, DO NOT MERGE ### Overview This PR

[GitHub] spark pull request #19278: [SPARK-22060][ML] Fix CrossValidator/TrainValidat...

2017-09-19 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19278#discussion_r139816308 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -399,14 +399,17 @@ private[ml] object DefaultParamsReader

[GitHub] spark pull request #19278: [SPARK-22060][ML] Fix CrossValidator/TrainValidat...

2017-09-19 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19278#discussion_r139809836 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -399,14 +399,17 @@ private[ml] object DefaultParamsReader

[GitHub] spark pull request #19278: [SPARK-22060][ML] Fix CrossValidator/TrainValidat...

2017-09-19 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19278#discussion_r139817121 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -17,6 +17,7 @@ package org.apache.spark.ml.tuning

[GitHub] spark pull request #19208: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-09-18 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19208#discussion_r139586600 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala --- @@ -276,12 +315,32 @@ object TrainValidationSplitModel extends

[GitHub] spark pull request #19208: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-09-18 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19208#discussion_r139578700 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala --- @@ -276,12 +315,32 @@ object TrainValidationSplitModel extends

[GitHub] spark pull request #19208: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-09-18 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19208#discussion_r139573779 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -261,17 +290,40 @@ class CrossValidatorModel private[ml

[GitHub] spark pull request #19208: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-09-18 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19208#discussion_r139556318 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala --- @@ -82,7 +82,10 @@ private[shared] object

[GitHub] spark pull request #19208: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-09-18 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19208#discussion_r139568979 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -237,12 +251,17 @@ object CrossValidator extends MLReadable

[GitHub] spark pull request #19208: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-09-18 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19208#discussion_r139557219 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -117,6 +123,12 @@ class CrossValidator @Since("1.2.0"

[GitHub] spark pull request #19186: [SPARK-21972][ML] Add param handlePersistence

2017-09-13 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19186#discussion_r138712707 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -483,24 +488,24 @@ class LogisticRegression @Since

[GitHub] spark issue #19106: [SPARK-21770][ML] ProbabilisticClassificationModel fix c...

2017-09-12 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19106 @sethah I haven't heard of anybody hitting this issue in practice, but it did seem best to ensure that valid probability distributions would be produced regardless of input. There was some

[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence

2017-09-11 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19186 Note: This PR follows up on the work/discussions in [https://github.com/apache/spark/pull/17014](https://github.com/apache/spark/pull/17014

[GitHub] spark pull request #19186: [SPARK-21972][ML] Add param handlePersistence

2017-09-11 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19186#discussion_r138139729 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -300,20 +300,23 @@ class KMeans @Since("1.5.0") ( @Si

[GitHub] spark pull request #19186: [SPARK-21972][ML] Add param handlePersistence

2017-09-11 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19186#discussion_r138136774 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -483,24 +488,17 @@ class LogisticRegression @Since

[GitHub] spark pull request #19186: [SPARK-21972][ML] Add param handlePersistence

2017-09-11 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19186#discussion_r138137893 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -163,9 +165,7 @@ final class OneVsRestModel private[ml

[GitHub] spark pull request #19186: [SPARK-21972][ML] Add param handlePersistence

2017-09-11 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19186#discussion_r138139091 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala --- @@ -82,7 +82,8 @@ private[shared] object

[GitHub] spark pull request #19186: [SPARK-21972][ML] Add param handlePersistence

2017-09-11 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19186#discussion_r138140113 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/IsotonicRegression.scala --- @@ -165,8 +170,7 @@ class IsotonicRegression @Since("

[GitHub] spark pull request #19186: [SPARK-21972][ML] Add param handlePersistence

2017-09-11 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19186#discussion_r138139539 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -444,13 +444,13 @@ class

[GitHub] spark issue #19107: [SPARK-21799][ML] Fix `KMeans` performance regression ca...

2017-09-11 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19107 @jkbradley would you be able to give this a look? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #19106: [SPARK-21770][ML] ProbabilisticClassificationModel fix c...

2017-09-11 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19106 This looks good to me! @srowen would you be able to give it another look? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #19106: [SPARK-21770][ML] ProbabilisticClassificationMode...

2017-09-11 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19106#discussion_r137988907 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala --- @@ -245,6 +245,13 @@ private[ml] object

[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms

2017-09-10 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/17014 Hi @zhengruifeng, thanks for your work on this! Now that we're introducing a new handlePersistence parameter (a new public API), it'd be good to track work in a separate JIRA/PR

[GitHub] spark issue #19107: [SPARK-21799][ML] Fix `KMeans` performance regression ca...

2017-09-07 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19107 Sorry for the delay, this looks good to me -- thanks @WeichenXu123! --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms

2017-08-31 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/17014 @WeichenXu123 That approach sounds reasonable to me. My main thought (& this might be obvious) is on the implementation level -- as long as we implement this by ad

[GitHub] spark pull request #19065: [SPARK-21729][ML][TEST] Generic test for Probabil...

2017-08-30 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19065#discussion_r136152805 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/ProbabilisticClassifierSuite.scala --- @@ -91,4 +94,60 @@ object

[GitHub] spark pull request #19065: [SPARK-21729][ML][TEST] Generic test for Probabil...

2017-08-28 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19065#discussion_r135655220 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/ProbabilisticClassifierSuite.scala --- @@ -91,4 +94,54 @@ object

[GitHub] spark pull request #19065: [SPARK-21729][ML][TEST] Generic test for Probabil...

2017-08-28 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19065#discussion_r135653663 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/ProbabilisticClassifierSuite.scala --- @@ -91,4 +94,54 @@ object

[GitHub] spark pull request #19065: [SPARK-21729][ML][TEST] Generic test for Probabil...

2017-08-28 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19065#discussion_r135656421 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/ProbabilisticClassifierSuite.scala --- @@ -91,4 +94,54 @@ object

[GitHub] spark pull request #19065: [SPARK-21729][ML][TEST] Generic test for Probabil...

2017-08-28 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19065#discussion_r135653421 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/ProbabilisticClassifierSuite.scala --- @@ -18,7 +18,10 @@ package

[GitHub] spark pull request #19065: [SPARK-21729][ML][TEST] Generic test for Probabil...

2017-08-28 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19065#discussion_r135653044 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -262,6 +262,9 @@ class

[GitHub] spark pull request #19065: [SPARK-21729][ML][TEST] Generic test for Probabil...

2017-08-28 Thread smurching
Github user smurching commented on a diff in the pull request: https://github.com/apache/spark/pull/19065#discussion_r135653479 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/ProbabilisticClassifierSuite.scala --- @@ -18,7 +18,10 @@ package

[GitHub] spark issue #14872: [SPARK-3162][MLlib][WIP] Add local tree training for dec...

2017-01-23 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/14872 No worries, apologies for being busy on my end -- I'll leave the branch up & try to contribute in other ways when I have the time! --- If your project is set up for it, you can r

[GitHub] spark pull request #14872: [SPARK-3162][MLlib][WIP] Add local tree training ...

2017-01-23 Thread smurching
Github user smurching closed the pull request at: https://github.com/apache/spark/pull/14872 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #14872: [SPARK-3162][MLlib][WIP] Add local tree training for dec...

2017-01-23 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/14872 Hi, I've stopped working on this PR - I can go ahead and close it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #14872: [SPARK-3162][MLlib][WIP] Add local tree training ...

2016-09-27 Thread smurching
GitHub user smurching reopened a pull request: https://github.com/apache/spark/pull/14872 [SPARK-3162][MLlib][WIP] Add local tree training for decision tree regressors ## What changes were proposed in this pull request? Based on [Yggdrasil](https://github.com/fabuzaid21

[GitHub] spark pull request #14872: [SPARK-3162][MLlib][WIP] Add local tree training ...

2016-09-02 Thread smurching
Github user smurching closed the pull request at: https://github.com/apache/spark/pull/14872 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #14872: Add local tree training for decision tree regress...

2016-08-29 Thread smurching
GitHub user smurching opened a pull request: https://github.com/apache/spark/pull/14872 Add local tree training for decision tree regressors ## What changes were proposed in this pull request? Based on [Yggdrasil](https://github.com/fabuzaid21/yggdrasil), added local

[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...

2016-06-24 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/13881 Does it make sense to only perform instrumentation-related computations (i.e. updating the max/min nodes per group) if the instrumentation argument to RandomForest.run (instr) is not None

[GitHub] spark pull request #13881: [SPARK-3723] [MLlib] Adding instrumentation to ra...

2016-06-23 Thread smurching
GitHub user smurching opened a pull request: https://github.com/apache/spark/pull/13881 [SPARK-3723] [MLlib] Adding instrumentation to random forests ## What changes were proposed in this pull request? In RandomForest.run(), added instrumentation for the number of node