[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-21 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19122 @BryanCutler Do you have more comments? I can check it out now but don't want to review at the same time

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-09-20 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r140030582 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,31 +462,46 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-09-20 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r139832997 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -503,17 +518,15 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-09-20 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r139832967 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -503,17 +518,15 @@ final class OnlineLDAOptimizer extends

[GitHub] spark issue #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should not coll...

2017-09-20 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18924 Taking a look --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-09-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r139807816 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -998,6 +1047,172 @@ class LinearRegressionSuite

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-09-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r139765053 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -69,19 +69,57 @@ private[regression] trait

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-09-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r139772375 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -471,6 +574,15 @@ object LinearRegression extends

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-09-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r139765068 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -220,12 +283,12 @@ class LinearRegression @Since("

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-09-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r139764932 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-09-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r139765034 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -69,19 +69,57 @@ private[regression] trait

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-09-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r139764922 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #19020: [SPARK-3181] [ML] Implement huber loss for LinearRegress...

2017-09-19 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19020 I'll check this out now --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence

2017-09-18 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19186 This has ended up being more complex than we envisioned. It would be valuable to describe the design succinctly so that people can debate it on JIRA. Could you please describe your solution

[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence

2017-09-18 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19186 @zhengruifeng Can you please update the PR description so it describes the actual functionality being added

[GitHub] spark issue #19106: [SPARK-21770][ML] ProbabilisticClassificationModel fix c...

2017-09-18 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19106 These are fair arguments. I guess it makes sense to throw an exception; that's fine with me. --- - To unsubscribe, e-mail

[GitHub] spark issue #16774: [SPARK-19357][ML] Adding parallel model evaluation in ML...

2017-09-18 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16774 @WeichenXu123 Thanks for finding that bug! Can you please separate out your bugfix? It's good to get fixes in, rather than attaching them to PRs which may require discussion, so that we make

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-09-13 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19208 Synced offline: I hadn't looked carefully and seen the 2 issues had been merged. @WeichenXu123 said he will split the work in 2, adding one parameter first

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-09-13 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19208 CC @hhbyyh and @MLnick Does this look reasonable to you? And @hhbyyh would you want to split off a new JIRA for your original solution of adding an option to dump models to disk

spark git commit: [SPARK-18608][ML] Fix double caching

2017-09-12 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-2.2 63098dc31 -> b606dc177 [SPARK-18608][ML] Fix double caching ## What changes were proposed in this pull request? `df.rdd.getStorageLevel` => `df.storageLevel` using cmd `find . -name '*.scala' | xargs -i bash -c 'egrep -in

spark git commit: [SPARK-18608][ML] Fix double caching

2017-09-12 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master b9b54b1c8 -> c5f9b89dd [SPARK-18608][ML] Fix double caching ## What changes were proposed in this pull request? `df.rdd.getStorageLevel` => `df.storageLevel` using cmd `find . -name '*.scala' | xargs -i bash -c 'egrep -in

[GitHub] spark issue #19197: [SPARK-18608][ML] Fix double caching

2017-09-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19197 having trouble merging...will try again soon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19197: [SPARK-18608][ML] Fix double caching

2017-09-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19197 LGTM Merging with master and branch-2.2 Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

spark git commit: [SPARK-21027][ML][PYTHON] Added tunable parallelism to one vs. rest in both Scala mllib and Pyspark

2017-09-12 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 515910e9b -> 720c94fe7 [SPARK-21027][ML][PYTHON] Added tunable parallelism to one vs. rest in both Scala mllib and Pyspark # What changes were proposed in this pull request? Added tunable parallelism to the pyspark implementation of one

[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-09-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19110 LGTM Merging with master Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19107: [SPARK-21799][ML] Fix `KMeans` performance regression ca...

2017-09-11 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19107 @WeichenXu123 I just commented on https://issues.apache.org/jira/browse/SPARK-18608 to clarify our efforts here. Can you please either retarget this for SPARK-18608 and update it, or ask

[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-09-11 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19110 Other than that 1 item, this looks ready --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19110: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-09-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19110#discussion_r138180599 --- Diff: python/pyspark/ml/param/shared.py --- @@ -608,6 +608,30 @@ def getAggregationDepth(self): return self.getOrDefault

[GitHub] spark issue #16774: [SPARK-19357][ML] Adding parallel model evaluation in ML...

2017-09-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16774 I think https://github.com/apache/spark/pull/19110 is ready to merge now. @BryanCutler @WeichenXu123 do you have a preference for which gets merged first

[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-09-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19110 LGTM Pinging on https://github.com/apache/spark/pull/16774 first to coordinate merging --- - To unsubscribe, e-mail

[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-09-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18281 Thanks @ajaysaini725 for your work (and permission given offline to take this over)! We can close this issue now

spark git commit: [SPARK-21729][ML][TEST] Generic test for ProbabilisticClassifier to ensure consistent output columns

2017-09-01 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master aba9492d2 -> 900f14f6f [SPARK-21729][ML][TEST] Generic test for ProbabilisticClassifier to ensure consistent output columns ## What changes were proposed in this pull request? Add test for prediction using the model with all combinations

[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

2017-09-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19065 Nice! LGTM Thanks @WeichenXu123 and @smurching ! Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

2017-09-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19065 Taking a look now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #19072: [SPARK-17139][ML][FOLLOW-UP] Add convenient method `asBi...

2017-09-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19072 I merged this with master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms

2017-09-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17014 Thanks all for discussing this! I'm just catching up now. I'm OK with adding handlePersistence as a new Param, but please do so in a separate PR and JIRA issue. I'd like to backport

spark git commit: [SPARK-21862][ML] Add overflow check in PCA

2017-08-31 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 96028e36b -> f5e10a34e [SPARK-21862][ML] Add overflow check in PCA ## What changes were proposed in this pull request? add overflow check in PCA, otherwise it is possible to throw `NegativeArraySizeException` when `k` and `numFeatures`

[GitHub] spark issue #19078: [SPARK-21862][ML] Add overflow check in PCA

2017-08-31 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19078 LGTM Merging with master Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

spark git commit: [SPARK-17139][ML][FOLLOW-UP] Add convenient method `asBinary` for casting to BinaryLogisticRegressionSummary

2017-08-31 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master cba69aeb4 -> 96028e36b [SPARK-17139][ML][FOLLOW-UP] Add convenient method `asBinary` for casting to BinaryLogisticRegressionSummary ## What changes were proposed in this pull request? add an "asBinary" method to LogisticRegressionSummary

[GitHub] spark pull request #19072: [SPARK-17139][ML][FOLLOW-UP] Add convenient metho...

2017-08-31 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19072#discussion_r136470877 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1473,6 +1473,17 @@ sealed trait

[GitHub] spark pull request #19072: [SPARK-17139][ML][FOLLOW-UP] Add convenient metho...

2017-08-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19072#discussion_r136249376 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1471,6 +1471,17 @@ sealed trait

[GitHub] spark pull request #19078: [SPARK-21862][ML] Add overflow check in PCA

2017-08-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19078#discussion_r136248809 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -110,3 +115,17 @@ class PCAModel private[spark

[GitHub] spark pull request #19078: [SPARK-21862][ML] Add overflow check in PCA

2017-08-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19078#discussion_r136249083 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -44,6 +44,11 @@ class PCA @Since("1.4.0") (@Since("1.4

[GitHub] spark pull request #19078: [SPARK-21862][ML] Add overflow check in PCA

2017-08-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19078#discussion_r136248974 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -44,6 +44,11 @@ class PCA @Since("1.4.0") (@Since("1.4

[GitHub] spark issue #19072: [SPARK-17133][ML][FOLLOW-UP] Add convenient method `asBi...

2017-08-29 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19072 Actually, can you please tag this with SPARK-17139 instead of SPARK-17133? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #19072: [SPARK-17133][ML][FOLLOW-UP] Add convenient metho...

2017-08-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19072#discussion_r135951191 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1471,6 +1471,14 @@ sealed trait

[GitHub] spark issue #19078: [SPARK-21862] Add overflow check in PCA

2017-08-29 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19078 Can you please add the "[ML]" tag to the PR title? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project doe

[GitHub] spark pull request #19078: [SPARK-21862] Add overflow check in PCA

2017-08-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19078#discussion_r135950507 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -44,6 +44,14 @@ class PCA @Since("1.4.0") (@Since("1.4

[GitHub] spark pull request #19078: [SPARK-21862] Add overflow check in PCA

2017-08-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19078#discussion_r135950323 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -44,6 +44,14 @@ class PCA @Since("1.4.0") (@Since("1.4

[GitHub] spark pull request #19078: [SPARK-21862] Add overflow check in PCA

2017-08-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19078#discussion_r135950242 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -44,6 +44,13 @@ class PCA @Since("1.4.0") (@Since("1.4

spark git commit: [MINOR][ML] Document treatment of instance weights in logreg summary

2017-08-29 Thread jkbradley
ion summary traits. Author: Joseph K. Bradley <jos...@databricks.com> Closes #19071 from jkbradley/lr-summary-minor. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/840ba053 Tree: http://git-wip-us.apache.org/repos/asf/s

[GitHub] spark issue #19071: [MINOR][ML] Document treatment of instance weights in lo...

2017-08-29 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19071 Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-08-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17862 Catching up here... To make sure I caught the decisions made in the discussion above, is it correct that this PR will: * Add support for squared hinge loss, and use that as the default (which

[GitHub] spark issue #19071: [MINOR][ML] Document treatment of instance weights in lo...

2017-08-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19071 CC @WeichenXu123 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #19071: [MINOR][ML] Document treatment of instance weight...

2017-08-28 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/19071 [MINOR][ML] Document treatment of instance weights in logreg summary ## What changes were proposed in this pull request? Add Scaladoc noting that instance weights are currently ignored

spark git commit: [SPARK-17139][ML] Add model summary for MultinomialLogisticRegression

2017-08-28 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 73e64f7d5 -> c7270a46f [SPARK-17139][ML] Add model summary for MultinomialLogisticRegression ## What changes were proposed in this pull request? Add 4 traits, using the following hierarchy: LogisticRegressionSummary

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15435 LGTM Merging with master Thanks a lot @WeichenXu123 and everyone who reviewed! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #19026: [SPARK-21681][ML] fix bug of MLOR do not work correctly ...

2017-08-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19026 @WeichenXu123 could you please close this manually? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

spark git commit: [SPARK-21681][ML] fix bug of MLOR do not work correctly when featureStd contains zero (backport PR for 2.2)

2017-08-24 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-2.2 a58536741 -> 2b4bd7910 [SPARK-21681][ML] fix bug of MLOR do not work correctly when featureStd contains zero (backport PR for 2.2) ## What changes were proposed in this pull request? This is backport PR of

[GitHub] spark issue #19026: [SPARK-21681][ML] fix bug of MLOR do not work correctly ...

2017-08-24 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19026 LGTM Thanks @WeichenXu123 ! Merging with branch-2.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-24 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15435 OK, then let's reinstate the Experimental tags. @weichen could you please mark the 4 public summary traits Experimental? --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-22 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17373 Merging with master Thanks @WeichenXu123 ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

spark git commit: [SPARK-12664][ML] Expose probability in mlp model

2017-08-22 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master d58a3507e -> d6b30edd4 [SPARK-12664][ML] Expose probability in mlp model ## What changes were proposed in this pull request? Modify MLP model to inherit `ProbabilisticClassificationModel` and so that it can expose the probability column

[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-22 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17373 Thanks! Will merge after rerunning tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18313: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-08-22 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18313 +1 for merging https://github.com/apache/spark/pull/16774 before proceeding with the other work since it will affect everything else. @MLnick I'd be Ok with adding options for best/all/k

[GitHub] spark issue #18896: [SPARK-21681][ML] fix bug of MLOR do not work correctly ...

2017-08-22 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18896 @WeichenXu123 would you mind sending a backport PR for 2.2? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

spark git commit: [SPARK-21681][ML] fix bug of MLOR do not work correctly when featureStd contains zero

2017-08-22 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 01a8e4627 -> d56c26210 [SPARK-21681][ML] fix bug of MLOR do not work correctly when featureStd contains zero ## What changes were proposed in this pull request? fix bug of MLOR do not work correctly when featureStd contains zero We can

[GitHub] spark issue #18896: [SPARK-21681][ML] fix bug of MLOR do not work correctly ...

2017-08-22 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18896 Merging with master and branch-2.2 Thanks @WeichenXu123 @MLnick @sethah ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r134628661 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1324,90 +1350,136 @@ private[ml] class

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r134627545 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1324,90 +1354,147 @@ private[ml] class

[GitHub] spark issue #18896: [SPARK-21681][ML] fix bug of MLOR do not work correctly ...

2017-08-21 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18896 Thanks @WeichenXu123 and @sethah ! @WeichenXu123 I'd leave the MLOR test since it's cheap and has a clear purpose, even if it overlaps a little. LGTM @sethah any more

[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-21 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17373 Oh OK makes sense. @WeichenXu123 could you please open a JIRA (linked from this task's JIRA) and CC @felixcheung on it? Thanks! I'll rerun tests to be safe and merge this afterwards

[GitHub] spark issue #18702: [SPARK-21485][SQL][DOCS] Spark SQL documentation generat...

2017-08-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18702 I tested locally, and it's fine if I install mkdocs and run sql/create-docs.sh Thanks for fixing it! --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18896: [SPARK-21681][ML] fix bug of MLOR do not work correctly ...

2017-08-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18896 How about updating the LogisticAggregatorSuite so it catches this error: ``` diff --git a/mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala b/mllib

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133844018 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1324,90 +1350,136 @@ private[ml] class

[GitHub] spark issue #18702: [SPARK-21485][SQL][DOCS] Spark SQL documentation generat...

2017-08-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18702 @HyukjinKwon It looks like the Jenkins doc build was broken by this PR (and I can't build it locally either). Is it just that the script needs to be updated to run sql/create-docs.sh

[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17373 @felixcheung How will this affect ```spark.mlp``` in R? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133828976 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1484,3 +1556,96 @@ class

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133771590 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -882,21 +882,28 @@ class LogisticRegression @Since

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133772559 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1324,90 +1350,136 @@ private[ml] class

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133830867 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -199,15 +199,57 @@ class LogisticRegressionSuite

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133824486 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1324,90 +1350,136 @@ private[ml] class

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133829306 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1484,3 +1556,96 @@ class

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133830662 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -199,15 +199,57 @@ class LogisticRegressionSuite

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133771738 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1018,19 +1025,33 @@ class LogisticRegressionModel

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133828620 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1484,3 +1556,96 @@ class

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133772374 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1018,19 +1025,33 @@ class LogisticRegressionModel

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133830499 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -199,15 +199,57 @@ class LogisticRegressionSuite

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133636149 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -882,21 +882,28 @@ class LogisticRegression @Since

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133589180 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -882,21 +882,28 @@ class LogisticRegression @Since

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133598714 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1324,90 +1350,130 @@ private[ml] class

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-16 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133598792 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1324,90 +1350,130 @@ private[ml] class

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-16 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15435 I'll do a review pass now, despite the merge conflicts --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-16 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17373 Thinking more about the proposal about separating the classification-specific logic out of the generic Topology, it's something we should definitely do at some point, but I'm OK with leaving

[GitHub] spark issue #18896: [SPARK-21681][ML] fix bug of MLOR do not work correctly ...

2017-08-15 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18896 LGTM except for making the test's title more descriptive. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r133324363 --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala --- @@ -527,9 +550,21 @@ private[ml] class FeedForwardModel private

[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r133322927 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala --- @@ -82,6 +83,49 @@ class

[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r133323889 --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala --- @@ -374,6 +380,22 @@ private[ann] trait TopologyModel extends Serializable

[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r133081809 --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala --- @@ -527,9 +550,21 @@ private[ml] class FeedForwardModel private

[GitHub] spark pull request #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r133079098 --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala --- @@ -374,6 +380,22 @@ private[ann] trait TopologyModel extends Serializable

<    3   4   5   6   7   8   9   10   11   12   >