[GitHub] spark pull request #17078: [SPARK-19746][ML] Faster indexing for logistic ag...

2017-02-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/17078#discussion_r103342591 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -456,6 +456,32 @@ class LogisticRegressionSuite

[GitHub] spark pull request #17078: [SPARK-19746][ML] Faster indexing for logistic ag...

2017-02-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/17078#discussion_r103342093 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1447,7 +1447,7 @@ private class LogisticAggregator

[GitHub] spark pull request #17078: [SPARK-19746][ML] Faster indexing for logistic ag...

2017-02-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/17078#discussion_r103342317 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1431,7 +1431,12 @@ private class LogisticAggregator

[GitHub] spark pull request #17078: [SPARK-19746][ML] Faster indexing for logistic ag...

2017-02-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/17078#discussion_r103154658 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1447,7 +1447,7 @@ private class LogisticAggregator

[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-12-14 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/16037 Sorry for late review. Just come back to US. LGTM too! Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #15893: [SPARK-18456][ML][FOLLOWUP] Use matrix abstraction for c...

2016-11-19 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/15893 LGTM. Since this doesn't have impact on performance, and make the codebase cleaner, I merged this PR into master and branch 2.1. Thanks. --- If your project is set up for it, you can reply

[GitHub] spark pull request #15893: [SPARK-18456][ML][FOLLOWUP] Use matrix abstractio...

2016-11-18 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15893#discussion_r88767607 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -586,15 +577,24 @@ class LogisticRegression @Since("

[GitHub] spark issue #15893: [SPARK-18456][ML][FOLLOWUP] Use matrix abstraction for c...

2016-11-18 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/15893 Only minor naming. LGTM. My interest can not access ssh to merge the code, will merge later tonight. Thanks. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #15876: [SPARK-11496][GraphX][FOLLOWUP] Add param checking for r...

2016-11-14 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/15876 LGTM. Thanks. Merged into master and 2.1 branch. @srowen Are you referring the MLOR PR which has many followup PRs? If so, the changes in the main MLOR PR is very big, and many of the issues

[GitHub] spark issue #15593: [SPARK-18060][ML] Avoid unnecessary computation for MLOR

2016-11-11 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/15593 Thanks all for working on this PR. I merged this into master, and I'll create a followup task and PR to handle the abstraction together with handling the smoothing in the initialization

[GitHub] spark issue #15593: [SPARK-18060][ML] Avoid unnecessary computation for MLOR

2016-11-11 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/15593 @MLnick I'm hoping that we could abstract out the the implementation of using column major format as much as possible; as a result, in the future, new developers can understand the code without

[GitHub] spark pull request #15593: [SPARK-18060][ML] Avoid unnecessary computation f...

2016-11-10 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15593#discussion_r87518789 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1486,57 +1504,65 @@ private class LogisticAggregator

[GitHub] spark pull request #15593: [SPARK-18060][ML] Avoid unnecessary computation f...

2016-11-10 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15593#discussion_r87516339 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -489,13 +485,14 @@ class LogisticRegression @Since("

[GitHub] spark pull request #15593: [SPARK-18060][ML] Avoid unnecessary computation f...

2016-11-10 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15593#discussion_r87501621 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -489,13 +485,14 @@ class LogisticRegression @Since("

[GitHub] spark issue #15593: [SPARK-18060][ML] Avoid unnecessary computation for MLOR

2016-10-31 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/15593 @sethah I'm recently busy on company work. Will start to work on open source code review soon this week. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-17 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/9 I was also thinking that most of people will use this for daily retraining by passing in the previous model which will cause the model larger and larger due to the model chain which is unnecessary

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r83703876 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -446,6 +459,11 @@ private[ml] object DefaultParamsReader { val cls

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r83600176 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -303,6 +312,20 @@ class KMeans @Since("1.5.0") ( @Si

[GitHub] spark issue #12761: [SPARK-14464] [MLLIB] Better support for logistic regres...

2016-10-17 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/12761 I'm benchmarking LOR with 14M features of internal company dataset (unfortunately, it's not public). Regrading using sparse data structure for aggregation, I'm not so sure how much

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-17 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/9 Please remove `WIP` in the description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15488: [SPARK-17941][ML][TEST] Logistic regression tests should...

2016-10-14 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/15488 Merged into master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15488: [SPARK-17941][ML][TEST] Logistic regression tests should...

2016-10-14 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/15488 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-11 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/9 +1 on what @sethah proposed. We can log with warn when k is modified by setting the initial model. Thanks. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82497437 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -137,18 +143,64 @@ class KMeansSuite extends SparkFunSuite

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-07 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82489783 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -137,18 +143,64 @@ class KMeansSuite extends SparkFunSuite

[GitHub] spark pull request #15074: [SPARK-17520] Implement a better __eq__ for Spars...

2016-10-07 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15074#discussion_r82461977 --- Diff: python/pyspark/mllib/linalg/__init__.py --- @@ -1296,9 +1296,19 @@ def asML(self): return newlinalg.SparseMatrix(self.numRows

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-06 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82230001 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -303,6 +312,10 @@ class KMeans @Since("1.5.0") ( @Si

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-06 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82229991 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -303,6 +312,10 @@ class KMeans @Since("1.5.0") ( @Si

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-05 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82105944 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -137,18 +142,53 @@ class KMeansSuite extends SparkFunSuite

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-05 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82105608 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -303,6 +312,10 @@ class KMeans @Since("1.5.0") ( @Si

[GitHub] spark issue #15349: [SPARK-17239][ML][DOC] Update user guide for multiclass ...

2016-10-05 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/15349 LGTM. Merged into master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15293: [SPARK-17718] [Update MLib Classification Documentation]

2016-09-29 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/15293 +1 on just having the note. For y = 0/1, just more confusing to have complicated formulation in the doc. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-26 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/9 Ping @yinxusen on update. Would like to have it merged soon so we can work on LiR and LoR parts. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r79936136 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -81,11 +81,23 @@ private[clustering] trait KMeansParams extends Params

[GitHub] spark issue #15177: [SPARK-11918] [ML] Better error from WLS for cases like ...

2016-09-21 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/15177 LGTM. Merged into master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14834: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-19 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/14834 Merged into master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14834: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-19 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/14834 LGTM. Wait for the test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r79517559 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -508,11 +680,51 @@ object LogisticRegression extends

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r79509186 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -676,39 +941,53 @@ object LogisticRegressionModel extends

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r79507627 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -508,11 +680,51 @@ object LogisticRegression extends

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r79510132 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -60,7 +60,8 @@ class OneVsRestSuite extends SparkFunSuite

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r79510068 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala --- @@ -244,7 +244,8 @@ class CrossValidatorSuite test("read/

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r79509747 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -67,11 +105,15 @@ class LogisticRegressionSuite

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r79510095 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/TrainValidationSplitSuite.scala --- @@ -133,7 +133,8 @@ class TrainValidationSplitSuite

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r79509600 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -22,28 +22,49 @@ import scala.language.existentials

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r79507486 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -508,11 +680,51 @@ object LogisticRegression extends

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r79508998 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1103,6 +1382,7 @@ class BinaryLogisticRegressionSummary

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r79506294 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -66,11 +69,37 @@ private[classification] trait

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r79509539 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -430,8 +431,9 @@ class LogisticRegressionWithLBFGS

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r79507775 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -508,11 +680,51 @@ object LogisticRegression extends

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-13 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r78682701 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -303,6 +322,29 @@ class KMeans @Since("1.5.0") ( @Si

[GitHub] spark issue #14834: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-13 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/14834 Only couple minor issues; otherwise, LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-13 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78674556 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala --- @@ -201,11 +201,24 @@ abstract class

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-13 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r7867 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -676,39 +936,54 @@ object LogisticRegressionModel extends

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-13 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78674092 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -595,55 +831,104 @@ class LogisticRegressionModel private

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-13 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78673689 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -508,11 +680,42 @@ object LogisticRegression extends

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-12 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78483816 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -460,33 +577,74 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-12 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78420053 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -460,33 +577,74 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-12 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78419383 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -311,8 +350,28 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-12 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78321887 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -460,33 +577,74 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-12 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78321247 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -323,32 +382,33 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-12 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78321146 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -311,8 +350,28 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-11 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78316060 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -311,8 +350,28 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-11 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78315600 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -261,6 +299,7 @@ class LogisticRegression @Since("

[GitHub] spark issue #14998: [SPARK-11496][GRAPHX] Parallel implementation of persona...

2016-09-10 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/14998 Merged into master. Great work! Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14998: [SPARK-11496][GRAPHX] Parallel implementation of persona...

2016-09-09 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/14998 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression interf...

2016-09-09 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/14834 @sethah Could you create another JIRA to track the issue that when there is a class which is not in training, centering the intercepts doesn't make any sense at all. The intercepts should be just

[GitHub] spark pull request #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression...

2016-09-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78128581 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -460,33 +564,74 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression...

2016-09-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78128374 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -460,33 +564,74 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression...

2016-09-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78128102 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -452,6 +555,7 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression...

2016-09-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78127943 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -370,49 +420,102 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression...

2016-09-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78127321 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -370,49 +420,102 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression...

2016-09-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78126776 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -370,49 +420,102 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression...

2016-09-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78111319 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -370,49 +420,102 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression...

2016-09-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78111049 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -370,49 +420,102 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression...

2016-09-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78109655 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -333,22 +387,18 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression...

2016-09-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78101965 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -333,22 +387,18 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression...

2016-09-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78101327 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -311,8 +348,25 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression...

2016-09-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78093424 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -50,6 +50,8 @@ private[classification] trait

[GitHub] spark issue #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression interf...

2016-09-08 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/14834 @sethah For sparse MLOR problems with L1, the models will be sparse in row. As a result, in the sparse, we need to store the models in CSR format, and CSR models can be used for model prediction

[GitHub] spark issue #14998: [SPARK-11496][GRAPHX] Parallel implementation of persona...

2016-09-08 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/14998 Some minor issues, and LGTM. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #14998: [SPARK-11496][GRAPHX] Parallel implementation of ...

2016-09-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14998#discussion_r78090880 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala --- @@ -163,6 +166,85 @@ object PageRank extends Logging

[GitHub] spark pull request #14998: [SPARK-11496][GRAPHX] Parallel implementation of ...

2016-09-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14998#discussion_r78090050 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala --- @@ -163,6 +166,85 @@ object PageRank extends Logging

[GitHub] spark pull request #14998: [SPARK-11496][GRAPHX] Parallel implementation of ...

2016-09-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14998#discussion_r78089652 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala --- @@ -163,6 +166,85 @@ object PageRank extends Logging

[GitHub] spark issue #14998: [SPARK-11496][GRAPHX] Parallel implementation of persona...

2016-09-07 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/14998 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression interf...

2016-09-07 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/14834 @sethah I remember that `compressed` method for `Matrix` is one of the todo in the followup tasks. For sparse binary logistic regression, if we store the models as `1 x numFeatures` compressed

[GitHub] spark pull request #14998: [SPARK-11496][GRAPHX] Parallel implementation of ...

2016-09-07 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14998#discussion_r77925377 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala --- @@ -19,8 +19,11 @@ package org.apache.spark.graphx.lib import

[GitHub] spark issue #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression interf...

2016-09-07 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/14834 @sethah +1 for this approach. Couple minor questions. With L1, the coefficients can be very sparse. Currently, we will store them as sparse vector and use sparse vector for prediction

[GitHub] spark issue #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression interf...

2016-09-06 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/14834 @sethah Thank you for coming up with PR with detailed documentation. For option 2, if a two class model is trained with multinomial family, how do you store it? I was thinking about maybe we could

[GitHub] spark issue #11119: [SPARK-10780][ML][WIP] Add initial model to kmeans

2016-08-31 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/9 Thanks @sethah for reviewing. @yinxusen once it's ready to merge, please remove WIP and I will be more than happy to do a final pass, and help to merge the code. Thanks. --- If your project

[GitHub] spark issue #14785: [SPARK-17207][MLLIB]fix comparing Vector bug in TestingU...

2016-08-26 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/14785 LGTM. Merged into master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14785: [SPARK-17207][MLLIB]fix comparing Vector bug in TestingU...

2016-08-25 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/14785 Please also add test cases for matrices. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14785: [SPARK-17207][MLLIB]fix comparing Vector bug in TestingU...

2016-08-24 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/14785 Can you also fix `https://github.com/apache/spark/blob/master/mllib-local/src/test/scala/org/apache/spark/ml/util/TestingUtils.scala`? Please add tests showing the issue is addressed. Thanks

[GitHub] spark issue #14766: [SPARK-17197] [ML] [PySpark] PySpark LiR/LoR supports tr...

2016-08-23 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/14766 LGTM. Since I mostly work on scala part, wait for other vote. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #13796: [SPARK-7159][ML] Add multiclass logistic regression to S...

2016-08-22 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/13796 The solution of this overparameterized problem in the link is just adding the regularization, and users may not want it. I think we need to optimize it on (k-1) parameters, and then put the final

[GitHub] spark issue #13796: [SPARK-7159][ML] Add multiclass logistic regression to S...

2016-08-22 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/13796 @WeichenXu123 Do you run into this potential issue with any dataset? If so, we may need to consider optimize softmax with pivoting when `reg == 0`. --- If your project is set up for it, you can

[GitHub] spark pull request #14717: [SPARK-17090][ML]Make tree aggregation level in l...

2016-08-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14717#discussion_r75588252 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala --- @@ -389,4 +389,21 @@ private[ml] trait HasSolver extends Params

[GitHub] spark issue #14717: [SPARK-17090][ML]Make tree aggregation level in linear/l...

2016-08-20 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/14717 LGTM. Merge into master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14717: [SPARK-17090][ML]Make tree aggregation level in l...

2016-08-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14717#discussion_r75587723 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -256,6 +256,17 @@ class LogisticRegression @Since("

[GitHub] spark pull request #14717: [SPARK-17090][ML]Make tree aggregation level in l...

2016-08-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14717#discussion_r75587709 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -48,7 +48,7 @@ import

<    1   2   3   4   5   6   7   8   9   10   >