[GitHub] spark pull request: [SPARK-14734][ML][MLLIB] Added toNew, fromNew ...

2016-04-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/12504#discussion_r60489912 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -158,6 +159,13 @@ sealed trait Matrix extends Serializable

[GitHub] spark pull request: [SPARK-14734][ML][MLLIB] Added toNew, fromNew ...

2016-04-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/12504#discussion_r60313819 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -158,6 +159,13 @@ sealed trait Matrix extends Serializable

[GitHub] spark pull request: [SPARK-14734][ML][MLLIB] Added toNew, fromNew ...

2016-04-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/12504#discussion_r60295795 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -158,6 +159,13 @@ sealed trait Matrix extends Serializable

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/12259#discussion_r60190633 --- Diff: mllib/src/main/scala/org/apache/spark/ml/linalg/VectorUDT.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/12259#discussion_r60190604 --- Diff: mllib/src/main/scala/org/apache/spark/ml/linalg/MatrixUDT.scala --- @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/12259#discussion_r60181626 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/udt/MatrixUDT.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/12259#discussion_r60178343 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/udt/UDTSuite.scala --- @@ -0,0 +1,99 @@ +/* --- End diff -- Let's create

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/12259#discussion_r60178107 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/udt/MatrixUDT.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-18 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/12259#discussion_r60172565 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/udt/UDTSuite.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-18 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/12259#discussion_r60172550 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/udt/VectorUDT.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-18 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/12259#discussion_r60172539 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/udt/MatrixUDT.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-18 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12259#issuecomment-211702268 @viirya Since we will use this in `mllib`, let's test it in `mllib` `test` scope. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-18 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12259#issuecomment-211488734 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14613][ML] Add @Since into the matrix a...

2016-04-18 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12416#issuecomment-211487563 I like @srowen 's idea. Having the shared annotation in `common/tags`, and move the current ones under `src/test`. @pravingadakh, can you update this PR? Thanks

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-04-15 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/9#issuecomment-210649354 This will be very useful for many use cases. Nice to have it in 2.0 :) --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-14613][ML] Add @Since into the matrix a...

2016-04-15 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12416#issuecomment-210553855 @srowen I think using scala annotation is easier, and I like it more. What do you think that we have a copy of `@Since` annotation in `mllib-local` jar? Thanks

[GitHub] spark pull request: [SPARK-14613][ML] Add @Since into the matrix a...

2016-04-15 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12416#issuecomment-210553033 All the public methods need to have `Since` for the doc. See the following for reference. Thanks. https://github.com/apache/spark/blob/master/mllib/src/main

[GitHub] spark pull request: [SPARK-14613][ML] Add @Since into the matrix a...

2016-04-15 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12416#issuecomment-210552137 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-14549][ML] Copy the Vector and Matrix c...

2016-04-14 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12317#issuecomment-210201008 +cc @srowen @MLnick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14549][ML] Copy the Vector and Matrix c...

2016-04-14 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12317#issuecomment-210145620 @mengxr The versions of the dependencies have been moved to the parent POM. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-14612] [ML] Consolidate the version of ...

2016-04-14 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12390#issuecomment-210073766 Thanks. Merged into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14549][ML] Copy the Vector and Matrix c...

2016-04-13 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/12317#discussion_r59647435 --- Diff: mllib-local/pom.xml --- @@ -62,6 +62,15 @@ test + com.google.guava + guava

[GitHub] spark pull request: [SPARK-14549][ML][WIP] Copy the Vector and Mat...

2016-04-11 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/12317 [SPARK-14549][ML][WIP] Copy the Vector and Matrix classes from mllib to ml in mllib-local ## What changes were proposed in this pull request? This task will copy the Vector and Matrix

[GitHub] spark pull request: [SPARK-14462] [HOTFIX] Let DummyTestingSuite i...

2016-04-11 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12276#issuecomment-208234927 @tedyu Thanks. I created a new PR to address this issue. https://github.com/apache/spark/pull/12298 --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-14462][ML][MLLIB] Add the mllib-local b...

2016-04-11 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/12298 [SPARK-14462][ML][MLLIB] Add the mllib-local build to maven pom ## What changes were proposed in this pull request? In order to separate the linear algebra, and vector matrix classes

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-11 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12259#issuecomment-208229364 + @mengxr @rxin In [SPARK-13944](https://issues.apache.org/jira/browse/SPARK-13944), the `matrix` and `vector` classes will be moved out to `spark-mllib-local

[GitHub] spark pull request: [SPARK-14498][ML][PYTHON][SQL] Many cleanups t...

2016-04-08 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12266#issuecomment-207692381 Thanks. Merged into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14498][ML][PYTHON][SQL] Many cleanups t...

2016-04-08 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12266#issuecomment-207663956 Both looks good to me. Thanks. I'll go ahead and merge it soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-14390][GraphX] Make initialization step...

2016-04-08 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12159#issuecomment-207632470 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14462][ML][MLLIB] add the mllib-local b...

2016-04-08 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12241#issuecomment-207632164 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14462][ML][MLLIB] add the mllib-local b...

2016-04-08 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12241#issuecomment-207632064 This is the minimal change for creating a new jar build. Let's wait the result of Jenkins. We'll move the code in a separate PR once this is merged. Thanks

[GitHub] spark pull request: [SPARK-14462][ML][MLLIB] add the mllib-local b...

2016-04-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/12241#discussion_r59079729 --- Diff: dev/sparktestsupport/modules.py --- @@ -256,9 +256,21 @@ def __hash__(self): ) +mllib_local = Module( +name="

[GitHub] spark pull request: [SPARK-14462][ML][MLLIB] add the mllib-local b...

2016-04-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/12241#discussion_r59081509 --- Diff: dev/sparktestsupport/modules.py --- @@ -256,9 +256,21 @@ def __hash__(self): ) +mllib_local = Module( +name="

[GitHub] spark pull request: [SPARK-14462][ML][MLLIB] add the mllib-local b...

2016-04-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/12241#discussion_r59081171 --- Diff: core/pom.xml --- @@ -35,6 +35,11 @@ http://spark.apache.org/ + org.apache.spark --- End diff -- I

[GitHub] spark pull request: [SPARK-14462][ML][MLLIB] add the mllib-local b...

2016-04-08 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12241#issuecomment-207349784 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13944][ML][WIP] Separate out local line...

2016-04-08 Thread dbtsai
Github user dbtsai closed the pull request at: https://github.com/apache/spark/pull/12172 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-04-07 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/11242#issuecomment-207184221 LGTM. This PR dramatically improves our s3 performance at Netflix. @andrewor14 @srowen @JoshRosen @davies @marmbrus @yhuai, any further feedback? Thanks

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-04-07 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/11242#issuecomment-207140497 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13944][ML][MLLIB] add the mllib-local b...

2016-04-07 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12241#issuecomment-207122586 @JoshRosen Thanks. It's working now. @holdenk I thought each jar needs to have its own `package-info.java` to generate the Java doc and Scala doc. I'm now

[GitHub] spark pull request: [SPARK-13944][ML][MLLIB] add the mllib-local b...

2016-04-07 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12241#issuecomment-207075404 +cc @JoshRosen who may be able to give me insight on the MiMa failure caused by adding new jar. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-13944][ML][MLLIB] add the mllib-local b...

2016-04-07 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12241#issuecomment-207069702 +cc @mengxr @jkbradley @srowen @holdenk @MLnick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13944][ML][MLLIB] add the mllib-local b...

2016-04-07 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/12241 [SPARK-13944][ML][MLLIB] add the mllib-local build to maven pom ## What changes were proposed in this pull request? In order to separate the linear algebra, and vector matrix classes

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-04-07 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/11242#issuecomment-207063759 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14427][SQL] Support persisting partitio...

2016-04-06 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12204#issuecomment-206689374 +@rdblue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-13944][ML][WIP] Separate out local line...

2016-04-05 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/12172 [SPARK-13944][ML][WIP] Separate out local linear algebra as a standalone module without Spark dependency ## What changes were proposed in this pull request? Separate out linear algebra

[GitHub] spark pull request: [SPARK-14390][GraphX] Make initialization step...

2016-04-04 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/12159#issuecomment-205538259 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-12555][SQL] Result should not be corrup...

2016-03-25 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/11623#issuecomment-201383428 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-03-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11242#discussion_r56891305 --- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala --- @@ -62,7 +64,23 @@ class UnionRDD[T: ClassTag]( var rdds: Seq[RDD[T

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-03-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11242#discussion_r56432610 --- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala --- @@ -62,7 +64,21 @@ class UnionRDD[T: ClassTag]( var rdds: Seq[RDD[T

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-03-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11242#discussion_r56719728 --- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala --- @@ -62,7 +64,23 @@ class UnionRDD[T: ClassTag]( var rdds: Seq[RDD[T

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-03-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11242#discussion_r56432618 --- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala --- @@ -62,7 +64,21 @@ class UnionRDD[T: ClassTag]( var rdds: Seq[RDD[T

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-03-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11242#discussion_r56747209 --- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala --- @@ -62,7 +64,23 @@ class UnionRDD[T: ClassTag]( var rdds: Seq[RDD[T

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-03-19 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11242#discussion_r56747153 --- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala --- @@ -62,7 +64,23 @@ class UnionRDD[T: ClassTag]( var rdds: Seq[RDD[T

[GitHub] spark pull request: [SPARK-13927][MLLIB] add row/column iterator t...

2016-03-19 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/11757#issuecomment-197553903 LGTM. Merged into master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-03-18 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11242#discussion_r56432616 --- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala --- @@ -62,7 +64,21 @@ class UnionRDD[T: ClassTag]( var rdds: Seq[RDD[T

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-03-18 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/11242#issuecomment-198535147 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-16 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-197213572 I'm not an expert in this area, but after thinking it more, I don't think we can use `DGELSD` which minimizes `||b - A*x||` using the singular value decomposition (SVD

[GitHub] spark pull request: SPARK-9926: Parallelize partition logic in Uni...

2016-03-15 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/11242#issuecomment-197052153 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-196662210 I will vote for approach 1. SVD will be the most stable algorithm, but slowest O(mn^2 + n^3) compared with Cholesky O(mn^2) or QR O(mn^2 - n^3/3) decomposition

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r55967102 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -108,6 +113,21 @@ class KMeansSuite extends SparkFunSuite

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r55966718 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -30,11 +33,13 @@ class KMeansSuite extends SparkFunSuite

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r55965910 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -118,6 +138,11 @@ object KMeansSuite { sql.createDataFrame

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r55965857 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -118,6 +138,11 @@ object KMeansSuite { sql.createDataFrame

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55964826 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55964808 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55964768 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55964546 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55964451 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55964211 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55963818 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55963496 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares

[GitHub] spark pull request: [SPARK-13545] [MLlib] [PySpark] Make MLlib Log...

2016-02-29 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/11424#issuecomment-190108578 Thanks. Merged into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-26 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/11136#issuecomment-189192045 Gonna do another detail pass of the code tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-25 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11136#discussion_r54141380 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -157,6 +157,12 @@ private[ml] class WeightedLeastSquares

[GitHub] spark pull request: [SPARK-13372] [ML] Fix LogisticRegression when...

2016-02-24 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/11247#issuecomment-188428112 @yanboliang I share the same concern with you. However, user may have `standardization = false`, but still want to have a good convergency when the scales are quite

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-02-23 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/9#issuecomment-187598373 Yes, but busy on work. :( Will soon start it in couple days. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-13379] [MLlib] Fix MLlib LogisticRegres...

2016-02-21 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/11258#issuecomment-186998722 LGTM. Merged into master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-13379] [MLlib] Fix MLlib LogisticRegres...

2016-02-21 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/11258#issuecomment-186994891 The default value in R's GLMNET is `1E-7`, and the default value in original LBFGS implementation is `1E-8`. In order to provide better and consistent result, let's

[GitHub] spark pull request: [SPARK-13379] [MLlib] Fix MLlib LogisticRegres...

2016-02-18 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/11258#issuecomment-186088729 +1 on copying the tests from ML LOR tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13372] [ML] Fix LogisticRegression when...

2016-02-18 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/11247#issuecomment-185826603 @yanboliang In #7080, It was intentionally made that `standardization = false` will run the same route as `standardization = true` without regularization

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-02-12 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/9#issuecomment-183248930 @yinxusen I'll be away for Spark summit east. Gonna work on this again when I'm back. Thanks. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-02-09 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/9#issuecomment-181766454 Agree, for code-gen, if we want to do it in this way, we would rather put them in a separate place. But will be nice to extend the code-gen framework so it can use one

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-02-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r52282168 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -248,6 +269,11 @@ class KMeans @Since("

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-02-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r52282317 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -248,6 +269,11 @@ class KMeans @Since("

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-02-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r52281022 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -237,6 +237,27 @@ class KMeans @Since("1.5.0") ( @Si

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-02-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r52281252 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -237,6 +237,27 @@ class KMeans @Since("1.5.0") ( @Si

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-02-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r52282898 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -106,6 +106,38 @@ class KMeansSuite extends SparkFunSuite

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-02-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r52279449 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala --- @@ -78,7 +78,24 @@ private[shared] object SharedParamsCodeGen

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-02-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r52279466 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala --- @@ -78,7 +78,24 @@ private[shared] object SharedParamsCodeGen

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-02-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r52281352 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -237,6 +237,27 @@ class KMeans @Since("1.5.0") ( @Si

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-02-02 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/10702#issuecomment-178787032 I meat comparing the result with your solution when `yStd != 0`, and `regParm != 0`. I suspect that you will get different result since GLMNET one forces to standardize

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-02-02 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/10702#issuecomment-179000215 Yes, that's what I meat. Without standardizing the labels, no way to match glmnet, but this makes the problem ill-defined when `yStd == 0`. --- If your project is set

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-02-02 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r5162 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -558,6 +583,86 @@ class LinearRegressionSuite

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-02-02 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/10702#issuecomment-179002845 LGTM. Merged into master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-02-02 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r51536905 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -74,7 +74,8 @@ class LinearRegression @Since("1.3.0"

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-02-02 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r51536910 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -83,7 +84,8 @@ class LinearRegression @Since("1.3.0"

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-02-02 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r51537328 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -398,7 +422,8 @@ class LinearRegressionModel private[ml

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-02-02 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/10702#issuecomment-178441655 For the case (3), I agree with your agreement completely. Can you try your normal equation solution with L2 without any standardization (nonzero ystd data) and see

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-30 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/10702#issuecomment-177385145 Commenting on your issues. Issue 1: With `WeightedLeastSquares`, we have option to standardize the label and features separately. As a result, if the label

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-30 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r51354803 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -558,6 +575,47 @@ class LinearRegressionSuite

<    4   5   6   7   8   9   10   11   12   13   >