Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12504#discussion_r60489912
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -158,6 +159,13 @@ sealed trait Matrix extends Serializable
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12504#discussion_r60313819
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -158,6 +159,13 @@ sealed trait Matrix extends Serializable
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12504#discussion_r60295795
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -158,6 +159,13 @@ sealed trait Matrix extends Serializable
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12259#discussion_r60190633
--- Diff: mllib/src/main/scala/org/apache/spark/ml/linalg/VectorUDT.scala
---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12259#discussion_r60190604
--- Diff: mllib/src/main/scala/org/apache/spark/ml/linalg/MatrixUDT.scala
---
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12259#discussion_r60181626
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/linalg/udt/MatrixUDT.scala ---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12259#discussion_r60178343
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/linalg/udt/UDTSuite.scala ---
@@ -0,0 +1,99 @@
+/*
--- End diff --
Let's create
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12259#discussion_r60178107
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/linalg/udt/MatrixUDT.scala ---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12259#discussion_r60172565
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/linalg/udt/UDTSuite.scala ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12259#discussion_r60172550
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/linalg/udt/VectorUDT.scala ---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12259#discussion_r60172539
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/linalg/udt/MatrixUDT.scala ---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12259#issuecomment-211702268
@viirya Since we will use this in `mllib`, let's test it in `mllib` `test`
scope.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12259#issuecomment-211488734
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12416#issuecomment-211487563
I like @srowen 's idea. Having the shared annotation in `common/tags`, and
move the current ones under `src/test`. @pravingadakh, can you update this PR?
Thanks
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/9#issuecomment-210649354
This will be very useful for many use cases. Nice to have it in 2.0 :)
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12416#issuecomment-210553855
@srowen I think using scala annotation is easier, and I like it more. What
do you think that we have a copy of `@Since` annotation in `mllib-local` jar?
Thanks
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12416#issuecomment-210553033
All the public methods need to have `Since` for the doc. See the following
for reference. Thanks.
https://github.com/apache/spark/blob/master/mllib/src/main
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12416#issuecomment-210552137
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12317#issuecomment-210201008
+cc @srowen @MLnick
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12317#issuecomment-210145620
@mengxr The versions of the dependencies have been moved to the parent POM.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12390#issuecomment-210073766
Thanks. Merged into master.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12317#discussion_r59647435
--- Diff: mllib-local/pom.xml ---
@@ -62,6 +62,15 @@
test
+ com.google.guava
+ guava
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/12317
[SPARK-14549][ML][WIP] Copy the Vector and Matrix classes from mllib to ml
in mllib-local
## What changes were proposed in this pull request?
This task will copy the Vector and Matrix
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12276#issuecomment-208234927
@tedyu Thanks. I created a new PR to address this issue.
https://github.com/apache/spark/pull/12298
---
If your project is set up for it, you can reply to this email
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/12298
[SPARK-14462][ML][MLLIB] Add the mllib-local build to maven pom
## What changes were proposed in this pull request?
In order to separate the linear algebra, and vector matrix classes
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12259#issuecomment-208229364
+ @mengxr
@rxin In [SPARK-13944](https://issues.apache.org/jira/browse/SPARK-13944),
the `matrix` and `vector` classes will be moved out to `spark-mllib-local
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12266#issuecomment-207692381
Thanks. Merged into master.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12266#issuecomment-207663956
Both looks good to me. Thanks. I'll go ahead and merge it soon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12159#issuecomment-207632470
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12241#issuecomment-207632164
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12241#issuecomment-207632064
This is the minimal change for creating a new jar build. Let's wait the
result of Jenkins. We'll move the code in a separate PR once this is merged.
Thanks
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12241#discussion_r59079729
--- Diff: dev/sparktestsupport/modules.py ---
@@ -256,9 +256,21 @@ def __hash__(self):
)
+mllib_local = Module(
+name="
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12241#discussion_r59081509
--- Diff: dev/sparktestsupport/modules.py ---
@@ -256,9 +256,21 @@ def __hash__(self):
)
+mllib_local = Module(
+name="
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/12241#discussion_r59081171
--- Diff: core/pom.xml ---
@@ -35,6 +35,11 @@
http://spark.apache.org/
+ org.apache.spark
--- End diff --
I
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12241#issuecomment-207349784
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user dbtsai closed the pull request at:
https://github.com/apache/spark/pull/12172
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/11242#issuecomment-207184221
LGTM. This PR dramatically improves our s3 performance at Netflix.
@andrewor14 @srowen @JoshRosen @davies @marmbrus @yhuai, any further
feedback? Thanks
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/11242#issuecomment-207140497
add to whitelist
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12241#issuecomment-207122586
@JoshRosen Thanks. It's working now.
@holdenk I thought each jar needs to have its own `package-info.java` to
generate the Java doc and Scala doc. I'm now
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12241#issuecomment-207075404
+cc @JoshRosen who may be able to give me insight on the MiMa failure
caused by adding new jar.
---
If your project is set up for it, you can reply to this email
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12241#issuecomment-207069702
+cc @mengxr @jkbradley @srowen @holdenk @MLnick
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/12241
[SPARK-13944][ML][MLLIB] add the mllib-local build to maven pom
## What changes were proposed in this pull request?
In order to separate the linear algebra, and vector matrix classes
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/11242#issuecomment-207063759
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12204#issuecomment-206689374
+@rdblue
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/12172
[SPARK-13944][ML][WIP] Separate out local linear algebra as a standalone
module without Spark dependency
## What changes were proposed in this pull request?
Separate out linear algebra
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/12159#issuecomment-205538259
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/11623#issuecomment-201383428
add to whitelist
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/11242#discussion_r56891305
--- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala ---
@@ -62,7 +64,23 @@ class UnionRDD[T: ClassTag](
var rdds: Seq[RDD[T
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/11242#discussion_r56432610
--- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala ---
@@ -62,7 +64,21 @@ class UnionRDD[T: ClassTag](
var rdds: Seq[RDD[T
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/11242#discussion_r56719728
--- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala ---
@@ -62,7 +64,23 @@ class UnionRDD[T: ClassTag](
var rdds: Seq[RDD[T
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/11242#discussion_r56432618
--- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala ---
@@ -62,7 +64,21 @@ class UnionRDD[T: ClassTag](
var rdds: Seq[RDD[T
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/11242#discussion_r56747209
--- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala ---
@@ -62,7 +64,23 @@ class UnionRDD[T: ClassTag](
var rdds: Seq[RDD[T
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/11242#discussion_r56747153
--- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala ---
@@ -62,7 +64,23 @@ class UnionRDD[T: ClassTag](
var rdds: Seq[RDD[T
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/11757#issuecomment-197553903
LGTM. Merged into master. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/11242#discussion_r56432616
--- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala ---
@@ -62,7 +64,21 @@ class UnionRDD[T: ClassTag](
var rdds: Seq[RDD[T
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/11242#issuecomment-198535147
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/11610#issuecomment-197213572
I'm not an expert in this area, but after thinking it more, I don't think
we can use `DGELSD` which minimizes `||b - A*x||` using the singular value
decomposition (SVD
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/11242#issuecomment-197052153
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/11610#issuecomment-196662210
I will vote for approach 1.
SVD will be the most stable algorithm, but slowest O(mn^2 + n^3) compared
with Cholesky O(mn^2) or QR O(mn^2 - n^3/3) decomposition
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r55967102
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala ---
@@ -108,6 +113,21 @@ class KMeansSuite extends SparkFunSuite
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r55966718
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala ---
@@ -30,11 +33,13 @@ class KMeansSuite extends SparkFunSuite
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r55965910
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala ---
@@ -118,6 +138,11 @@ object KMeansSuite {
sql.createDataFrame
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r55965857
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala ---
@@ -118,6 +138,11 @@ object KMeansSuite {
sql.createDataFrame
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/11610#discussion_r55964826
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/11610#discussion_r55964808
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/11610#discussion_r55964768
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/11610#discussion_r55964546
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/11610#discussion_r55964451
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/11610#discussion_r55964211
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/11610#discussion_r55963818
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/11610#discussion_r55963496
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/11424#issuecomment-190108578
Thanks. Merged into master.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/11136#issuecomment-189192045
Gonna do another detail pass of the code tomorrow.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/11136#discussion_r54141380
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -157,6 +157,12 @@ private[ml] class WeightedLeastSquares
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/11247#issuecomment-188428112
@yanboliang I share the same concern with you. However, user may have
`standardization = false`, but still want to have a good convergency when the
scales are quite
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/9#issuecomment-187598373
Yes, but busy on work. :( Will soon start it in couple days.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/11258#issuecomment-186998722
LGTM. Merged into master. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/11258#issuecomment-186994891
The default value in R's GLMNET is `1E-7`, and the default value in
original LBFGS implementation is `1E-8`. In order to provide better and
consistent result, let's
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/11258#issuecomment-186088729
+1 on copying the tests from ML LOR tests.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/11247#issuecomment-185826603
@yanboliang In #7080, It was intentionally made that `standardization =
false` will run the same route as `standardization = true` without
regularization
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/9#issuecomment-183248930
@yinxusen I'll be away for Spark summit east. Gonna work on this again when
I'm back. Thanks.
---
If your project is set up for it, you can reply to this email
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/9#issuecomment-181766454
Agree, for code-gen, if we want to do it in this way, we would rather put
them in a separate place. But will be nice to extend the code-gen framework so
it can use one
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r52282168
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
---
@@ -248,6 +269,11 @@ class KMeans @Since("
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r52282317
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
---
@@ -248,6 +269,11 @@ class KMeans @Since("
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r52281022
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
---
@@ -237,6 +237,27 @@ class KMeans @Since("1.5.0") (
@Si
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r52281252
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
---
@@ -237,6 +237,27 @@ class KMeans @Since("1.5.0") (
@Si
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r52282898
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala ---
@@ -106,6 +106,38 @@ class KMeansSuite extends SparkFunSuite
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r52279449
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
---
@@ -78,7 +78,24 @@ private[shared] object SharedParamsCodeGen
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r52279466
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
---
@@ -78,7 +78,24 @@ private[shared] object SharedParamsCodeGen
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r52281352
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
---
@@ -237,6 +237,27 @@ class KMeans @Since("1.5.0") (
@Si
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-178787032
I meat comparing the result with your solution when `yStd != 0`, and
`regParm != 0`. I suspect that you will get different result since GLMNET one
forces to standardize
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-179000215
Yes, that's what I meat. Without standardizing the labels, no way to match
glmnet, but this makes the problem ill-defined when `yStd == 0`.
---
If your project is set
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r5162
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -558,6 +583,86 @@ class LinearRegressionSuite
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-179002845
LGTM. Merged into master. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51536905
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -74,7 +74,8 @@ class LinearRegression @Since("1.3.0"
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51536910
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -83,7 +84,8 @@ class LinearRegression @Since("1.3.0"
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51537328
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -398,7 +422,8 @@ class LinearRegressionModel private[ml
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-178441655
For the case (3), I agree with your agreement completely. Can you try your
normal equation solution with L2 without any standardization (nonzero ystd
data) and see
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-177385145
Commenting on your issues.
Issue 1:
With `WeightedLeastSquares`, we have option to standardize the label and
features separately. As a result, if the label
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51354803
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -558,6 +575,47 @@ class LinearRegressionSuite
801 - 900 of 1777 matches
Mail list logo