Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/22236
Take it and good luck.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/22236
just FYI about another related PR:
https://github.com/apache/spark/pull/17280
and maybe I should close it? @srowen
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/21981
BTW, @HyukjinKwon, do you know who's still reviewing the ML PRs? I have a
few old PRs and I really want to know which're considered meaningful
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/21981
Thanks for the review @HyukjinKwon.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/21981
Ah, this triggers the doc check. Updating.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
GitHub user hhbyyh opened a pull request:
https://github.com/apache/spark/pull/21981
[SAPRK-25011][ML]add prefix to __all__ in fpm.py
## What changes were proposed in this pull request?
jira: https://issues.apache.org/jira/browse/SPARK-25011
add prefix to __all__
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/21942
I think it's better to move the code and unit test in one PR. But since
it's not a trivial change, I suggest you to wait for committers' opinion
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21942#discussion_r207103066
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala ---
@@ -160,15 +160,89 @@ class StandardScalerModel private[ml
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17280
Updated to support backward model loading compatibility.
@MLnick @jkbradley
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/16158
gentle ping @MLnick, Thanks for the review. Appreciate if you have some
time for further comments.
---
-
To unsubscribe, e-mail
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/20028
Thanks for the comments @zhengruifeng @felixcheung
It's been nearly 8 months and it took me a while to recall what this PR
does. While the PR did provide some improvement for the current
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21501#discussion_r194098947
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0"
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21501#discussion_r194099298
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0"
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21248#discussion_r189465979
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/PowerIterationClusteringExample.scala
---
@@ -0,0 +1,114 @@
+/*
+ * Licensed
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21248#discussion_r189466147
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/PowerIterationClusteringExample.scala
---
@@ -0,0 +1,114 @@
+/*
+ * Licensed
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21248#discussion_r189466112
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/PowerIterationClusteringExample.scala
---
@@ -0,0 +1,114 @@
+/*
+ * Licensed
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21283#discussion_r189464881
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21283#discussion_r189464891
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21283#discussion_r189464861
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21283#discussion_r189465580
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21283#discussion_r189465534
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21283#discussion_r189465567
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/20028
Please advice if this is a good feature to add. If not I'll close it.
Thanks.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19599
Please advice if this is a good feature to add. If not I'll close it.
Thanks.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17583
Please advice if this is a good feature to add. If not I'll close it.
Thanks.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17280
Please advice if this is a good feature to add. If not I'll close it.
Thanks.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/16158
Please advice if this is a good feature to add. If not I'll close it.
Thanks.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19565
It's probably better to wait for the opinion from a committer.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17280#discussion_r164942458
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -319,9 +323,11 @@ object FPGrowthModel extends MLReadable[FPGrowthModel
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17280
Thanks for taking a look @MLnick
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17280#discussion_r161962624
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -319,9 +323,11 @@ object FPGrowthModel extends MLReadable[FPGrowthModel
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/16158#discussion_r159016507
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -133,7 +134,10 @@ class CrossValidator @Since("1.2.0") (@Si
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r158344862
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -249,6 +250,31 @@ object ParamValidators {
def arrayLengthGt[T
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r158170154
--- Diff: mllib/src/test/scala/org/apache/spark/ml/param/ParamsSuite.scala
---
@@ -430,4 +433,49 @@ object ParamsSuite extends SparkFunSuite
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r158154050
--- Diff: mllib/src/test/scala/org/apache/spark/ml/param/ParamsSuite.scala
---
@@ -430,4 +433,49 @@ object ParamsSuite extends SparkFunSuite
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r158153277
--- Diff: mllib/src/test/scala/org/apache/spark/ml/param/ParamsSuite.scala
---
@@ -430,4 +433,49 @@ object ParamsSuite extends SparkFunSuite
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r158153048
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -249,6 +250,31 @@ object ParamValidators {
def arrayLengthGt[T
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19993
To make it available for other classes, we need to support checking for
both `fit` and `transform`, that means we also need a sample input Dataset, so
we may have to add the explicit test in each
GitHub user hhbyyh opened a pull request:
https://github.com/apache/spark/pull/20028
[SPARK-19053][ML]Supporting multiple evaluation metrics in DataFrame-based
API
## What changes were proposed in this pull request?
As an initial step, the PR creates
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r157870176
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -249,6 +250,29 @@ object ParamValidators {
def arrayLengthGt[T
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r157871042
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -249,6 +250,29 @@ object ParamValidators {
def arrayLengthGt[T
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r157867496
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala
---
@@ -137,18 +137,17 @@ final class Bucketizer @Since("1.4.0")
(@Si
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r157870214
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -249,6 +250,29 @@ object ParamValidators {
def arrayLengthGt[T
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r157869596
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -249,6 +250,29 @@ object ParamValidators {
def arrayLengthGt[T
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19993
I would suggest to develop the common infrastructure and unit test first,
then other PR can take it or we can send follow-up fix.
cc @MLnick for advice
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19599
Updated, use $lc and add a new unit test for doc and exception.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19599
> One option that came to my mind was that $ returns lowercase, so this is
used at most places but when you really need it you can access the original
(not necessarily lowercase) va
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r156772807
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc:
String
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r156771298
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc:
String
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r156770135
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc:
String
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19020#discussion_r156758913
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -480,10 +640,14 @@ object LinearRegression extends
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19020#discussion_r156758353
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala
---
@@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19020
LGTM.
One thing I noticed is that we did not really compare the loss with other
lib (like sklearn), which is something also missing for other linear
algorithms. Do you think it would
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19599
Many thanks for the review @smurakozi and @attilapiros.
> The PR is not complete (did not convert all Param[String] instances to
StringParam consistently) so it should be marked as
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r156556262
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -224,8 +222,8 @@ class LinearRegression @Since("1.3.0"
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r156556136
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc:
String
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r156555847
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc:
String
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19525
Thanks for the review @yanboliang
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19894
LGTM. Good fix.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/10803
No it's not merged. Feel free to use the code as you wish.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19525#discussion_r151854676
--- Diff:
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -476,6 +476,10 @@ class DenseMatrix @Since("2.0.0") (
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r151029101
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +346,39 @@ class VectorIndexerModel private[ml
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19588
Also we need jira for python.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r150748259
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +346,39 @@ class VectorIndexerModel private[ml
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19525#discussion_r150432943
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
---
@@ -2769,6 +2769,20 @@ class LogisticRegressionSuite
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19525#discussion_r150432221
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19525#discussion_r150430257
--- Diff:
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -476,6 +476,10 @@ class DenseMatrix @Since("2.0.0") (
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148969162
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -37,7 +38,25 @@ import org.apache.spark.sql.types.{StructField
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r148968172
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r148968255
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r148967559
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r148967956
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r148967766
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r148967853
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r148966706
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r148968530
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/impl/Word2VecCBOWSolver.scala
---
@@ -0,0 +1,371 @@
+/*
+ * Licensed to the Apache
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148634321
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/RegressionEvaluator.scala
---
@@ -49,8 +49,8 @@ final class RegressionEvaluator @Since("
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148634479
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -722,6 +722,17 @@ class LinearRegressionSummary private
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148440930
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -37,7 +38,25 @@ import org.apache.spark.sql.types.{StructField
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148442070
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +342,39 @@ class VectorIndexerModel private[ml
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148440879
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -37,7 +38,25 @@ import org.apache.spark.sql.types.{StructField
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148444910
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +342,39 @@ class VectorIndexerModel private[ml
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148444535
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +342,39 @@ class VectorIndexerModel private[ml
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148444218
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +342,39 @@ class VectorIndexerModel private[ml
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148442709
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -287,9 +315,12 @@ class VectorIndexerModel private[ml
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r148440785
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -37,7 +38,25 @@ import org.apache.spark.sql.types.{StructField
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19599
I used two ways to switch String params among different options:
1. In NaiveBayes: convert StringParam and String constants to lowercase.
2. in LinearRegression: .equalsIgnoreCase
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19565#discussion_r148438581
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -497,40 +481,46 @@ final class OnlineLDAOptimizer extends
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19565#discussion_r148438759
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -446,14 +445,14 @@ final class OnlineLDAOptimizer extends
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19565#discussion_r148437931
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -446,14 +445,14 @@ final class OnlineLDAOptimizer extends
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19565
Let me know if I missed anything, but I don't quite catch the part
> all the batches have the same length
IMO
`docs.sample(withReplacement = sampleWithReplacem
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/19565
@akopich I'm actually leaning towards "filter after sample".
1. so we don't need to change `miniBatchFraction` in
` docs.sample(withReplacement = sampleWithR
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r147562600
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -440,6 +440,43 @@ class BooleanParam(parent: String, name: String, doc:
String
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r147562823
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala ---
@@ -133,7 +134,7 @@ class NaiveBayes @Since("1.5.0") (
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r147562645
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -224,8 +222,8 @@ class LinearRegression @Since("1.3.0"
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19599#discussion_r147562530
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -440,6 +440,43 @@ class BooleanParam(parent: String, name: String, doc:
String
GitHub user hhbyyh opened a pull request:
https://github.com/apache/spark/pull/19599
[SPARK-22381] [ML] Add StringParam that supports valid options
## What changes were proposed in this pull request?
jira: https://issues.apache.org/jira/browse/SPARK-22381
During
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19020#discussion_r147321334
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala
---
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19020#discussion_r147324011
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -142,6 +221,9 @@ class LinearRegression @Since("1.3.0"
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/19020#discussion_r147326457
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -344,33 +449,58 @@ class LinearRegression @Since("
1 - 100 of 973 matches
Mail list logo