[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules

2018-08-28 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/22236 Take it and good luck. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules

2018-08-27 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/22236 just FYI about another related PR: https://github.com/apache/spark/pull/17280 and maybe I should close it? @srowen

[GitHub] spark issue #21981: [SAPRK-25011][ML]add prefix to __all__ in fpm.py

2018-08-03 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/21981 BTW, @HyukjinKwon, do you know who's still reviewing the ML PRs? I have a few old PRs and I really want to know which're considered meaningful

[GitHub] spark issue #21981: [SAPRK-25011][ML]add prefix to __all__ in fpm.py

2018-08-03 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/21981 Thanks for the review @HyukjinKwon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #21981: [SAPRK-25011][ML]add prefix to __all__ in fpm.py

2018-08-02 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/21981 Ah, this triggers the doc check. Updating. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21981: [SAPRK-25011][ML]add prefix to __all__ in fpm.py

2018-08-02 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/21981 [SAPRK-25011][ML]add prefix to __all__ in fpm.py ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-25011 add prefix to __all__

[GitHub] spark issue #21942: [SPARK-24283][ML] Make ml.StandardScaler skip conversion...

2018-08-02 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/21942 I think it's better to move the code and unit test in one PR. But since it's not a trivial change, I suggest you to wait for committers' opinion

[GitHub] spark pull request #21942: [SPARK-24283][ML] Make ml.StandardScaler skip con...

2018-08-01 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/21942#discussion_r207103066 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala --- @@ -160,15 +160,89 @@ class StandardScalerModel private[ml

[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...

2018-07-31 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17280 Updated to support backward model loading compatibility. @MLnick @jkbradley --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2018-07-29 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/16158 gentle ping @MLnick, Thanks for the review. Appreciate if you have some time for further comments. --- - To unsubscribe, e-mail

[GitHub] spark issue #20028: [SPARK-19053][ML]Supporting multiple evaluation metrics ...

2018-07-29 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/20028 Thanks for the comments @zhengruifeng @felixcheung It's been nearly 8 months and it took me a while to recall what this PR does. While the PR did provide some improvement for the current

[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

2018-06-08 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/21501#discussion_r194098947 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0"

[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

2018-06-08 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/21501#discussion_r194099298 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0"

[GitHub] spark pull request #21248: [SPARK-24191][ML]Scala Example code for Power Ite...

2018-05-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/21248#discussion_r189465979 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/PowerIterationClusteringExample.scala --- @@ -0,0 +1,114 @@ +/* + * Licensed

[GitHub] spark pull request #21248: [SPARK-24191][ML]Scala Example code for Power Ite...

2018-05-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/21248#discussion_r189466147 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/PowerIterationClusteringExample.scala --- @@ -0,0 +1,114 @@ +/* + * Licensed

[GitHub] spark pull request #21248: [SPARK-24191][ML]Scala Example code for Power Ite...

2018-05-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/21248#discussion_r189466112 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/PowerIterationClusteringExample.scala --- @@ -0,0 +1,114 @@ +/* + * Licensed

[GitHub] spark pull request #21283: [SPARK-24224][ML-Examples]Java example code for P...

2018-05-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/21283#discussion_r189464881 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java --- @@ -0,0 +1,85 @@ +/* + * Licensed

[GitHub] spark pull request #21283: [SPARK-24224][ML-Examples]Java example code for P...

2018-05-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/21283#discussion_r189464891 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java --- @@ -0,0 +1,85 @@ +/* + * Licensed

[GitHub] spark pull request #21283: [SPARK-24224][ML-Examples]Java example code for P...

2018-05-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/21283#discussion_r189464861 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java --- @@ -0,0 +1,85 @@ +/* + * Licensed

[GitHub] spark pull request #21283: [SPARK-24224][ML-Examples]Java example code for P...

2018-05-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/21283#discussion_r189465580 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java --- @@ -0,0 +1,85 @@ +/* + * Licensed

[GitHub] spark pull request #21283: [SPARK-24224][ML-Examples]Java example code for P...

2018-05-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/21283#discussion_r189465534 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java --- @@ -0,0 +1,85 @@ +/* + * Licensed

[GitHub] spark pull request #21283: [SPARK-24224][ML-Examples]Java example code for P...

2018-05-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/21283#discussion_r189465567 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java --- @@ -0,0 +1,85 @@ +/* + * Licensed

[GitHub] spark issue #20028: [SPARK-19053][ML]Supporting multiple evaluation metrics ...

2018-03-20 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/20028 Please advice if this is a good feature to add. If not I'll close it. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

2018-03-20 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19599 Please advice if this is a good feature to add. If not I'll close it. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...

2018-03-20 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17583 Please advice if this is a good feature to add. If not I'll close it. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...

2018-03-20 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17280 Please advice if this is a good feature to add. If not I'll close it. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2018-03-20 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/16158 Please advice if this is a good feature to add. If not I'll close it. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2018-02-03 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19565 It's probably better to wait for the opinion from a committer. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #17280: [SPARK-19939] [ML] Add support for association ru...

2018-01-30 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17280#discussion_r164942458 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -319,9 +323,11 @@ object FPGrowthModel extends MLReadable[FPGrowthModel

[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...

2018-01-16 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17280 Thanks for taking a look @MLnick --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #17280: [SPARK-19939] [ML] Add support for association ru...

2018-01-16 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17280#discussion_r161962624 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -319,9 +323,11 @@ object FPGrowthModel extends MLReadable[FPGrowthModel

[GitHub] spark pull request #16158: [SPARK-18724][ML] Add TuningSummary for TrainVali...

2017-12-28 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/16158#discussion_r159016507 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -133,7 +134,10 @@ class CrossValidator @Since("1.2.0") (@Si

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2017-12-21 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r158344862 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -249,6 +250,31 @@ object ParamValidators { def arrayLengthGt[T

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2017-12-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r158170154 --- Diff: mllib/src/test/scala/org/apache/spark/ml/param/ParamsSuite.scala --- @@ -430,4 +433,49 @@ object ParamsSuite extends SparkFunSuite

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2017-12-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r158154050 --- Diff: mllib/src/test/scala/org/apache/spark/ml/param/ParamsSuite.scala --- @@ -430,4 +433,49 @@ object ParamsSuite extends SparkFunSuite

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2017-12-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r158153277 --- Diff: mllib/src/test/scala/org/apache/spark/ml/param/ParamsSuite.scala --- @@ -430,4 +433,49 @@ object ParamsSuite extends SparkFunSuite

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2017-12-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r158153048 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -249,6 +250,31 @@ object ParamValidators { def arrayLengthGt[T

[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2017-12-20 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19993 To make it available for other classes, we need to support checking for both `fit` and `transform`, that means we also need a sample input Dataset, so we may have to add the explicit test in each

[GitHub] spark pull request #20028: [SPARK-19053][ML]Supporting multiple evaluation m...

2017-12-19 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/20028 [SPARK-19053][ML]Supporting multiple evaluation metrics in DataFrame-based API ## What changes were proposed in this pull request? As an initial step, the PR creates

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2017-12-19 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r157870176 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -249,6 +250,29 @@ object ParamValidators { def arrayLengthGt[T

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2017-12-19 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r157871042 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -249,6 +250,29 @@ object ParamValidators { def arrayLengthGt[T

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2017-12-19 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r157867496 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -137,18 +137,17 @@ final class Bucketizer @Since("1.4.0") (@Si

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2017-12-19 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r157870214 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -249,6 +250,29 @@ object ParamValidators { def arrayLengthGt[T

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2017-12-19 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r157869596 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -249,6 +250,29 @@ object ParamValidators { def arrayLengthGt[T

[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2017-12-18 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19993 I would suggest to develop the common infrastructure and unit test first, then other PR can take it or we can send follow-up fix. cc @MLnick for advice

[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

2017-12-13 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19599 Updated, use $lc and add a new unit test for doc and exception. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

2017-12-13 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19599 > One option that came to my mind was that $ returns lowercase, so this is used at most places but when you really need it you can access the original (not necessarily lowercase) va

[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

2017-12-13 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19599#discussion_r156772807 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: String

[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

2017-12-13 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19599#discussion_r156771298 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: String

[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

2017-12-13 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19599#discussion_r156770135 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: String

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-12-13 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r156758913 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -480,10 +640,14 @@ object LinearRegression extends

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-12-13 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r156758353 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #19020: [SPARK-3181] [ML] Implement huber loss for LinearRegress...

2017-12-13 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19020 LGTM. One thing I noticed is that we did not really compare the loss with other lib (like sklearn), which is something also missing for other linear algorithms. Do you think it would

[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

2017-12-12 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19599 Many thanks for the review @smurakozi and @attilapiros. > The PR is not complete (did not convert all Param[String] instances to StringParam consistently) so it should be marked as

[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

2017-12-12 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19599#discussion_r156556262 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -224,8 +222,8 @@ class LinearRegression @Since("1.3.0"

[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

2017-12-12 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19599#discussion_r156556136 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: String

[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

2017-12-12 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19599#discussion_r156555847 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: String

[GitHub] spark issue #19525: [SPARK-22289] [ML] Add JSON support for Matrix parameter...

2017-12-12 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19525 Thanks for the review @yanboliang --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #19894: [SPARK-22700][ML] Bucketizer.transform incorrectly drops...

2017-12-07 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19894 LGTM. Good fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #10803: [SPARK-12875] [ML] Add Weight of Evidence and Informatio...

2017-12-01 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/10803 No it's not merged. Feel free to use the code as you wish. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

2017-11-18 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19525#discussion_r151854676 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -476,6 +476,10 @@ class DenseMatrix @Since("2.0.0") (

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-14 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r151029101 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -311,22 +346,39 @@ class VectorIndexerModel private[ml

[GitHub] spark issue #19588: [SPARK-12375][ML] VectorIndexerModel support handle unse...

2017-11-13 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19588 Also we need jira for python. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-13 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r150748259 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -311,22 +346,39 @@ class VectorIndexerModel private[ml

[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

2017-11-12 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19525#discussion_r150432943 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -2769,6 +2769,20 @@ class LogisticRegressionSuite

[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

2017-11-12 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19525#discussion_r150432221 --- Diff: mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

2017-11-12 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19525#discussion_r150430257 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -476,6 +476,10 @@ class DenseMatrix @Since("2.0.0") (

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-12 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r148969162 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -37,7 +38,25 @@ import org.apache.spark.sql.types.{StructField

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-11-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r148968172 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-11-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r148968255 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-11-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r148967559 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-11-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r148967956 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-11-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r148967766 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-11-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r148967853 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-11-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r148966706 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -105,6 +106,56 @@ private[feature] trait Word2VecBase extends Params

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-11-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r148968530 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/impl/Word2VecCBOWSolver.scala --- @@ -0,0 +1,371 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-02 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r148634321 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/RegressionEvaluator.scala --- @@ -49,8 +49,8 @@ final class RegressionEvaluator @Since("

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-02 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r148634479 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -722,6 +722,17 @@ class LinearRegressionSummary private

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-02 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r148440930 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -37,7 +38,25 @@ import org.apache.spark.sql.types.{StructField

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-02 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r148442070 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -311,22 +342,39 @@ class VectorIndexerModel private[ml

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-02 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r148440879 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -37,7 +38,25 @@ import org.apache.spark.sql.types.{StructField

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-02 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r148444910 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -311,22 +342,39 @@ class VectorIndexerModel private[ml

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-02 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r148444535 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -311,22 +342,39 @@ class VectorIndexerModel private[ml

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-02 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r148444218 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -311,22 +342,39 @@ class VectorIndexerModel private[ml

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-02 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r148442709 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -287,9 +315,12 @@ class VectorIndexerModel private[ml

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-02 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r148440785 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -37,7 +38,25 @@ import org.apache.spark.sql.types.{StructField

[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

2017-11-01 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19599 I used two ways to switch String params among different options: 1. In NaiveBayes: convert StringParam and String constants to lowercase. 2. in LinearRegression: .equalsIgnoreCase

[GitHub] spark pull request #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should fi...

2017-11-01 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19565#discussion_r148438581 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -497,40 +481,46 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should fi...

2017-11-01 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19565#discussion_r148438759 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -446,14 +445,14 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should fi...

2017-11-01 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19565#discussion_r148437931 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -446,14 +445,14 @@ final class OnlineLDAOptimizer extends

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-10-31 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19565 Let me know if I missed anything, but I don't quite catch the part > all the batches have the same length IMO `docs.sample(withReplacement = sampleWithReplacem

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-10-31 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19565 @akopich I'm actually leaning towards "filter after sample". 1. so we don't need to change `miniBatchFraction` in ` docs.sample(withReplacement = sampleWithR

[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

2017-10-28 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19599#discussion_r147562600 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -440,6 +440,43 @@ class BooleanParam(parent: String, name: String, doc: String

[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

2017-10-28 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19599#discussion_r147562823 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala --- @@ -133,7 +134,7 @@ class NaiveBayes @Since("1.5.0") (

[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

2017-10-28 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19599#discussion_r147562645 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -224,8 +222,8 @@ class LinearRegression @Since("1.3.0"

[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

2017-10-28 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19599#discussion_r147562530 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -440,6 +440,43 @@ class BooleanParam(parent: String, name: String, doc: String

[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

2017-10-28 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/19599 [SPARK-22381] [ML] Add StringParam that supports valid options ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-22381 During

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-10-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r147321334 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-10-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r147324011 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -142,6 +221,9 @@ class LinearRegression @Since("1.3.0"

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-10-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r147326457 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -344,33 +449,58 @@ class LinearRegression @Since("

  1   2   3   4   5   6   7   8   9   10   >