[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-11-07 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/9 @sethah Sorry, I got stuck in other things. I'll update this PR tonight. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #11374: [SPARK-12042] Python API for mllib.stat.test.StreamingTe...

2016-10-28 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/11374 Ping @mengxr @feynmanliang @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-23 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/9 How about the following: 1. Since the new generated model is derived from an estimator, the model should have the same params as its parent estimator. That's why there is no need

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-19 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/9 @MLnick @dbtsai @sethah Any thoughts on the new version? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-17 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r83764721 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -252,8 +254,10 @@ object KMeansModel extends MLReadable[KMeansModel

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-17 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/9 I agree that too long lineage would hurt performance and also unnecessary. How about cutting a lineage that is longer than 2? Namely, we only keep direct parent model when saving. Keeping direct

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-16 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/9 @sethah Thanks, I change these problems. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-16 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/9 @sethah New behavior of `setK` is adapted. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-14 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r83510194 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -300,15 +301,23 @@ private[ml] object DefaultParamsWriter

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-14 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r83509799 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -137,18 +143,64 @@ class KMeansSuite extends SparkFunSuite

[GitHub] spark issue #13794: [SPARK-15574][ML][PySpark] Python meta-algorithms in Sca...

2016-10-07 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/13794 Thanks @holdenk Yes, I am still interested in this. @jkbradley Do we still need the PR to support meta-algorithms in PySpark? --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #14006: [SPARK-13015][MLlib][DOC] Replace example code in mllib-...

2016-10-07 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/14006 @holdenk I'll update a version ASAP --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-07 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/9 @dbtsai @sethah I updated the code. Now we check the equivalence of K when setting initialModel if K is set previously. We also check the equivalence when fitting a model. --- If your project

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-29 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r81262768 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -300,15 +301,23 @@ private[ml] object DefaultParamsWriter

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-29 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r81262198 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/DefaultReadWriteTest.scala --- @@ -107,24 +133,34 @@ trait DefaultReadWriteTest extends

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-27 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/9 Ping @dbtsai @sethah --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-27 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r80630417 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -81,11 +81,23 @@ private[clustering] trait KMeansParams extends Params

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-27 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r80630435 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/DefaultReadWriteTest.scala --- @@ -39,7 +40,7 @@ trait DefaultReadWriteTest extends TempDirectory

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-27 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r80630450 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/DefaultReadWriteTest.scala --- @@ -107,24 +133,34 @@ trait DefaultReadWriteTest extends

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-27 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r80630372 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/DefaultReadWriteTest.scala --- @@ -107,24 +133,34 @@ trait DefaultReadWriteTest extends

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-12 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/9 ping @dbtsai @sethah --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-06 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/9 @dbtsai It's ready for your review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-02 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r77414688 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -139,16 +145,32 @@ class KMeansSuite extends SparkFunSuite

[GitHub] spark issue #11119: [SPARK-10780][ML][WIP] Add initial model to kmeans

2016-08-31 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/9 Thanks @sethah and @dbtsai, I'll fix them soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-17 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/14229 @felixcheung Merged with master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-16 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/14229#discussion_r75041418 --- Diff: R/pkg/R/mllib.R --- @@ -605,6 +701,69 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formul

[GitHub] spark pull request #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-16 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/14229#discussion_r75034333 --- Diff: R/pkg/R/mllib.R --- @@ -299,6 +306,94 @@ setMethod("summary", signature(object = "NaiveBayesModel"),

[GitHub] spark pull request #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-16 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/14229#discussion_r74993602 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala --- @@ -0,0 +1,207 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-16 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/14229#discussion_r74992693 --- Diff: R/pkg/R/mllib.R --- @@ -299,6 +307,92 @@ setMethod("summary", signature(object = "NaiveBayesModel"),

[GitHub] spark pull request #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-16 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/14229#discussion_r74990815 --- Diff: R/pkg/R/generics.R --- @@ -1279,6 +1279,19 @@ setGeneric("spark.naiveBayes", function(data, formula, ...) { standardGeneric("s

[GitHub] spark pull request #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-16 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/14229#discussion_r74990526 --- Diff: R/pkg/R/mllib.R --- @@ -299,6 +307,92 @@ setMethod("summary", signature(object = "NaiveBayesModel"),

[GitHub] spark pull request #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-15 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/14229#discussion_r74852343 --- Diff: R/pkg/R/mllib.R --- @@ -605,6 +701,69 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formul

[GitHub] spark pull request #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-15 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/14229#discussion_r74852090 --- Diff: R/pkg/R/mllib.R --- @@ -605,6 +701,69 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formul

[GitHub] spark pull request #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-15 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/14229#discussion_r74851793 --- Diff: R/pkg/R/mllib.R --- @@ -605,6 +701,69 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formul

[GitHub] spark pull request #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-15 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/14229#discussion_r74841214 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala --- @@ -0,0 +1,207 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-12 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/14229#discussion_r74662435 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala --- @@ -0,0 +1,210 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-11 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/14229 @felixcheung I add some aliases for spark.lda related functions. However, I am not quite understand it. From [here](https://cran.r-project.org/web/packages/roxygen2/vignettes/rd.html) I can see

[GitHub] spark issue #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-11 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/14229 @felixcheung Yes. Sorry I missed the email. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14212: [SPARK-16558][Examples][MLlib] examples/mllib/LDAExample...

2016-08-01 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/14212 @MLnick They serve different purpose. This one is for users who have built their tools upon it. The `LatentDirichletAllocationExample` is for ML docs. --- If your project is set up for it, you

[GitHub] spark issue #14212: [SPARK-16558][Examples][MLlib] examples/mllib/LDAExample...

2016-07-21 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/14212 Ping @mengxr @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-07-15 Thread yinxusen
GitHub user yinxusen opened a pull request: https://github.com/apache/spark/pull/14229 [SPARK-16447][ML][SparkR] LDA wrapper in SparkR ## What changes were proposed in this pull request? Add LDA Wrapper in SparkR with the following interfaces: - spark.lda(data

[GitHub] spark pull request #14212: [SPARK-16558][Examples][MLlib] examples/mllib/LDA...

2016-07-14 Thread yinxusen
GitHub user yinxusen opened a pull request: https://github.com/apache/spark/pull/14212 [SPARK-16558][Examples][MLlib] examples/mllib/LDAExample should use MLVector instead of MLlib Vector ## What changes were proposed in this pull request? mllib.LDAExample uses ML pipeline

[GitHub] spark issue #14006: [SPARK-13015][MLlib][DOC] Replace example code in mllib-...

2016-07-08 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/14006 @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14049: [SPARK-16369][MLlib] tallSkinnyQR of RowMatrix should aw...

2016-07-07 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/14049 @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14051: [SPARK-16372][MLlib] Retag RDD to tallSkinnyQR of...

2016-07-05 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/14051#discussion_r69656311 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -537,7 +537,7 @@ class RowMatrix @Since("

[GitHub] spark pull request #14051: [SPARK-16372][MLlib] Retag RDD to tallSkinnyQR of...

2016-07-05 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/14051#discussion_r69655115 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -537,7 +537,7 @@ class RowMatrix @Since("

[GitHub] spark pull request #14051: [SPARK-16372][MLlib] Retag RDD to tallSkinnyQR of...

2016-07-05 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/14051#discussion_r69612547 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -537,7 +537,7 @@ class RowMatrix @Since("

[GitHub] spark pull request #14049: [SPARK-16369][MLlib] tallSkinnyQR of RowMatrix sh...

2016-07-05 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/14049#discussion_r69602682 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -538,20 +538,29 @@ class RowMatrix @Since("

[GitHub] spark pull request #14051: [SPARK-16372][MLlib] RowMatrix constructor should...

2016-07-04 Thread yinxusen
GitHub user yinxusen opened a pull request: https://github.com/apache/spark/pull/14051 [SPARK-16372][MLlib] RowMatrix constructor should use retag for Java compatibility ## What changes were proposed in this pull request? The following Java code because of type erasing

[GitHub] spark pull request #14049: [SPARK-16369][MLlib] tallSkinnyQR of RowMatrix sh...

2016-07-04 Thread yinxusen
GitHub user yinxusen opened a pull request: https://github.com/apache/spark/pull/14049 [SPARK-16369][MLlib] tallSkinnyQR of RowMatrix should aware of empty partition ## What changes were proposed in this pull request? tallSkinnyQR of RowMatrix should aware of empty

[GitHub] spark issue #14006: [SPARK-13015][MLlib][DOC] Replace example code in mllib-...

2016-06-30 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/14006 @liancheng @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14006: [SPARK-13015][MLlib][DOC] Replace example code in...

2016-06-30 Thread yinxusen
GitHub user yinxusen opened a pull request: https://github.com/apache/spark/pull/14006 [SPARK-13015][MLlib][DOC] Replace example code in mllib-data-types.md using include_example ## What changes were proposed in this pull request? 1. Add more specific error prompt

[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-30 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/13972 @mengxr Sure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-29 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/13972 @mengxr With this PR merged, I think we can also fix the [SPARK-13015 (mllib-data-types.md )](https://issues.apache.org/jira/browse/SPARK-13015) with a consolidated example file. --- If your

[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-29 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/13972 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-29 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/13972 One small matter is that we can't use intersecting labels, e.g. ```scala // $example on:init_session$ some code // $example on:build_session$ some code // $example

[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-29 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/13972 I'll take a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #11119: [SPARK-10780][ML][WIP] Add initial model to kmean...

2016-06-28 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r68715841 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedGeneralTypeParams.scala --- @@ -0,0 +1,34 @@ +/* --- End diff

[GitHub] spark issue #13921: [SPARK-16140][MLlib][SparkR][Docs] Group k-means method ...

2016-06-28 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/13921 @keypointt You need to add title for `predict` and `write.ml`. Like the first line below. ```Rd #' This is title for write.ml #' @rdname write.ml #' @export setGeneric

[GitHub] spark issue #13794: [SPARK-15574][ML][PySpark] Python meta-algorithms in Sca...

2016-06-24 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/13794 @jkbradley Update: Now I've added the PythonEstimator and PythonModel. For PythonEvaluator, it's better to commit in along with changes of CrossValidator. It's ready to review. --- If your

[GitHub] spark issue #13794: [SPARK-15574][WIP][ML][PySpark] Python transformer wrapp...

2016-06-24 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/13794 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13794: [SPARK-15574][WIP][ML][PySpark] Python transformer wrapp...

2016-06-23 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/13794 test it please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13794: [SPARK-15574][WIP][ML][PySpark] Python transformer wrapp...

2016-06-23 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/13794 test it please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13794: [SPARK-15574][ML][PySpark] Python transformer wrapper an...

2016-06-20 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/13794 @jkbradley For [SPARK-15574](https://issues.apache.org/jira/browse/SPARK-15574), only Pipeline and Transformer are changed in this PR. I will add Estimator/Model later. For pure Python

[GitHub] spark pull request #13794: [SPARK-15574][ML][PySpark] Python transformer wra...

2016-06-20 Thread yinxusen
GitHub user yinxusen opened a pull request: https://github.com/apache/spark/pull/13794 [SPARK-15574][ML][PySpark] Python transformer wrapper and Pipeline ## What changes were proposed in this pull request? 1. Add a PythonTransformerWrapper in Scala for pure Python

[GitHub] spark issue #13536: [SPARK-15793][ML] Add maxSentenceLength for ml.Word2Vec

2016-06-06 Thread yinxusen
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/13536 retest it please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13536: [SPARK-15793][ML] Add maxSentenceLength for ml.Wo...

2016-06-06 Thread yinxusen
GitHub user yinxusen opened a pull request: https://github.com/apache/spark/pull/13536 [SPARK-15793][ML] Add maxSentenceLength for ml.Word2Vec ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-15793 Word2vec in ML

[GitHub] spark pull request: [SPARK-15008][ML][PySpark] Add integration tes...

2016-05-25 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12875#issuecomment-221748096 Sorry for the mistake. Fixed it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-15008][ML][PySpark] Add integration tes...

2016-05-03 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/12875#discussion_r61972815 --- Diff: python/pyspark/ml/classification.py --- @@ -1172,6 +1175,53 @@ def getClassifier(self): """ return se

[GitHub] spark pull request: [SPARK-15008][ML][PySpark] Add integration tes...

2016-05-03 Thread yinxusen
GitHub user yinxusen opened a pull request: https://github.com/apache/spark/pull/12875 [SPARK-15008][ML][PySpark] Add integration test for OneVsRest ## What changes were proposed in this pull request? 1. Add `_transfer_param_map_to/from_java` for OneVsRest; 2. Add

[GitHub] spark pull request: [SPARK-14706][SPARK-14973][ML][PySpark] Python...

2016-05-01 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-216081217 @jkbradley I am preparing the SPARK-15008 now, but I am wondering why we need another persistence test for OneVsRest since we already have the `test_save_load

[GitHub] spark pull request: [SPARK-14973] The CrossValidator and TrainVali...

2016-05-01 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12825#issuecomment-216076628 @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-14973] The CrossValidator and TrainVali...

2016-05-01 Thread yinxusen
GitHub user yinxusen opened a pull request: https://github.com/apache/spark/pull/12825 [SPARK-14973] The CrossValidator and TrainValidationSplit miss the seed when saving and loading ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse

[GitHub] spark pull request: [SPARK-14706][SPARK-14973][ML][PySpark] Python...

2016-04-30 Thread yinxusen
Github user yinxusen closed the pull request at: https://github.com/apache/spark/pull/12604 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-14706][SPARK-14973][ML][PySpark] Python...

2016-04-30 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-216017708 Close this one and prepare the new PR for SPARK-15008 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-14931][ML][PySpark] Mismatched default ...

2016-04-30 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12738#issuecomment-216017506 I'll close this one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14931][ML][PySpark] Mismatched default ...

2016-04-30 Thread yinxusen
Github user yinxusen closed the pull request at: https://github.com/apache/spark/pull/12738 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-14931][ML][PYTHON] Mismatched default v...

2016-04-30 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12816#issuecomment-216017502 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-14931][ML][PYTHON] Mismatched default v...

2016-04-30 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12816#issuecomment-216017182 Yeah, I understand. Skipped two days for family issues. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-14931][ML][PYTHON] Mismatched default v...

2016-04-30 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12816#issuecomment-216017084 @jkbradley Sorry for the late, reviewing now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14931][ML][PySpark] Mismatched default ...

2016-04-28 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12738#issuecomment-215548179 @holdenk It seems not good enough, I'll add a unit test to check the mismatch, then it will be easy to fix the mismatch default values issue. --- If your project

[GitHub] spark pull request: [SPARK-14931][ML][PySpark] Mismatched default ...

2016-04-28 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12738#issuecomment-215534356 @holdenk Yes this PR is just a quick fix. I'll create a new JIRA for auditing default values. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-28 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-215318737 @jkbradley Another bug found: The `CrossValidator` and `TrainValidationSplit` miss the `seed` when saving and loading. I'd prefer to create a JIRA and fix

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-04-27 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r61368867 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedGeneralTypeParams.scala --- @@ -0,0 +1,34 @@ +/* --- End diff

[GitHub] spark pull request: [SPARK-14931][ML][PySpark] Mismatched default ...

2016-04-27 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12738#issuecomment-215171223 See example here https://github.com/apache/spark/pull/12604#issuecomment-214852153 --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-14931][ML][PySpark] Mismatched default ...

2016-04-27 Thread yinxusen
GitHub user yinxusen opened a pull request: https://github.com/apache/spark/pull/12738 [SPARK-14931][ML][PySpark] Mismatched default values between pipelines in Spark and PySpark ## What changes were proposed in this pull request? When transferring params from Spark

[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214897400 @jkbradley Here we go. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214890020 I'll do it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214852153 @jkbradley Find another bug: For LogisticRegression in PySpark, if you write it then reload it, the result is not identical with the previous one. Because some params

[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-26 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-214834313 JIRA created: https://issues.apache.org/jira/browse/SPARK-14924 for OneVsRest with classifier in estimatorParamMaps of tuning fail to persistence --- If your project

[GitHub] spark pull request: [SPARK-11399] Add label support in include_exa...

2016-04-25 Thread yinxusen
Github user yinxusen closed the pull request at: https://github.com/apache/spark/pull/9430 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-11399] Add label support in include_exa...

2016-04-25 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/9430#issuecomment-214472349 I'll close this one according to https://github.com/apache/spark/pull/11128 for now --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-13015][Docs] Replace example code in ml...

2016-04-25 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/11128#issuecomment-214436420 @keypointt How about we close this for now? There are too many small example codes. If we need all example codes out of markdown files, we can reopen it then. Thanks

[GitHub] spark pull request: [SPARK-7861][ML] PySpark OneVsRest

2016-04-22 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12124#issuecomment-213635964 @jkbradley Do you still have plans to solve the metadata problem for tree methods? I find that [SPARK-7126](https://issues.apache.org/jira/browse/SPARK-7126) aims

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-04-22 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/9#issuecomment-213607193 @jkbradley @mengxr I updated the MLReadable/MLWritable to previous DefaultMLReadable/DefaultMLWritable to prevent Mima failure. Can we catch it in 2.0? --- If your

[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-22 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-213578858 @jkbradley There is also a possible enhancement for `OneVsRest`. Currently `OneVsRest` requires `classifier` as its parameter, which is not easy-to-use especially

[GitHub] spark pull request: [Spark-14300][Docs][MLLIB]Scala MLlib examples...

2016-04-22 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12195#issuecomment-213575408 @mengxr LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [Spark-14300][Docs][MLLIB]Scala MLlib examples...

2016-04-22 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12195#issuecomment-213571733 @keypointt Sorry for the late. I'm taking a look now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-14706][ML][PySpark] Python ML persisten...

2016-04-22 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/12604#issuecomment-213571053 @jkbradley Here are what I found in the integration test. The UIDs of non-JavaParams classes look good so far with current tests. --- If your project is set up

[GitHub] spark pull request: [SPARK-14706] Python ML persistence integratio...

2016-04-22 Thread yinxusen
GitHub user yinxusen opened a pull request: https://github.com/apache/spark/pull/12604 [SPARK-14706] Python ML persistence integration test ## What changes were proposed in this pull request? This patch tests Python ML persistence integration. - Add persistency

  1   2   3   4   5   6   7   8   9   10   >