[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...

2016-05-09 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12788#discussion_r62573653 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/GaussianMixtureExample.scala --- @@ -0,0 +1,71 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-14340][EXAMPLE][DOC] Update Examples an...

2016-05-09 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/11844#issuecomment-217954275 @zhengruifeng Can you make it sharing with GMM? Once your PR is merged, I can change mine to use your data. Thanks! --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...

2016-05-09 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12788#discussion_r62533303 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/GaussianMixtureExample.scala --- @@ -0,0 +1,71 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-15096][ML]:LogisticRegression MultiClas...

2016-05-09 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12969#discussion_r62532641 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -744,7 +744,13 @@ private[classification] class

[GitHub] spark pull request: [SPARK-15096][ML]:LogisticRegression MultiClas...

2016-05-09 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12969#discussion_r62525721 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -744,7 +744,13 @@ private[classification] class

[GitHub] spark pull request: [SPARK-15145][ML]:spark.ml binary classificati...

2016-05-09 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12922#issuecomment-217909119 @MLnick Do you want me to do "adding accuracy to the ml binary classification evaluator" in this JIRA or in a separate JIRA? Thanks! --- If your proj

[GitHub] spark pull request: [SPARK-15096][ML]:LogisticRegression MultiClas...

2016-05-07 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12969#discussion_r62420025 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -744,7 +744,13 @@ private[classification] class

[GitHub] spark pull request: [SPARK-15096][ML]:LogisticRegression MultiClas...

2016-05-06 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/12969 [SPARK-15096][ML]:LogisticRegression MultiClassSummarizer numClasses can fail if no valid labels are found ## What changes were proposed in this pull request? (Please fill in changes

[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...

2016-05-06 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12788#issuecomment-217562662 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...

2016-05-06 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12788#issuecomment-217541412 @sethah @zhengruifeng @yanboliang I made changes to address comments. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...

2016-05-06 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12788#issuecomment-217534939 retest it please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-15145][ML]:spark.ml binary classificati...

2016-05-05 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12922#issuecomment-217285315 @MLnick We can add accuracy to BinaryClassificationEvaluator. But we need to add new API of calculate accuracy as a Double. Now, it is RDD[Double, Double

[GitHub] spark pull request: [SPARK-15145][ML]:spark.ml binary classificati...

2016-05-04 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/12922 [SPARK-15145][ML]:spark.ml binary classification should include accuracy ## What changes were proposed in this pull request? Add accuracy into binary classification metrics

[GitHub] spark pull request: [SPARK-14900][ML]:spark.ml classification metr...

2016-05-04 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12882#discussion_r62083764 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala --- @@ -97,6 +98,7 @@ class

[GitHub] spark pull request: [SPARK-14900][ML]:spark.ml classification metr...

2016-05-04 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12882#discussion_r62082958 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala --- @@ -97,6 +98,7 @@ class

[GitHub] spark pull request: [SPARK-14900][ML]:spark.ml classification metr...

2016-05-04 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12882#discussion_r62079416 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala --- @@ -97,6 +98,7 @@ class

[GitHub] spark pull request: [SPARK-14900][ML]:spark.ml classification metr...

2016-05-04 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12882#discussion_r62078845 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala --- @@ -151,6 +151,14 @@ class MulticlassMetrics @Since

[GitHub] spark pull request: [SPARK-14900][ML]:spark.ml classification metr...

2016-05-03 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/12882 [SPARK-14900][ML]:spark.ml classification metrics should include accuracy ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) Add

[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...

2016-04-30 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12788#issuecomment-215999406 cc @yanboliang @jkbradley @MLnick @holdenk --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...

2016-04-29 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/12788 [SPARK-14434][ML]:User guide doc and examples for GaussianMixture in spark.ml ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix

[GitHub] spark pull request: [SPARK-14571][ML]Log instrumentation in ALS

2016-04-27 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12560#issuecomment-215235728 @thunterdb What do you think about our discussions? Thanks! Miao --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-14937][ML][Document]spark.ml LogisticRe...

2016-04-27 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12717#discussion_r61306574 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionSummaryExample.scala --- @@ -30,11 +30,11 @@ object

[GitHub] spark pull request: [SPARK-14937][ML][Document]spark.ml LogisticRe...

2016-04-27 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12717#discussion_r61305059 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionSummaryExample.scala --- @@ -30,11 +30,11 @@ object

[GitHub] spark pull request: [SPARK-14571][ML]Log instrumentation in ALS

2016-04-27 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12560#issuecomment-215135795 @MLnick @yanboliang Any further comments? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-14937][ML][Document]spark.ml LogisticRe...

2016-04-27 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12717#discussion_r61288394 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionSummaryExample.scala --- @@ -30,11 +30,11 @@ object

[GitHub] spark pull request: [SPARK-14937][ML][Document]spark.ml LogisticRe...

2016-04-26 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12717#issuecomment-214970534 @yanboliang Can you take a look ? It is a simple fix. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-14937][ML][Document]spark.ml LogisticRe...

2016-04-26 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12717#issuecomment-214919250 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14937][ML][Document]spark.ml LogisticRe...

2016-04-26 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/12717 [SPARK-14937][ML][Document]spark.ml LogisticRegression sqlCtx in scala is inconsistent with java and python ## What changes were proposed in this pull request? In spark.ml document

[GitHub] spark pull request: [SPARK-14571][ML]Log instrumentation in ALS

2016-04-25 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12560#issuecomment-214515243 @MLnick I agree. I will remove the feature log now and only log parameters. I will keep the named feature method. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-14571][ML]Log instrumentation in ALS

2016-04-25 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12560#issuecomment-214474647 @MLnick Yanbo does not like the change of train() API. The new parameter is optional, so the user of train should not be aware of this change. In addition, I

[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...

2016-04-25 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12402#discussion_r60963521 --- Diff: python/pyspark/ml/clustering.py --- @@ -22,7 +22,151 @@ from pyspark.mllib.common import inherit_doc __all__

[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...

2016-04-22 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12402#issuecomment-213514940 @yanboliang @jkbradley I made all suggested changes and improved document in the comments. Thanks! --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-14571][ML]Log instrumentation in ALS

2016-04-22 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12560#discussion_r60701976 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -607,7 +611,8 @@ object ALS extends DefaultParamsReadable[ALS

[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...

2016-04-22 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12402#issuecomment-213306253 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...

2016-04-22 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12402#discussion_r60694336 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -104,6 +105,17 @@ class GaussianMixtureModel private[ml

[GitHub] spark pull request: [SPARK-14571][ML]Log instrumentation in ALS

2016-04-21 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12560#issuecomment-213264393 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...

2016-04-21 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12402#issuecomment-213232640 @jkbradley Thanks for your review! I will make the changes accordingly. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...

2016-04-21 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12402#issuecomment-213096256 @jkbradley @yanboliang I made changes and remove unused import. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-14571][ML]Log instrumentation in ALS

2016-04-21 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12560#issuecomment-213059111 @thunterdb train method has count information, but it will change the signature of the train method. I am learning how to avoid collect and changing signature

[GitHub] spark pull request: [SPARK-14571][ML]Log instrumentation in ALS

2016-04-21 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12560#issuecomment-213023729 Thanks all for your comments! Let me figure out how to collect the information without slowing the algorithm. @MLnick The names are passed to the log. For example

[GitHub] spark pull request: [SPARK-14346] [SQL] Show create table

2016-04-20 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12406#issuecomment-212752658 @xwu0226 use git rebase upstream/master. Do not use git merge upstream/master. I have the same issue before. git merge will add others' commits to your PR. git

[GitHub] spark pull request: [SPARK-14571][ML]Log instrumentation in ALS

2016-04-20 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/12560 [SPARK-14571][ML]Log instrumentation in ALS ## What changes were proposed in this pull request? Add log instrumentation for parameters: rank, numUserBlocks, numItemBlocks

[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...

2016-04-20 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12402#issuecomment-212622646 @jkbradley I replied your inline comment to clarify your suggestion, before I making any changes. Thanks! --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...

2016-04-20 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12402#discussion_r60493825 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -105,6 +108,15 @@ class GaussianMixtureModel private[ml

[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...

2016-04-19 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12402#discussion_r60313413 --- Diff: python/pyspark/ml/clustering.py --- @@ -20,9 +20,150 @@ from pyspark.ml.wrapper import JavaEstimator, JavaModel from

[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...

2016-04-19 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12402#discussion_r60290299 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -105,6 +108,15 @@ class GaussianMixtureModel private[ml

[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...

2016-04-18 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12402#discussion_r60123953 --- Diff: python/pyspark/ml/clustering.py --- @@ -22,7 +22,151 @@ from pyspark.mllib.common import inherit_doc __all__

[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...

2016-04-15 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12402#discussion_r59895216 --- Diff: python/pyspark/ml/clustering.py --- @@ -22,7 +22,151 @@ from pyspark.mllib.common import inherit_doc __all__

[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...

2016-04-14 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12402#issuecomment-210177866 ./dev/lint-python passed, but integration test still failed. Anything I missed for unit test? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-14433][PySpark][ML]:PySpark ml Gaussian...

2016-04-14 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/12402 [SPARK-14433][PySpark][ML]:PySpark ml GaussianMixture ## What changes were proposed in this pull request? Add Python API in ML for GaussianMixture ## How was this patch

[GitHub] spark pull request: [SPARK-12569][PySpark][ML]:DecisionTreeRegress...

2016-04-07 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12116#discussion_r58981236 --- Diff: python/pyspark/ml/regression.py --- @@ -433,12 +440,12 @@ class DecisionTreeRegressor(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredi

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-07 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58929398 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -127,6 +146,9 @@ class CountVectorizer(override val uid: String

[GitHub] spark pull request: [SPARK-12569][PySpark][ML]:DecisionTreeRegress...

2016-04-07 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12116#discussion_r58909275 --- Diff: python/pyspark/ml/regression.py --- @@ -425,6 +425,10 @@ class DecisionTreeRegressor(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredi

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-07 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206999812 @MLnick I will revise the test accordingly. I think after testing the estimator, I need to turn off the flag of the trained model first. Otherwise, the binary

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58754307 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -183,6 +183,26 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58750775 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -183,6 +183,26 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58748929 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -183,6 +183,26 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58746316 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -183,6 +183,26 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58744522 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -183,6 +183,26 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58740250 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -100,6 +103,24 @@ private[feature] trait CountVectorizerParams

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58739580 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -42,7 +42,8 @@ private[feature] trait CountVectorizerParams

[GitHub] spark pull request: [SPARK-12569][PySpark][ML]:DecisionTreeRegress...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12116#issuecomment-206449257 @holdenk I am think what tests should be added. Do you have any suggestions? Thanks! Miao --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206181665 @MLnick can you trigger the auto test? It seems that I am not in the white list. I had one JIRA merged to master. Thanks! Miao --- If your project is set up

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58662007 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -115,6 +115,27 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-12569][PySpark][ML]:DecisionTreeRegress...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12116#issuecomment-206145891 @jkbradley Can you add me to white list to trigger the integration test? Thanks! Miao --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/12200 [SPARK-14392][ML]CountVectorizer Estimator should include binary toggle Param ## What changes were proposed in this pull request? CountVectorizerModel has a binary toggle param

[GitHub] spark pull request: [SPARK-12569][PySpark][ML]:DecisionTreeRegress...

2016-04-05 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12116#issuecomment-205896732 @holdenk Thanks for your comments! I will make changes accordingly. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-12569][PySpark][ML]:DecisionTreeRegress...

2016-04-04 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12116#issuecomment-205520304 @holdenk I made the changes and tested the gen code. Can you review it? Thanks! --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-12569][PySpark][ML]:DecisionTreeRegress...

2016-04-04 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12116#issuecomment-205513619 @holdenk Thanks for pointing it out. I will revise it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-12569][PySpark][ML]:DecisionTreeRegress...

2016-04-01 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/12116 [SPARK-12569][PySpark][ML]:DecisionTreeRegressor: provide variance of prediction: Python AP ## What changes were proposed in this pull request? A new column VarianceCol has been

[GitHub] spark pull request: [SPARK-14071][PySpark][ML]Change MLWritable.wr...

2016-03-28 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/11945#discussion_r57636219 --- Diff: python/pyspark/ml/tests.py --- @@ -655,6 +656,20 @@ def test_nested_pipeline_persistence(self): except OSError

[GitHub] spark pull request: [SPARK-14071][PySpark][ML]Change MLWritable.wr...

2016-03-28 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/11945#discussion_r57632372 --- Diff: python/pyspark/ml/tests.py --- @@ -655,6 +656,20 @@ def test_nested_pipeline_persistence(self): except OSError

[GitHub] spark pull request: [SPARK-14071][PySpark][ML]Change MLWritable.wr...

2016-03-27 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/11945#discussion_r57545765 --- Diff: python/pyspark/ml/tests.py --- @@ -655,6 +656,20 @@ def test_nested_pipeline_persistence(self): except OSError

[GitHub] spark pull request: [SPARK-14071][PySpark][ML]Change MLWritable.wr...

2016-03-26 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/11945#issuecomment-201947768 @jkbradley I am not sure whether the property tag will change the appearance of the members in the doc. I can do a quick check by roll-back the change to check

[GitHub] spark pull request: [SPARK-14071][PySpark][ML]Change MLWritable.wr...

2016-03-25 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/11945#issuecomment-201375961 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14071][PySpark][ML]Change MLWritable.wr...

2016-03-25 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/11945#issuecomment-201174996 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14071][PySpark][ML]Change MLWritable.wr...

2016-03-24 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/11945#issuecomment-201146670 Found the issue: PEP8 checks failed. ./python/pyspark/ml/tests.py:658:5: E301 expected 1 blank line, found 0 --- If your project is set up

[GitHub] spark pull request: [SPARK-14071][PySpark][ML]Change MLWritable.wr...

2016-03-24 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/11945#issuecomment-201146467 Build finished. The HTML pages are in _build/html. [error] running /home/jenkins/workspace/SparkPullRequestBuilder@3/dev/lint-python ; received return code 1

[GitHub] spark pull request: [SPARK-14071][PySpark][ML]Change MLWritable.wr...

2016-03-24 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/11945 [SPARK-14071][PySpark][ML]Change MLWritable.write to be a property Add property to MLWritable.write method, so we can use .write instead of .write() Add a new test to ml/test.py

[GitHub] spark pull request: SPARK-13034[ML]:PySpark ml.classification supp...

2016-03-19 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/11582#issuecomment-198064456 close this one as it has been merged with 11707. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-13034[ML]:PySpark ml.classification supp...

2016-03-19 Thread wangmiao1981
Github user wangmiao1981 closed the pull request at: https://github.com/apache/spark/pull/11582 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-13034] Add export/import for all estima...

2016-03-09 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/11552#issuecomment-194441901 @GayathriMurali Thanks! I see you add one more classification other than logisticregression and navie bayes. When I was working on my code base, that classifier

[GitHub] spark pull request: SPARK-13034[ML]:PySpark ml.classification supp...

2016-03-08 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/11582#issuecomment-194014892 @srowen I added the title in the pull request. Sorry for causing the confusion here. I only made changes in one python file. All other changes are merged from

[GitHub] spark pull request: [SPARK-13034] Add export/import for all estima...

2016-03-08 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/11552#issuecomment-193931859 Hi Gayathri, I put my comments in the JIRA about 2 weeks ago and worked with Yanbo on putting some code. Can we work together to get it merged? I

[GitHub] spark pull request: SPARK-13034

2016-03-08 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/11582 SPARK-13034 I added Import and Export for Logisticregression and Naive Bayes Test ./python/run-tests --python-executables=python2.7 --modules=pyspark-ml Result: Running

[GitHub] spark pull request: merge code

2016-02-25 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/11380#issuecomment-189038485 Sorry for mistakenly sending it out. I want to merge Master code to my own branch. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: merge code

2016-02-25 Thread wangmiao1981
Github user wangmiao1981 closed the pull request at: https://github.com/apache/spark/pull/11380 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: merge code

2016-02-25 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/11380 merge code ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how

<    2   3   4   5   6   7