Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12788#discussion_r62573653
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/GaussianMixtureExample.scala
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/11844#issuecomment-217954275
@zhengruifeng Can you make it sharing with GMM? Once your PR is merged, I
can change mine to use your data. Thanks!
---
If your project is set up for it, you
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12788#discussion_r62533303
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/GaussianMixtureExample.scala
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12969#discussion_r62532641
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -744,7 +744,13 @@ private[classification] class
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12969#discussion_r62525721
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -744,7 +744,13 @@ private[classification] class
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12922#issuecomment-217909119
@MLnick Do you want me to do "adding accuracy to the ml binary
classification evaluator" in this JIRA or in a separate JIRA? Thanks!
---
If your proj
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12969#discussion_r62420025
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -744,7 +744,13 @@ private[classification] class
GitHub user wangmiao1981 opened a pull request:
https://github.com/apache/spark/pull/12969
[SPARK-15096][ML]:LogisticRegression MultiClassSummarizer numClasses can
fail if no valid labels are found
## What changes were proposed in this pull request?
(Please fill in changes
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12788#issuecomment-217562662
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12788#issuecomment-217541412
@sethah @zhengruifeng @yanboliang I made changes to address comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12788#issuecomment-217534939
retest it please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12922#issuecomment-217285315
@MLnick We can add accuracy to BinaryClassificationEvaluator. But we need
to add new API of calculate accuracy as a Double. Now, it is RDD[Double,
Double
GitHub user wangmiao1981 opened a pull request:
https://github.com/apache/spark/pull/12922
[SPARK-15145][ML]:spark.ml binary classification should include accuracy
## What changes were proposed in this pull request?
Add accuracy into binary classification metrics
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12882#discussion_r62083764
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
---
@@ -97,6 +98,7 @@ class
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12882#discussion_r62082958
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
---
@@ -97,6 +98,7 @@ class
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12882#discussion_r62079416
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
---
@@ -97,6 +98,7 @@ class
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12882#discussion_r62078845
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala
---
@@ -151,6 +151,14 @@ class MulticlassMetrics @Since
GitHub user wangmiao1981 opened a pull request:
https://github.com/apache/spark/pull/12882
[SPARK-14900][ML]:spark.ml classification metrics should include accuracy
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
Add
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12788#issuecomment-215999406
cc @yanboliang @jkbradley @MLnick @holdenk
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
GitHub user wangmiao1981 opened a pull request:
https://github.com/apache/spark/pull/12788
[SPARK-14434][ML]:User guide doc and examples for GaussianMixture in
spark.ml
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12560#issuecomment-215235728
@thunterdb What do you think about our discussions? Thanks!
Miao
---
If your project is set up for it, you can reply to this email and have your
reply
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12717#discussion_r61306574
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionSummaryExample.scala
---
@@ -30,11 +30,11 @@ object
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12717#discussion_r61305059
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionSummaryExample.scala
---
@@ -30,11 +30,11 @@ object
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12560#issuecomment-215135795
@MLnick @yanboliang Any further comments? Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12717#discussion_r61288394
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionSummaryExample.scala
---
@@ -30,11 +30,11 @@ object
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12717#issuecomment-214970534
@yanboliang Can you take a look ? It is a simple fix. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12717#issuecomment-214919250
retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user wangmiao1981 opened a pull request:
https://github.com/apache/spark/pull/12717
[SPARK-14937][ML][Document]spark.ml LogisticRegression sqlCtx in scala is
inconsistent with java and python
## What changes were proposed in this pull request?
In spark.ml document
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12560#issuecomment-214515243
@MLnick I agree. I will remove the feature log now and only log parameters.
I will keep the named feature method.
---
If your project is set up for it, you can
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12560#issuecomment-214474647
@MLnick Yanbo does not like the change of train() API. The new parameter is
optional, so the user of train should not be aware of this change. In addition,
I
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12402#discussion_r60963521
--- Diff: python/pyspark/ml/clustering.py ---
@@ -22,7 +22,151 @@
from pyspark.mllib.common import inherit_doc
__all__
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12402#issuecomment-213514940
@yanboliang @jkbradley I made all suggested changes and improved document
in the comments. Thanks!
---
If your project is set up for it, you can reply
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12560#discussion_r60701976
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -607,7 +611,8 @@ object ALS extends DefaultParamsReadable[ALS
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12402#issuecomment-213306253
retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12402#discussion_r60694336
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
@@ -104,6 +105,17 @@ class GaussianMixtureModel private[ml
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12560#issuecomment-213264393
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12402#issuecomment-213232640
@jkbradley Thanks for your review! I will make the changes accordingly.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12402#issuecomment-213096256
@jkbradley @yanboliang I made changes and remove unused import.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12560#issuecomment-213059111
@thunterdb train method has count information, but it will change the
signature of the train method. I am learning how to avoid collect and changing
signature
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12560#issuecomment-213023729
Thanks all for your comments! Let me figure out how to collect the
information without slowing the algorithm. @MLnick The names are passed to the
log. For example
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12406#issuecomment-212752658
@xwu0226 use git rebase upstream/master. Do not use git merge
upstream/master. I have the same issue before. git merge will add others'
commits to your PR. git
GitHub user wangmiao1981 opened a pull request:
https://github.com/apache/spark/pull/12560
[SPARK-14571][ML]Log instrumentation in ALS
## What changes were proposed in this pull request?
Add log instrumentation for parameters:
rank, numUserBlocks, numItemBlocks
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12402#issuecomment-212622646
@jkbradley I replied your inline comment to clarify your suggestion, before
I making any changes. Thanks!
---
If your project is set up for it, you can reply
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12402#discussion_r60493825
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
@@ -105,6 +108,15 @@ class GaussianMixtureModel private[ml
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12402#discussion_r60313413
--- Diff: python/pyspark/ml/clustering.py ---
@@ -20,9 +20,150 @@
from pyspark.ml.wrapper import JavaEstimator, JavaModel
from
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12402#discussion_r60290299
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
@@ -105,6 +108,15 @@ class GaussianMixtureModel private[ml
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12402#discussion_r60123953
--- Diff: python/pyspark/ml/clustering.py ---
@@ -22,7 +22,151 @@
from pyspark.mllib.common import inherit_doc
__all__
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12402#discussion_r59895216
--- Diff: python/pyspark/ml/clustering.py ---
@@ -22,7 +22,151 @@
from pyspark.mllib.common import inherit_doc
__all__
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12402#issuecomment-210177866
./dev/lint-python passed, but integration test still failed. Anything I
missed for unit test?
---
If your project is set up for it, you can reply to this email
GitHub user wangmiao1981 opened a pull request:
https://github.com/apache/spark/pull/12402
[SPARK-14433][PySpark][ML]:PySpark ml GaussianMixture
## What changes were proposed in this pull request?
Add Python API in ML for GaussianMixture
## How was this patch
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12116#discussion_r58981236
--- Diff: python/pyspark/ml/regression.py ---
@@ -433,12 +440,12 @@ class DecisionTreeRegressor(JavaEstimator,
HasFeaturesCol, HasLabelCol, HasPredi
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58929398
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -127,6 +146,9 @@ class CountVectorizer(override val uid: String
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12116#discussion_r58909275
--- Diff: python/pyspark/ml/regression.py ---
@@ -425,6 +425,10 @@ class DecisionTreeRegressor(JavaEstimator,
HasFeaturesCol, HasLabelCol, HasPredi
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206999812
@MLnick I will revise the test accordingly. I think after testing the
estimator, I need to turn off the flag of the trained model first. Otherwise,
the binary
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58754307
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -183,6 +183,26 @@ class CountVectorizerSuite extends
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58750775
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -183,6 +183,26 @@ class CountVectorizerSuite extends
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58748929
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -183,6 +183,26 @@ class CountVectorizerSuite extends
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58746316
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -183,6 +183,26 @@ class CountVectorizerSuite extends
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58744522
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -183,6 +183,26 @@ class CountVectorizerSuite extends
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58740250
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -100,6 +103,24 @@ private[feature] trait CountVectorizerParams
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58739580
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -42,7 +42,8 @@ private[feature] trait CountVectorizerParams
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12116#issuecomment-206449257
@holdenk I am think what tests should be added. Do you have any suggestions?
Thanks!
Miao
---
If your project is set up for it, you can reply
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12200#issuecomment-206181665
@MLnick can you trigger the auto test? It seems that I am not in the white
list. I had one JIRA merged to master. Thanks!
Miao
---
If your project is set up
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12200#discussion_r58662007
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -115,6 +115,27 @@ class CountVectorizerSuite extends
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12116#issuecomment-206145891
@jkbradley Can you add me to white list to trigger the integration test?
Thanks!
Miao
---
If your project is set up for it, you can reply
GitHub user wangmiao1981 opened a pull request:
https://github.com/apache/spark/pull/12200
[SPARK-14392][ML]CountVectorizer Estimator should include binary toggle
Param
## What changes were proposed in this pull request?
CountVectorizerModel has a binary toggle param
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12116#issuecomment-205896732
@holdenk Thanks for your comments! I will make changes accordingly.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12116#issuecomment-205520304
@holdenk I made the changes and tested the gen code. Can you review it?
Thanks!
---
If your project is set up for it, you can reply to this email and have your
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/12116#issuecomment-205513619
@holdenk Thanks for pointing it out. I will revise it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
GitHub user wangmiao1981 opened a pull request:
https://github.com/apache/spark/pull/12116
[SPARK-12569][PySpark][ML]:DecisionTreeRegressor: provide variance of
prediction: Python AP
## What changes were proposed in this pull request?
A new column VarianceCol has been
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/11945#discussion_r57636219
--- Diff: python/pyspark/ml/tests.py ---
@@ -655,6 +656,20 @@ def test_nested_pipeline_persistence(self):
except OSError
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/11945#discussion_r57632372
--- Diff: python/pyspark/ml/tests.py ---
@@ -655,6 +656,20 @@ def test_nested_pipeline_persistence(self):
except OSError
Github user wangmiao1981 commented on a diff in the pull request:
https://github.com/apache/spark/pull/11945#discussion_r57545765
--- Diff: python/pyspark/ml/tests.py ---
@@ -655,6 +656,20 @@ def test_nested_pipeline_persistence(self):
except OSError
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/11945#issuecomment-201947768
@jkbradley I am not sure whether the property tag will change the
appearance of the members in the doc. I can do a quick check by roll-back the
change to check
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/11945#issuecomment-201375961
Jenkins, test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/11945#issuecomment-201174996
Jenkins, test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/11945#issuecomment-201146670
Found the issue:
PEP8 checks failed.
./python/pyspark/ml/tests.py:658:5: E301 expected 1 blank line, found 0
---
If your project is set up
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/11945#issuecomment-201146467
Build finished. The HTML pages are in _build/html.
[error] running
/home/jenkins/workspace/SparkPullRequestBuilder@3/dev/lint-python ; received
return code 1
GitHub user wangmiao1981 opened a pull request:
https://github.com/apache/spark/pull/11945
[SPARK-14071][PySpark][ML]Change MLWritable.write to be a property
Add property to MLWritable.write method, so we can use .write instead of
.write()
Add a new test to ml/test.py
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/11582#issuecomment-198064456
close this one as it has been merged with 11707.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user wangmiao1981 closed the pull request at:
https://github.com/apache/spark/pull/11582
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/11552#issuecomment-194441901
@GayathriMurali Thanks! I see you add one more classification other than
logisticregression and navie bayes. When I was working on my code base, that
classifier
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/11582#issuecomment-194014892
@srowen I added the title in the pull request. Sorry for causing the
confusion here. I only made changes in one python file. All other changes are
merged from
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/11552#issuecomment-193931859
Hi Gayathri,
I put my comments in the JIRA about 2 weeks ago and worked with Yanbo on
putting some code.
Can we work together to get it merged? I
GitHub user wangmiao1981 opened a pull request:
https://github.com/apache/spark/pull/11582
SPARK-13034
I added Import and Export for Logisticregression and Naive Bayes
Test ./python/run-tests --python-executables=python2.7 --modules=pyspark-ml
Result:
Running
Github user wangmiao1981 commented on the pull request:
https://github.com/apache/spark/pull/11380#issuecomment-189038485
Sorry for mistakenly sending it out. I want to merge Master code to my own
branch.
---
If your project is set up for it, you can reply to this email and have
Github user wangmiao1981 closed the pull request at:
https://github.com/apache/spark/pull/11380
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user wangmiao1981 opened a pull request:
https://github.com/apache/spark/pull/11380
merge code
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested?
(Please explain how
601 - 688 of 688 matches
Mail list logo