[GitHub] spark issue #19072: [SPARK-17133][ML][FOLLOW-UP] Add convenient method `asBi...

2017-08-29 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19072 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17014: [SPARK-18608][ML] Fix double-caching in ML algori...

2017-08-28 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17014#discussion_r135695930 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala --- @@ -85,6 +86,10 @@ abstract class Predictor[ M <: PredictionMo

[GitHub] spark pull request #19072: [SPARK-17133][ML][FOLLOW-UP] Add convenient metho...

2017-08-28 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/19072 [SPARK-17133][ML][FOLLOW-UP] Add convenient method `asBinary` for casting to BinaryLogisticRegressionSummary ## What changes were proposed in this pull request? add an "asB

[GitHub] spark pull request #19026: [SPARK-21681][ML] fix bug of MLOR do not work cor...

2017-08-28 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/19026 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #19018: [SPARK-21801][SPARKR][TEST] unit test randomly fail with...

2017-08-28 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19018 @felixcheung In Jenkins Log I only found Random forest and Decision Tree failed, random forest failed more frequently. thanks! --- If your project is set up for it, you can reply

[GitHub] spark pull request #17014: [SPARK-18608][ML] Fix double-caching in ML algori...

2017-08-28 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17014#discussion_r135534873 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala --- @@ -85,6 +86,10 @@ abstract class Predictor[ M <: PredictionMo

[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

2017-08-28 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19065 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #19065: [SPARK-21729][ML][TEST] Generic test for Probabil...

2017-08-28 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/19065 [SPARK-21729][ML][TEST] Generic test for ProbabilisticClassifier to ensure consistent output columns ## What changes were proposed in this pull request? Add test for prediction using

[GitHub] spark issue #19018: [SPARK-21801][SPARKR][TEST] unit test randomly fail with...

2017-08-27 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19018 ping @felixcheung We can make all R tests for trees deterministic (not only random trees). Leave other problems to separate PR. It would be great to fix it soon, Thanks! --- If your project

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-25 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-25 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateO...

2017-08-24 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19029#discussion_r135186430 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -438,6 +438,10 @@ private[ml] object SummaryBuilderImpl extends Logging

[GitHub] spark issue #19018: [SPARK-21801][SPARKR][TEST] unit test randomly fail with...

2017-08-24 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19018 @felixcheung This error occur in the OneHotEncoder inside the RFormula I think. Only OneHotEncoder will print this error message after I search the project... --- If your project is set up

[GitHub] spark pull request #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateO...

2017-08-23 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/19029 [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSummarizer.variance generate negative result ## What changes were proposed in this pull request? Because of numerical error

[GitHub] spark issue #19026: [SPARK-21681][ML] fix bug of MLOR do not work correctly ...

2017-08-23 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19026 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16774: [SPARK-19357][ML] Adding parallel model evaluation in ML...

2017-08-23 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/16774 @BryanCutler @MLnick I agree pick `HasParallel` into this PR because the `trait` has very little code. Another feature is pending on this PR. So we hope this get merged soon! cc @jkbradley

[GitHub] spark pull request #19026: [SPARK-21681][ML] fix bug of MLOR do not work cor...

2017-08-23 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/19026 [SPARK-21681][ML] fix bug of MLOR do not work correctly when featureStd contains zero (backport PR for 2.2) ## What changes were proposed in this pull request? This is backport PR

[GitHub] spark issue #18896: [SPARK-21681][ML] fix bug of MLOR do not work correctly ...

2017-08-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18896 @jkbradley OK. (Can this directly merged to 2.2 ?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with...

2017-08-22 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18538#discussion_r134449164 --- Diff: mllib/src/test/scala/org/apache/spark/ml/evaluation/ClusteringEvaluatorSuite.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed

[GitHub] spark issue #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should not coll...

2017-08-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18924 Thanks! I will take a look later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

2017-08-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17849 What do you think about this ? @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17373 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18992: [SPARK-19762][ML][FOLLOWUP]Add necessary comments...

2017-08-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18992#discussion_r134109929 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala --- @@ -57,6 +61,11 @@ private[ml] class

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-18 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133895488 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1574,18 +1588,17 @@ sealed trait

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-18 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133895254 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1357,23 +1361,23 @@ sealed trait

[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17373 @felixcheung So it do not cause bugs in sparkR, we can leave it in a separated PR ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133883361 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1324,90 +1350,136 @@ private[ml] class

[GitHub] spark issue #18896: [SPARK-21681][ML] fix bug of MLOR do not work correctly ...

2017-08-17 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18896 @jkkbradley OK. So I can remove the test I added ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

2017-08-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r133636062 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -882,21 +882,28 @@ class LogisticRegression @Since

[GitHub] spark issue #18896: [SPARK-21681][ML] fix bug of MLOR do not work correctly ...

2017-08-16 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18896 @MLnick I debug the testcase your mentioned. The reason is, the zero var cause the computation generate `Infinite` and `NaN` so the result is unexpectable, in this case, it happened

[GitHub] spark issue #18896: [SPARK-21681][ML] fix bug of MLOR do not work correctly ...

2017-08-16 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18896 @MLnick Yes it is always trained in scaled space. But the testcase you mentioned do not take the "scale" step, so do not trigger the bug... --- If your project is set up for it, you

[GitHub] spark issue #18896: [SPARK-21681][ML] fix bug of MLOR do not work correctly ...

2017-08-16 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18896 @MLnick That's because, this bug will be triggered only when we standardize feature first then do training... --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-15 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17373 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r133191237 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,593 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-15 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17373 cc @jkbradley Code updated, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r133121659 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,593 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18798 @yanboliang I will update ASAP, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r133119397 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,593 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18798 @viirya Sure! comment updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #18872: [SPARK-21723][ML] Fix writing LibSVM (key not fou...

2017-08-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18872#discussion_r133081682 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala --- @@ -109,14 +112,15 @@ class LibSVMRelationSuite

[GitHub] spark pull request #18872: [SPARK-21723][ML] Fix writing LibSVM (key not fou...

2017-08-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18872#discussion_r133082255 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala --- @@ -109,14 +112,15 @@ class LibSVMRelationSuite

[GitHub] spark pull request #18872: [SPARK-21723][ML] Fix writing LibSVM (key not fou...

2017-08-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18872#discussion_r133081995 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala --- @@ -109,14 +112,15 @@ class LibSVMRelationSuite

[GitHub] spark pull request #18736: [SPARK-21481][ML] Add indexOf method for ml.featu...

2017-08-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18736#discussion_r133080543 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala --- @@ -80,20 +82,31 @@ class HashingTF @Since("1.4.0") (@Si

[GitHub] spark pull request #18736: [SPARK-21481][ML] Add indexOf method for ml.featu...

2017-08-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18736#discussion_r133080201 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/HashingTFSuite.scala --- @@ -69,6 +69,20 @@ class HashingTFSuite extends SparkFunSuite

[GitHub] spark pull request #18736: [SPARK-21481][ML] Add indexOf method for ml.featu...

2017-08-09 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18736#discussion_r132253487 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala --- @@ -90,10 +92,22 @@ class HashingTF @Since("1.4.0") (@Si

[GitHub] spark pull request #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operation...

2017-08-09 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17894#discussion_r132249512 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1722,25 +1723,22 @@ private class

[GitHub] spark pull request #18896: [SPARK-21681][ML] fix bug of MLOR do not work cor...

2017-08-09 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/18896 [SPARK-21681][ML] fix bug of MLOR do not work correctly when featureStd contains zero ## What changes were proposed in this pull request? fix bug of MLOR do not work correctly

[GitHub] spark issue #16774: [SPARK-19357][ML] Adding parallel model evaluation in ML...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/16774 @BryanCutler You are right. Once `Future` complete the model can be cleaned by GC. So the memory cost of the code has been optimized already. I didn't look at the code carefully a few days ago

[GitHub] spark issue #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operations in Lo...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17894 I am also interested in implementation by level-3 BLAS. Can you post a design doc first? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #18888: [Spark-17025][ML][Python] Persistence for Pipelin...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/1#discussion_r132072527 --- Diff: python/pyspark/ml/pipeline.py --- @@ -242,3 +327,65 @@ def _to_java(self): JavaParams._new_java_obj

[GitHub] spark pull request #18888: [Spark-17025][ML][Python] Persistence for Pipelin...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/1#discussion_r132070100 --- Diff: python/pyspark/ml/pipeline.py --- @@ -204,13 +282,20 @@ def copy(self, extra=None): @since("2.0.0") def

[GitHub] spark pull request #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operation...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17894#discussion_r132068663 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1722,25 +1723,22 @@ private class

[GitHub] spark pull request #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operation...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17894#discussion_r132069046 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1722,25 +1723,22 @@ private class

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 cc @jkbradley @MrBago thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17583: [SPARK-20271]Add FuncTransformer to simplify cust...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17583#discussion_r132058898 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FuncTransformer.scala --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18736: [SPARK-21481][ML] Add indexOf method for ml.featu...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18736#discussion_r132058692 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala --- @@ -90,10 +92,22 @@ class HashingTF @Since("1.4.0") (@Si

[GitHub] spark pull request #17583: [SPARK-20271]Add FuncTransformer to simplify cust...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17583#discussion_r132052106 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FuncTransformer.scala --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r132021227 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala --- @@ -107,9 +103,9 @@ class

[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r131992917 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala --- @@ -83,6 +83,36 @@ class

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18798 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r131979173 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,593 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r131824713 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala --- @@ -82,6 +83,23 @@ class

[GitHub] spark pull request #17373: [SPARK-12664] Expose probability in mlp model

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17373#discussion_r131790673 --- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala --- @@ -463,7 +479,7 @@ private[ml] class FeedForwardModel private( private

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r131772274 --- Diff: python/pyspark/ml/util.py --- @@ -61,33 +66,86 @@ def _randomUID(cls): @inherit_doc -class MLWriter(object): +class

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r131771002 --- Diff: python/pyspark/ml/tests.py --- @@ -1158,6 +1165,33 @@ def test_decisiontree_regressor(self): except OSError

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r131766119 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,213 @@ +/* + * Licensed

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r131766525 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,213 @@ +/* + * Licensed

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r131767248 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,213 @@ +/* + * Licensed

[GitHub] spark pull request #14326: [SPARK-3181] [ML] Implement RobustRegression with...

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14326#discussion_r131763824 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/RobustRegression.scala --- @@ -0,0 +1,497 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14326: [SPARK-3181] [ML] Implement RobustRegression with...

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14326#discussion_r131762320 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/RobustRegression.scala --- @@ -0,0 +1,497 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14326: [SPARK-3181] [ML] Implement RobustRegression with...

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14326#discussion_r131764683 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/RobustRegression.scala --- @@ -0,0 +1,466 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #18798: [WIP] [SPARK-19634][ML] Multivariate summarizer -...

2017-08-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r131735748 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -547,35 +533,11 @@ object SummaryBuilderImpl extends Logging

[GitHub] spark issue #18797: [SPARK-21523][ML] update breeze to 0.13.2 for an emergen...

2017-08-04 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18797 @srowen Great! thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18797: [SPARK-21523][ML] update breeze to 0.13.2 for an emergen...

2017-08-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18797 Thanks! Waiting AFT testcode author to figure out how to modify the testcase. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

2017-08-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17849 Thanks your work on this but I am curious what is the benefit of doing this? In pyspark there is no param in Model itself currently, what is the problem or bugs it can resolve after adding

[GitHub] spark issue #18797: [SPARK-21523][ML] update breeze to 0.13.2 for an emergen...

2017-08-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18797 @srowen Yeah, the third case is another problem (I think we can simply change the iter num 7 to 6 in testcase) I am curious about the first two cases, why trigger the require fail

[GitHub] spark issue #18797: [SPARK-21523][ML] update breeze to 0.13.2 for an emergen...

2017-08-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18797 Strange thing, the code failed this `require` at https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/StrongWolfe.scala#L73 in the three case

[GitHub] spark pull request #18746: [ML][Python] Implemented UnaryTransformer in Pyth...

2017-08-02 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18746#discussion_r130940500 --- Diff: python/pyspark/ml/base.py --- @@ -116,3 +121,44 @@ class Model(Transformer): """ __metacl

[GitHub] spark issue #18313: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18313 @jkbradley I think the thing is simple. When persist model list param is `false`, just keep the code logic the same and **it won't increase the memory cost** (This is the default case

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130747756 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130746993 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130746893 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18798 @thunterdb 1) The dataframe deserialize from binary data will add overhead, (maybe there is compaction or not, it depends on the datatype, cc @liancheng ) about 1x performance in my test

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r130745275 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +289,124 @@ def numFeatures(self): Returns the number of features the model was trained

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18798 performance data attached. cc @thunterdb @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130683266 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130684686 --- Diff: mllib/src/test/scala/org/apache/spark/ml/stat/SummarizerSuite.scala --- @@ -0,0 +1,619 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130684135 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130680940 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130683584 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130682301 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18798#discussion_r130684437 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18798: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-08-01 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/18798 [SPARK-19634][ML] Multivariate summarizer - dataframes API ## What changes were proposed in this pull request? This patch adds the DataFrames API to the multivariate summarizer (mean

[GitHub] spark pull request #18797: [SPARK-21523] update breeze to 0.13.1 for an emer...

2017-08-01 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/18797 [SPARK-21523] update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search ## What changes were proposed in this pull request? Update breeze to 0.13.1 for an emergency

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-07-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r130523538 --- Diff: python/pyspark/ml/util.py --- @@ -283,3 +289,124 @@ def numFeatures(self): Returns the number of features the model was trained

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-07-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18742#discussion_r130519716 --- Diff: python/pyspark/ml/param/__init__.py --- @@ -375,6 +375,18 @@ def copy(self, extra=None): that._defaultParamMap

<    3   4   5   6   7   8   9   10   11   12   >