[GitHub] spark pull request #17478: [SPARK-18901][ML]:Require in LR LogisticAggregato...

2017-03-30 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/17478 [SPARK-18901][ML]:Require in LR LogisticAggregator is redundant ## What changes were proposed in this pull request? In MultivariateOnlineSummarizer, `add` and `merge` have

[GitHub] spark pull request #17474: [Minor][SparkR]: Add run command comment in examp...

2017-03-29 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/17474 [Minor][SparkR]: Add run command comment in examples ## What changes were proposed in this pull request? There are two examples in r folder missing the run commands

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-03-27 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 gentle ping @jkbradley @yanboliang @thunterdb --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17170#discussion_r106587332 --- Diff: R/pkg/R/mllib_fpm.R --- @@ -0,0 +1,152 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17170#discussion_r106587261 --- Diff: R/pkg/R/mllib_fpm.R --- @@ -0,0 +1,152 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17170#discussion_r106587357 --- Diff: R/pkg/R/mllib_fpm.R --- @@ -0,0 +1,152 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17170#discussion_r106587496 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/FPGrowthWrapper.scala --- @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17170#discussion_r106587413 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/FPGrowthWrapper.scala --- @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17170#discussion_r106587130 --- Diff: R/pkg/R/mllib_fpm.R --- @@ -0,0 +1,152 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17170#discussion_r106587054 --- Diff: R/pkg/R/mllib_fpm.R --- @@ -0,0 +1,152 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17170#discussion_r106587315 --- Diff: R/pkg/R/mllib_fpm.R --- @@ -0,0 +1,152 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17170#discussion_r106587292 --- Diff: R/pkg/R/mllib_fpm.R --- @@ -0,0 +1,152 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 @jkbradley can you take a look? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-03-15 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 It passed at local. I will fix the issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-03-14 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 @jkbradley I changed the input data format by using (list of neighbor IDs, list of weights), which are two columns of the input dataset. For the result, I appended the predicted cluster ids

[GitHub] spark issue #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSVCSuite

2017-03-06 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16784 Resolved. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-03-06 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 Update: Sorry for the delay. I am working on some other items. Now, I am using (node-id, list of neighbor IDs, list of weights) by adding two additional columns. I

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-03-01 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 Option 2 doesn't break our pipeline scheme which only appends the result column to the input dataframe. Besides the discussions above, the graph is undirected and the weight list will appear

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-28 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r103535626 --- Diff: R/pkg/R/mllib_tree.R --- @@ -143,14 +143,15 @@ print.summary.treeEnsemble <- function(x) { #' #' # fit a Gradient Boosted T

[GitHub] spark issue #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSVCSuite

2017-02-28 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16784 @jkbradley I simplified the test cases and modified the data generation API using to toSparse method, which eliminates the index variable. "Is this multivariate online summarizer

[GitHub] spark issue #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSVCSuite

2017-02-28 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16784 @jkbradley I simplified the tests and modified the data generation API by using toSparse method, which eliminates the index variable. "Is this multivariate online summarizer issue r

[GitHub] spark issue #17032: [SPARK-19460][SparkR]:Update dataset used in R documenta...

2017-02-26 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/17032 @felixcheung I have made the changes per our review discussion. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-26 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 @jkbradley Thanks for your reply! I quickly go through your suggestions. If I understand correctly, you prefer making it a `Transformer`, as we previously discussed, but changing the input

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-24 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r103004945 --- Diff: examples/src/main/r/ml/glm.R --- @@ -25,12 +25,12 @@ library(SparkR) sparkR.session(appName = "SparkR-ML-glm-ex

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r102857809 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -565,11 +565,10 @@ We use a simple example to demonstrate `spark.logit` usage. In general

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 @thunterdb @yanboliang Do we reach an agreement on whether to make it a transformer or an estimator now? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r102856602 --- Diff: examples/src/main/r/ml/kmeans.R --- @@ -26,10 +26,12 @@ sparkR.session(appName = "SparkR-ML-kmeans-example")

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r102823739 --- Diff: examples/src/main/r/ml/bisectingKmeans.R --- @@ -25,20 +25,21 @@ library(SparkR) sparkR.session(appName = "SparkR-ML-bisectingK

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r102823636 --- Diff: examples/src/main/r/ml/glm.R --- @@ -25,11 +25,12 @@ library(SparkR) sparkR.session(appName = "SparkR-ML-glm-ex

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r102802725 --- Diff: examples/src/main/r/ml/glm.R --- @@ -25,11 +25,12 @@ library(SparkR) sparkR.session(appName = "SparkR-ML-glm-ex

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r102802291 --- Diff: R/pkg/R/mllib_tree.R --- @@ -143,14 +143,15 @@ print.summary.treeEnsemble <- function(x) { #' #' # fit a Gradient Boosted T

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r102801964 --- Diff: R/pkg/R/mllib_tree.R --- @@ -143,14 +143,15 @@ print.summary.treeEnsemble <- function(x) { #' #' # fit a Gradient Boosted T

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r102801930 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -565,11 +565,10 @@ We use a simple example to demonstrate `spark.logit` usage. In general

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r102801880 --- Diff: examples/src/main/r/ml/bisectingKmeans.R --- @@ -25,20 +25,21 @@ library(SparkR) sparkR.session(appName = "SparkR-ML-bisectingK

[GitHub] spark issue #17032: [SPARK-19460][SparkR]:Update dataset used in R documenta...

2017-02-22 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/17032 cc @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-22 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/17032 [SPARK-19460][SparkR]:Update dataset used in R documentation, examples to reduce warning noise and confusions ## What changes were proposed in this pull request? Replace `iris

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-22 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 @thunterdb Per discussion with Yanbo, there is one concern of making it an Estimator. For every `transform`, there is an additional data shuffle. cc @yanboliang @jkbradley Thanks

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 I am checking ALS out to understand your suggestions. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/13440 @thunterdb Can you take a look? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 Yanbo Liang added a comment - 02/Nov/16 09:30 - edited I'm prefer to #1 and #3, but it looks like we can achieve both goals. Graph can be represented by GraphX/GraphFrame

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 Joseph K. Bradley added a comment - 31/Oct/16 18:14 Miao Wang Sorry for the slow response here. I do want us to add PIC to spark.ml, but we should discuss the design before the PR

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 @thunterdb Thanks for your response. In the original JIRA, we have discussed why we want it to be a transformer. Let me find it and post it here. --- If your project is set up for it, you

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r102337526 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r102330772 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r102308292 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed

[GitHub] spark issue #16945: [SPARK-19616][SparkR]:weightCol and aggregationDepth sho...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16945 @felixcheung I have made suggested changes. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-02-17 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r101871069 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-02-17 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r101870770 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed

[GitHub] spark issue #16945: [SPARK-19616][SparkR]:weightCol and aggregationDepth sho...

2017-02-16 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16945 I add a test of weightCol for spark.logit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-16 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 @thunterdb Thanks for your review! I will address the comments soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #16969: [SPARK-19639][SPARKR][Example]:Add spark.svmLinea...

2017-02-16 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/16969 [SPARK-19639][SPARKR][Example]:Add spark.svmLinear example and update vignettes ## What changes were proposed in this pull request? We recently add the spark.svmLinear API for SparkR

[GitHub] spark issue #16945: [SPARK-19616][SparkR]:weightCol and aggregationDepth sho...

2017-02-16 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16945 I will add tests. Now I am looking for dataset other than iris to be used in the document. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

2017-02-16 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/13440 @erikerlandson I am just helping clearing the stale PRs. :) I have no idea whether they have intention to accept it. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

2017-02-15 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/13440 @erikerlandson Are you still working on this PR? Thanks! Miao --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16945: [SPARK-19616][SparkR]:weightCol and aggregationDepth sho...

2017-02-15 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16945 cc @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #16945: [SPARK-19616][SparkR]:weightCol and aggregationDe...

2017-02-15 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/16945 [SPARK-19616][SparkR]:weightCol and aggregationDepth should be improved for some SparkR APIs ## What changes were proposed in this pull request? This is a follow-up PR of #16800

[GitHub] spark pull request #16761: [BackPort-2.1][SPARK-19319][SparkR]:SparkR Kmeans...

2017-02-15 Thread wangmiao1981
Github user wangmiao1981 closed the pull request at: https://github.com/apache/spark/pull/16761 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-15 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16800 @felixcheung I will do the example and vignettes today. For the document, I will wait for @hhbyyh to merge his main document first. Thanks! --- If your project is set up for it, you can reply

[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...

2017-02-14 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/12675 @HyukjinKwon @srowen This should be closed. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...

2017-02-14 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/12675 #15777 has resolved this issue. We should close this one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-14 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r101176046 --- Diff: R/pkg/R/mllib_classification.R --- @@ -39,6 +46,131 @@ setClass("MultilayerPerceptronClassificationModel", representation(jo

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-14 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r101176118 --- Diff: R/pkg/R/generics.R --- @@ -1380,6 +1380,10 @@ setGeneric("spark.kstest", function(data, ...) { standardGeneric(&qu

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-14 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100976207 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LinearSVCWrapper.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-14 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100976153 --- Diff: R/pkg/R/mllib_classification.R --- @@ -39,6 +46,116 @@ setClass("MultilayerPerceptronClassificationModel", representation(jo

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-13 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100929519 --- Diff: R/pkg/R/mllib_classification.R --- @@ -39,6 +46,116 @@ setClass("MultilayerPerceptronClassificationModel", representation(jo

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-12 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100710481 --- Diff: R/pkg/R/mllib_utils.R --- @@ -35,7 +35,8 @@ #' @seealso \link{spark.als}, \link{spark.bisectingKmeans}, \link{spark.gaussianMixture

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-12 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100710453 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LinearSVCWrapper.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-12 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100710403 --- Diff: R/pkg/R/mllib_classification.R --- @@ -39,6 +46,116 @@ setClass("MultilayerPerceptronClassificationModel", representation(jo

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-12 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100710369 --- Diff: R/pkg/R/mllib_classification.R --- @@ -39,6 +46,116 @@ setClass("MultilayerPerceptronClassificationModel", representation(jo

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-12 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100710345 --- Diff: R/pkg/R/mllib_classification.R --- @@ -39,6 +46,116 @@ setClass("MultilayerPerceptronClassificationModel", representation(jo

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-12 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100710324 --- Diff: R/pkg/R/mllib_classification.R --- @@ -39,6 +46,116 @@ setClass("MultilayerPerceptronClassificationModel", representation(jo

[GitHub] spark issue #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-11 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16800 @felixcheung I have addressed the comments. cc @yanboliang @hhbyyh Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-09 Thread wangmiao1981
Github user wangmiao1981 closed the pull request at: https://github.com/apache/spark/pull/16800 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-09 Thread wangmiao1981
GitHub user wangmiao1981 reopened a pull request: https://github.com/apache/spark/pull/16800 [SPARK-19456][SparkR]:Add LinearSVC R API ## What changes were proposed in this pull request? Linear SVM classifier is newly added into ML and python API has been added. This JIRA

[GitHub] spark issue #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-09 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16800 Open to trigger --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-09 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16800 close to trigger windows test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...

2017-02-09 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/12675 @GayathriMurali If you are not able to proceed, I can take over. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r99755967 --- Diff: R/pkg/inst/tests/testthat/test_mllib_classification.R --- @@ -27,6 +27,44 @@ absoluteSparkPath <- function(x) { file.path(sparkHome

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r99755912 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LinearSVCWrapper.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16800: [SPARK-19456][SparkR][WIP]:Add LinearSVC R API

2017-02-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r99755781 --- Diff: R/pkg/R/generics.R --- @@ -1376,6 +1376,10 @@ setGeneric("spark.kstest", function(data, ...) { standardGeneric(&qu

[GitHub] spark pull request #16800: [SPARK-19456][SparkR][WIP]:Add LinearSVC R API

2017-02-04 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r99474739 --- Diff: R/pkg/R/generics.R --- @@ -1376,6 +1376,10 @@ setGeneric("spark.kstest", function(data, ...) { standardGeneric(&qu

[GitHub] spark issue #16800: [SPARK-19456][SparkR][WIP]:Add LinearSVC R API

2017-02-03 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16800 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #16799: [SPARK-19386][SparkR][Followup] fix error in vignettes

2017-02-03 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16799 LGTM. Let us wait for Jenkins to build it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16799: [SPARK-19386][SparkR][Followup] fix error in vignettes

2017-02-03 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16799 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16796: [SPARK-10063] Follow-up: remove dead code related to an ...

2017-02-03 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16796 #16767 breaks the build. The issue was not caught by Jenkins, because the last revision didn't trigger build. A hot fix is create #16799. --- If your project is set up for it, you can reply

[GitHub] spark issue #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in SparkR d...

2017-02-03 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16767 Jenkins build was not triggered for the last revision. It breaks the build. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #16799: [SPARK-19386][SparkR][Followup] fix error in vignettes

2017-02-03 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16799 Good catch! Thanks for fixing it! @felixcheung This is a hot-fix to build break. I don't know why Jenkins didn't catch it in the original PR. --- If your project is set up

[GitHub] spark pull request #16800: [SPARK-19456][SparkR][WIP]:Add LinearSVC R API

2017-02-03 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/16800 [SPARK-19456][SparkR][WIP]:Add LinearSVC R API ## What changes were proposed in this pull request? Linear SVM classifier is newly added into ML and python API has been added

[GitHub] spark issue #16794: [SPARK-19452][SparkR] Fix bug in the name assignment met...

2017-02-03 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16794 still document failures. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16794: [SPARK-19452][SparkR] Fix bug in the name assignment met...

2017-02-03 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16794 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16794: [SPARK-19452][SparkR] Fix bug in the name assignment met...

2017-02-03 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16794 try {r, warning=FALSE} for the warning cases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16794: [SPARK-19452][SparkR] Fix bug in the name assignment met...

2017-02-03 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16794 Warning in FUN(X[[1L]], ...) : Use Sepal_Length instead of Sepal.Length as column name Warning in FUN(X[[2L]], ...) : Use Sepal_Width instead of Sepal.Width as column name

[GitHub] spark issue #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSVCSuite

2017-02-03 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16784 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSVCSuite

2017-02-03 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16784 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSVCSuite

2017-02-03 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16784 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSV...

2017-02-02 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/16784 [SPARK-19382][ML]:Test sparse vectors in LinearSVCSuite ## What changes were proposed in this pull request? Add unit tests for testing SparseVector. We can't add mixed

[GitHub] spark issue #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in SparkR d...

2017-02-02 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16767 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16761: [BackPort-2.1][SPARK-19319][SparkR]:SparkR Kmeans summar...

2017-02-02 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16761 If we don't want to change the parameters in 2.1, it is not necessary to port it back. It is because the bug occurs only if you use `random` mode with specific `seed`. If we don't provide seed

[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...

2017-02-01 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16767#discussion_r98973879 --- Diff: examples/src/main/r/ml/bisectingKmeans.R --- @@ -0,0 +1,42 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...

2017-02-01 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16767#discussion_r98974153 --- Diff: examples/src/main/r/ml/bisectingKmeans.R --- @@ -0,0 +1,42 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

<    1   2   3   4   5   6   7   >