[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...

2017-02-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16767


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...

2017-02-01 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16767#discussion_r99055560
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -819,6 +821,18 @@ perplexity <- spark.perplexity(model, corpusDF)
 perplexity
 ```
 
+ Bisecting k-means
--- End diff --

same here. the model sections are in alphabetic order


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...

2017-02-01 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16767#discussion_r99055524
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -494,6 +494,8 @@ SparkR supports the following machine learning models 
and algorithms.
 
 * Latent Dirichlet Allocation (LDA)
 
+* Bisecting $k$-means
--- End diff --

these model names are in order
```
 Clustering
 
 * Gaussian Mixture Model (GMM)
 
 * $k$-means Clustering
  
  * Latent Dirichlet Allocation (LDA)
  
 * Bisecting $k$-means
```

should be
```
 Clustering
 
 * Bisecting $k$-means

 * Gaussian Mixture Model (GMM)
 
 * $k$-means Clustering
  
  * Latent Dirichlet Allocation (LDA)
  
 ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...

2017-02-01 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16767#discussion_r98973879
  
--- Diff: examples/src/main/r/ml/bisectingKmeans.R ---
@@ -0,0 +1,42 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# To run this example use
+# ./bin/spark-submit examples/src/main/r/ml/bisectingKmeans.R
+
+# Load SparkR library into your R session
+library(SparkR)
+
+# Initialize SparkSession
+sparkR.session(appName = "SparkR-ML-bisectingKmeans-example")
+
+# $example on$
+irisDF <- suppressWarnings(createDataFrame(iris))
--- End diff --

`suppressWarnings` should not appear in example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...

2017-02-01 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16767#discussion_r98974153
  
--- Diff: examples/src/main/r/ml/bisectingKmeans.R ---
@@ -0,0 +1,42 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# To run this example use
+# ./bin/spark-submit examples/src/main/r/ml/bisectingKmeans.R
+
+# Load SparkR library into your R session
+library(SparkR)
+
+# Initialize SparkSession
+sparkR.session(appName = "SparkR-ML-bisectingKmeans-example")
+
+# $example on$
+irisDF <- suppressWarnings(createDataFrame(iris))
+
+# Fit bisecting k-means model with four centers
+model <- spark.bisectingKmeans(df, Sepal_Length ~ Sepal_Width, k = 4)
+
+# get fitted result from a bisecting k-means model
+fitted.model <- fitted(model, "centers")
+
+# Model summary
+showDF(fitted.model)
--- End diff --

Model Summary should use `summary` method. Otherwise, change the comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...

2017-02-01 Thread krishnakalyan3
Github user krishnakalyan3 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16767#discussion_r98974015
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -494,6 +494,8 @@ SparkR supports the following machine learning models 
and algorithms.
 
 * Latent Dirichlet Allocation (LDA)
 
+* Bisecting $k$-means
--- End diff --

@felixcheung could you please let me know whats wrong here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...

2017-02-01 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16767#discussion_r98969396
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -561,7 +563,7 @@ summary(model)
 
  Multilayer Perceptron
 
-Multilayer perceptron classifier (MLPC) is a classifier based on the 
[feedforward artificial neural 
network](https://en.wikipedia.org/wiki/Feedforward_neural_network). MLPC 
consists of multiple layers of nodes. Each layer is fully connected to the next 
layer in the network. Nodes in the input layer represent the input data. All 
other nodes map inputs to outputs by a linear combination of the inputs with 
the node’s weights $w$ and bias $b$ and applying an activation function. This 
can be written in matrix form for MLPC with $K+1$ layers as follows:
+Multilayer perceptron classifier (MLPC) is a classifier based on the 
[feedforward artificial neural 
network](https://en.wikipedia.org/wiki/Feedforward_neural_network). MLPC 
consists of multiple layers of nodes. Each layer is fully connected to the next 
layer in the network. Nodes in the input layer represent the input data. All 
other nodes map inputs to outputs by a linear combination of the inputs with 
the node???s weights $w$ and bias $b$ and applying an activation function. This 
can be written in matrix form for MLPC with $K+1$ layers as follows:
--- End diff --

I don't see the change here either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...

2017-02-01 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16767#discussion_r98969259
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -373,7 +373,7 @@ head(out, 3)
 ```
 
  Apply by Group
-`gapply` can apply a function to each group of a `SparkDataFrame`. The 
function is to be applied to each group of the `SparkDataFrame` and should have 
only two parameters: grouping key and R `data.frame` corresponding to that key. 
The groups are chosen from `SparkDataFrames` column(s). The output of function 
should be a `data.frame`. Schema specifies the row format of the resulting 
`SparkDataFrame`. It must represent R function’s output schema on the basis 
of Spark data types. The column names of the returned `data.frame` are set by 
user. See [here](#DataTypes) for mapping between R and Spark.
+`gapply` can apply a function to each group of a `SparkDataFrame`. The 
function is to be applied to each group of the `SparkDataFrame` and should have 
only two parameters: grouping key and R `data.frame` corresponding to that key. 
The groups are chosen from `SparkDataFrames` column(s). The output of function 
should be a `data.frame`. Schema specifies the row format of the resulting 
`SparkDataFrame`. It must represent R function???s output schema on the basis 
of Spark data types. The column names of the returned `data.frame` are set by 
user. See [here](#DataTypes) for mapping between R and Spark.
--- End diff --

I don't see any change here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...

2017-02-01 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16767#discussion_r98968984
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -84,7 +84,7 @@ class(carsGP)
 
 SparkR supports a number of commonly used machine learning algorithms. 
Under the hood, SparkR uses MLlib to train the model. Users can call `summary` 
to print a summary of the fitted model, `predict` to make predictions on new 
data, and `write.ml`/`read.ml` to save/load fitted models.
 
-SparkR supports a subset of R formula operators for model fitting, 
including ‘~’, ‘.’, ‘:’, ‘+’, and ‘-‘. We use linear 
regression as an example.
+SparkR supports a subset of R formula operators for model fitting, 
including ???~???, ???.???, ???:???, ???+???, and ???-???. We use linear 
regression as an example.
--- End diff --

why ??? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...

2017-02-01 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16767#discussion_r98958887
  
--- Diff: docs/ml-clustering.md ---
@@ -167,6 +167,13 @@ Refer to the [Python API 
docs](api/python/pyspark.ml.html#pyspark.ml.clustering.
 
 {% include_example python/ml/bisecting_k_means_example.py %}
 
+
+
+
+Refer to the [R API docs](api/R/spark.bisectingKmeans.html) for more 
details. {% include_example r/ml/bisectingKmeans.R %}
+
+{% include_example r/ml/lda.R %}
--- End diff --

this is incorrect


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...

2017-02-01 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16767#discussion_r98958842
  
--- Diff: docs/ml-clustering.md ---
@@ -167,6 +167,13 @@ Refer to the [Python API 
docs](api/python/pyspark.ml.html#pyspark.ml.clustering.
 
 {% include_example python/ml/bisecting_k_means_example.py %}
 
+
+
+
+Refer to the [R API docs](api/R/spark.bisectingKmeans.html) for more 
details. {% include_example r/ml/bisectingKmeans.R %}
--- End diff --

and this ` {% include_example r/ml/bisectingKmeans.R %}` should be a 
separate line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...

2017-02-01 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16767#discussion_r98958713
  
--- Diff: docs/ml-clustering.md ---
@@ -167,6 +167,13 @@ Refer to the [Python API 
docs](api/python/pyspark.ml.html#pyspark.ml.clustering.
 
 {% include_example python/ml/bisecting_k_means_example.py %}
 
+
+
+
+Refer to the [R API docs](api/R/spark.bisectingKmeans.html) for more 
details. {% include_example r/ml/bisectingKmeans.R %}
--- End diff --

you will need to add this file r/ml/bisectingKmeans.R


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...

2017-02-01 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16767#discussion_r98958385
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -494,6 +494,8 @@ SparkR supports the following machine learning models 
and algorithms.
 
 * Latent Dirichlet Allocation (LDA)
 
+* Bisecting $k$-means
--- End diff --

please sort this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org