[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16767 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16767#discussion_r99055560 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -819,6 +821,18 @@ perplexity <- spark.perplexity(model, corpusDF) perplexity ``` + Bisecting k-means --- End diff -- same here. the model sections are in alphabetic order --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16767#discussion_r99055524 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -494,6 +494,8 @@ SparkR supports the following machine learning models and algorithms. * Latent Dirichlet Allocation (LDA) +* Bisecting $k$-means --- End diff -- these model names are in order ``` Clustering * Gaussian Mixture Model (GMM) * $k$-means Clustering * Latent Dirichlet Allocation (LDA) * Bisecting $k$-means ``` should be ``` Clustering * Bisecting $k$-means * Gaussian Mixture Model (GMM) * $k$-means Clustering * Latent Dirichlet Allocation (LDA) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16767#discussion_r98973879 --- Diff: examples/src/main/r/ml/bisectingKmeans.R --- @@ -0,0 +1,42 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# To run this example use +# ./bin/spark-submit examples/src/main/r/ml/bisectingKmeans.R + +# Load SparkR library into your R session +library(SparkR) + +# Initialize SparkSession +sparkR.session(appName = "SparkR-ML-bisectingKmeans-example") + +# $example on$ +irisDF <- suppressWarnings(createDataFrame(iris)) --- End diff -- `suppressWarnings` should not appear in example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16767#discussion_r98974153 --- Diff: examples/src/main/r/ml/bisectingKmeans.R --- @@ -0,0 +1,42 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# To run this example use +# ./bin/spark-submit examples/src/main/r/ml/bisectingKmeans.R + +# Load SparkR library into your R session +library(SparkR) + +# Initialize SparkSession +sparkR.session(appName = "SparkR-ML-bisectingKmeans-example") + +# $example on$ +irisDF <- suppressWarnings(createDataFrame(iris)) + +# Fit bisecting k-means model with four centers +model <- spark.bisectingKmeans(df, Sepal_Length ~ Sepal_Width, k = 4) + +# get fitted result from a bisecting k-means model +fitted.model <- fitted(model, "centers") + +# Model summary +showDF(fitted.model) --- End diff -- Model Summary should use `summary` method. Otherwise, change the comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...
Github user krishnakalyan3 commented on a diff in the pull request: https://github.com/apache/spark/pull/16767#discussion_r98974015 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -494,6 +494,8 @@ SparkR supports the following machine learning models and algorithms. * Latent Dirichlet Allocation (LDA) +* Bisecting $k$-means --- End diff -- @felixcheung could you please let me know whats wrong here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16767#discussion_r98969396 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -561,7 +563,7 @@ summary(model) Multilayer Perceptron -Multilayer perceptron classifier (MLPC) is a classifier based on the [feedforward artificial neural network](https://en.wikipedia.org/wiki/Feedforward_neural_network). MLPC consists of multiple layers of nodes. Each layer is fully connected to the next layer in the network. Nodes in the input layer represent the input data. All other nodes map inputs to outputs by a linear combination of the inputs with the nodeâs weights $w$ and bias $b$ and applying an activation function. This can be written in matrix form for MLPC with $K+1$ layers as follows: +Multilayer perceptron classifier (MLPC) is a classifier based on the [feedforward artificial neural network](https://en.wikipedia.org/wiki/Feedforward_neural_network). MLPC consists of multiple layers of nodes. Each layer is fully connected to the next layer in the network. Nodes in the input layer represent the input data. All other nodes map inputs to outputs by a linear combination of the inputs with the node???s weights $w$ and bias $b$ and applying an activation function. This can be written in matrix form for MLPC with $K+1$ layers as follows: --- End diff -- I don't see the change here either. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16767#discussion_r98969259 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -373,7 +373,7 @@ head(out, 3) ``` Apply by Group -`gapply` can apply a function to each group of a `SparkDataFrame`. The function is to be applied to each group of the `SparkDataFrame` and should have only two parameters: grouping key and R `data.frame` corresponding to that key. The groups are chosen from `SparkDataFrames` column(s). The output of function should be a `data.frame`. Schema specifies the row format of the resulting `SparkDataFrame`. It must represent R functionâs output schema on the basis of Spark data types. The column names of the returned `data.frame` are set by user. See [here](#DataTypes) for mapping between R and Spark. +`gapply` can apply a function to each group of a `SparkDataFrame`. The function is to be applied to each group of the `SparkDataFrame` and should have only two parameters: grouping key and R `data.frame` corresponding to that key. The groups are chosen from `SparkDataFrames` column(s). The output of function should be a `data.frame`. Schema specifies the row format of the resulting `SparkDataFrame`. It must represent R function???s output schema on the basis of Spark data types. The column names of the returned `data.frame` are set by user. See [here](#DataTypes) for mapping between R and Spark. --- End diff -- I don't see any change here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16767#discussion_r98968984 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -84,7 +84,7 @@ class(carsGP) SparkR supports a number of commonly used machine learning algorithms. Under the hood, SparkR uses MLlib to train the model. Users can call `summary` to print a summary of the fitted model, `predict` to make predictions on new data, and `write.ml`/`read.ml` to save/load fitted models. -SparkR supports a subset of R formula operators for model fitting, including â~â, â.â, â:â, â+â, and â-â. We use linear regression as an example. +SparkR supports a subset of R formula operators for model fitting, including ???~???, ???.???, ???:???, ???+???, and ???-???. We use linear regression as an example. --- End diff -- why ??? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16767#discussion_r98958887 --- Diff: docs/ml-clustering.md --- @@ -167,6 +167,13 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.clustering. {% include_example python/ml/bisecting_k_means_example.py %} + + + +Refer to the [R API docs](api/R/spark.bisectingKmeans.html) for more details. {% include_example r/ml/bisectingKmeans.R %} + +{% include_example r/ml/lda.R %} --- End diff -- this is incorrect --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16767#discussion_r98958842 --- Diff: docs/ml-clustering.md --- @@ -167,6 +167,13 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.clustering. {% include_example python/ml/bisecting_k_means_example.py %} + + + +Refer to the [R API docs](api/R/spark.bisectingKmeans.html) for more details. {% include_example r/ml/bisectingKmeans.R %} --- End diff -- and this ` {% include_example r/ml/bisectingKmeans.R %}` should be a separate line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16767#discussion_r98958713 --- Diff: docs/ml-clustering.md --- @@ -167,6 +167,13 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.clustering. {% include_example python/ml/bisecting_k_means_example.py %} + + + +Refer to the [R API docs](api/R/spark.bisectingKmeans.html) for more details. {% include_example r/ml/bisectingKmeans.R %} --- End diff -- you will need to add this file r/ml/bisectingKmeans.R --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16767: [SPARK-19386][SPARKR][DOC] Bisecting k-means in S...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16767#discussion_r98958385 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -494,6 +494,8 @@ SparkR supports the following machine learning models and algorithms. * Latent Dirichlet Allocation (LDA) +* Bisecting $k$-means --- End diff -- please sort this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org