http://git-wip-us.apache.org/repos/asf/spark-website/blob/e1001463/site/docs/2.2.2/api/R/spark.bisectingKmeans.html ---------------------------------------------------------------------- diff --git a/site/docs/2.2.2/api/R/spark.bisectingKmeans.html b/site/docs/2.2.2/api/R/spark.bisectingKmeans.html new file mode 100644 index 0000000..43b8cab --- /dev/null +++ b/site/docs/2.2.2/api/R/spark.bisectingKmeans.html @@ -0,0 +1,179 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Bisecting K-Means Clustering Model</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for spark.bisectingKmeans {SparkR}"><tr><td>spark.bisectingKmeans {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Bisecting K-Means Clustering Model</h2> + +<h3>Description</h3> + +<p>Fits a bisecting k-means clustering model against a SparkDataFrame. +Users can call <code>summary</code> to print a summary of the fitted model, <code>predict</code> to make +predictions on new data, and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models. +</p> +<p>Get fitted result from a bisecting k-means model. +Note: A saved-loaded model does not support this method. +</p> + + +<h3>Usage</h3> + +<pre> +spark.bisectingKmeans(data, formula, ...) + +## S4 method for signature 'SparkDataFrame,formula' +spark.bisectingKmeans(data, formula, k = 4, + maxIter = 20, seed = NULL, minDivisibleClusterSize = 1) + +## S4 method for signature 'BisectingKMeansModel' +summary(object) + +## S4 method for signature 'BisectingKMeansModel' +predict(object, newData) + +## S4 method for signature 'BisectingKMeansModel' +fitted(object, method = c("centers", + "classes")) + +## S4 method for signature 'BisectingKMeansModel,character' +write.ml(object, path, + overwrite = FALSE) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>data</code></td> +<td> +<p>a SparkDataFrame for training.</p> +</td></tr> +<tr valign="top"><td><code>formula</code></td> +<td> +<p>a symbolic description of the model to be fitted. Currently only a few formula +operators are supported, including '~', '.', ':', '+', and '-'. +Note that the response variable of formula is empty in spark.bisectingKmeans.</p> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional argument(s) passed to the method.</p> +</td></tr> +<tr valign="top"><td><code>k</code></td> +<td> +<p>the desired number of leaf clusters. Must be > 1. +The actual number could be smaller if there are no divisible leaf clusters.</p> +</td></tr> +<tr valign="top"><td><code>maxIter</code></td> +<td> +<p>maximum iteration number.</p> +</td></tr> +<tr valign="top"><td><code>seed</code></td> +<td> +<p>the random seed.</p> +</td></tr> +<tr valign="top"><td><code>minDivisibleClusterSize</code></td> +<td> +<p>The minimum number of points (if greater than or equal to 1.0) +or the minimum proportion of points (if less than 1.0) of a divisible cluster. +Note that it is an expert parameter. The default value should be good enough +for most cases.</p> +</td></tr> +<tr valign="top"><td><code>object</code></td> +<td> +<p>a fitted bisecting k-means model.</p> +</td></tr> +<tr valign="top"><td><code>newData</code></td> +<td> +<p>a SparkDataFrame for testing.</p> +</td></tr> +<tr valign="top"><td><code>method</code></td> +<td> +<p>type of fitted results, <code>"centers"</code> for cluster centers +or <code>"classes"</code> for assigned classes.</p> +</td></tr> +<tr valign="top"><td><code>path</code></td> +<td> +<p>the directory where the model is saved.</p> +</td></tr> +<tr valign="top"><td><code>overwrite</code></td> +<td> +<p>overwrites or not if the output path already exists. Default is FALSE +which means throw exception if the output path exists.</p> +</td></tr> +</table> + + +<h3>Value</h3> + +<p><code>spark.bisectingKmeans</code> returns a fitted bisecting k-means model. +</p> +<p><code>summary</code> returns summary information of the fitted model, which is a list. +The list includes the model's <code>k</code> (number of cluster centers), +<code>coefficients</code> (model cluster centers), +<code>size</code> (number of data points in each cluster), <code>cluster</code> +(cluster centers of the transformed data; cluster is NULL if is.loaded is TRUE), +and <code>is.loaded</code> (whether the model is loaded from a saved file). +</p> +<p><code>predict</code> returns the predicted values based on a bisecting k-means model. +</p> +<p><code>fitted</code> returns a SparkDataFrame containing fitted values. +</p> + + +<h3>Note</h3> + +<p>spark.bisectingKmeans since 2.2.0 +</p> +<p>summary(BisectingKMeansModel) since 2.2.0 +</p> +<p>predict(BisectingKMeansModel) since 2.2.0 +</p> +<p>fitted since 2.2.0 +</p> +<p>write.ml(BisectingKMeansModel, character) since 2.2.0 +</p> + + +<h3>See Also</h3> + +<p><a href="predict.html">predict</a>, <a href="read.ml.html">read.ml</a>, <a href="write.ml.html">write.ml</a> +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D sparkR.session() +##D t <- as.data.frame(Titanic) +##D df <- createDataFrame(t) +##D model <- spark.bisectingKmeans(df, Class ~ Survived, k = 4) +##D summary(model) +##D +##D # get fitted result from a bisecting k-means model +##D fitted.model <- fitted(model, "centers") +##D showDF(fitted.model) +##D +##D # fitted values on training data +##D fitted <- predict(model, df) +##D head(select(fitted, "Class", "prediction")) +##D +##D # save fitted model to input path +##D path <- "path/to/model" +##D write.ml(model, path) +##D +##D # can also read back the saved model and print +##D savedModel <- read.ml(path) +##D summary(savedModel) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.2.2 <a href="00Index.html">Index</a>]</div> +</body></html>
http://git-wip-us.apache.org/repos/asf/spark-website/blob/e1001463/site/docs/2.2.2/api/R/spark.fpGrowth.html ---------------------------------------------------------------------- diff --git a/site/docs/2.2.2/api/R/spark.fpGrowth.html b/site/docs/2.2.2/api/R/spark.fpGrowth.html new file mode 100644 index 0000000..558be4d --- /dev/null +++ b/site/docs/2.2.2/api/R/spark.fpGrowth.html @@ -0,0 +1,180 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: FP-growth</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for spark.fpGrowth {SparkR}"><tr><td>spark.fpGrowth {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>FP-growth</h2> + +<h3>Description</h3> + +<p>A parallel FP-growth algorithm to mine frequent itemsets. +<code>spark.fpGrowth</code> fits a FP-growth model on a SparkDataFrame. Users can +<code>spark.freqItemsets</code> to get frequent itemsets, <code>spark.associationRules</code> to get +association rules, <code>predict</code> to make predictions on new data based on generated association +rules, and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models. +For more details, see +<a href="https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html#fp-growth"> +FP-growth</a>. +</p> + + +<h3>Usage</h3> + +<pre> +spark.fpGrowth(data, ...) + +spark.freqItemsets(object) + +spark.associationRules(object) + +## S4 method for signature 'SparkDataFrame' +spark.fpGrowth(data, minSupport = 0.3, + minConfidence = 0.8, itemsCol = "items", numPartitions = NULL) + +## S4 method for signature 'FPGrowthModel' +spark.freqItemsets(object) + +## S4 method for signature 'FPGrowthModel' +spark.associationRules(object) + +## S4 method for signature 'FPGrowthModel' +predict(object, newData) + +## S4 method for signature 'FPGrowthModel,character' +write.ml(object, path, overwrite = FALSE) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>data</code></td> +<td> +<p>A SparkDataFrame for training.</p> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional argument(s) passed to the method.</p> +</td></tr> +<tr valign="top"><td><code>object</code></td> +<td> +<p>a fitted FPGrowth model.</p> +</td></tr> +<tr valign="top"><td><code>minSupport</code></td> +<td> +<p>Minimal support level.</p> +</td></tr> +<tr valign="top"><td><code>minConfidence</code></td> +<td> +<p>Minimal confidence level.</p> +</td></tr> +<tr valign="top"><td><code>itemsCol</code></td> +<td> +<p>Features column name.</p> +</td></tr> +<tr valign="top"><td><code>numPartitions</code></td> +<td> +<p>Number of partitions used for fitting.</p> +</td></tr> +<tr valign="top"><td><code>newData</code></td> +<td> +<p>a SparkDataFrame for testing.</p> +</td></tr> +<tr valign="top"><td><code>path</code></td> +<td> +<p>the directory where the model is saved.</p> +</td></tr> +<tr valign="top"><td><code>overwrite</code></td> +<td> +<p>logical value indicating whether to overwrite if the output path +already exists. Default is FALSE which means throw exception +if the output path exists.</p> +</td></tr> +</table> + + +<h3>Value</h3> + +<p><code>spark.fpGrowth</code> returns a fitted FPGrowth model. +</p> +<p>A <code>SparkDataFrame</code> with frequent itemsets. +The <code>SparkDataFrame</code> contains two columns: +<code>items</code> (an array of the same type as the input column) +and <code>freq</code> (frequency of the itemset). +</p> +<p>A <code>SparkDataFrame</code> with association rules. +The <code>SparkDataFrame</code> contains three columns: +<code>antecedent</code> (an array of the same type as the input column), +<code>consequent</code> (an array of the same type as the input column), +and <code>condfidence</code> (confidence). +</p> +<p><code>predict</code> returns a SparkDataFrame containing predicted values. +</p> + + +<h3>Note</h3> + +<p>spark.fpGrowth since 2.2.0 +</p> +<p>spark.freqItemsets(FPGrowthModel) since 2.2.0 +</p> +<p>spark.associationRules(FPGrowthModel) since 2.2.0 +</p> +<p>predict(FPGrowthModel) since 2.2.0 +</p> +<p>write.ml(FPGrowthModel, character) since 2.2.0 +</p> + + +<h3>See Also</h3> + +<p><a href="read.ml.html">read.ml</a> +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D raw_data <- read.df( +##D "data/mllib/sample_fpgrowth.txt", +##D source = "csv", +##D schema = structType(structField("raw_items", "string"))) +##D +##D data <- selectExpr(raw_data, "split(raw_items, ' ') as items") +##D model <- spark.fpGrowth(data) +##D +##D # Show frequent itemsets +##D frequent_itemsets <- spark.freqItemsets(model) +##D showDF(frequent_itemsets) +##D +##D # Show association rules +##D association_rules <- spark.associationRules(model) +##D showDF(association_rules) +##D +##D # Predict on new data +##D new_itemsets <- data.frame(items = c("t", "t,s")) +##D new_data <- selectExpr(createDataFrame(new_itemsets), "split(items, ',') as items") +##D predict(model, new_data) +##D +##D # Save and load model +##D path <- "/path/to/model" +##D write.ml(model, path) +##D read.ml(path) +##D +##D # Optional arguments +##D baskets_data <- selectExpr(createDataFrame(itemsets), "split(items, ',') as baskets") +##D another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5, +##D itemsCol = "baskets", numPartitions = 10) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.2.2 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/e1001463/site/docs/2.2.2/api/R/spark.gaussianMixture.html ---------------------------------------------------------------------- diff --git a/site/docs/2.2.2/api/R/spark.gaussianMixture.html b/site/docs/2.2.2/api/R/spark.gaussianMixture.html new file mode 100644 index 0000000..556a177 --- /dev/null +++ b/site/docs/2.2.2/api/R/spark.gaussianMixture.html @@ -0,0 +1,156 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Multivariate Gaussian Mixture Model (GMM)</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for spark.gaussianMixture {SparkR}"><tr><td>spark.gaussianMixture {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Multivariate Gaussian Mixture Model (GMM)</h2> + +<h3>Description</h3> + +<p>Fits multivariate gaussian mixture model against a SparkDataFrame, similarly to R's +mvnormalmixEM(). Users can call <code>summary</code> to print a summary of the fitted model, +<code>predict</code> to make predictions on new data, and <code>write.ml</code>/<code>read.ml</code> +to save/load fitted models. +</p> + + +<h3>Usage</h3> + +<pre> +spark.gaussianMixture(data, formula, ...) + +## S4 method for signature 'SparkDataFrame,formula' +spark.gaussianMixture(data, formula, k = 2, + maxIter = 100, tol = 0.01) + +## S4 method for signature 'GaussianMixtureModel' +summary(object) + +## S4 method for signature 'GaussianMixtureModel' +predict(object, newData) + +## S4 method for signature 'GaussianMixtureModel,character' +write.ml(object, path, + overwrite = FALSE) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>data</code></td> +<td> +<p>a SparkDataFrame for training.</p> +</td></tr> +<tr valign="top"><td><code>formula</code></td> +<td> +<p>a symbolic description of the model to be fitted. Currently only a few formula +operators are supported, including '~', '.', ':', '+', and '-'. +Note that the response variable of formula is empty in spark.gaussianMixture.</p> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional arguments passed to the method.</p> +</td></tr> +<tr valign="top"><td><code>k</code></td> +<td> +<p>number of independent Gaussians in the mixture model.</p> +</td></tr> +<tr valign="top"><td><code>maxIter</code></td> +<td> +<p>maximum iteration number.</p> +</td></tr> +<tr valign="top"><td><code>tol</code></td> +<td> +<p>the convergence tolerance.</p> +</td></tr> +<tr valign="top"><td><code>object</code></td> +<td> +<p>a fitted gaussian mixture model.</p> +</td></tr> +<tr valign="top"><td><code>newData</code></td> +<td> +<p>a SparkDataFrame for testing.</p> +</td></tr> +<tr valign="top"><td><code>path</code></td> +<td> +<p>the directory where the model is saved.</p> +</td></tr> +<tr valign="top"><td><code>overwrite</code></td> +<td> +<p>overwrites or not if the output path already exists. Default is FALSE +which means throw exception if the output path exists.</p> +</td></tr> +</table> + + +<h3>Value</h3> + +<p><code>spark.gaussianMixture</code> returns a fitted multivariate gaussian mixture model. +</p> +<p><code>summary</code> returns summary of the fitted model, which is a list. +The list includes the model's <code>lambda</code> (lambda), <code>mu</code> (mu), +<code>sigma</code> (sigma), <code>loglik</code> (loglik), and <code>posterior</code> (posterior). +</p> +<p><code>predict</code> returns a SparkDataFrame containing predicted labels in a column named +"prediction". +</p> + + +<h3>Note</h3> + +<p>spark.gaussianMixture since 2.1.0 +</p> +<p>summary(GaussianMixtureModel) since 2.1.0 +</p> +<p>predict(GaussianMixtureModel) since 2.1.0 +</p> +<p>write.ml(GaussianMixtureModel, character) since 2.1.0 +</p> + + +<h3>See Also</h3> + +<p>mixtools: <a href="https://cran.r-project.org/package=mixtools">https://cran.r-project.org/package=mixtools</a> +</p> +<p><a href="predict.html">predict</a>, <a href="read.ml.html">read.ml</a>, <a href="write.ml.html">write.ml</a> +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D sparkR.session() +##D library(mvtnorm) +##D set.seed(100) +##D a <- rmvnorm(4, c(0, 0)) +##D b <- rmvnorm(6, c(3, 4)) +##D data <- rbind(a, b) +##D df <- createDataFrame(as.data.frame(data)) +##D model <- spark.gaussianMixture(df, ~ V1 + V2, k = 2) +##D summary(model) +##D +##D # fitted values on training data +##D fitted <- predict(model, df) +##D head(select(fitted, "V1", "prediction")) +##D +##D # save fitted model to input path +##D path <- "path/to/model" +##D write.ml(model, path) +##D +##D # can also read back the saved model and print +##D savedModel <- read.ml(path) +##D summary(savedModel) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.2.2 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/e1001463/site/docs/2.2.2/api/R/spark.gbt.html ---------------------------------------------------------------------- diff --git a/site/docs/2.2.2/api/R/spark.gbt.html b/site/docs/2.2.2/api/R/spark.gbt.html new file mode 100644 index 0000000..b53edb8 --- /dev/null +++ b/site/docs/2.2.2/api/R/spark.gbt.html @@ -0,0 +1,245 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Gradient Boosted Tree Model for Regression and Classification</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for spark.gbt {SparkR}"><tr><td>spark.gbt {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Gradient Boosted Tree Model for Regression and Classification</h2> + +<h3>Description</h3> + +<p><code>spark.gbt</code> fits a Gradient Boosted Tree Regression model or Classification model on a +SparkDataFrame. Users can call <code>summary</code> to get a summary of the fitted +Gradient Boosted Tree model, <code>predict</code> to make predictions on new data, and +<code>write.ml</code>/<code>read.ml</code> to save/load fitted models. +For more details, see +<a href="http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-tree-regression"> +GBT Regression</a> and +<a href="http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-tree-classifier"> +GBT Classification</a> +</p> + + +<h3>Usage</h3> + +<pre> +spark.gbt(data, formula, ...) + +## S4 method for signature 'SparkDataFrame,formula' +spark.gbt(data, formula, + type = c("regression", "classification"), maxDepth = 5, maxBins = 32, + maxIter = 20, stepSize = 0.1, lossType = NULL, seed = NULL, + subsamplingRate = 1, minInstancesPerNode = 1, minInfoGain = 0, + checkpointInterval = 10, maxMemoryInMB = 256, cacheNodeIds = FALSE) + +## S4 method for signature 'GBTRegressionModel' +summary(object) + +## S3 method for class 'summary.GBTRegressionModel' +print(x, ...) + +## S4 method for signature 'GBTClassificationModel' +summary(object) + +## S3 method for class 'summary.GBTClassificationModel' +print(x, ...) + +## S4 method for signature 'GBTRegressionModel' +predict(object, newData) + +## S4 method for signature 'GBTClassificationModel' +predict(object, newData) + +## S4 method for signature 'GBTRegressionModel,character' +write.ml(object, path, + overwrite = FALSE) + +## S4 method for signature 'GBTClassificationModel,character' +write.ml(object, path, + overwrite = FALSE) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>data</code></td> +<td> +<p>a SparkDataFrame for training.</p> +</td></tr> +<tr valign="top"><td><code>formula</code></td> +<td> +<p>a symbolic description of the model to be fitted. Currently only a few formula +operators are supported, including '~', ':', '+', and '-'.</p> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional arguments passed to the method.</p> +</td></tr> +<tr valign="top"><td><code>type</code></td> +<td> +<p>type of model, one of "regression" or "classification", to fit</p> +</td></tr> +<tr valign="top"><td><code>maxDepth</code></td> +<td> +<p>Maximum depth of the tree (>= 0).</p> +</td></tr> +<tr valign="top"><td><code>maxBins</code></td> +<td> +<p>Maximum number of bins used for discretizing continuous features and for choosing +how to split on features at each node. More bins give higher granularity. Must be +>= 2 and >= number of categories in any categorical feature.</p> +</td></tr> +<tr valign="top"><td><code>maxIter</code></td> +<td> +<p>Param for maximum number of iterations (>= 0).</p> +</td></tr> +<tr valign="top"><td><code>stepSize</code></td> +<td> +<p>Param for Step size to be used for each iteration of optimization.</p> +</td></tr> +<tr valign="top"><td><code>lossType</code></td> +<td> +<p>Loss function which GBT tries to minimize. +For classification, must be "logistic". For regression, must be one of +"squared" (L2) and "absolute" (L1), default is "squared".</p> +</td></tr> +<tr valign="top"><td><code>seed</code></td> +<td> +<p>integer seed for random number generation.</p> +</td></tr> +<tr valign="top"><td><code>subsamplingRate</code></td> +<td> +<p>Fraction of the training data used for learning each decision tree, in +range (0, 1].</p> +</td></tr> +<tr valign="top"><td><code>minInstancesPerNode</code></td> +<td> +<p>Minimum number of instances each child must have after split. If a +split causes the left or right child to have fewer than +minInstancesPerNode, the split will be discarded as invalid. Should be +>= 1.</p> +</td></tr> +<tr valign="top"><td><code>minInfoGain</code></td> +<td> +<p>Minimum information gain for a split to be considered at a tree node.</p> +</td></tr> +<tr valign="top"><td><code>checkpointInterval</code></td> +<td> +<p>Param for set checkpoint interval (>= 1) or disable checkpoint (-1).</p> +</td></tr> +<tr valign="top"><td><code>maxMemoryInMB</code></td> +<td> +<p>Maximum memory in MB allocated to histogram aggregation.</p> +</td></tr> +<tr valign="top"><td><code>cacheNodeIds</code></td> +<td> +<p>If FALSE, the algorithm will pass trees to executors to match instances with +nodes. If TRUE, the algorithm will cache node IDs for each instance. Caching +can speed up training of deeper trees. Users can set how often should the +cache be checkpointed or disable it by setting checkpointInterval.</p> +</td></tr> +<tr valign="top"><td><code>object</code></td> +<td> +<p>A fitted Gradient Boosted Tree regression model or classification model.</p> +</td></tr> +<tr valign="top"><td><code>x</code></td> +<td> +<p>summary object of Gradient Boosted Tree regression model or classification model +returned by <code>summary</code>.</p> +</td></tr> +<tr valign="top"><td><code>newData</code></td> +<td> +<p>a SparkDataFrame for testing.</p> +</td></tr> +<tr valign="top"><td><code>path</code></td> +<td> +<p>The directory where the model is saved.</p> +</td></tr> +<tr valign="top"><td><code>overwrite</code></td> +<td> +<p>Overwrites or not if the output path already exists. Default is FALSE +which means throw exception if the output path exists.</p> +</td></tr> +</table> + + +<h3>Value</h3> + +<p><code>spark.gbt</code> returns a fitted Gradient Boosted Tree model. +</p> +<p><code>summary</code> returns summary information of the fitted model, which is a list. +The list of components includes <code>formula</code> (formula), +<code>numFeatures</code> (number of features), <code>features</code> (list of features), +<code>featureImportances</code> (feature importances), <code>maxDepth</code> (max depth of trees), +<code>numTrees</code> (number of trees), and <code>treeWeights</code> (tree weights). +</p> +<p><code>predict</code> returns a SparkDataFrame containing predicted labeled in a column named +"prediction". +</p> + + +<h3>Note</h3> + +<p>spark.gbt since 2.1.0 +</p> +<p>summary(GBTRegressionModel) since 2.1.0 +</p> +<p>print.summary.GBTRegressionModel since 2.1.0 +</p> +<p>summary(GBTClassificationModel) since 2.1.0 +</p> +<p>print.summary.GBTClassificationModel since 2.1.0 +</p> +<p>predict(GBTRegressionModel) since 2.1.0 +</p> +<p>predict(GBTClassificationModel) since 2.1.0 +</p> +<p>write.ml(GBTRegressionModel, character) since 2.1.0 +</p> +<p>write.ml(GBTClassificationModel, character) since 2.1.0 +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D # fit a Gradient Boosted Tree Regression Model +##D df <- createDataFrame(longley) +##D model <- spark.gbt(df, Employed ~ ., type = "regression", maxDepth = 5, maxBins = 16) +##D +##D # get the summary of the model +##D summary(model) +##D +##D # make predictions +##D predictions <- predict(model, df) +##D +##D # save and load the model +##D path <- "path/to/model" +##D write.ml(model, path) +##D savedModel <- read.ml(path) +##D summary(savedModel) +##D +##D # fit a Gradient Boosted Tree Classification Model +##D # label must be binary - Only binary classification is supported for GBT. +##D t <- as.data.frame(Titanic) +##D df <- createDataFrame(t) +##D model <- spark.gbt(df, Survived ~ Age + Freq, "classification") +##D +##D # numeric label is also supported +##D t2 <- as.data.frame(Titanic) +##D t2$NumericGender <- ifelse(t2$Sex == "Male", 0, 1) +##D df <- createDataFrame(t2) +##D model <- spark.gbt(df, NumericGender ~ ., type = "classification") +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.2.2 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/e1001463/site/docs/2.2.2/api/R/spark.getSparkFiles.html ---------------------------------------------------------------------- diff --git a/site/docs/2.2.2/api/R/spark.getSparkFiles.html b/site/docs/2.2.2/api/R/spark.getSparkFiles.html new file mode 100644 index 0000000..dbefe59 --- /dev/null +++ b/site/docs/2.2.2/api/R/spark.getSparkFiles.html @@ -0,0 +1,59 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Get the absolute path of a file added through spark.addFile.</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for spark.getSparkFiles {SparkR}"><tr><td>spark.getSparkFiles {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Get the absolute path of a file added through spark.addFile.</h2> + +<h3>Description</h3> + +<p>Get the absolute path of a file added through spark.addFile. +</p> + + +<h3>Usage</h3> + +<pre> +spark.getSparkFiles(fileName) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>fileName</code></td> +<td> +<p>The name of the file added through spark.addFile</p> +</td></tr> +</table> + + +<h3>Value</h3> + +<p>the absolute path of a file added through spark.addFile. +</p> + + +<h3>Note</h3> + +<p>spark.getSparkFiles since 2.1.0 +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D spark.getSparkFiles("myfile") +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.2.2 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/e1001463/site/docs/2.2.2/api/R/spark.getSparkFilesRootDirectory.html ---------------------------------------------------------------------- diff --git a/site/docs/2.2.2/api/R/spark.getSparkFilesRootDirectory.html b/site/docs/2.2.2/api/R/spark.getSparkFilesRootDirectory.html new file mode 100644 index 0000000..57529d0 --- /dev/null +++ b/site/docs/2.2.2/api/R/spark.getSparkFilesRootDirectory.html @@ -0,0 +1,49 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Get the root directory that contains files added through...</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for spark.getSparkFilesRootDirectory {SparkR}"><tr><td>spark.getSparkFilesRootDirectory {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Get the root directory that contains files added through spark.addFile.</h2> + +<h3>Description</h3> + +<p>Get the root directory that contains files added through spark.addFile. +</p> + + +<h3>Usage</h3> + +<pre> +spark.getSparkFilesRootDirectory() +</pre> + + +<h3>Value</h3> + +<p>the root directory that contains files added through spark.addFile +</p> + + +<h3>Note</h3> + +<p>spark.getSparkFilesRootDirectory since 2.1.0 +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D spark.getSparkFilesRootDirectory() +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.2.2 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/e1001463/site/docs/2.2.2/api/R/spark.glm.html ---------------------------------------------------------------------- diff --git a/site/docs/2.2.2/api/R/spark.glm.html b/site/docs/2.2.2/api/R/spark.glm.html new file mode 100644 index 0000000..5541699 --- /dev/null +++ b/site/docs/2.2.2/api/R/spark.glm.html @@ -0,0 +1,207 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Generalized Linear Models</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for spark.glm {SparkR}"><tr><td>spark.glm {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Generalized Linear Models</h2> + +<h3>Description</h3> + +<p>Fits generalized linear model against a SparkDataFrame. +Users can call <code>summary</code> to print a summary of the fitted model, <code>predict</code> to make +predictions on new data, and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models. +</p> + + +<h3>Usage</h3> + +<pre> +spark.glm(data, formula, ...) + +## S4 method for signature 'SparkDataFrame,formula' +spark.glm(data, formula, family = gaussian, + tol = 1e-06, maxIter = 25, weightCol = NULL, regParam = 0, + var.power = 0, link.power = 1 - var.power) + +## S4 method for signature 'GeneralizedLinearRegressionModel' +summary(object) + +## S3 method for class 'summary.GeneralizedLinearRegressionModel' +print(x, ...) + +## S4 method for signature 'GeneralizedLinearRegressionModel' +predict(object, newData) + +## S4 method for signature 'GeneralizedLinearRegressionModel,character' +write.ml(object, path, + overwrite = FALSE) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>data</code></td> +<td> +<p>a SparkDataFrame for training.</p> +</td></tr> +<tr valign="top"><td><code>formula</code></td> +<td> +<p>a symbolic description of the model to be fitted. Currently only a few formula +operators are supported, including '~', '.', ':', '+', and '-'.</p> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional arguments passed to the method.</p> +</td></tr> +<tr valign="top"><td><code>family</code></td> +<td> +<p>a description of the error distribution and link function to be used in the model. +This can be a character string naming a family function, a family function or +the result of a call to a family function. Refer R family at +<a href="https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html">https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html</a>. +Currently these families are supported: <code>binomial</code>, <code>gaussian</code>, +<code>Gamma</code>, <code>poisson</code> and <code>tweedie</code>. +</p> +<p>Note that there are two ways to specify the tweedie family. +</p> + +<ul> +<li><p> Set <code>family = "tweedie"</code> and specify the var.power and link.power; +</p> +</li> +<li><p> When package <code>statmod</code> is loaded, the tweedie family is specified using the +family definition therein, i.e., <code>tweedie(var.power, link.power)</code>. +</p> +</li></ul> +</td></tr> +<tr valign="top"><td><code>tol</code></td> +<td> +<p>positive convergence tolerance of iterations.</p> +</td></tr> +<tr valign="top"><td><code>maxIter</code></td> +<td> +<p>integer giving the maximal number of IRLS iterations.</p> +</td></tr> +<tr valign="top"><td><code>weightCol</code></td> +<td> +<p>the weight column name. If this is not set or <code>NULL</code>, we treat all instance +weights as 1.0.</p> +</td></tr> +<tr valign="top"><td><code>regParam</code></td> +<td> +<p>regularization parameter for L2 regularization.</p> +</td></tr> +<tr valign="top"><td><code>var.power</code></td> +<td> +<p>the power in the variance function of the Tweedie distribution which provides +the relationship between the variance and mean of the distribution. Only +applicable to the Tweedie family.</p> +</td></tr> +<tr valign="top"><td><code>link.power</code></td> +<td> +<p>the index in the power link function. Only applicable to the Tweedie family.</p> +</td></tr> +<tr valign="top"><td><code>object</code></td> +<td> +<p>a fitted generalized linear model.</p> +</td></tr> +<tr valign="top"><td><code>x</code></td> +<td> +<p>summary object of fitted generalized linear model returned by <code>summary</code> function.</p> +</td></tr> +<tr valign="top"><td><code>newData</code></td> +<td> +<p>a SparkDataFrame for testing.</p> +</td></tr> +<tr valign="top"><td><code>path</code></td> +<td> +<p>the directory where the model is saved.</p> +</td></tr> +<tr valign="top"><td><code>overwrite</code></td> +<td> +<p>overwrites or not if the output path already exists. Default is FALSE +which means throw exception if the output path exists.</p> +</td></tr> +</table> + + +<h3>Value</h3> + +<p><code>spark.glm</code> returns a fitted generalized linear model. +</p> +<p><code>summary</code> returns summary information of the fitted model, which is a list. +The list of components includes at least the <code>coefficients</code> (coefficients matrix, which includes +coefficients, standard error of coefficients, t value and p value), +<code>null.deviance</code> (null/residual degrees of freedom), <code>aic</code> (AIC) +and <code>iter</code> (number of iterations IRLS takes). If there are collinear columns in the data, +the coefficients matrix only provides coefficients. +</p> +<p><code>predict</code> returns a SparkDataFrame containing predicted labels in a column named +"prediction". +</p> + + +<h3>Note</h3> + +<p>spark.glm since 2.0.0 +</p> +<p>summary(GeneralizedLinearRegressionModel) since 2.0.0 +</p> +<p>print.summary.GeneralizedLinearRegressionModel since 2.0.0 +</p> +<p>predict(GeneralizedLinearRegressionModel) since 1.5.0 +</p> +<p>write.ml(GeneralizedLinearRegressionModel, character) since 2.0.0 +</p> + + +<h3>See Also</h3> + +<p><a href="glm.html">glm</a>, <a href="read.ml.html">read.ml</a> +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D sparkR.session() +##D t <- as.data.frame(Titanic) +##D df <- createDataFrame(t) +##D model <- spark.glm(df, Freq ~ Sex + Age, family = "gaussian") +##D summary(model) +##D +##D # fitted values on training data +##D fitted <- predict(model, df) +##D head(select(fitted, "Freq", "prediction")) +##D +##D # save fitted model to input path +##D path <- "path/to/model" +##D write.ml(model, path) +##D +##D # can also read back the saved model and print +##D savedModel <- read.ml(path) +##D summary(savedModel) +##D +##D # fit tweedie model +##D model <- spark.glm(df, Freq ~ Sex + Age, family = "tweedie", +##D var.power = 1.2, link.power = 0) +##D summary(model) +##D +##D # use the tweedie family from statmod +##D library(statmod) +##D model <- spark.glm(df, Freq ~ Sex + Age, family = tweedie(1.2, 0)) +##D summary(model) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.2.2 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/e1001463/site/docs/2.2.2/api/R/spark.isoreg.html ---------------------------------------------------------------------- diff --git a/site/docs/2.2.2/api/R/spark.isoreg.html b/site/docs/2.2.2/api/R/spark.isoreg.html new file mode 100644 index 0000000..7d57cb7 --- /dev/null +++ b/site/docs/2.2.2/api/R/spark.isoreg.html @@ -0,0 +1,146 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Isotonic Regression Model</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for spark.isoreg {SparkR}"><tr><td>spark.isoreg {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Isotonic Regression Model</h2> + +<h3>Description</h3> + +<p>Fits an Isotonic Regression model against a SparkDataFrame, similarly to R's isoreg(). +Users can print, make predictions on the produced model and save the model to the input path. +</p> + + +<h3>Usage</h3> + +<pre> +spark.isoreg(data, formula, ...) + +## S4 method for signature 'SparkDataFrame,formula' +spark.isoreg(data, formula, + isotonic = TRUE, featureIndex = 0, weightCol = NULL) + +## S4 method for signature 'IsotonicRegressionModel' +summary(object) + +## S4 method for signature 'IsotonicRegressionModel' +predict(object, newData) + +## S4 method for signature 'IsotonicRegressionModel,character' +write.ml(object, path, + overwrite = FALSE) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>data</code></td> +<td> +<p>SparkDataFrame for training.</p> +</td></tr> +<tr valign="top"><td><code>formula</code></td> +<td> +<p>A symbolic description of the model to be fitted. Currently only a few formula +operators are supported, including '~', '.', ':', '+', and '-'.</p> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional arguments passed to the method.</p> +</td></tr> +<tr valign="top"><td><code>isotonic</code></td> +<td> +<p>Whether the output sequence should be isotonic/increasing (TRUE) or +antitonic/decreasing (FALSE).</p> +</td></tr> +<tr valign="top"><td><code>featureIndex</code></td> +<td> +<p>The index of the feature if <code>featuresCol</code> is a vector column +(default: 0), no effect otherwise.</p> +</td></tr> +<tr valign="top"><td><code>weightCol</code></td> +<td> +<p>The weight column name.</p> +</td></tr> +<tr valign="top"><td><code>object</code></td> +<td> +<p>a fitted IsotonicRegressionModel.</p> +</td></tr> +<tr valign="top"><td><code>newData</code></td> +<td> +<p>SparkDataFrame for testing.</p> +</td></tr> +<tr valign="top"><td><code>path</code></td> +<td> +<p>The directory where the model is saved.</p> +</td></tr> +<tr valign="top"><td><code>overwrite</code></td> +<td> +<p>Overwrites or not if the output path already exists. Default is FALSE +which means throw exception if the output path exists.</p> +</td></tr> +</table> + + +<h3>Value</h3> + +<p><code>spark.isoreg</code> returns a fitted Isotonic Regression model. +</p> +<p><code>summary</code> returns summary information of the fitted model, which is a list. +The list includes model's <code>boundaries</code> (boundaries in increasing order) +and <code>predictions</code> (predictions associated with the boundaries at the same index). +</p> +<p><code>predict</code> returns a SparkDataFrame containing predicted values. +</p> + + +<h3>Note</h3> + +<p>spark.isoreg since 2.1.0 +</p> +<p>summary(IsotonicRegressionModel) since 2.1.0 +</p> +<p>predict(IsotonicRegressionModel) since 2.1.0 +</p> +<p>write.ml(IsotonicRegression, character) since 2.1.0 +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D sparkR.session() +##D data <- list(list(7.0, 0.0), list(5.0, 1.0), list(3.0, 2.0), +##D list(5.0, 3.0), list(1.0, 4.0)) +##D df <- createDataFrame(data, c("label", "feature")) +##D model <- spark.isoreg(df, label ~ feature, isotonic = FALSE) +##D # return model boundaries and prediction as lists +##D result <- summary(model, df) +##D # prediction based on fitted model +##D predict_data <- list(list(-2.0), list(-1.0), list(0.5), +##D list(0.75), list(1.0), list(2.0), list(9.0)) +##D predict_df <- createDataFrame(predict_data, c("feature")) +##D # get prediction column +##D predict_result <- collect(select(predict(model, predict_df), "prediction")) +##D +##D # save fitted model to input path +##D path <- "path/to/model" +##D write.ml(model, path) +##D +##D # can also read back the saved model and print +##D savedModel <- read.ml(path) +##D summary(savedModel) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.2.2 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/e1001463/site/docs/2.2.2/api/R/spark.kmeans.html ---------------------------------------------------------------------- diff --git a/site/docs/2.2.2/api/R/spark.kmeans.html b/site/docs/2.2.2/api/R/spark.kmeans.html new file mode 100644 index 0000000..53356d8 --- /dev/null +++ b/site/docs/2.2.2/api/R/spark.kmeans.html @@ -0,0 +1,166 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: K-Means Clustering Model</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for spark.kmeans {SparkR}"><tr><td>spark.kmeans {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>K-Means Clustering Model</h2> + +<h3>Description</h3> + +<p>Fits a k-means clustering model against a SparkDataFrame, similarly to R's kmeans(). +Users can call <code>summary</code> to print a summary of the fitted model, <code>predict</code> to make +predictions on new data, and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models. +</p> + + +<h3>Usage</h3> + +<pre> +spark.kmeans(data, formula, ...) + +## S4 method for signature 'SparkDataFrame,formula' +spark.kmeans(data, formula, k = 2, + maxIter = 20, initMode = c("k-means||", "random"), seed = NULL, + initSteps = 2, tol = 1e-04) + +## S4 method for signature 'KMeansModel' +summary(object) + +## S4 method for signature 'KMeansModel' +predict(object, newData) + +## S4 method for signature 'KMeansModel,character' +write.ml(object, path, overwrite = FALSE) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>data</code></td> +<td> +<p>a SparkDataFrame for training.</p> +</td></tr> +<tr valign="top"><td><code>formula</code></td> +<td> +<p>a symbolic description of the model to be fitted. Currently only a few formula +operators are supported, including '~', '.', ':', '+', and '-'. +Note that the response variable of formula is empty in spark.kmeans.</p> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional argument(s) passed to the method.</p> +</td></tr> +<tr valign="top"><td><code>k</code></td> +<td> +<p>number of centers.</p> +</td></tr> +<tr valign="top"><td><code>maxIter</code></td> +<td> +<p>maximum iteration number.</p> +</td></tr> +<tr valign="top"><td><code>initMode</code></td> +<td> +<p>the initialization algorithm choosen to fit the model.</p> +</td></tr> +<tr valign="top"><td><code>seed</code></td> +<td> +<p>the random seed for cluster initialization.</p> +</td></tr> +<tr valign="top"><td><code>initSteps</code></td> +<td> +<p>the number of steps for the k-means|| initialization mode. +This is an advanced setting, the default of 2 is almost always enough. Must be > 0.</p> +</td></tr> +<tr valign="top"><td><code>tol</code></td> +<td> +<p>convergence tolerance of iterations.</p> +</td></tr> +<tr valign="top"><td><code>object</code></td> +<td> +<p>a fitted k-means model.</p> +</td></tr> +<tr valign="top"><td><code>newData</code></td> +<td> +<p>a SparkDataFrame for testing.</p> +</td></tr> +<tr valign="top"><td><code>path</code></td> +<td> +<p>the directory where the model is saved.</p> +</td></tr> +<tr valign="top"><td><code>overwrite</code></td> +<td> +<p>overwrites or not if the output path already exists. Default is FALSE +which means throw exception if the output path exists.</p> +</td></tr> +</table> + + +<h3>Value</h3> + +<p><code>spark.kmeans</code> returns a fitted k-means model. +</p> +<p><code>summary</code> returns summary information of the fitted model, which is a list. +The list includes the model's <code>k</code> (the configured number of cluster centers), +<code>coefficients</code> (model cluster centers), +<code>size</code> (number of data points in each cluster), <code>cluster</code> +(cluster centers of the transformed data), is.loaded (whether the model is loaded +from a saved file), and <code>clusterSize</code> +(the actual number of cluster centers. When using initMode = "random", +<code>clusterSize</code> may not equal to <code>k</code>). +</p> +<p><code>predict</code> returns the predicted values based on a k-means model. +</p> + + +<h3>Note</h3> + +<p>spark.kmeans since 2.0.0 +</p> +<p>summary(KMeansModel) since 2.0.0 +</p> +<p>predict(KMeansModel) since 2.0.0 +</p> +<p>write.ml(KMeansModel, character) since 2.0.0 +</p> + + +<h3>See Also</h3> + +<p><a href="predict.html">predict</a>, <a href="read.ml.html">read.ml</a>, <a href="write.ml.html">write.ml</a> +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D sparkR.session() +##D t <- as.data.frame(Titanic) +##D df <- createDataFrame(t) +##D model <- spark.kmeans(df, Class ~ Survived, k = 4, initMode = "random") +##D summary(model) +##D +##D # fitted values on training data +##D fitted <- predict(model, df) +##D head(select(fitted, "Class", "prediction")) +##D +##D # save fitted model to input path +##D path <- "path/to/model" +##D write.ml(model, path) +##D +##D # can also read back the saved model and print +##D savedModel <- read.ml(path) +##D summary(savedModel) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.2.2 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/e1001463/site/docs/2.2.2/api/R/spark.kstest.html ---------------------------------------------------------------------- diff --git a/site/docs/2.2.2/api/R/spark.kstest.html b/site/docs/2.2.2/api/R/spark.kstest.html new file mode 100644 index 0000000..75d6354 --- /dev/null +++ b/site/docs/2.2.2/api/R/spark.kstest.html @@ -0,0 +1,130 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: (One-Sample) Kolmogorov-Smirnov Test</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for spark.kstest {SparkR}"><tr><td>spark.kstest {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>(One-Sample) Kolmogorov-Smirnov Test</h2> + +<h3>Description</h3> + +<p><code>spark.kstest</code> Conduct the two-sided Kolmogorov-Smirnov (KS) test for data sampled from a +continuous distribution. +</p> +<p>By comparing the largest difference between the empirical cumulative +distribution of the sample data and the theoretical distribution we can provide a test for the +the null hypothesis that the sample data comes from that theoretical distribution. +</p> +<p>Users can call <code>summary</code> to obtain a summary of the test, and <code>print.summary.KSTest</code> +to print out a summary result. +</p> + + +<h3>Usage</h3> + +<pre> +spark.kstest(data, ...) + +## S4 method for signature 'SparkDataFrame' +spark.kstest(data, testCol = "test", + nullHypothesis = c("norm"), distParams = c(0, 1)) + +## S4 method for signature 'KSTest' +summary(object) + +## S3 method for class 'summary.KSTest' +print(x, ...) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>data</code></td> +<td> +<p>a SparkDataFrame of user data.</p> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional argument(s) passed to the method.</p> +</td></tr> +<tr valign="top"><td><code>testCol</code></td> +<td> +<p>column name where the test data is from. It should be a column of double type.</p> +</td></tr> +<tr valign="top"><td><code>nullHypothesis</code></td> +<td> +<p>name of the theoretical distribution tested against. Currently only +<code>"norm"</code> for normal distribution is supported.</p> +</td></tr> +<tr valign="top"><td><code>distParams</code></td> +<td> +<p>parameters(s) of the distribution. For <code>nullHypothesis = "norm"</code>, +we can provide as a vector the mean and standard deviation of +the distribution. If none is provided, then standard normal will be used. +If only one is provided, then the standard deviation will be set to be one.</p> +</td></tr> +<tr valign="top"><td><code>object</code></td> +<td> +<p>test result object of KSTest by <code>spark.kstest</code>.</p> +</td></tr> +<tr valign="top"><td><code>x</code></td> +<td> +<p>summary object of KSTest returned by <code>summary</code>.</p> +</td></tr> +</table> + + +<h3>Value</h3> + +<p><code>spark.kstest</code> returns a test result object. +</p> +<p><code>summary</code> returns summary information of KSTest object, which is a list. +The list includes the <code>p.value</code> (p-value), <code>statistic</code> (test statistic +computed for the test), <code>nullHypothesis</code> (the null hypothesis with its +parameters tested against) and <code>degreesOfFreedom</code> (degrees of freedom of the test). +</p> + + +<h3>Note</h3> + +<p>spark.kstest since 2.1.0 +</p> +<p>summary(KSTest) since 2.1.0 +</p> +<p>print.summary.KSTest since 2.1.0 +</p> + + +<h3>See Also</h3> + +<p><a href="http://spark.apache.org/docs/latest/mllib-statistics.html#hypothesis-testing"> +MLlib: Hypothesis Testing</a> +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D data <- data.frame(test = c(0.1, 0.15, 0.2, 0.3, 0.25)) +##D df <- createDataFrame(data) +##D test <- spark.kstest(df, "test", "norm", c(0, 1)) +##D +##D # get a summary of the test result +##D testSummary <- summary(test) +##D testSummary +##D +##D # print out the summary in an organized way +##D print.summary.KSTest(testSummary) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.2.2 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/e1001463/site/docs/2.2.2/api/R/spark.lapply.html ---------------------------------------------------------------------- diff --git a/site/docs/2.2.2/api/R/spark.lapply.html b/site/docs/2.2.2/api/R/spark.lapply.html new file mode 100644 index 0000000..5372603 --- /dev/null +++ b/site/docs/2.2.2/api/R/spark.lapply.html @@ -0,0 +1,95 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Run a function over a list of elements, distributing the...</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for spark.lapply {SparkR}"><tr><td>spark.lapply {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Run a function over a list of elements, distributing the computations with Spark</h2> + +<h3>Description</h3> + +<p>Run a function over a list of elements, distributing the computations with Spark. Applies a +function in a manner that is similar to doParallel or lapply to elements of a list. +The computations are distributed using Spark. It is conceptually the same as the following code: +lapply(list, func) +</p> + + +<h3>Usage</h3> + +<pre> +spark.lapply(list, func) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>list</code></td> +<td> +<p>the list of elements</p> +</td></tr> +<tr valign="top"><td><code>func</code></td> +<td> +<p>a function that takes one argument.</p> +</td></tr> +</table> + + +<h3>Details</h3> + +<p>Known limitations: +</p> + +<ul> +<li><p> variable scoping and capture: compared to R's rich support for variable resolutions, +the distributed nature of SparkR limits how variables are resolved at runtime. All the +variables that are available through lexical scoping are embedded in the closure of the +function and available as read-only variables within the function. The environment variables +should be stored into temporary variables outside the function, and not directly accessed +within the function. +</p> +</li> +<li><p> loading external packages: In order to use a package, you need to load it inside the +closure. For example, if you rely on the MASS module, here is how you would use it: +</p> +<pre> + train <- function(hyperparam) { + library(MASS) + lm.ridge("y ~ x+z", data, lambda=hyperparam) + model + } + </pre> +</li></ul> + + + +<h3>Value</h3> + +<p>a list of results (the exact type being determined by the function) +</p> + + +<h3>Note</h3> + +<p>spark.lapply since 2.0.0 +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D sparkR.session() +##D doubled <- spark.lapply(1:10, function(x){2 * x}) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.2.2 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/e1001463/site/docs/2.2.2/api/R/spark.lda.html ---------------------------------------------------------------------- diff --git a/site/docs/2.2.2/api/R/spark.lda.html b/site/docs/2.2.2/api/R/spark.lda.html new file mode 100644 index 0000000..2bbb430 --- /dev/null +++ b/site/docs/2.2.2/api/R/spark.lda.html @@ -0,0 +1,246 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Latent Dirichlet Allocation</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for spark.lda {SparkR}"><tr><td>spark.lda {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Latent Dirichlet Allocation</h2> + +<h3>Description</h3> + +<p><code>spark.lda</code> fits a Latent Dirichlet Allocation model on a SparkDataFrame. Users can call +<code>summary</code> to get a summary of the fitted LDA model, <code>spark.posterior</code> to compute +posterior probabilities on new data, <code>spark.perplexity</code> to compute log perplexity on new +data and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models. +</p> + + +<h3>Usage</h3> + +<pre> +spark.lda(data, ...) + +spark.posterior(object, newData) + +spark.perplexity(object, data) + +## S4 method for signature 'SparkDataFrame' +spark.lda(data, features = "features", k = 10, + maxIter = 20, optimizer = c("online", "em"), subsamplingRate = 0.05, + topicConcentration = -1, docConcentration = -1, + customizedStopWords = "", maxVocabSize = bitwShiftL(1, 18)) + +## S4 method for signature 'LDAModel' +summary(object, maxTermsPerTopic) + +## S4 method for signature 'LDAModel,SparkDataFrame' +spark.perplexity(object, data) + +## S4 method for signature 'LDAModel,SparkDataFrame' +spark.posterior(object, newData) + +## S4 method for signature 'LDAModel,character' +write.ml(object, path, overwrite = FALSE) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>data</code></td> +<td> +<p>A SparkDataFrame for training.</p> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional argument(s) passed to the method.</p> +</td></tr> +<tr valign="top"><td><code>object</code></td> +<td> +<p>A Latent Dirichlet Allocation model fitted by <code>spark.lda</code>.</p> +</td></tr> +<tr valign="top"><td><code>newData</code></td> +<td> +<p>A SparkDataFrame for testing.</p> +</td></tr> +<tr valign="top"><td><code>features</code></td> +<td> +<p>Features column name. Either libSVM-format column or character-format column is +valid.</p> +</td></tr> +<tr valign="top"><td><code>k</code></td> +<td> +<p>Number of topics.</p> +</td></tr> +<tr valign="top"><td><code>maxIter</code></td> +<td> +<p>Maximum iterations.</p> +</td></tr> +<tr valign="top"><td><code>optimizer</code></td> +<td> +<p>Optimizer to train an LDA model, "online" or "em", default is "online".</p> +</td></tr> +<tr valign="top"><td><code>subsamplingRate</code></td> +<td> +<p>(For online optimizer) Fraction of the corpus to be sampled and used in +each iteration of mini-batch gradient descent, in range (0, 1].</p> +</td></tr> +<tr valign="top"><td><code>topicConcentration</code></td> +<td> +<p>concentration parameter (commonly named <code>beta</code> or <code>eta</code>) for +the prior placed on topic distributions over terms, default -1 to set automatically on the +Spark side. Use <code>summary</code> to retrieve the effective topicConcentration. Only 1-size +numeric is accepted.</p> +</td></tr> +<tr valign="top"><td><code>docConcentration</code></td> +<td> +<p>concentration parameter (commonly named <code>alpha</code>) for the +prior placed on documents distributions over topics (<code>theta</code>), default -1 to set +automatically on the Spark side. Use <code>summary</code> to retrieve the effective +docConcentration. Only 1-size or <code>k</code>-size numeric is accepted.</p> +</td></tr> +<tr valign="top"><td><code>customizedStopWords</code></td> +<td> +<p>stopwords that need to be removed from the given corpus. Ignore the +parameter if libSVM-format column is used as the features column.</p> +</td></tr> +<tr valign="top"><td><code>maxVocabSize</code></td> +<td> +<p>maximum vocabulary size, default 1 << 18</p> +</td></tr> +<tr valign="top"><td><code>maxTermsPerTopic</code></td> +<td> +<p>Maximum number of terms to collect for each topic. Default value of 10.</p> +</td></tr> +<tr valign="top"><td><code>path</code></td> +<td> +<p>The directory where the model is saved.</p> +</td></tr> +<tr valign="top"><td><code>overwrite</code></td> +<td> +<p>Overwrites or not if the output path already exists. Default is FALSE +which means throw exception if the output path exists.</p> +</td></tr> +</table> + + +<h3>Value</h3> + +<p><code>spark.lda</code> returns a fitted Latent Dirichlet Allocation model. +</p> +<p><code>summary</code> returns summary information of the fitted model, which is a list. +The list includes +</p> +<table summary="R valueblock"> +<tr valign="top"><td><code><code>docConcentration</code></code></td> +<td> +<p>concentration parameter commonly named <code>alpha</code> for +the prior placed on documents distributions over topics <code>theta</code></p> +</td></tr> +<tr valign="top"><td><code><code>topicConcentration</code></code></td> +<td> +<p>concentration parameter commonly named <code>beta</code> or +<code>eta</code> for the prior placed on topic distributions over terms</p> +</td></tr> +<tr valign="top"><td><code><code>logLikelihood</code></code></td> +<td> +<p>log likelihood of the entire corpus</p> +</td></tr> +<tr valign="top"><td><code><code>logPerplexity</code></code></td> +<td> +<p>log perplexity</p> +</td></tr> +<tr valign="top"><td><code><code>isDistributed</code></code></td> +<td> +<p>TRUE for distributed model while FALSE for local model</p> +</td></tr> +<tr valign="top"><td><code><code>vocabSize</code></code></td> +<td> +<p>number of terms in the corpus</p> +</td></tr> +<tr valign="top"><td><code><code>topics</code></code></td> +<td> +<p>top 10 terms and their weights of all topics</p> +</td></tr> +<tr valign="top"><td><code><code>vocabulary</code></code></td> +<td> +<p>whole terms of the training corpus, NULL if libsvm format file +used as training set</p> +</td></tr> +<tr valign="top"><td><code><code>trainingLogLikelihood</code></code></td> +<td> +<p>Log likelihood of the observed tokens in the training set, +given the current parameter estimates: +log P(docs | topics, topic distributions for docs, Dirichlet hyperparameters) +It is only for distributed LDA model (i.e., optimizer = "em")</p> +</td></tr> +<tr valign="top"><td><code><code>logPrior</code></code></td> +<td> +<p>Log probability of the current parameter estimate: +log P(topics, topic distributions for docs | Dirichlet hyperparameters) +It is only for distributed LDA model (i.e., optimizer = "em")</p> +</td></tr> +</table> +<p><code>spark.perplexity</code> returns the log perplexity of given SparkDataFrame, or the log +perplexity of the training data if missing argument "data". +</p> +<p><code>spark.posterior</code> returns a SparkDataFrame containing posterior probabilities +vectors named "topicDistribution". +</p> + + +<h3>Note</h3> + +<p>spark.lda since 2.1.0 +</p> +<p>summary(LDAModel) since 2.1.0 +</p> +<p>spark.perplexity(LDAModel) since 2.1.0 +</p> +<p>spark.posterior(LDAModel) since 2.1.0 +</p> +<p>write.ml(LDAModel, character) since 2.1.0 +</p> + + +<h3>See Also</h3> + +<p>topicmodels: <a href="https://cran.r-project.org/package=topicmodels">https://cran.r-project.org/package=topicmodels</a> +</p> +<p><a href="read.ml.html">read.ml</a> +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D text <- read.df("data/mllib/sample_lda_libsvm_data.txt", source = "libsvm") +##D model <- spark.lda(data = text, optimizer = "em") +##D +##D # get a summary of the model +##D summary(model) +##D +##D # compute posterior probabilities +##D posterior <- spark.posterior(model, text) +##D showDF(posterior) +##D +##D # compute perplexity +##D perplexity <- spark.perplexity(model, text) +##D +##D # save and load the model +##D path <- "path/to/model" +##D write.ml(model, path) +##D savedModel <- read.ml(path) +##D summary(savedModel) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.2.2 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/e1001463/site/docs/2.2.2/api/R/spark.logit.html ---------------------------------------------------------------------- diff --git a/site/docs/2.2.2/api/R/spark.logit.html b/site/docs/2.2.2/api/R/spark.logit.html new file mode 100644 index 0000000..80f9ad5 --- /dev/null +++ b/site/docs/2.2.2/api/R/spark.logit.html @@ -0,0 +1,201 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Logistic Regression Model</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for spark.logit {SparkR}"><tr><td>spark.logit {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Logistic Regression Model</h2> + +<h3>Description</h3> + +<p>Fits an logistic regression model against a SparkDataFrame. It supports "binomial": Binary logistic regression +with pivoting; "multinomial": Multinomial logistic (softmax) regression without pivoting, similar to glmnet. +Users can print, make predictions on the produced model and save the model to the input path. +</p> + + +<h3>Usage</h3> + +<pre> +spark.logit(data, formula, ...) + +## S4 method for signature 'SparkDataFrame,formula' +spark.logit(data, formula, regParam = 0, + elasticNetParam = 0, maxIter = 100, tol = 1e-06, family = "auto", + standardization = TRUE, thresholds = 0.5, weightCol = NULL, + aggregationDepth = 2) + +## S4 method for signature 'LogisticRegressionModel' +summary(object) + +## S4 method for signature 'LogisticRegressionModel' +predict(object, newData) + +## S4 method for signature 'LogisticRegressionModel,character' +write.ml(object, path, + overwrite = FALSE) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>data</code></td> +<td> +<p>SparkDataFrame for training.</p> +</td></tr> +<tr valign="top"><td><code>formula</code></td> +<td> +<p>A symbolic description of the model to be fitted. Currently only a few formula +operators are supported, including '~', '.', ':', '+', and '-'.</p> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional arguments passed to the method.</p> +</td></tr> +<tr valign="top"><td><code>regParam</code></td> +<td> +<p>the regularization parameter.</p> +</td></tr> +<tr valign="top"><td><code>elasticNetParam</code></td> +<td> +<p>the ElasticNet mixing parameter. For alpha = 0.0, the penalty is an L2 penalty. +For alpha = 1.0, it is an L1 penalty. For 0.0 < alpha < 1.0, the penalty is a combination +of L1 and L2. Default is 0.0 which is an L2 penalty.</p> +</td></tr> +<tr valign="top"><td><code>maxIter</code></td> +<td> +<p>maximum iteration number.</p> +</td></tr> +<tr valign="top"><td><code>tol</code></td> +<td> +<p>convergence tolerance of iterations.</p> +</td></tr> +<tr valign="top"><td><code>family</code></td> +<td> +<p>the name of family which is a description of the label distribution to be used in the model. +Supported options: +</p> + +<ul> +<li><p>"auto": Automatically select the family based on the number of classes: +If number of classes == 1 || number of classes == 2, set to "binomial". +Else, set to "multinomial". +</p> +</li> +<li><p>"binomial": Binary logistic regression with pivoting. +</p> +</li> +<li><p>"multinomial": Multinomial logistic (softmax) regression without pivoting. +</p> +</li></ul> +</td></tr> +<tr valign="top"><td><code>standardization</code></td> +<td> +<p>whether to standardize the training features before fitting the model. The coefficients +of models will be always returned on the original scale, so it will be transparent for +users. Note that with/without standardization, the models should be always converged +to the same solution when no regularization is applied. Default is TRUE, same as glmnet.</p> +</td></tr> +<tr valign="top"><td><code>thresholds</code></td> +<td> +<p>in binary classification, in range [0, 1]. If the estimated probability of class label 1 +is > threshold, then predict 1, else 0. A high threshold encourages the model to predict 0 +more often; a low threshold encourages the model to predict 1 more often. Note: Setting this with +threshold p is equivalent to setting thresholds c(1-p, p). In multiclass (or binary) classification to adjust the probability of +predicting each class. Array must have length equal to the number of classes, with values > 0, +excepting that at most one value may be 0. The class with largest value p/t is predicted, where p +is the original probability of that class and t is the class's threshold.</p> +</td></tr> +<tr valign="top"><td><code>weightCol</code></td> +<td> +<p>The weight column name.</p> +</td></tr> +<tr valign="top"><td><code>aggregationDepth</code></td> +<td> +<p>The depth for treeAggregate (greater than or equal to 2). If the dimensions of features +or the number of partitions are large, this param could be adjusted to a larger size. +This is an expert parameter. Default value should be good for most cases.</p> +</td></tr> +<tr valign="top"><td><code>object</code></td> +<td> +<p>an LogisticRegressionModel fitted by <code>spark.logit</code>.</p> +</td></tr> +<tr valign="top"><td><code>newData</code></td> +<td> +<p>a SparkDataFrame for testing.</p> +</td></tr> +<tr valign="top"><td><code>path</code></td> +<td> +<p>The directory where the model is saved.</p> +</td></tr> +<tr valign="top"><td><code>overwrite</code></td> +<td> +<p>Overwrites or not if the output path already exists. Default is FALSE +which means throw exception if the output path exists.</p> +</td></tr> +</table> + + +<h3>Value</h3> + +<p><code>spark.logit</code> returns a fitted logistic regression model. +</p> +<p><code>summary</code> returns summary information of the fitted model, which is a list. +The list includes <code>coefficients</code> (coefficients matrix of the fitted model). +</p> +<p><code>predict</code> returns the predicted values based on an LogisticRegressionModel. +</p> + + +<h3>Note</h3> + +<p>spark.logit since 2.1.0 +</p> +<p>summary(LogisticRegressionModel) since 2.1.0 +</p> +<p>predict(LogisticRegressionModel) since 2.1.0 +</p> +<p>write.ml(LogisticRegression, character) since 2.1.0 +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D sparkR.session() +##D # binary logistic regression +##D t <- as.data.frame(Titanic) +##D training <- createDataFrame(t) +##D model <- spark.logit(training, Survived ~ ., regParam = 0.5) +##D summary <- summary(model) +##D +##D # fitted values on training data +##D fitted <- predict(model, training) +##D +##D # save fitted model to input path +##D path <- "path/to/model" +##D write.ml(model, path) +##D +##D # can also read back the saved model and predict +##D # Note that summary deos not work on loaded model +##D savedModel <- read.ml(path) +##D summary(savedModel) +##D +##D # multinomial logistic regression +##D +##D model <- spark.logit(training, Class ~ ., regParam = 0.5) +##D summary <- summary(model) +##D +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.2.2 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/e1001463/site/docs/2.2.2/api/R/spark.mlp.html ---------------------------------------------------------------------- diff --git a/site/docs/2.2.2/api/R/spark.mlp.html b/site/docs/2.2.2/api/R/spark.mlp.html new file mode 100644 index 0000000..1226a55 --- /dev/null +++ b/site/docs/2.2.2/api/R/spark.mlp.html @@ -0,0 +1,180 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Multilayer Perceptron Classification Model</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for spark.mlp {SparkR}"><tr><td>spark.mlp {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Multilayer Perceptron Classification Model</h2> + +<h3>Description</h3> + +<p><code>spark.mlp</code> fits a multi-layer perceptron neural network model against a SparkDataFrame. +Users can call <code>summary</code> to print a summary of the fitted model, <code>predict</code> to make +predictions on new data, and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models. +Only categorical data is supported. +For more details, see +<a href="http://spark.apache.org/docs/latest/ml-classification-regression.html"> +Multilayer Perceptron</a> +</p> + + +<h3>Usage</h3> + +<pre> +spark.mlp(data, formula, ...) + +## S4 method for signature 'SparkDataFrame,formula' +spark.mlp(data, formula, layers, + blockSize = 128, solver = "l-bfgs", maxIter = 100, tol = 1e-06, + stepSize = 0.03, seed = NULL, initialWeights = NULL) + +## S4 method for signature 'MultilayerPerceptronClassificationModel' +summary(object) + +## S4 method for signature 'MultilayerPerceptronClassificationModel' +predict(object, newData) + +## S4 method for signature 'MultilayerPerceptronClassificationModel,character' +write.ml(object, + path, overwrite = FALSE) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>data</code></td> +<td> +<p>a <code>SparkDataFrame</code> of observations and labels for model fitting.</p> +</td></tr> +<tr valign="top"><td><code>formula</code></td> +<td> +<p>a symbolic description of the model to be fitted. Currently only a few formula +operators are supported, including '~', '.', ':', '+', and '-'.</p> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional arguments passed to the method.</p> +</td></tr> +<tr valign="top"><td><code>layers</code></td> +<td> +<p>integer vector containing the number of nodes for each layer.</p> +</td></tr> +<tr valign="top"><td><code>blockSize</code></td> +<td> +<p>blockSize parameter.</p> +</td></tr> +<tr valign="top"><td><code>solver</code></td> +<td> +<p>solver parameter, supported options: "gd" (minibatch gradient descent) or "l-bfgs".</p> +</td></tr> +<tr valign="top"><td><code>maxIter</code></td> +<td> +<p>maximum iteration number.</p> +</td></tr> +<tr valign="top"><td><code>tol</code></td> +<td> +<p>convergence tolerance of iterations.</p> +</td></tr> +<tr valign="top"><td><code>stepSize</code></td> +<td> +<p>stepSize parameter.</p> +</td></tr> +<tr valign="top"><td><code>seed</code></td> +<td> +<p>seed parameter for weights initialization.</p> +</td></tr> +<tr valign="top"><td><code>initialWeights</code></td> +<td> +<p>initialWeights parameter for weights initialization, it should be a +numeric vector.</p> +</td></tr> +<tr valign="top"><td><code>object</code></td> +<td> +<p>a Multilayer Perceptron Classification Model fitted by <code>spark.mlp</code></p> +</td></tr> +<tr valign="top"><td><code>newData</code></td> +<td> +<p>a SparkDataFrame for testing.</p> +</td></tr> +<tr valign="top"><td><code>path</code></td> +<td> +<p>the directory where the model is saved.</p> +</td></tr> +<tr valign="top"><td><code>overwrite</code></td> +<td> +<p>overwrites or not if the output path already exists. Default is FALSE +which means throw exception if the output path exists.</p> +</td></tr> +</table> + + +<h3>Value</h3> + +<p><code>spark.mlp</code> returns a fitted Multilayer Perceptron Classification Model. +</p> +<p><code>summary</code> returns summary information of the fitted model, which is a list. +The list includes <code>numOfInputs</code> (number of inputs), <code>numOfOutputs</code> +(number of outputs), <code>layers</code> (array of layer sizes including input +and output layers), and <code>weights</code> (the weights of layers). +For <code>weights</code>, it is a numeric vector with length equal to the expected +given the architecture (i.e., for 8-10-2 network, 112 connection weights). +</p> +<p><code>predict</code> returns a SparkDataFrame containing predicted labeled in a column named +"prediction". +</p> + + +<h3>Note</h3> + +<p>spark.mlp since 2.1.0 +</p> +<p>summary(MultilayerPerceptronClassificationModel) since 2.1.0 +</p> +<p>predict(MultilayerPerceptronClassificationModel) since 2.1.0 +</p> +<p>write.ml(MultilayerPerceptronClassificationModel, character) since 2.1.0 +</p> + + +<h3>See Also</h3> + +<p><a href="read.ml.html">read.ml</a> +</p> +<p><a href="write.ml.html">write.ml</a> +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D df <- read.df("data/mllib/sample_multiclass_classification_data.txt", source = "libsvm") +##D +##D # fit a Multilayer Perceptron Classification Model +##D model <- spark.mlp(df, label ~ features, blockSize = 128, layers = c(4, 3), solver = "l-bfgs", +##D maxIter = 100, tol = 0.5, stepSize = 1, seed = 1, +##D initialWeights = c(0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 9, 9, 9, 9, 9)) +##D +##D # get the summary of the model +##D summary(model) +##D +##D # make predictions +##D predictions <- predict(model, df) +##D +##D # save and load the model +##D path <- "path/to/model" +##D write.ml(model, path) +##D savedModel <- read.ml(path) +##D summary(savedModel) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.2.2 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/e1001463/site/docs/2.2.2/api/R/spark.naiveBayes.html ---------------------------------------------------------------------- diff --git a/site/docs/2.2.2/api/R/spark.naiveBayes.html b/site/docs/2.2.2/api/R/spark.naiveBayes.html new file mode 100644 index 0000000..7b602f1 --- /dev/null +++ b/site/docs/2.2.2/api/R/spark.naiveBayes.html @@ -0,0 +1,143 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Naive Bayes Models</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for spark.naiveBayes {SparkR}"><tr><td>spark.naiveBayes {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Naive Bayes Models</h2> + +<h3>Description</h3> + +<p><code>spark.naiveBayes</code> fits a Bernoulli naive Bayes model against a SparkDataFrame. +Users can call <code>summary</code> to print a summary of the fitted model, <code>predict</code> to make +predictions on new data, and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models. +Only categorical data is supported. +</p> + + +<h3>Usage</h3> + +<pre> +spark.naiveBayes(data, formula, ...) + +## S4 method for signature 'SparkDataFrame,formula' +spark.naiveBayes(data, formula, + smoothing = 1) + +## S4 method for signature 'NaiveBayesModel' +summary(object) + +## S4 method for signature 'NaiveBayesModel' +predict(object, newData) + +## S4 method for signature 'NaiveBayesModel,character' +write.ml(object, path, + overwrite = FALSE) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>data</code></td> +<td> +<p>a <code>SparkDataFrame</code> of observations and labels for model fitting.</p> +</td></tr> +<tr valign="top"><td><code>formula</code></td> +<td> +<p>a symbolic description of the model to be fitted. Currently only a few formula +operators are supported, including '~', '.', ':', '+', and '-'.</p> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional argument(s) passed to the method. Currently only <code>smoothing</code>.</p> +</td></tr> +<tr valign="top"><td><code>smoothing</code></td> +<td> +<p>smoothing parameter.</p> +</td></tr> +<tr valign="top"><td><code>object</code></td> +<td> +<p>a naive Bayes model fitted by <code>spark.naiveBayes</code>.</p> +</td></tr> +<tr valign="top"><td><code>newData</code></td> +<td> +<p>a SparkDataFrame for testing.</p> +</td></tr> +<tr valign="top"><td><code>path</code></td> +<td> +<p>the directory where the model is saved.</p> +</td></tr> +<tr valign="top"><td><code>overwrite</code></td> +<td> +<p>overwrites or not if the output path already exists. Default is FALSE +which means throw exception if the output path exists.</p> +</td></tr> +</table> + + +<h3>Value</h3> + +<p><code>spark.naiveBayes</code> returns a fitted naive Bayes model. +</p> +<p><code>summary</code> returns summary information of the fitted model, which is a list. +The list includes <code>apriori</code> (the label distribution) and +<code>tables</code> (conditional probabilities given the target label). +</p> +<p><code>predict</code> returns a SparkDataFrame containing predicted labeled in a column named +"prediction". +</p> + + +<h3>Note</h3> + +<p>spark.naiveBayes since 2.0.0 +</p> +<p>summary(NaiveBayesModel) since 2.0.0 +</p> +<p>predict(NaiveBayesModel) since 2.0.0 +</p> +<p>write.ml(NaiveBayesModel, character) since 2.0.0 +</p> + + +<h3>See Also</h3> + +<p>e1071: <a href="https://cran.r-project.org/package=e1071">https://cran.r-project.org/package=e1071</a> +</p> +<p><a href="write.ml.html">write.ml</a> +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D data <- as.data.frame(UCBAdmissions) +##D df <- createDataFrame(data) +##D +##D # fit a Bernoulli naive Bayes model +##D model <- spark.naiveBayes(df, Admit ~ Gender + Dept, smoothing = 0) +##D +##D # get the summary of the model +##D summary(model) +##D +##D # make predictions +##D predictions <- predict(model, df) +##D +##D # save and load the model +##D path <- "path/to/model" +##D write.ml(model, path) +##D savedModel <- read.ml(path) +##D summary(savedModel) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.2.2 <a href="00Index.html">Index</a>]</div> +</body></html> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org