spark git commit: [SPARKR][DOC] minor formatting and output cleanup for R vignettes

shivaram Tue, 04 Oct 2016 09:30:14 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 3dbe8097f -> 50f6be759



[SPARKR][DOC] minor formatting and output cleanup for R vignettes

Clean up output, format table, truncate long example output, hide warnings

(new - Left; existing - Right)
![image](https://cloud.githubusercontent.com/assets/8969467/19064018/5dcde4d0-89bc-11e6-857b-052df3f52a4e.png)

![image](https://cloud.githubusercontent.com/assets/8969467/19064034/6db09956-89bc-11e6-8e43-232d5c3fe5e6.png)

![image](https://cloud.githubusercontent.com/assets/8969467/19064058/88f09590-89bc-11e6-9993-61639e29dfdd.png)

![image](https://cloud.githubusercontent.com/assets/8969467/19064066/95ccbf64-89bc-11e6-877f-45af03ddcadc.png)

![image](https://cloud.githubusercontent.com/assets/8969467/19064082/a8445404-89bc-11e6-8532-26d8bc9b206f.png)

Run create-doc.sh manually

Author: Felix Cheung <felixcheun...@hotmail.com>

Closes #15340 from felixcheung/vignettes.

(cherry picked from commit 068c198e956346b90968a4d74edb7bc820c4be28)
Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/50f6be75
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/50f6be75
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/50f6be75

Branch: refs/heads/branch-2.0
Commit: 50f6be7598547fed5190a920fd3cebb4bc908524
Parents: 3dbe809
Author: Felix Cheung <felixcheun...@hotmail.com>
Authored: Tue Oct 4 09:22:26 2016 -0700
Committer: Shivaram Venkataraman <shiva...@cs.berkeley.edu>
Committed: Tue Oct 4 09:28:56 2016 -0700

----------------------------------------------------------------------
 R/pkg/vignettes/sparkr-vignettes.Rmd | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/50f6be75/R/pkg/vignettes/sparkr-vignettes.Rmd
----------------------------------------------------------------------
diff --git a/R/pkg/vignettes/sparkr-vignettes.Rmd 
b/R/pkg/vignettes/sparkr-vignettes.Rmd
index 5156c9e..babfb71 100644
--- a/R/pkg/vignettes/sparkr-vignettes.Rmd
+++ b/R/pkg/vignettes/sparkr-vignettes.Rmd
@@ -26,7 +26,7 @@ library(SparkR)
 
 We use default settings in which it runs in local mode. It auto downloads 
Spark package in the background if no previous installation is found. For more 
details about setup, see [Spark Session](#SetupSparkSession).
 
-```{r, message=FALSE}
+```{r, message=FALSE, results="hide"}
 sparkR.session()
 ```
 
@@ -114,10 +114,12 @@ In particular, the following Spark driver properties can 
be set in `sparkConfig`
 
 Property Name | Property group | spark-submit equivalent
 ---------------- | ------------------ | ----------------------
-spark.driver.memory | Application Properties | --driver-memory
-spark.driver.extraClassPath | Runtime Environment | --driver-class-path
-spark.driver.extraJavaOptions | Runtime Environment | --driver-java-options
-spark.driver.extraLibraryPath | Runtime Environment | --driver-library-path
+`spark.driver.memory` | Application Properties | `--driver-memory`
+`spark.driver.extraClassPath` | Runtime Environment | `--driver-class-path`
+`spark.driver.extraJavaOptions` | Runtime Environment | `--driver-java-options`
+`spark.driver.extraLibraryPath` | Runtime Environment | `--driver-library-path`
+`spark.yarn.keytab` | Application Properties | `--keytab`
+`spark.yarn.principal` | Application Properties | `--principal`
 
 **For Windows users**: Due to different file prefixes across operating 
systems, to avoid the issue of potential wrong prefix, a current workaround is 
to specify `spark.sql.warehouse.dir` when starting the `SparkSession`.
 
@@ -161,7 +163,7 @@ head(df)
 ### Data Sources
 SparkR supports operating on a variety of data sources through the 
`SparkDataFrame` interface. You can check the Spark SQL programming guide for 
more [specific 
options](https://spark.apache.org/docs/latest/sql-programming-guide.html#manually-specifying-options)
 that are available for the built-in data sources.
 
-The general method for creating `SparkDataFrame` from data sources is 
`read.df`. This method takes in the path for the file to load and the type of 
data source, and the currently active Spark Session will be used automatically. 
SparkR supports reading CSV, JSON and Parquet files natively and through Spark 
Packages you can find data source connectors for popular file formats like 
Avro. These packages can be added with `sparkPackages` parameter when 
initializing SparkSession using `sparkR.session'.`
+The general method for creating `SparkDataFrame` from data sources is 
`read.df`. This method takes in the path for the file to load and the type of 
data source, and the currently active Spark Session will be used automatically. 
SparkR supports reading CSV, JSON and Parquet files natively and through Spark 
Packages you can find data source connectors for popular file formats like 
Avro. These packages can be added with `sparkPackages` parameter when 
initializing SparkSession using `sparkR.session`.
 
 ```{r, eval=FALSE}
 sparkR.session(sparkPackages = "com.databricks:spark-avro_2.11:3.0.0")
@@ -406,10 +408,17 @@ class(model.summaries)
 ```
 
 
-To avoid lengthy display, we only present the result of the second fitted 
model. You are free to inspect other models as well.
+To avoid lengthy display, we only present the partial result of the second 
fitted model. You are free to inspect other models as well.
+```{r, include=FALSE}
+ops <- options()
+options(max.print=40)
+```
 ```{r}
 print(model.summaries[[2]])
 ```
+```{r, include=FALSE}
+options(ops)
+```
 
 
 ### SQL Queries
@@ -534,7 +543,7 @@ head(select(kmeansPredictions, "model", "mpg", "hp", "wt", 
"prediction"), n = 20
 Survival analysis studies the expected duration of time until an event 
happens, and often the relationship with risk factors or treatment taken on the 
subject. In contrast to standard regression analysis, survival modeling has to 
deal with special characteristics in the data including non-negative survival 
time and censoring.
 
 Accelerated Failure Time (AFT) model is a parametric survival model for 
censored data that assumes the effect of a covariate is to accelerate or 
decelerate the life course of an event by some constant. For more information, 
refer to the Wikipedia page [AFT 
Model](https://en.wikipedia.org/wiki/Accelerated_failure_time_model) and the 
references there. Different from a [Proportional Hazards 
Model](https://en.wikipedia.org/wiki/Proportional_hazards_model) designed for 
the same purpose, the AFT model is easier to parallelize because each instance 
contributes to the objective function independently.
-```{r}
+```{r, warning=FALSE}
 library(survival)
 ovarianDF <- createDataFrame(ovarian)
 aftModel <- spark.survreg(ovarianDF, Surv(futime, fustat) ~ ecog_ps + rx)
@@ -545,8 +554,8 @@ head(aftPredictions)
 
 ### Model Persistence
 The following example shows how to save/load an ML model by SparkR.
-```{r}
-irisDF <- suppressWarnings(createDataFrame(iris))
+```{r, warning=FALSE}
+irisDF <- createDataFrame(iris)
 gaussianGLM <- spark.glm(irisDF, Sepal_Length ~ Sepal_Width + Species, family 
= "gaussian")
 
 # Save and then load a fitted MLlib model


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARKR][DOC] minor formatting and output cleanup for R vignettes

Reply via email to