spark git commit: [SPARK-20015][SPARKR][SS][DOC][EXAMPLE] Document R Structured Streaming (experimental) in R vignettes and R & SS programming guide, R example

2017-05-04 Thread felixcheung
Repository: spark
Updated Branches:
  refs/heads/branch-2.2 5fe9313d7 -> 6c5c594b7


[SPARK-20015][SPARKR][SS][DOC][EXAMPLE] Document R Structured Streaming 
(experimental) in R vignettes and R & SS programming guide, R example

Add
- R vignettes
- R programming guide
- SS programming guide
- R example

Also disable spark.als in vignettes for now since it's failing (SPARK-20402)

manually

Author: Felix Cheung 

Closes #17814 from felixcheung/rdocss.

(cherry picked from commit b8302ccd02265f9d7a7895c7b033441fa2d8ffd1)
Signed-off-by: Felix Cheung 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6c5c594b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6c5c594b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6c5c594b

Branch: refs/heads/branch-2.2
Commit: 6c5c594b77fb36d531cdaba5a34abe85b138d0a6
Parents: 5fe9313
Author: Felix Cheung 
Authored: Thu May 4 00:27:10 2017 -0700
Committer: Felix Cheung 
Committed: Thu May 4 00:29:20 2017 -0700

--
 R/pkg/vignettes/sparkr-vignettes.Rmd|  77 -
 docs/sparkr.md  |   4 +
 docs/structured-streaming-programming-guide.md  | 285 ---
 .../r/streaming/structured_network_wordcount.R  |  57 
 4 files changed, 380 insertions(+), 43 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6c5c594b/R/pkg/vignettes/sparkr-vignettes.Rmd
--
diff --git a/R/pkg/vignettes/sparkr-vignettes.Rmd 
b/R/pkg/vignettes/sparkr-vignettes.Rmd
index f81dbab..b933c59 100644
--- a/R/pkg/vignettes/sparkr-vignettes.Rmd
+++ b/R/pkg/vignettes/sparkr-vignettes.Rmd
@@ -182,7 +182,7 @@ head(df)
 ```
 
 ### Data Sources
-SparkR supports operating on a variety of data sources through the 
`SparkDataFrame` interface. You can check the Spark SQL programming guide for 
more [specific 
options](https://spark.apache.org/docs/latest/sql-programming-guide.html#manually-specifying-options)
 that are available for the built-in data sources.
+SparkR supports operating on a variety of data sources through the 
`SparkDataFrame` interface. You can check the Spark SQL Programming Guide for 
more [specific 
options](https://spark.apache.org/docs/latest/sql-programming-guide.html#manually-specifying-options)
 that are available for the built-in data sources.
 
 The general method for creating `SparkDataFrame` from data sources is 
`read.df`. This method takes in the path for the file to load and the type of 
data source, and the currently active Spark Session will be used automatically. 
SparkR supports reading CSV, JSON and Parquet files natively and through Spark 
Packages you can find data source connectors for popular file formats like 
Avro. These packages can be added with `sparkPackages` parameter when 
initializing SparkSession using `sparkR.session`.
 
@@ -232,7 +232,7 @@ write.df(people, path = "people.parquet", source = 
"parquet", mode = "overwrite"
 ```
 
 ### Hive Tables
-You can also create SparkDataFrames from Hive tables. To do this we will need 
to create a SparkSession with Hive support which can access tables in the Hive 
MetaStore. Note that Spark should have been built with Hive support and more 
details can be found in the [SQL programming 
guide](https://spark.apache.org/docs/latest/sql-programming-guide.html). In 
SparkR, by default it will attempt to create a SparkSession with Hive support 
enabled (`enableHiveSupport = TRUE`).
+You can also create SparkDataFrames from Hive tables. To do this we will need 
to create a SparkSession with Hive support which can access tables in the Hive 
MetaStore. Note that Spark should have been built with Hive support and more 
details can be found in the [SQL Programming 
Guide](https://spark.apache.org/docs/latest/sql-programming-guide.html). In 
SparkR, by default it will attempt to create a SparkSession with Hive support 
enabled (`enableHiveSupport = TRUE`).
 
 ```{r, eval=FALSE}
 sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
@@ -657,6 +657,7 @@ head(select(naiveBayesPrediction, "Class", "Sex", "Age", 
"Survived", "prediction
 Survival analysis studies the expected duration of time until an event 
happens, and often the relationship with risk factors or treatment taken on the 
subject. In contrast to standard regression analysis, survival modeling has to 
deal with special characteristics in the data including non-negative survival 
time and censoring.
 
 Accelerated Failure Time (AFT) model is a parametric survival model for 
censored data that assumes the effect of a covariate is to accelerate or 
decelerate the life course of an event by some constant. For more information, 
refer to the Wikipedia page [AFT 
Model](https://en.wikipedia.org/wiki/Accelerated_failure_time_model) 

spark git commit: [SPARK-20015][SPARKR][SS][DOC][EXAMPLE] Document R Structured Streaming (experimental) in R vignettes and R & SS programming guide, R example

2017-05-04 Thread felixcheung
Repository: spark
Updated Branches:
  refs/heads/master fc472bddd -> b8302ccd0


[SPARK-20015][SPARKR][SS][DOC][EXAMPLE] Document R Structured Streaming 
(experimental) in R vignettes and R & SS programming guide, R example

## What changes were proposed in this pull request?

Add
- R vignettes
- R programming guide
- SS programming guide
- R example

Also disable spark.als in vignettes for now since it's failing (SPARK-20402)

## How was this patch tested?

manually

Author: Felix Cheung 

Closes #17814 from felixcheung/rdocss.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b8302ccd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b8302ccd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b8302ccd

Branch: refs/heads/master
Commit: b8302ccd02265f9d7a7895c7b033441fa2d8ffd1
Parents: fc472bd
Author: Felix Cheung 
Authored: Thu May 4 00:27:10 2017 -0700
Committer: Felix Cheung 
Committed: Thu May 4 00:27:10 2017 -0700

--
 R/pkg/vignettes/sparkr-vignettes.Rmd|  79 -
 docs/sparkr.md  |   4 +
 docs/structured-streaming-programming-guide.md  | 285 ---
 .../r/streaming/structured_network_wordcount.R  |  57 
 4 files changed, 381 insertions(+), 44 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b8302ccd/R/pkg/vignettes/sparkr-vignettes.Rmd
--
diff --git a/R/pkg/vignettes/sparkr-vignettes.Rmd 
b/R/pkg/vignettes/sparkr-vignettes.Rmd
index 4b9d6c3..d38ec4f 100644
--- a/R/pkg/vignettes/sparkr-vignettes.Rmd
+++ b/R/pkg/vignettes/sparkr-vignettes.Rmd
@@ -182,7 +182,7 @@ head(df)
 ```
 
 ### Data Sources
-SparkR supports operating on a variety of data sources through the 
`SparkDataFrame` interface. You can check the Spark SQL programming guide for 
more [specific 
options](https://spark.apache.org/docs/latest/sql-programming-guide.html#manually-specifying-options)
 that are available for the built-in data sources.
+SparkR supports operating on a variety of data sources through the 
`SparkDataFrame` interface. You can check the Spark SQL Programming Guide for 
more [specific 
options](https://spark.apache.org/docs/latest/sql-programming-guide.html#manually-specifying-options)
 that are available for the built-in data sources.
 
 The general method for creating `SparkDataFrame` from data sources is 
`read.df`. This method takes in the path for the file to load and the type of 
data source, and the currently active Spark Session will be used automatically. 
SparkR supports reading CSV, JSON and Parquet files natively and through Spark 
Packages you can find data source connectors for popular file formats like 
Avro. These packages can be added with `sparkPackages` parameter when 
initializing SparkSession using `sparkR.session`.
 
@@ -232,7 +232,7 @@ write.df(people, path = "people.parquet", source = 
"parquet", mode = "overwrite"
 ```
 
 ### Hive Tables
-You can also create SparkDataFrames from Hive tables. To do this we will need 
to create a SparkSession with Hive support which can access tables in the Hive 
MetaStore. Note that Spark should have been built with Hive support and more 
details can be found in the [SQL programming 
guide](https://spark.apache.org/docs/latest/sql-programming-guide.html). In 
SparkR, by default it will attempt to create a SparkSession with Hive support 
enabled (`enableHiveSupport = TRUE`).
+You can also create SparkDataFrames from Hive tables. To do this we will need 
to create a SparkSession with Hive support which can access tables in the Hive 
MetaStore. Note that Spark should have been built with Hive support and more 
details can be found in the [SQL Programming 
Guide](https://spark.apache.org/docs/latest/sql-programming-guide.html). In 
SparkR, by default it will attempt to create a SparkSession with Hive support 
enabled (`enableHiveSupport = TRUE`).
 
 ```{r, eval=FALSE}
 sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
@@ -314,7 +314,7 @@ Use `cube` or `rollup` to compute subtotals across multiple 
dimensions.
 mean(cube(carsDF, "cyl", "gear", "am"), "mpg")
 ```
 
-generates groupings for {(`cyl`, `gear`, `am`), (`cyl`, `gear`), (`cyl`), ()}, 
while 
+generates groupings for {(`cyl`, `gear`, `am`), (`cyl`, `gear`), (`cyl`), ()}, 
while
 
 ```{r}
 mean(rollup(carsDF, "cyl", "gear", "am"), "mpg")
@@ -672,6 +672,7 @@ head(select(naiveBayesPrediction, "Class", "Sex", "Age", 
"Survived", "prediction
 Survival analysis studies the expected duration of time until an event 
happens, and often the relationship with risk factors or treatment taken on the 
subject. In contrast to standard regression analysis, survival modeling has to 
deal with special characteristics in the data including non-negative sur