[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r144330025
  
--- Diff: docs/sql-programming-guide.md ---
@@ -479,6 +481,26 @@ source type can be converted into other types using 
this syntax.
 
 
 
+To load a CSV file you can use:
+
+
+
+{% include_example manual_load_options_csv 
scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
+
+
+
+{% include_example manual_load_options_csv 
java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
+
+
+
+{% include_example manual_load_options_csv python/sql/datasource.py %}
+
+
+
+{% include_example manual_load_options_csv r/RSparkSQLExample.R %}
+
+
+
 ### Run SQL on files directly
--- End diff --

Yup, a newline between 503 and 504.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-12 Thread jomach
Github user jomach commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r144321507
  
--- Diff: docs/sql-programming-guide.md ---
@@ -479,6 +481,26 @@ source type can be converted into other types using 
this syntax.
 
 
 
+To load a CSV file you can use:
+
+
+
+{% include_example manual_load_options_csv 
scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
+
+
+
+{% include_example manual_load_options_csv 
java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
+
+
+
+{% include_example manual_load_options_csv python/sql/datasource.py %}
+
+
+
+{% include_example manual_load_options_csv r/RSparkSQLExample.R %}
+
+
+
 ### Run SQL on files directly
--- End diff --

@HyukjinKwon  should I add a new line between line 503 and 504 ? 
For example : 
```
{% include_example generic_load_save_functions r/RSparkSQLExample.R %}




### Manually Specifying Options
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r144201090
  
--- Diff: docs/sql-programming-guide.md ---
@@ -479,6 +481,26 @@ source type can be converted into other types using 
this syntax.
 
 
 
+To load a CSV file you can use:
+
+
+
+{% include_example manual_load_options_csv 
scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
+
+
+
+{% include_example manual_load_options_csv 
java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
+
+
+
+{% include_example manual_load_options_csv python/sql/datasource.py %}
+
+
+
+{% include_example manual_load_options_csv r/RSparkSQLExample.R %}
+
+
+
 ### Run SQL on files directly
--- End diff --

Yup, that's okay. BTW, I initially what I meant in 
https://github.com/apache/spark/pull/19429#discussion_r143932389 was a newline 
between `` and `### Run ..` (not `...ample.R %}` and ``. This 
breaks rendering:

https://user-images.githubusercontent.com/6477701/31481516-cd9ddb80-af5e-11e7-970b-d2c279f025d4.png";
 width="200" />


Let's don't forget to fix this up before the release if the followup 
couldn't be made ahead.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19429


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r143935085
  
--- Diff: docs/sql-programming-guide.md ---
@@ -479,6 +481,25 @@ source type can be converted into other types using 
this syntax.
 
 
 
+To load a csv file you can use:
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r143933800
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
 ---
@@ -116,6 +116,13 @@ private static void 
runBasicDataSourceExample(SparkSession spark) {
   
spark.read().format("json").load("examples/src/main/resources/people.json");
 peopleDF.select("name", 
"age").write().format("parquet").save("namesAndAges.parquet");
 // $example off:manual_load_options$
+// $example on:manual_load_options_csv$
+Dataset peopleDFCsv = spark.read().format("csv")
+ .option("sep", ";")
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r143932389
  
--- Diff: docs/sql-programming-guide.md ---
@@ -479,6 +481,25 @@ source type can be converted into other types using 
this syntax.
 
 
 
+To load a csv file you can use:
+
+
+
+{% include_example manual_load_options_csv 
scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
+
+
+
+{% include_example manual_load_options_csv 
java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
+
+
+
+{% include_example manual_load_options_csv python/sql/datasource.py %}
+
+
+
+{% include_example manual_load_options_csv r/RSparkSQLExample.R %}
+
+
--- End diff --

Let's add another newline here. It breaks rendering.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r143932676
  
--- Diff: docs/sql-programming-guide.md ---
@@ -461,6 +461,8 @@ name (i.e., `org.apache.spark.sql.parquet`), but for 
built-in sources you can al
 names (`json`, `parquet`, `jdbc`, `orc`, `libsvm`, `csv`, `text`). 
DataFrames loaded from any data
 source type can be converted into other types using this syntax.
 
+To load a json file you can use:
--- End diff --

I'd say `JSON` instead of `json`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r143933737
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala
 ---
@@ -49,6 +49,14 @@ object SQLDataSourceExample {
 val peopleDF = 
spark.read.format("json").load("examples/src/main/resources/people.json")
 peopleDF.select("name", 
"age").write.format("parquet").save("namesAndAges.parquet")
 // $example off:manual_load_options$
+// $example on:manual_load_options_csv$
+val peopleDFCsv = spark.read.format("csv")
+ .option("sep", ";")
--- End diff --

double-spaced (no tab of course ..)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r143933594
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -112,6 +112,11 @@ namesAndAges <- select(df, "name", "age")
 write.df(namesAndAges, "namesAndAges.parquet", "parquet")
 # $example off:manual_load_options$
 
+# $example on:manual_load_options_csv$
--- End diff --

I'd add a newline here above to keep consistent in this file


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r143929178
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala
 ---
@@ -49,6 +49,14 @@ object SQLDataSourceExample {
 val peopleDF = 
spark.read.format("json").load("examples/src/main/resources/people.json")
 peopleDF.select("name", 
"age").write.format("parquet").save("namesAndAges.parquet")
 // $example off:manual_load_options$
+// $example on:manual_load_options_csv$
+val peopleDFCsv = spark.read.format("csv")
+ .option("sep", ";")
+ .option("inferSchema", "true")
+ .option("header", "true")
+ .load("examples/src/main/resources/people.csv")
--- End diff --

Could you change the indents of line 54-57 to 2 spaces?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r143929114
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
 ---
@@ -116,6 +116,13 @@ private static void 
runBasicDataSourceExample(SparkSession spark) {
   
spark.read().format("json").load("examples/src/main/resources/people.json");
 peopleDF.select("name", 
"age").write().format("parquet").save("namesAndAges.parquet");
 // $example off:manual_load_options$
+// $example on:manual_load_options_csv$
+Dataset peopleDFCsv = spark.read().format("csv")
+  .option("sep", ";")
+  .option("inferSchema", "true")
+  .option("header", "true")
--- End diff --

Could you change the indents of line 121-123 to 2 spaces?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r143288308
  
--- Diff: docs/sql-programming-guide.md ---
@@ -479,6 +481,47 @@ source type can be converted into other types using 
this syntax.
 
 
 
+To load a csv file you can use:
+
+
+
+{% include_example manual_load_options_csv 
scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
+
+
+
+{% include_example manual_load_options_csv 
java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
+
+
+
+{% include_example manual_load_options_csv python/sql/datasource.py %}
+
+
+
+{% include_example manual_load_options_csv r/RSparkSQLExample.R %}
+
+
+
+To load a csv file you can use:
--- End diff --

This is also a duplicate. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r143288202
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
 ---
@@ -115,7 +115,20 @@ private static void 
runBasicDataSourceExample(SparkSession spark) {
 Dataset peopleDF =
   
spark.read().format("json").load("examples/src/main/resources/people.json");
 peopleDF.select("name", 
"age").write().format("parquet").save("namesAndAges.parquet");
-// $example off:manual_load_options$
+// $example on:manual_load_options_csv$
+Dataset peopleDFCsv = spark.read().format("csv")
+  .option("sep", ";")
+  .option("inferSchema", "true")
+  .option("header", "true")
+  .load("examples/src/main/resources/people.csv");
+// $example off:manual_load_options_csv$
+// $example on:manual_load_options_csv$
+Dataset peopleDFCsv = spark.read().format("csv")
+  .option("sep", ";")
+  .option("inferSchema", "true")
+  .option("header", "true")
+  .load("examples/src/main/resources/people.csv");
+// $example off:manual_load_options_csv$
--- End diff --

Line 125-131 is a duplicate. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r143287943
  
--- Diff: examples/src/main/resources/people.csv ---
@@ -0,0 +1,3 @@
+name;age;job
+Jorge;30;Developer
+Bob;32;Developer
--- End diff --

Add an empty line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r143287807
  
--- Diff: examples/src/main/python/sql/datasource.py ---
@@ -53,6 +53,11 @@ def basic_datasource_example(spark):
 df.select("name", "age").write.save("namesAndAges.parquet", 
format="parquet")
 # $example off:manual_load_options$
 
+# $example on:manual_load_options_csv$
+df = spark.read.load("examples/src/main/resources/people.csv",
+ format="csv", sep=":", inferSchema="true", 
header="true")
+# $example off:manual_load_options_csv
--- End diff --

This need to be corrected to 
> # $example off:manual_load_options_csv$


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r143287505
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
 ---
@@ -115,7 +115,20 @@ private static void 
runBasicDataSourceExample(SparkSession spark) {
 Dataset peopleDF =
   
spark.read().format("json").load("examples/src/main/resources/people.json");
 peopleDF.select("name", 
"age").write().format("parquet").save("namesAndAges.parquet");
-// $example off:manual_load_options$
+// $example on:manual_load_options_csv$
--- End diff --

You still need to keep 
> // $example off:manual_load_options$


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-04 Thread jomach
GitHub user jomach opened a pull request:

https://github.com/apache/spark/pull/19429

[SPARK-20055] [Docs] Added documentation for loading csv files into 
DataFrames

 

## What changes were proposed in this pull request?

 Added documentation for loading csv files into Dataframes

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jomach/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19429.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19429


commit f5941bf196a36afe8715d713fcaaf3f1a136d9e8
Author: Jorge Machado 
Date:   2017-10-04T13:09:16Z

SPARK-20055 Documentation
 -Added documentation for loading csv files into Dataframes




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org