[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19429#discussion_r144330025 --- Diff: docs/sql-programming-guide.md --- @@ -479,6 +481,26 @@ source type can be converted into other types using this syntax. +To load a CSV file you can use: + + + +{% include_example manual_load_options_csv scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %} + + + +{% include_example manual_load_options_csv java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %} + + + +{% include_example manual_load_options_csv python/sql/datasource.py %} + + + +{% include_example manual_load_options_csv r/RSparkSQLExample.R %} + + + ### Run SQL on files directly --- End diff -- Yup, a newline between 503 and 504. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Github user jomach commented on a diff in the pull request: https://github.com/apache/spark/pull/19429#discussion_r144321507 --- Diff: docs/sql-programming-guide.md --- @@ -479,6 +481,26 @@ source type can be converted into other types using this syntax. +To load a CSV file you can use: + + + +{% include_example manual_load_options_csv scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %} + + + +{% include_example manual_load_options_csv java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %} + + + +{% include_example manual_load_options_csv python/sql/datasource.py %} + + + +{% include_example manual_load_options_csv r/RSparkSQLExample.R %} + + + ### Run SQL on files directly --- End diff -- @HyukjinKwon should I add a new line between line 503 and 504 ? For example : ``` {% include_example generic_load_save_functions r/RSparkSQLExample.R %} ### Manually Specifying Options ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19429#discussion_r144201090 --- Diff: docs/sql-programming-guide.md --- @@ -479,6 +481,26 @@ source type can be converted into other types using this syntax. +To load a CSV file you can use: + + + +{% include_example manual_load_options_csv scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %} + + + +{% include_example manual_load_options_csv java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %} + + + +{% include_example manual_load_options_csv python/sql/datasource.py %} + + + +{% include_example manual_load_options_csv r/RSparkSQLExample.R %} + + + ### Run SQL on files directly --- End diff -- Yup, that's okay. BTW, I initially what I meant in https://github.com/apache/spark/pull/19429#discussion_r143932389 was a newline between `` and `### Run ..` (not `...ample.R %}` and ``. This breaks rendering: https://user-images.githubusercontent.com/6477701/31481516-cd9ddb80-af5e-11e7-970b-d2c279f025d4.png"; width="200" /> Let's don't forget to fix this up before the release if the followup couldn't be made ahead. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19429 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19429#discussion_r143935085 --- Diff: docs/sql-programming-guide.md --- @@ -479,6 +481,25 @@ source type can be converted into other types using this syntax. +To load a csv file you can use: --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19429#discussion_r143933800 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java --- @@ -116,6 +116,13 @@ private static void runBasicDataSourceExample(SparkSession spark) { spark.read().format("json").load("examples/src/main/resources/people.json"); peopleDF.select("name", "age").write().format("parquet").save("namesAndAges.parquet"); // $example off:manual_load_options$ +// $example on:manual_load_options_csv$ +Dataset peopleDFCsv = spark.read().format("csv") + .option("sep", ";") --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19429#discussion_r143932389 --- Diff: docs/sql-programming-guide.md --- @@ -479,6 +481,25 @@ source type can be converted into other types using this syntax. +To load a csv file you can use: + + + +{% include_example manual_load_options_csv scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %} + + + +{% include_example manual_load_options_csv java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %} + + + +{% include_example manual_load_options_csv python/sql/datasource.py %} + + + +{% include_example manual_load_options_csv r/RSparkSQLExample.R %} + + --- End diff -- Let's add another newline here. It breaks rendering. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19429#discussion_r143932676 --- Diff: docs/sql-programming-guide.md --- @@ -461,6 +461,8 @@ name (i.e., `org.apache.spark.sql.parquet`), but for built-in sources you can al names (`json`, `parquet`, `jdbc`, `orc`, `libsvm`, `csv`, `text`). DataFrames loaded from any data source type can be converted into other types using this syntax. +To load a json file you can use: --- End diff -- I'd say `JSON` instead of `json`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19429#discussion_r143933737 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala --- @@ -49,6 +49,14 @@ object SQLDataSourceExample { val peopleDF = spark.read.format("json").load("examples/src/main/resources/people.json") peopleDF.select("name", "age").write.format("parquet").save("namesAndAges.parquet") // $example off:manual_load_options$ +// $example on:manual_load_options_csv$ +val peopleDFCsv = spark.read.format("csv") + .option("sep", ";") --- End diff -- double-spaced (no tab of course ..) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19429#discussion_r143933594 --- Diff: examples/src/main/r/RSparkSQLExample.R --- @@ -112,6 +112,11 @@ namesAndAges <- select(df, "name", "age") write.df(namesAndAges, "namesAndAges.parquet", "parquet") # $example off:manual_load_options$ +# $example on:manual_load_options_csv$ --- End diff -- I'd add a newline here above to keep consistent in this file --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19429#discussion_r143929178 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala --- @@ -49,6 +49,14 @@ object SQLDataSourceExample { val peopleDF = spark.read.format("json").load("examples/src/main/resources/people.json") peopleDF.select("name", "age").write.format("parquet").save("namesAndAges.parquet") // $example off:manual_load_options$ +// $example on:manual_load_options_csv$ +val peopleDFCsv = spark.read.format("csv") + .option("sep", ";") + .option("inferSchema", "true") + .option("header", "true") + .load("examples/src/main/resources/people.csv") --- End diff -- Could you change the indents of line 54-57 to 2 spaces? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19429#discussion_r143929114 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java --- @@ -116,6 +116,13 @@ private static void runBasicDataSourceExample(SparkSession spark) { spark.read().format("json").load("examples/src/main/resources/people.json"); peopleDF.select("name", "age").write().format("parquet").save("namesAndAges.parquet"); // $example off:manual_load_options$ +// $example on:manual_load_options_csv$ +Dataset peopleDFCsv = spark.read().format("csv") + .option("sep", ";") + .option("inferSchema", "true") + .option("header", "true") --- End diff -- Could you change the indents of line 121-123 to 2 spaces? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19429#discussion_r143288308 --- Diff: docs/sql-programming-guide.md --- @@ -479,6 +481,47 @@ source type can be converted into other types using this syntax. +To load a csv file you can use: + + + +{% include_example manual_load_options_csv scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %} + + + +{% include_example manual_load_options_csv java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %} + + + +{% include_example manual_load_options_csv python/sql/datasource.py %} + + + +{% include_example manual_load_options_csv r/RSparkSQLExample.R %} + + + +To load a csv file you can use: --- End diff -- This is also a duplicate. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19429#discussion_r143288202 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java --- @@ -115,7 +115,20 @@ private static void runBasicDataSourceExample(SparkSession spark) { Dataset peopleDF = spark.read().format("json").load("examples/src/main/resources/people.json"); peopleDF.select("name", "age").write().format("parquet").save("namesAndAges.parquet"); -// $example off:manual_load_options$ +// $example on:manual_load_options_csv$ +Dataset peopleDFCsv = spark.read().format("csv") + .option("sep", ";") + .option("inferSchema", "true") + .option("header", "true") + .load("examples/src/main/resources/people.csv"); +// $example off:manual_load_options_csv$ +// $example on:manual_load_options_csv$ +Dataset peopleDFCsv = spark.read().format("csv") + .option("sep", ";") + .option("inferSchema", "true") + .option("header", "true") + .load("examples/src/main/resources/people.csv"); +// $example off:manual_load_options_csv$ --- End diff -- Line 125-131 is a duplicate. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19429#discussion_r143287943 --- Diff: examples/src/main/resources/people.csv --- @@ -0,0 +1,3 @@ +name;age;job +Jorge;30;Developer +Bob;32;Developer --- End diff -- Add an empty line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19429#discussion_r143287807 --- Diff: examples/src/main/python/sql/datasource.py --- @@ -53,6 +53,11 @@ def basic_datasource_example(spark): df.select("name", "age").write.save("namesAndAges.parquet", format="parquet") # $example off:manual_load_options$ +# $example on:manual_load_options_csv$ +df = spark.read.load("examples/src/main/resources/people.csv", + format="csv", sep=":", inferSchema="true", header="true") +# $example off:manual_load_options_csv --- End diff -- This need to be corrected to > # $example off:manual_load_options_csv$ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19429#discussion_r143287505 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java --- @@ -115,7 +115,20 @@ private static void runBasicDataSourceExample(SparkSession spark) { Dataset peopleDF = spark.read().format("json").load("examples/src/main/resources/people.json"); peopleDF.select("name", "age").write().format("parquet").save("namesAndAges.parquet"); -// $example off:manual_load_options$ +// $example on:manual_load_options_csv$ --- End diff -- You still need to keep > // $example off:manual_load_options$ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...
GitHub user jomach opened a pull request: https://github.com/apache/spark/pull/19429 [SPARK-20055] [Docs] Added documentation for loading csv files into DataFrames ## What changes were proposed in this pull request? Added documentation for loading csv files into Dataframes ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jomach/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19429.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19429 commit f5941bf196a36afe8715d713fcaaf3f1a136d9e8 Author: Jorge Machado Date: 2017-10-04T13:09:16Z SPARK-20055 Documentation -Added documentation for loading csv files into Dataframes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org