[GitHub] spark pull request #17602: [MINOR][DOCS] JSON APIs related documentation fix...

2017-04-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17602


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17602: [MINOR][DOCS] JSON APIs related documentation fix...

2017-04-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17602#discussion_r110814617
  
--- Diff: docs/sql-programming-guide.md ---
@@ -883,7 +883,7 @@ Configuration of Parquet can be done using the 
`setConf` method on `SparkSession
 
 
 Spark SQL can automatically infer the schema of a JSON dataset and load it 
as a `Dataset[Row]`.
-This conversion can be done using `SparkSession.read.json()` on either an 
RDD of String,
+This conversion can be done using `SparkSession.read.json()` on either a 
`Dataset[String]`,
--- End diff --

Output:

![2017-04-11 1 43 
06](https://cloud.githubusercontent.com/assets/6477701/24893164/dbbd4300-1ebc-11e7-91f4-45d6a48f2da1.png)

Example: 

![2017-04-11 1 43 
10](https://cloud.githubusercontent.com/assets/6477701/24893165/dbe61b0e-1ebc-11e7-9ab6-1a12ef351bb2.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17602: [MINOR][DOCS] JSON APIs related documentation fix...

2017-04-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17602#discussion_r110814635
  
--- Diff: docs/sql-programming-guide.md ---
@@ -897,7 +897,7 @@ For a regular multi-line JSON file, set the `wholeFile` 
option to `true`.
 
 
 Spark SQL can automatically infer the schema of a JSON dataset and load it 
as a `Dataset`.
-This conversion can be done using `SparkSession.read().json()` on either 
an RDD of String,
+This conversion can be done using `SparkSession.read().json()` on either a 
`Dataset`,
--- End diff --

Output:
![2017-04-11 1 43 
15](https://cloud.githubusercontent.com/assets/6477701/24893173/ee6fdb66-1ebc-11e7-85cf-fe5605d5a7c5.png)

Example:
![2017-04-11 1 43 
18](https://cloud.githubusercontent.com/assets/6477701/24893175/f174490a-1ebc-11e7-8434-55f45fa8805b.png)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17602: [MINOR][DOCS] JSON APIs related documentation fix...

2017-04-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17602#discussion_r110812638
  
--- Diff: docs/sql-programming-guide.md ---
@@ -897,7 +897,7 @@ For a regular multi-line JSON file, set the `wholeFile` 
option to `true`.
 
 
 Spark SQL can automatically infer the schema of a JSON dataset and load it 
as a `Dataset`.
-This conversion can be done using `SparkSession.read().json()` on either 
an RDD of String,
+This conversion can be done using `SparkSession.read().json()` on either 
an `Dataset`,
--- End diff --

Java example uses `Dataset` as below:

![2017-04-11 1 14 
54](https://cloud.githubusercontent.com/assets/6477701/24892622/fcad75ac-1eb8-11e7-8141-d0ea59d66cfb.png)

Output:

![2017-04-11 1 14 
57](https://cloud.githubusercontent.com/assets/6477701/24892623/ff6a93a6-1eb8-11e7-994a-c1d4654a767e.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17602: [MINOR][DOCS] JSON APIs related documentation fix...

2017-04-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17602#discussion_r110811804
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -634,7 +634,9 @@ def saveAsTable(self, name, format=None, mode=None, 
partitionBy=None, **options)
 
 @since(1.4)
 def json(self, path, mode=None, compression=None, dateFormat=None, 
timestampFormat=None):
-"""Saves the content of the :class:`DataFrame` in JSON format at 
the specified path.
+"""Saves the content of the :class:`DataFrame` in JSON format
+(`JSON Lines text format or newline-delimited JSON 
`_) at the
--- End diff --

**Before**
![2017-04-11 10 02 
21](https://cloud.githubusercontent.com/assets/6477701/24892210/c53d6f9e-1eb5-11e7-9360-7fc172089ae4.png)

**After**

![2017-04-11 12 49 
38](https://cloud.githubusercontent.com/assets/6477701/24892184/8d72b5e2-1eb5-11e7-8f34-c6edc562c37f.png)

Note that this is not consistent with Scala/Java ones:

![2017-04-11 12 50 
13](https://cloud.githubusercontent.com/assets/6477701/24892182/8c0e0080-1eb5-11e7-847b-3df347b3e5c1.png)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17602: [MINOR][DOCS] JSON APIs related documentation fix...

2017-04-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17602#discussion_r110811551
  
--- Diff: docs/sql-programming-guide.md ---
@@ -883,7 +883,7 @@ Configuration of Parquet can be done using the 
`setConf` method on `SparkSession
 
 
 Spark SQL can automatically infer the schema of a JSON dataset and load it 
as a `Dataset[Row]`.
-This conversion can be done using `SparkSession.read.json()` on either an 
RDD of String,
+This conversion can be done using `SparkSession.read.json()` on either an 
`Dataset[String]`,
--- End diff --

Scala example uses `Dataset` as below:

![2017-04-11 10 21 
12](https://cloud.githubusercontent.com/assets/6477701/24892046/c9c5dfac-1eb4-11e7-938c-fe6be4ef8b39.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17602: [MINOR][DOCS] JSON APIs related documentation fix...

2017-04-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17602#discussion_r110811091
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -268,8 +268,8 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   }
 
   /**
-   * Loads a JSON file (http://jsonlines.org/";>JSON Lines text 
format or
-   * newline-delimited JSON) and returns the result as a `DataFrame`.
+   * Loads a JSON file and returns the results as a `DataFrame`.
+   *
--- End diff --

This de-duplicate the documentation as it points the overloaded `json()` 
out below.

**Before**

![2017-04-11 10 33 
18](https://cloud.githubusercontent.com/assets/6477701/24892234/ff72e70c-1eb5-11e7-9096-dc29f2ed6a4d.png)


**After**

![2017-04-11 12 36 
03](https://cloud.githubusercontent.com/assets/6477701/24892237/0215a68e-1eb6-11e7-813d-e1451d542655.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17602: [MINOR][DOCS] JSON APIs related documentation fix...

2017-04-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17602#discussion_r110810961
  
--- Diff: python/pyspark/sql/streaming.py ---
@@ -405,8 +405,8 @@ def json(self, path, schema=None, 
primitivesAsString=None, prefersDecimal=None,
 """
 Loads a JSON file stream and returns the results as a 
:class:`DataFrame`.
 
-`JSON Lines `_(newline-delimited JSON) is 
supported by default.
-For JSON (one record per file), set the `wholeFile` parameter to 
``true``.
+`JSON Lines `_ (newline-delimited JSON) is 
supported by default.
+For JSON (one record per file), set the ``wholeFile`` parameter to 
``true``.
--- End diff --

**Before**

![2017-04-11 10 10 
08](https://cloud.githubusercontent.com/assets/6477701/24892218/d3b2cbf0-1eb5-11e7-8f0d-8071e7c65832.png)

**After**

![2017-04-11 10 11 
46](https://cloud.githubusercontent.com/assets/6477701/24892223/dea9f240-1eb5-11e7-9137-c74960d2bf6d.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17602: [MINOR][DOCS] JSON APIs related documentation fix...

2017-04-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17602#discussion_r110810827
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -634,7 +634,9 @@ def saveAsTable(self, name, format=None, mode=None, 
partitionBy=None, **options)
 
 @since(1.4)
 def json(self, path, mode=None, compression=None, dateFormat=None, 
timestampFormat=None):
-"""Saves the content of the :class:`DataFrame` in JSON format at 
the specified path.
+"""Saves the content of the :class:`DataFrame` in JSON format
+(`JSON Lines text format or newline-delimited JSON 
<[http://jsonlines.org/>`_) at the
+specified path.
--- End diff --

**Before **
![2017-04-11 10 11 
46](https://cloud.githubusercontent.com/assets/6477701/24892138/3c52f686-1eb5-11e7-8aae-c698c762bb8b.png)

**After**

![2017-04-11 12 49 
38](https://cloud.githubusercontent.com/assets/6477701/24892184/8d72b5e2-1eb5-11e7-8f34-c6edc562c37f.png)

Note that this is not consistent with Scala/Java ones:

![2017-04-11 12 50 
13](https://cloud.githubusercontent.com/assets/6477701/24892182/8c0e0080-1eb5-11e7-847b-3df347b3e5c1.png)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17602: [MINOR][DOCS] JSON APIs related documentation fix...

2017-04-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17602#discussion_r110810554
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -173,8 +173,8 @@ def json(self, path, schema=None, 
primitivesAsString=None, prefersDecimal=None,
 """
 Loads JSON files and returns the results as a :class:`DataFrame`.
 
-`JSON Lines `_(newline-delimited JSON) is 
supported by default.
-For JSON (one record per file), set the `wholeFile` parameter to 
``true``.
+`JSON Lines `_ (newline-delimited JSON) is 
supported by default.
+For JSON (one record per file), set the ``wholeFile`` parameter to 
``true``.
--- End diff --

**Before**

![2017-04-11 10 10 
08](https://cloud.githubusercontent.com/assets/6477701/24892123/215f27fa-1eb5-11e7-8587-3c873ce4a895.png)


**After**

![2017-04-11 10 06 
33](https://cloud.githubusercontent.com/assets/6477701/24892110/1587d06c-1eb5-11e7-8f7c-1aca568713cc.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17602: [MINOR][DOCS] JSON APIs related documentation fix...

2017-04-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17602#discussion_r110810413
  
--- Diff: docs/sql-programming-guide.md ---
@@ -897,7 +897,7 @@ For a regular multi-line JSON file, set the `wholeFile` 
option to `true`.
 
 
 Spark SQL can automatically infer the schema of a JSON dataset and load it 
as a `Dataset`.
-This conversion can be done using `SparkSession.read().json()` on either 
an RDD of String,
+This conversion can be done using `SparkSession.read().json()` on either 
an Dataset of String,
--- End diff --

Java example uses `Dataset` as below:

![2017-04-11 10 21 
18](https://cloud.githubusercontent.com/assets/6477701/24892067/e34b0538-1eb4-11e7-96cf-933388bc3937.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17602: [MINOR][DOCS] JSON APIs related documentation fix...

2017-04-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17602#discussion_r110810364
  
--- Diff: docs/sql-programming-guide.md ---
@@ -883,7 +883,7 @@ Configuration of Parquet can be done using the 
`setConf` method on `SparkSession
 
 
 Spark SQL can automatically infer the schema of a JSON dataset and load it 
as a `Dataset[Row]`.
-This conversion can be done using `SparkSession.read.json()` on either an 
RDD of String,
+This conversion can be done using `SparkSession.read.json()` on either an 
Dataset of String,
--- End diff --

Scala example uses `Dataset` as below:

![2017-04-11 10 21 
12](https://cloud.githubusercontent.com/assets/6477701/24892046/c9c5dfac-1eb4-11e7-938c-fe6be4ef8b39.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17602: [MINOR][DOCS] JSON APIs related documentation fix...

2017-04-10 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/17602

[MINOR][DOCS] JSON APIs related documentation fixes

## What changes were proposed in this pull request?

This PR proposes corrections related to JSON APIs, including rendering 
links in Python documentation, replacing `RDD` to `Dataset` in programing 
guide, adding missing description about JSON Lines consistently in 
`DataFrameReader.json` in Python API and de-duplicating little bit of 
`DataFrameReader.json` in Scala/Java API .

## How was this patch tested?

Manually build the documentation via `jekyll build`. Corresponding 
snapstops will be left on the codes.

Note that currently there are Javadoc8 breaks in several places. These are 
proposed to be handled in https://github.com/apache/spark/pull/17477. So, this 
PR does not fix those.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark minor-json-documentation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17602.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17602


commit fd64e49cf715ca8a5e04321415adacdb955dad5a
Author: hyukjinkwon 
Date:   2017-04-11T03:37:01Z

JSON related documentation fixes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org