[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

cloud-fan Wed, 15 Feb 2017 22:17:13 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16386#discussion_r101451677
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -159,18 +159,21 @@ def json(self, path, schema=None, 
primitivesAsString=None, prefersDecimal=None,
                  allowComments=None, allowUnquotedFieldNames=None, 
allowSingleQuotes=None,
                  allowNumericLeadingZero=None, 
allowBackslashEscapingAnyCharacter=None,
                  mode=None, columnNameOfCorruptRecord=None, dateFormat=None, 
timestampFormat=None,
    -             timeZone=None):
    +             timeZone=None, wholeFile=None):
             """
    -        Loads a JSON file (`JSON Lines text format or newline-delimited 
JSON
    -        <http://jsonlines.org/>`_) or an RDD of Strings storing JSON 
objects (one object per
    -        record) and returns the result as a :class`DataFrame`.
    +        Loads a JSON file and returns the results as a :class:`DataFrame`.
    +
    +        Both JSON (one record per file) and `JSON Lines 
<http://jsonlines.org/>`_
    +        (newline-delimited JSON) are supported and can be selected with 
the `wholeFile` parameter.
     
             If the ``schema`` parameter is not specified, this function goes
             through the input once to determine the input schema.
     
             :param path: string represents path to the JSON dataset,
                          or RDD of Strings storing JSON objects.
             :param schema: an optional :class:`pyspark.sql.types.StructType` 
for the input schema.
    +        :param wholeFile: parse one record, which may span multiple lines, 
per file. If None is
    --- End diff --
    
    the parameters docs come with the same order of the parameter list, let's 
move the `wholeFile` doc to the end



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

Reply via email to