Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20937#discussion_r183511452
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -237,6 +237,9 @@ def json(self, path, schema=None, 
primitivesAsString=None, prefersDecimal=None,
             :param allowUnquotedControlChars: allows JSON Strings to contain 
unquoted control
                                               characters (ASCII characters 
with value less than 32,
                                               including tab and line feed 
characters) or not.
    +        :param encoding: standard encoding (charset) name, for example 
UTF-8, UTF-16LE and UTF-32BE.
    +                         If None is set, the encoding of input JSON will 
be detected automatically
    +                         when the multiLine option is set to ``true``.
    --- End diff --
    
    No, it doesn't. If it had been true, it would break backward compatibility. 
In the comment, we just want to highlight that encoding auto-detection (it 
means **correct** auto-detection in all cases) is officially supported in the 
multiLine mode only.
    
    In per-line mode, the auto-detection mechanism (when `encoding` is not set) 
can fail in some cases, for example if actual encoding of json file is `UTF-16` 
with BOM but in some case it works (file's encoding is `UTF-8` and actual line 
separator `\n` for example). That's why @HyukjinKwon suggested to mention only 
working case.
      


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to