Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20937#discussion_r183511452 --- Diff: python/pyspark/sql/readwriter.py --- @@ -237,6 +237,9 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None, :param allowUnquotedControlChars: allows JSON Strings to contain unquoted control characters (ASCII characters with value less than 32, including tab and line feed characters) or not. + :param encoding: standard encoding (charset) name, for example UTF-8, UTF-16LE and UTF-32BE. + If None is set, the encoding of input JSON will be detected automatically + when the multiLine option is set to ``true``. --- End diff -- No, it doesn't. If it had been true, it would break backward compatibility. In the comment, we just want to highlight that encoding auto-detection (it means **correct** auto-detection in all cases) is officially supported in the multiLine mode only. In per-line mode, the auto-detection mechanism (when `encoding` is not set) can fail in some cases, for example if actual encoding of json file is `UTF-16` with BOM but in some case it works (file's encoding is `UTF-8` and actual line separator `\n` for example). That's why @HyukjinKwon suggested to mention only working case.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org