Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20937#discussion_r180014167 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -366,6 +366,9 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { * `java.text.SimpleDateFormat`. This applies to timestamp type.</li> * <li>`multiLine` (default `false`): parse one record, which may span multiple lines, * per file</li> + * <li>`encoding` (by default it is not set): allows to forcibly set one of standard basic + * or extended charsets for input jsons. For example UTF-8, UTF-16BE, UTF-32. If the encoding + * is not specified (by default), it will be detected automatically.</li> --- End diff -- > If encoding is not set, it will be detected by Jackson independently from multiline. Jackson detects but Spark doesn't correctly when `multiLine` is disabled even with this PR, as we talked. We found many holes. Why did you bring this again?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org