Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20849#discussion_r175282994 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -85,6 +85,12 @@ private[sql] class JSONOptions( val multiLine = parameters.get("multiLine").map(_.toBoolean).getOrElse(false) + /** + * Standard charset name. For example UTF-8, UTF-16 and UTF-32. + * If charset is not specified (None), it will be detected automatically. --- End diff -- Json's schema inference use the text datasource to separate the lines before we go through jackson parser where the charset for newlines should be respected. Shouldn't we better fix text datasource with the hadoop's line reader first?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org