Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20937#discussion_r180000138 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -361,6 +361,15 @@ class JacksonParser( // For such records, all fields other than the field configured by // `columnNameOfCorruptRecord` are set to `null`. throw BadRecordException(() => recordLiteral(record), () => None, e) + case e: CharConversionException if options.encoding.isEmpty => + val msg = + """Failed to parse a character. Encoding was detected automatically. --- End diff -- ok, speaking about this concrete exception handling. The exception with the message is thrown ONLY when options.encoding.isEmpty is `true`. It means `encoding` is not set and actual encoding of a file was autodetected. The `msg` says about that actually: `Encoding was detected automatically`. Maybe `encoding` was detected correctly but the file contains a wrong char. In that case, the first sentence says this `Failed to parse a character`. The same could happen if you set `encoding` explicitly because you cannot guarantee that inputs are alway correct. > I think automatic detection is true only when multuline is enabled. Wrong char in input file can be in a file with UTF-8 read with `multiline = false` and in a file in UTF-16LE with `multiline = true`. My point is the mention of the `multiline` option in the error message doesn't help to user to solve the issue. A possible solution is to set `encoding` explicitly - what the message says actually.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org