Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20937#discussion_r180000138
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
    @@ -361,6 +361,15 @@ class JacksonParser(
             // For such records, all fields other than the field configured by
             // `columnNameOfCorruptRecord` are set to `null`.
             throw BadRecordException(() => recordLiteral(record), () => None, 
e)
    +      case e: CharConversionException if options.encoding.isEmpty =>
    +        val msg =
    +          """Failed to parse a character. Encoding was detected 
automatically.
    --- End diff --
    
    ok, speaking about this concrete exception handling. The exception with the 
message is thrown ONLY when options.encoding.isEmpty is `true`. It means 
`encoding` is not set and actual encoding of a file was autodetected. The `msg` 
says about that actually:  `Encoding was detected automatically`.
    
    Maybe `encoding` was detected correctly but the file contains a wrong char. 
In that case, the first sentence says this `Failed to parse a character`. The 
same could happen if you set `encoding` explicitly because you cannot guarantee 
that inputs are alway correct.
    
    > I think automatic detection is true only when multuline is enabled.
    
    Wrong char in input file can be in a file with UTF-8 read with `multiline = 
false` and in a file in UTF-16LE with `multiline = true`.
    
    My point is the mention of the `multiline` option in the error message 
doesn't help to user to solve the issue. A possible solution is to set 
`encoding` explicitly - what the message says actually.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to