[ https://issues.apache.org/jira/browse/SPARK-23724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-23724. ---------------------------------- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20937 [https://github.com/apache/spark/pull/20937] > Custom record separator for jsons in charsets different from UTF-8 > ------------------------------------------------------------------ > > Key: SPARK-23724 > URL: https://issues.apache.org/jira/browse/SPARK-23724 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 2.4.0 > Reporter: Maxim Gekk > Assignee: Maxim Gekk > Priority: Major > Fix For: 2.4.0 > > > The option should define a sequence of bytes between two consecutive json > records. Currently the separator is detected automatically by hadoop library: > > [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java#L185-L254] > > The method is able to recognize only *\r, \n* and *\r\n* in UTF-8 encoding. > It doesn't work in the cases if encoding of input stream is different from > UTF-8. The option should allow to users explicitly set separator/delimiter of > json records. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org