Maxim Gekk created SPARK-23725: ---------------------------------- Summary: Improve Hadoop's LineReader to support charsets different from UTF-8 Key: SPARK-23725 URL: https://issues.apache.org/jira/browse/SPARK-23725 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.4.0 Reporter: Maxim Gekk
If the record delimiter is not specified, Hadoop LineReader splits lines/records by '\n', '\r' or/and '\r\n' in UTF-8 encoding: [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java#L173-L177] . The implementation should be improved to support any charset. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org