Maxim Gekk created SPARK-23725:
----------------------------------

             Summary: Improve Hadoop's LineReader to support charsets different 
from UTF-8
                 Key: SPARK-23725
                 URL: https://issues.apache.org/jira/browse/SPARK-23725
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Maxim Gekk


If the record delimiter is not specified, Hadoop LineReader splits 
lines/records by '\n', '\r' or/and '\r\n' in UTF-8 encoding: 
[https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java#L173-L177]
 . The implementation should be improved to support any charset.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to