[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...

justinuang Tue, 16 Oct 2018 11:40:02 -0700

Github user justinuang commented on the issue:

    https://github.com/apache/spark/pull/22503
  
    So Hadoop's LineReader looks like it handles CR, LF, CRLF:
    
    
https://github.com/apache/hadoop/blob/f90c64e6242facf38c2baedeeda42e4a8293e642/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java#L36
    
    Univocity handles CR, LF, CRLF (the logic is a bit convoluted but it looks 
like they have the same behavior in that if they see a CR, they will look for a 
LF next):
    
    
https://github.com/uniVocity/univocity-parsers/blob/master/src/main/java/com/univocity/parsers/common/input/LineSeparatorDetector.java
    
    I do agree we should expose the option of `setLineSeparator`, but 
regardless of that, the default behavior of handling CR, LF, CRLF should be the 
same between single line and multiline mode.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...

Reply via email to