[GitHub] spark pull request #18304: [SPARK-21098] Add lineseparator parameter to csv ...

HyukjinKwon Thu, 15 Jun 2017 01:05:28 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18304#discussion_r122142421
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
    @@ -90,6 +90,7 @@ class CSVOptions(
       val quote = getChar("quote", '\"')
       val escape = getChar("escape", '\\')
       val comment = getChar("comment", '\u0000')
    +  val lineSeparator = parameters.getOrElse("lineSeparator", "\n")
    --- End diff --
    
    To cut this short, up to my knowledge, for the current status (please 
correct me if I am wrong),
    
    CSV read -> cover `\n` and `\r\n`
    CSV read with `wholeFile` (`multiLine`) -> OS dependent newline (by 
Univocity)
    CSV write -> OS dependent newline (by Univocity)
    
    JSON read -> cover `\n` and `\r\n`
    JSON read with `wholeFile` (`multiLine`) -> N/A (it reads the whole file as 
a single record for the current status)
    JSON write -> `\n`
    
    TEXT read -> cover `\n` and `\r\n`
    TEXT write -> `\n`
    
    For hardcorded newline, fixing it to `\n` is probably better. I wouldn't 
mind if this PR is turned to fix this for 'CSV read with wholeFile (multiLine)' 
and 'CSV write'.
    
    For configurable line separator, it would be not configurable in this way 
for 'CSV read'. We should be able to read it back what we write out.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18304: [SPARK-21098] Add lineseparator parameter to csv ...

Reply via email to