[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...

HyukjinKwon Tue, 06 Mar 2018 14:29:27 -0800

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20727#discussion_r172682591
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala
 ---
    @@ -42,7 +52,12 @@ class HadoopFileLinesReader(
           Array.empty)
         val attemptId = new TaskAttemptID(new TaskID(new JobID(), 
TaskType.MAP, 0), 0)
         val hadoopAttemptContext = new TaskAttemptContextImpl(conf, attemptId)
    -    val reader = new LineRecordReader()
    +    val reader = if (lineSeparator != "\n") {
    +      new LineRecordReader(lineSeparator.getBytes("UTF-8"))
    --- End diff --
    
    I mean, it's initially an unicode string via datasource interface and we 
need to somehow convert it to bytes once as it takes bytes. Do you mean adding 
another option for specifying charset or did I maybe miss something?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...

Reply via email to