Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20727#discussion_r172682591
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala
 ---
    @@ -42,7 +52,12 @@ class HadoopFileLinesReader(
           Array.empty)
         val attemptId = new TaskAttemptID(new TaskID(new JobID(), 
TaskType.MAP, 0), 0)
         val hadoopAttemptContext = new TaskAttemptContextImpl(conf, attemptId)
    -    val reader = new LineRecordReader()
    +    val reader = if (lineSeparator != "\n") {
    +      new LineRecordReader(lineSeparator.getBytes("UTF-8"))
    --- End diff --
    
    I mean, it's initially an unicode string via datasource interface and we 
need to somehow convert it to bytes once as it takes bytes. Do you mean adding 
another option for specifying charset or did I maybe miss something?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to