Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20727#discussion_r172360489 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala --- @@ -42,7 +52,12 @@ class HadoopFileLinesReader( Array.empty) val attemptId = new TaskAttemptID(new TaskID(new JobID(), TaskType.MAP, 0), 0) val hadoopAttemptContext = new TaskAttemptContextImpl(conf, attemptId) - val reader = new LineRecordReader() + val reader = if (lineSeparator != "\n") { + new LineRecordReader(lineSeparator.getBytes("UTF-8")) + } else { + // This behavior follows Hive. `\n` covers `\r`, `\r\n` and `\n`. --- End diff -- I initially did this but reverted it back - https://github.com/apache/spark/pull/18581#discussion_r134814393. Let me bring the option back.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org