Re: \r\n in csv output

Vipul Rajan Mon, 23 Mar 2020 15:48:47 -0700

You can use newAPIHadoopFile

import org.apache.hadoop.io.LongWritable
import org.apache.hadoop.io.Text
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat
val conf = new Configuration


conf.set("textinputformat.record.delimiter", "\r\n")

val df = sc.newAPIHadoopFile("path/to/file", classOf[TextInputFormat],
classOf[LongWritable], classOf[Text], conf).map(_._2.toString).toDF()

You would get a dataframe with just a single string column. You'd have to
split that column and make it into columnar format, it can be done. If you
need help feel free to ping back.

Regards

On Tue, Mar 24, 2020 at 1:23 AM Steven Parkes <[email protected]> wrote:

> SPARK-26108 <https://issues.apache.org/jira/browse/SPARK-26108> / PR#23080
> <https://github.com/apache/spark/pull/23080> added a require on
> CSVOptions#lineSeparator to be a single character.
>
> AFAICT, this keeps us from writing CSV files with \r\n line terminators.
>
> Wondering if this was intended or a bug? Is there an alternative mechanism
> or something else I'm missing?
>

Re: \r\n in csv output

Reply via email to