You can use newAPIHadoopFile
import org.apache.hadoop.io.LongWritable
import org.apache.hadoop.io.Text
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat
val conf = new Configuration
conf.set("textinputformat.record.delimiter", "\r\n")
val df = sc.newAPIHadoopFile("path/to/file", classOf[TextInputFormat],
classOf[LongWritable], classOf[Text], conf).map(_._2.toString).toDF()
You would get a dataframe with just a single string column. You'd have to
split that column and make it into columnar format, it can be done. If you
need help feel free to ping back.
Regards
On Tue, Mar 24, 2020 at 1:23 AM Steven Parkes <[email protected]> wrote:
> SPARK-26108 <https://issues.apache.org/jira/browse/SPARK-26108> / PR#23080
> <https://github.com/apache/spark/pull/23080> added a require on
> CSVOptions#lineSeparator to be a single character.
>
> AFAICT, this keeps us from writing CSV files with \r\n line terminators.
>
> Wondering if this was intended or a bug? Is there an alternative mechanism
> or something else I'm missing?
>