Hello,
My name is Mario Fernández and I´m a Big Data developer, I usually program in
Apache Spark in Java, and we have a big problem to read properly a csv file.
The issue is that:
When I want to read csv file, for instance, with semicolon delimiter, the
dataframe take semicolon like delimiter ant that´s correct, but also take comma
like delimiter and that´s the problem.
I check this problem in Apache Spark 2.10, 1.6.2 DataFrame and also in Apache
Spark 2.11 2.0.2 Dataset, and troubles are the same.
dfFile1 = sqlContext.read() .format("com.databricks.spark.csv")
.schema(customSchema)
.option("charset", "Cp1252")
.option("header", "true")
.option("delimiter", ";")
.load(path);
When I read a csv file like that, dataFrame take like delimiter the yellow
letters:
Number;Name;Surname;Category
129.363;Mathew, Thomas;Johnson;Centers Technician
And the comma between Mathew and Thomas, shouldn´t be take like delimiter.
I would like to know if that´s problem is a bug and you are going to correct or
the way to read simply is like that.
Thank you so much in advance.
Kind Regards.
________________________________
AVISO DE CONFIDENCIALIDAD.
Este correo y la información contenida o adjunta al mismo es privada y
confidencial y va dirigida exclusivamente a su destinatario. everis informa a
quien pueda haber recibido este correo por error que contiene información
confidencial cuyo uso, copia, reproducción o distribución está expresamente
prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por
error, le rogamos lo ponga en conocimiento del emisor y proceda a su
eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.
CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are private and
confidential and intended exclusively for the addressee. everis informs to whom
it may receive it in error that it contains privileged information and its use,
copy, reproduction or distribution is prohibited. If you are not an intended
recipient of this E-mail, please notify the sender, delete it and do not read,
act upon, print, disclose, copy, retain or redistribute any portion of this
E-mail.