Dear Sid,

can you please give us more info? Is it true that every line may have a different number of columns? Is there any rule followed by

every line of the file? From the information you have sent I cannot fully understand the "schema" of your data.

Regards,

Apostolos


On 25/5/22 23:06, Sid wrote:
Hi Experts,

I have below CSV data that is getting generated automatically. I can't change the data manually.

The data looks like below:

2020-12-12,abc,2000,,INR,
2020-12-09,cde,3000,he is a manager,DOLLARS,nothing
2020-12-09,fgh,,software_developer,I only manage the development part.

Since I don't have much experience with the other domains.

It is handled by the other people.,INR
2020-12-12,abc,2000,,USD,

The third record is a problem. Since the value is separated by the new line by the user while filling up the form. So, how do I handle this?

There are 6 columns and 4 records in total. These are the sample records.

Should I load it as RDD and then may be using a regex should eliminate the new lines? Or how it should be? with ". /n" ?

Any suggestions?

Thanks,
Sid

--
Apostolos N. Papadopoulos, Associate Professor
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, GREECE
tel: ++0030312310991918
email: papad...@csd.auth.gr
twitter: @papadopoulos_ap
web: http://datalab.csd.auth.gr/~apostol


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to