Dear Sid,
can you please give us more info? Is it true that every line may have a
different number of columns? Is there any rule followed by
every line of the file? From the information you have sent I cannot
fully understand the "schema" of your data.
Regards,
Apostolos
On 25/5/22 23:06, Sid wrote:
Hi Experts,
I have below CSV data that is getting generated automatically. I can't
change the data manually.
The data looks like below:
2020-12-12,abc,2000,,INR,
2020-12-09,cde,3000,he is a manager,DOLLARS,nothing
2020-12-09,fgh,,software_developer,I only manage the development part.
Since I don't have much experience with the other domains.
It is handled by the other people.,INR
2020-12-12,abc,2000,,USD,
The third record is a problem. Since the value is separated by the new
line by the user while filling up the form. So, how do I handle this?
There are 6 columns and 4 records in total. These are the sample records.
Should I load it as RDD and then may be using a regex should eliminate
the new lines? Or how it should be? with ". /n" ?
Any suggestions?
Thanks,
Sid
--
Apostolos N. Papadopoulos, Associate Professor
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, GREECE
tel: ++0030312310991918
email: papad...@csd.auth.gr
twitter: @papadopoulos_ap
web: http://datalab.csd.auth.gr/~apostol
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org