Forgot to reply-all last message, whoops. Not very good at email.

You need to normalize the CSV with a parser that can escape commas inside
of strings
Not sure if Spark has an option for this?


On Wed, May 25, 2022 at 4:37 PM Sid <flinkbyhe...@gmail.com> wrote:

> Thank you so much for your time.
>
> I have data like below which I tried to load by setting multiple options
> while reading the file but however, but I am not able to consolidate the
> 9th column data within itself.
>
> [image: image.png]
>
> I tried the below code:
>
> df = spark.read.option("header", "true").option("multiline",
> "true").option("inferSchema", "true").option("quote",
>
>                                 '"').option(
>     "delimiter", ",").csv("path")
>
> What else I can do?
>
> Thanks,
> Sid
>
>
> On Thu, May 26, 2022 at 1:46 AM Apostolos N. Papadopoulos <
> papad...@csd.auth.gr> wrote:
>
>> Dear Sid,
>>
>> can you please give us more info? Is it true that every line may have a
>> different number of columns? Is there any rule followed by
>>
>> every line of the file? From the information you have sent I cannot
>> fully understand the "schema" of your data.
>>
>> Regards,
>>
>> Apostolos
>>
>>
>> On 25/5/22 23:06, Sid wrote:
>> > Hi Experts,
>> >
>> > I have below CSV data that is getting generated automatically. I can't
>> > change the data manually.
>> >
>> > The data looks like below:
>> >
>> > 2020-12-12,abc,2000,,INR,
>> > 2020-12-09,cde,3000,he is a manager,DOLLARS,nothing
>> > 2020-12-09,fgh,,software_developer,I only manage the development part.
>> >
>> > Since I don't have much experience with the other domains.
>> >
>> > It is handled by the other people.,INR
>> > 2020-12-12,abc,2000,,USD,
>> >
>> > The third record is a problem. Since the value is separated by the new
>> > line by the user while filling up the form. So, how do I handle this?
>> >
>> > There are 6 columns and 4 records in total. These are the sample
>> records.
>> >
>> > Should I load it as RDD and then may be using a regex should eliminate
>> > the new lines? Or how it should be? with ". /n" ?
>> >
>> > Any suggestions?
>> >
>> > Thanks,
>> > Sid
>>
>> --
>> Apostolos N. Papadopoulos, Associate Professor
>> Department of Informatics
>> Aristotle University of Thessaloniki
>> Thessaloniki, GREECE
>> tel: ++0030312310991918
>> email: papad...@csd.auth.gr
>> twitter: @papadopoulos_ap
>> web: http://datalab.csd.auth.gr/~apostol
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>

Reply via email to