lto:sumit.kha...@askme.in]
> *Sent:* 29 July 2016 13:41
> *To:* Gourav Sengupta <gourav.sengu...@gmail.com>
> *Cc:* user <user@spark.apache.org>
> *Subject:* Re: how to save spark files as parquets efficiently
>
>
>
> Hey Gourav,
>
>
>
>
.
Thanks,
Ewan
From: Sumit Khanna [mailto:sumit.kha...@askme.in]
Sent: 29 July 2016 13:41
To: Gourav Sengupta <gourav.sengu...@gmail.com>
Cc: user <user@spark.apache.org>
Subject: Re: how to save spark files as parquets efficiently
Hey Gourav,
Well so I think that it is my ex
Hey Gourav,
Well so I think that it is my execution plan that is at fault. So basically
df.write as a spark job on localhost:4040/ well being an action will
include the time taken for all the umpteen transformation on it right? All
I wanted to know is "what apt env/config params are needed to
Hi,
The default write format in SPARK is parquet. And I have never faced any
issues writing over a billion records in SPARK. Are you using
virtualization by any chance or an obsolete hard disk or Intel Celeron may
be?
Regards,
Gourav Sengupta
On Fri, Jul 29, 2016 at 7:27 AM, Sumit Khanna
Hey,
So I believe this is the right format to save the file, as in optimization
is never in the write part, but with the head / body of my execution plan
isnt it?
Thanks,
On Fri, Jul 29, 2016 at 11:57 AM, Sumit Khanna
wrote:
> Hey,
>
> master=yarn
> mode=cluster
>
>