Hi,

The writes, in terms of number of records written simultaneously, can be
increased if you increased the number of partitions. You can try to
increase the number of partitions and check out how it works. There is
though an upper cap (the one that I faced in Ubuntu) on the number of
parallel writes that you can do based on an Operating System set up, and
when that happens you will be able to see the errors.

By the way why are you expecting high CPU utilisation for writes, should it
not be more of IO issue? But I may be wrong.


Regards,
Gourav

On Wed, Feb 10, 2016 at 10:56 AM, Eli Super <eli.su...@gmail.com> wrote:

> Hi
>
> I work with pyspark & spark 1.5.2
>
> Currently saving rdd into csv file is very very slow , uses 2% CPU only
>
> I use :
> my_dd.write.format("com.databricks.spark.csv").option("header",
> "false").save('file:///my_folder')
>
> Is there a way to save csv faster ?
>
> Many thanks
>

Reply via email to