Hi
I work with pyspark & spark 1.5.2
Currently saving rdd into csv file is very very slow , uses 2% CPU only
I use :
my_dd.write.format("com.databricks.spark.csv").option("header",
"false").save('file:///my_folder')
Is there a way to save csv faster ?
Many thanks
> On 10 Feb 2016, at 10:56, Eli Super wrote:
>
> Hi
>
> I work with pyspark & spark 1.5.2
>
> Currently saving rdd into csv file is very very slow , uses 2% CPU only
>
> I use :
> my_dd.write.format("com.databricks.spark.csv").option("header",
>
Hi,
The writes, in terms of number of records written simultaneously, can be
increased if you increased the number of partitions. You can try to
increase the number of partitions and check out how it works. There is
though an upper cap (the one that I faced in Ubuntu) on the number of
parallel