Is there a way to save csv file fast ?

2016-02-10 Thread Eli Super
Hi I work with pyspark & spark 1.5.2 Currently saving rdd into csv file is very very slow , uses 2% CPU only I use : my_dd.write.format("com.databricks.spark.csv").option("header", "false").save('file:///my_folder') Is there a way to save csv faster ? Many thanks

Re: Is there a way to save csv file fast ?

2016-02-10 Thread Steve Loughran
> On 10 Feb 2016, at 10:56, Eli Super wrote: > > Hi > > I work with pyspark & spark 1.5.2 > > Currently saving rdd into csv file is very very slow , uses 2% CPU only > > I use : > my_dd.write.format("com.databricks.spark.csv").option("header", >

Re: Is there a way to save csv file fast ?

2016-02-10 Thread Gourav Sengupta
Hi, The writes, in terms of number of records written simultaneously, can be increased if you increased the number of partitions. You can try to increase the number of partitions and check out how it works. There is though an upper cap (the one that I faced in Ubuntu) on the number of parallel