subject:"Is there a way to save csv file fast \?"

Is there a way to save csv file fast ?

2016-02-10 Thread Eli Super

Hi I work with pyspark & spark 1.5.2 Currently saving rdd into csv file is very very slow , uses 2% CPU only I use : my_dd.write.format("com.databricks.spark.csv").option("header", "false").save('file:///my_folder') Is there a way to save csv faster ? Many thanks

Re: Is there a way to save csv file fast ?

2016-02-10 Thread Steve Loughran

> On 10 Feb 2016, at 10:56, Eli Super wrote: > > Hi > > I work with pyspark & spark 1.5.2 > > Currently saving rdd into csv file is very very slow , uses 2% CPU only > > I use : > my_dd.write.format("com.databricks.spark.csv").option("header", >

Re: Is there a way to save csv file fast ?

2016-02-10 Thread Gourav Sengupta

Hi, The writes, in terms of number of records written simultaneously, can be increased if you increased the number of partitions. You can try to increase the number of partitions and check out how it works. There is though an upper cap (the one that I faced in Ubuntu) on the number of parallel