Is there a way to save csv file fast ?

2016-02-10 Thread Eli Super
Hi I work with pyspark & spark 1.5.2 Currently saving rdd into csv file is very very slow , uses 2% CPU only I use : my_dd.write.format("com.databricks.spark.csv").option("header", "false").save('file:///my_folder') Is there a way to save csv faster ? Many thanks

Re: Is there a way to save csv file fast ?

2016-02-10 Thread Steve Loughran
> On 10 Feb 2016, at 10:56, Eli Super <eli.su...@gmail.com> wrote: > > Hi > > I work with pyspark & spark 1.5.2 > > Currently saving rdd into csv file is very very slow , uses 2% CPU only > > I use : > my_dd.write.format("com.databricks.spark.

Re: Is there a way to save csv file fast ?

2016-02-10 Thread Gourav Sengupta
<eli.su...@gmail.com> wrote: > Hi > > I work with pyspark & spark 1.5.2 > > Currently saving rdd into csv file is very very slow , uses 2% CPU only > > I use : > my_dd.write.format("com.databricks.spark.csv").option("header", > "fa

Re: parquet.io.ParquetEncodingException Warning when trying to save parquet file in Spark

2015-11-09 Thread Fengdong Yu
> at > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1056) > > at > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:998) >

Re: parquet.io.ParquetEncodingException Warning when trying to save parquet file in Spark

2015-11-09 Thread swetha kasireddy
n Nov 9, 2015, at 3:43 PM, swetha <swethakasire...@gmail.com> wrote: > > > > Hi, > > > > I see unwanted Warning when I try to save a Parquet file in hdfs in > Spark. > > Please find below the code and the Warning message. Any idea as to how to > > avoid the un

Re: parquet.io.ParquetEncodingException Warning when trying to save parquet file in Spark

2015-11-09 Thread Fengdong Yu
Which Spark version used? It was fixed in Parquet-1.7x, so Spark-1.5.x will be work. > On Nov 9, 2015, at 3:43 PM, swetha <swethakasire...@gmail.com> wrote: > > Hi, > > I see unwanted Warning when I try to save a Parquet file in hdfs in Spark. > Please find below

Re: parquet.io.ParquetEncodingException Warning when trying to save parquet file in Spark

2015-11-09 Thread Ted Yu
Please see https://issues.apache.org/jira/browse/PARQUET-124 > On Nov 8, 2015, at 11:43 PM, swetha <swethakasire...@gmail.com> wrote: > > Hi, > > I see unwanted Warning when I try to save a Parquet file in hdfs in Spark. > Please find below the code and the Warning mes

parquet.io.ParquetEncodingException Warning when trying to save parquet file in Spark

2015-11-08 Thread swetha
Hi, I see unwanted Warning when I try to save a Parquet file in hdfs in Spark. Please find below the code and the Warning message. Any idea as to how to avoid the unwanted Warning message? activeSessionsToBeSaved.saveAsNewAPIHadoopFile("test", classOf[Void], classOf[ActiveSession],

Re: Exception on save s3n file (1.4.1, hadoop 2.6)

2015-09-25 Thread Steve Loughran
On 25 Sep 2015, at 03:35, Zhang, Jingyu > wrote: I got following exception when I run JavPairRDD.values().saveAsTextFile("s3n://bucket); Can anyone help me out? thanks 15/09/25 12:24:32 INFO SparkContext: Successfully stopped

Exception on save s3n file (1.4.1, hadoop 2.6)

2015-09-24 Thread Zhang, Jingyu
I got following exception when I run JavPairRDD.values().saveAsTextFile("s3n://bucket); Can anyone help me out? thanks 15/09/25 12:24:32 INFO SparkContext: Successfully stopped SparkContext Exception in thread "main" java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException at

save as file

2014-11-11 Thread Naveen Kumar Pokala
Hi, I am spark 1.1.0. I need a help regarding saving rdd in a JSON file? How to do that? And how to mentions hdfs path in the program. -Naveen

Re: save as file

2014-11-11 Thread Akhil Das
One approach would be to use SaveAsNewAPIHadoop file and specify jsonOutputFormat. Another simple one would be like: val rdd = sc.parallelize(1 to 100) val json = rdd.map(x = { val m: Map[String, Int] = Map(id - x) new JSONObject(m) }) json.saveAsTextFile(output) Thanks Best

Re: save as file

2014-11-11 Thread Ritesh Kumar Singh
We have RDD.saveAsTextFile and RDD.saveAsObjectFile for saving the output to any location specified. The params to be provided are: path of storage location no. of partitions For giving an hdfs path we use the following format: /user/user-name/directory-to-sore/ On Tue, Nov 11, 2014 at 6:28 PM,