Hi
I work with pyspark & spark 1.5.2
Currently saving rdd into csv file is very very slow , uses 2% CPU only
I use :
my_dd.write.format("com.databricks.spark.csv").option("header",
"false").save('file:///my_folder')
Is there a way to save csv faster ?
Many thanks
> On 10 Feb 2016, at 10:56, Eli Super <eli.su...@gmail.com> wrote:
>
> Hi
>
> I work with pyspark & spark 1.5.2
>
> Currently saving rdd into csv file is very very slow , uses 2% CPU only
>
> I use :
> my_dd.write.format("com.databricks.spark.
<eli.su...@gmail.com> wrote:
> Hi
>
> I work with pyspark & spark 1.5.2
>
> Currently saving rdd into csv file is very very slow , uses 2% CPU only
>
> I use :
> my_dd.write.format("com.databricks.spark.csv").option("header",
> "fa
> at
> > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1056)
> > at
> > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:998)
>
n Nov 9, 2015, at 3:43 PM, swetha <swethakasire...@gmail.com> wrote:
> >
> > Hi,
> >
> > I see unwanted Warning when I try to save a Parquet file in hdfs in
> Spark.
> > Please find below the code and the Warning message. Any idea as to how to
> > avoid the un
Which Spark version used?
It was fixed in Parquet-1.7x, so Spark-1.5.x will be work.
> On Nov 9, 2015, at 3:43 PM, swetha <swethakasire...@gmail.com> wrote:
>
> Hi,
>
> I see unwanted Warning when I try to save a Parquet file in hdfs in Spark.
> Please find below
Please see
https://issues.apache.org/jira/browse/PARQUET-124
> On Nov 8, 2015, at 11:43 PM, swetha <swethakasire...@gmail.com> wrote:
>
> Hi,
>
> I see unwanted Warning when I try to save a Parquet file in hdfs in Spark.
> Please find below the code and the Warning mes
Hi,
I see unwanted Warning when I try to save a Parquet file in hdfs in Spark.
Please find below the code and the Warning message. Any idea as to how to
avoid the unwanted Warning message?
activeSessionsToBeSaved.saveAsNewAPIHadoopFile("test", classOf[Void],
classOf[ActiveSession],
On 25 Sep 2015, at 03:35, Zhang, Jingyu
> wrote:
I got following exception when I run
JavPairRDD.values().saveAsTextFile("s3n://bucket); Can anyone help me out?
thanks
15/09/25 12:24:32 INFO SparkContext: Successfully stopped
I got following exception when I run
JavPairRDD.values().saveAsTextFile("s3n://bucket);
Can anyone help me out? thanks
15/09/25 12:24:32 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.lang.NoClassDefFoundError:
org/jets3t/service/ServiceException
at
Hi,
I am spark 1.1.0. I need a help regarding saving rdd in a JSON file?
How to do that? And how to mentions hdfs path in the program.
-Naveen
One approach would be to use SaveAsNewAPIHadoop file and specify
jsonOutputFormat.
Another simple one would be like:
val rdd = sc.parallelize(1 to 100)
val json = rdd.map(x = {
val m: Map[String, Int] = Map(id - x)
new JSONObject(m) })
json.saveAsTextFile(output)
Thanks
Best
We have RDD.saveAsTextFile and RDD.saveAsObjectFile for saving the output
to any location specified. The params to be provided are:
path of storage location
no. of partitions
For giving an hdfs path we use the following format:
/user/user-name/directory-to-sore/
On Tue, Nov 11, 2014 at 6:28 PM,
13 matches
Mail list logo