Like:
counts.saveAsTestFiles("hdfs://host:port/some/location")
Thanks
Best Regards
On Tue, Sep 29, 2015 at 2:15 AM, Chengi Liu wrote:
> Hi,
> I am going thru this example here:
>
>
Hi,
I am going thru this example here:
https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/kafka_wordcount.py
If I want to write this data on hdfs.
Whats the right way to do this?
Thanks
Just to add rdd.take(1) won't trigger the entire computation, it will just
pull out the first record. You need to do a rdd.count() or rdd.saveAs*Files
to trigger the complete pipeline. How many partitions do you see in the
last stage?
Thanks
Best Regards
On Tue, Aug 4, 2015 at 7:10 AM, ayan guha
Is your data skewed? What happens if you do rdd.count()?
On 4 Aug 2015 05:49, Jasleen Kaur jasleenkaur1...@gmail.com wrote:
I am executing a spark job on a cluster as a yarn-client(Yarn cluster not
an option due to permission issues).
- num-executors 800
- spark.akka.frameSize=1024
I am executing a spark job on a cluster as a yarn-client(Yarn cluster not
an option due to permission issues).
- num-executors 800
- spark.akka.frameSize=1024
- spark.default.parallelism=25600
- driver-memory=4G
- executor-memory=32G.
- My input size is around 1.5TB.
My problem
as opposed to 1.2 min for the slaves).
Any suggestion what the reason might be?
thanks,
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/writing-to-hdfs-on-master-node-much-faster-tp22570.html
Sent from the Apache Spark User List mailing list archive
to 1.2 min for the
slaves).
Any suggestion what the reason might be?
thanks,
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/writing-to-hdfs-on-master-node-much-faster-tp22570.html
Sent from the Apache Spark User List mailing list archive
on the other 2
nodes
-Original Message-
From: Sean Owen [mailto:so...@cloudera.com]
Sent: Monday, April 20, 2015 12:57 PM
To: jamborta
Cc: user@spark.apache.org
Subject: Re: writing to hdfs on master node much faster
What machines are HDFS data nodes -- just your master? that would explain
://apache-spark-user-list.1001560.n3.nabble.com/writing-to-hdfs-on-master-node-much-faster-tp22570.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr
There was already a thread around it if i understood your question
correctly, you can go through this
https://mail-archives.apache.org/mod_mbox/spark-user/201502.mbox/%3ccannjawtrp0nd3odz-5-_ya351rin81q-9+f2u-qn+vruqy+...@mail.gmail.com%3E
Thanks
Best Regards
On Thu, Feb 19, 2015 at 8:16 PM,
Hi all,
In Spark Streaming I want use the Dstream.saveAsTextFiles by bulk writing
because of the normal saveAsTextFiles cannot during the batch interval of
setting.
May be a common pool of writing or another assigned worker for bulk writing?
Thanks!
B/R
Jichao
PS this is the real fix to this issue:
https://issues.apache.org/jira/browse/SPARK-5795
I'd like to merge it as I don't think it breaks the API; it actually
fixes it to work as intended.
On Mon, Feb 16, 2015 at 3:25 AM, Bahubali Jain bahub...@gmail.com wrote:
I used the latest assembly jar and
I used the latest assembly jar and the below as suggested by Akhil to fix
this problem...
temp.saveAsHadoopFiles(DailyCSV,.txt, String.class, String.class,
*(Class)* TextOutputFormat.class);
Thanks All for the help !
On Wed, Feb 11, 2015 at 1:38 PM, Sean Owen so...@cloudera.com wrote:
That
That kinda dodges the problem by ignoring generic types. But it may be
simpler than the 'real' solution, which is a bit ugly.
(But first, to double check, are you importing the correct
TextOutputFormat? there are two versions. You use .mapred. with the
old API and .mapreduce. with the new API.)
to know is
why didn't Spark retry writing file to HDFS? It just shows it as failed job
in Spark UI.
Error:
java.io.IOException: All datanodes x.x.x.x: are bad. Aborting...
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1128
On Sat, Oct 4, 2014 at 5:28 PM, Abraham Jacob abe.jac...@gmail.com wrote:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
Good. There is also a
Are you importing the '.mapred.' version of TextOutputFormat instead
of the new API '.mapreduce.' version?
On Sat, Oct 4, 2014 at 1:08 AM, Abraham Jacob abe.jac...@gmail.com wrote:
Hi All,
Would really appreciate if someone in the community can help me with this. I
have a simple Java spark
Hi Sean/All,
I am importing among various other things the newer mapreduce version -
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import
Hi All,
Would really appreciate if someone in the community can help me with this.
I have a simple Java spark streaming application - NetworkWordCount
SparkConf sparkConf = new
SparkConf().setMaster(yarn-cluster).setAppName(Streaming WordCount);
JavaStreamingContext jssc = new
19 matches
Mail list logo