
I just started a new spark streaming project. In this phase of the system
all we want to do is save the data we received to hdfs. I after running for
a couple of days it looks like I am missing a lot of data. I wonder if
saveAsTextFile("hdfs:///rawSteamingData²); is overwriting the data I capture
in previous window? I noticed that after running for a couple of days  my
hdfs file system has 25 file. The names are something like ³part-00006². I
used 'hadoop fs ­dus¹ to check the total data captured. While the system was
running I would periodically call Œdus¹ I was surprised sometimes the
numbers of total bytes actually dropped.

Is there a better way to save write my data to disk?

Any suggestions would be appreciated


   public static void main(String[] args) {

      SparkConf conf = new SparkConf().setAppName(appName);

        JavaSparkContext jsc = new JavaSparkContext(conf);

        JavaStreamingContext ssc = new JavaStreamingContext(jsc, new
Duration(5 * 1000));

[ deleted code Š]

data.foreachRDD(new Function<JavaRDD<String>, Void>(){

            private static final long serialVersionUID =


            public Void call(JavaRDD<String> jsonStr) throws Exception {

                jsonStr.saveAsTextFile("hdfs:///rawSteamingData²); //
/rawSteamingData is a directory

                return null;









Reply via email to