Re: Standalone mode connection failure from worker node to master
I am also facing the same issue, anyone figured it? Please help -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Standalone-mode-connection-failure-from-worker-node-to-master-tp23101p23816.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: java.lang.IllegalStateException: unread block data
same issue anyone help please -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-IllegalStateException-unread-block-data-tp20668p20745.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Standalone spark cluster. Can't submit job programmatically -> java.io.InvalidClassException
No able to get it , how did you exactly fix it? i am using maven build i downloaded spark1.1.1 and then packaged with mvn -Dhadoop.version=1.2.1 -DskipTests clean package but i keep getting invalid class exceptions -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Standalone-spark-cluster-Can-t-submit-job-programmatically-java-io-InvalidClassException-tp13456p20624.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
spark 1.1.1 Maven dependency
Dear All, I am using spark streaming, It was working fine when i was using spark1.0.2, now i repeatedly getting few issue Like class not found, i am using the same pom.xml with the updated version for all spark modules i am using spark-core,streaming, streaming with kafka modules.. Its constantly keeps throwing errors for no commons-configuation, commons-langs, logging How to get all the dependencies for running spark streaming.. Is there any way or we just have to find by trial and error methord? my pom dependencies javax.servlet servlet-api 2.5 org.apache.spark spark-core_2.10 1.0.2 org.apache.spark spark-streaming_2.10 1.0.2 org.apache.spark spark-streaming-kafka_2.10 1.0.2 org.slf4j slf4j-log4j12 1.7.5 commons-logging commons-logging 1.1.1 commons-configuration commons-configuration 1.6 Am i missing something here? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-1-1-Maven-dependency-tp20590.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Submiting Spark application through code
I am trying to submit spark streaming program, when i submit batch process its working.. but when i do the same with spark streaming.. it throws Anyone please help 14/11/26 17:42:25 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:50016 14/11/26 17:42:25 INFO server.Server: jetty-8.1.14.v20131031 14/11/26 17:42:25 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 14/11/26 17:42:25 INFO ui.SparkUI: Started SparkUI at http://172.18.152.36:4040 14/11/26 17:42:30 INFO spark.SparkContext: Added JAR /Volumes/Official/workspace/ZBI/target/ZBI-0.0.1-SNAPSHOT-jar-with-dependencies.jar at http://172.18.152.36:50016/jars/ZBI-0.0.1-SNAPSHOT-jar-with-dependencies.jar with timestamp 1417003949988 Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/ui/SparkUITab at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:161) at org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:91) at org.apache.spark.streaming.api.java.JavaStreamingContext.(JavaStreamingContext.scala:78) at com.zoho.zbi.spark.SparkStreaming$1.create(SparkStreaming.java:51) at org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$9.apply(JavaStreamingContext.scala:564) at org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$9.apply(JavaStreamingContext.scala:563) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:545) at org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:563) at org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala) at com.zoho.zbi.spark.SparkStreaming.callStreaming(SparkStreaming.java:98) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Submiting-Spark-application-through-code-tp17452p19934.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Streaming: foreachRDD network output
Any one, any luck? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-foreachRDD-network-output-tp15205p18251.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Submiting Spark application through code
Thanks boss its working :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Submiting-Spark-application-through-code-tp17452p18250.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: java.io.NotSerializableException: org.apache.spark.SparkEnv
Hi Thanks for replying, I have posted my code in http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-getOrCreate-tc18060.html -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-NotSerializableException-org-apache-spark-SparkEnv-tp10641p18172.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Streaming window operations not producing output
hi TD, I would like to run streaming 24/7 and trying to use get or create but its not working please can you help on this http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-getOrCreate-tc18060.html -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-window-operations-not-producing-output-tp17504p18169.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: NullPointerException on reading checkpoint files
my goal is to run streaming 24x 7 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-on-reading-checkpoint-files-tp7306p18168.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: NullPointerException on reading checkpoint files
Hi TD, I am trying to use getorCreate but i am getting java.io not serialised please help i have posted it in a different thread. http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-getOrCreate-tc18060.html -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-on-reading-checkpoint-files-tp7306p18167.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Streaming getOrCreate
Anybody any luck? I am also trying to set NONE to delete key from state, will null help? how to use scala none in java My code goes this way public static class ScalaLang { public static Option none() { return (Option) None$.MODULE$; } } Function2, Optional, Optional> updateFunction = new Function2, Optional, Optional>() { @Override public Optional call(List values, Optional state) { Double newSum = state.or(0D); if(values.isEmpty()){ System.out.println("empty value"); return null; I WANT TO RETURN NONE TO DELETE KEY but when i set ScalaLang.<>none(); it shows error }else{ for (double i : values) { newSum += i; } return Optional.of(newSum); } } }; -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-getOrCreate-tp18060p18139.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark Streaming getOrCreate
Hi All I am using SparkStreaming.. public class SparkStreaming{ SparkConf sparkConf = new SparkConf().setAppName("Sales"); JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(5000)); String chkPntDir = ""; //get checkpoint dir jssc.checkpoint(chkPntDir); JavaSpark jSpark = new JavaSpark(); //this is where i have the business logic JavaStreamingContext newJSC = jSpark.callTest(jssc); newJSC.start(); newJSC.awaitTermination(); } where public class JavaSpark implements Serializable{ public JavaStreamingContext callTest(JavaStreamingContext){ logic goes here } } is working fine But i try getOrCreate as i want spark streaming to run 24/7 JavaStreamingContextFactory contextFactory = new JavaStreamingContextFactory() { @Override public JavaStreamingContext create() { SparkConf sparkConf = new SparkConf().setAppName(appName).setMaster(master); JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(5000)); jssc.checkpoint("checkpointDir"); JavaSpark js = new JavaSpark(); JavaStreamingContext newJssc = js.callTest(jssc);// This is where all the logic is return newJssc; } JavaStreamingContext context = JavaStreamingContext.getOrCreate(checkPointDir, contextFactory); context.start(); context.awaitTermination(); Not working 14/11/04 19:40:37 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. Exception in thread "Thread-37" org.apache.spark.SparkException: Job aborted due to stage failure: All masters are unresponsive! Giving up. at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 14/11/04 19:40:37 ERROR JobScheduler: Error running job streaming job 141511018 ms.0 org.apache.spark.SparkException: Job aborted due to stage failure: All masters are unresponsive! Giving up. at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) Please help me out. Earlier the biz logic was inside the ContextFactory but i got org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: com.zoho.zbi.spark.PaymentStreaming$1 Then i added private static final long serialVersionUID = -5751968749110204082L; in all the method dint work either Got 14/11/04 19:40:37 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. Exception in thread "Thread-37" org.apache.spark.SparkException: Job aborted due to stage failure: All masters are unresponsive! Giving up. at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apach
Re: java.io.NotSerializableException: org.apache.spark.SparkEnv
Same Issue .. How did you solve it? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-NotSerializableException-org-apache-spark-SparkEnv-tp10641p18047.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Submiting Spark application through code
I tried running it but dint work public static final SparkConf batchConf= new SparkConf(); String master = "spark://sivarani:7077"; String spark_home ="/home/sivarani/spark-1.0.2-bin-hadoop2/"; String jar = "/home/sivarani/build/Test.jar"; public static final JavaSparkContext batchSparkContext = new JavaSparkContext(master,"SparkTest",spark_home,new String[] {jar}); public static void main(String args[]){ runSpark(0,"TestSubmit");} public static void runSpark(int crit, String dataFile){ JavaRDD logData = batchSparkContext.textFile(input, 10); flatMap maptoparr reduceByKey List> output1 = counts.collect(); } This works fine with spark-submit but when i tried to submit through code LeadBatchProcessing.runSpark(0, "TestSubmit.csv"); I get this following error HTTP Status 500 - javax.servlet.ServletException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 4 times, most recent failure: TID 29 on host 172.18.152.36 failed for unknown reason Job aborted due to stage failure: Task 0.0:0 failed 4 times, most recent failure: TID 29 on host 172.18.152.36 failed for unknown reason Driver stacktrace: Any Advice on this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Submiting-Spark-application-through-code-tp17452p17797.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark Streaming Issue not running 24/7
The problem is simple I want a to stream data 24/7 do some calculations and save the result in a csv/json file so that i could use it for visualization using dc.js/d3.js I opted for spark streaming on yarn cluster with kafka tried running it for 24/7 Using GroupByKey and updateStateByKey to have the computed historical data Initially streaming is working fine.. but after few hours i am getting 14/10/30 23:48:49 ERROR TaskSetManager: Task 2485162.0:3 failed 4 times; aborting job 14/10/30 23:48:50 ERROR JobScheduler: Error running job streaming job 141469227 ms.1 org.apache.spark.SparkException: Job aborted due to stage failure: Task 2485162.0:3 failed 4 times, most recent failure: Exception failure in TID 478548 on host 172.18.152.36: java.lang.ArrayIndexOutOfBoundsException Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) I guess its due to the GroupByKey and updateStateByKey, i tried GroupByKey(100) increased partition Also when data is in state say for eg 10th sec 1000 records are in state, 100th sec 20,000 records are in state out of which 19,000 records are not updated how to remove them from state.. UpdateStateByKey(none) how and when to do that, how we will know when to send none, and save the data before setting none? I also tried not sending any data a few hours but check the web ui i am getting task FINISHED app-20141030203943- NewApp 0 6.0 GB 2014/10/30 20:39:43 hadoop FINISHED 4.2 h This makes me confused.. In the code it says awaitTermination, but did not terminate the task.. will streaming stop if no data is received for a significant amount of time? Is there any doc available on how much time spark will run when no data is streamed? Any Doc available -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Issue-not-running-24-7-tp17791.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Streaming Question regarding lazy calculations
Hi All I am using spark streaming with kafka streaming for 24/7 My Code is something like JavaDStream data = messages.map(new MapData()); JavaPairDStream> records = data.mapToPair(new dataPair()).groupByKey(100); records.print(); JavaPairDStream result = records.mapValues(new Sum()).updateStateByKey(updateFunction).cache(); result.foreach{ write(result,path); //writing result to the path } Since result holds historcal value , even when there is no input record for 10 min , no change in result i tend to write it again and again for every 3 secs i tried checking if(record.count() > 0 ) { result.foreach(write file) } But spark is not considering my check.. Any insight on how to achieve it -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-Question-regarding-lazy-calculations-tp17636.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Submiting Spark application through code
Hi I know we can create spark context with new JavaStreamingContext(master, appName, batchDuration, sparkHome, jarFile) but to run the application we will have to use spark-home/spark-submit --class NetworkCount i want skip submitting manually, i wanted to invoke this spark app when a condition is true from a while running another java application -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Submiting-Spark-application-through-code-tp17452p17461.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Streaming - How to remove state for key
I am having the same issue, i am using update stateBykey and over a period a set of data will not change i would like save it and delete it from state.. have you found the answer? please share your views. Thanks for your time -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-How-to-remove-state-for-key-tp5534p17454.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Streaming Applications
Hi tdas, is it possible to run spark 24/7, i am using updateStateByKey and i am streaming 3lac records in 1/2 hr, i am not getting the correct result also i am not not able to run spark streaming for 24/7 after hew hrs i get array out of bound exception even if i am not streaming anything? btw will the streaming end if i am not streaming anything for a few minutes? Please help me out here Also is it possible to delete state? since its growing exponentially, also not all the data are updated.. at some point we have to reset it na? how to do that.. i am able to work with batch processing using spark successfully but streaming is quite a mystery for me i am submitting spark application in the following fashion bin/spark-submit --class "NetworkCount" --master spark://abc.test.com:7077 try/simple-project/target/simple-project-1.0-jar-with-dependencies.jar But is there any other way to submit spark application through the code? like for example i am checking for a condition if true i wanted to run the spark application (isConditionTrue){ runSpark("NetworkCount","masterurl","jar") } I am aware we can set the jar and master url with spark context, but how to run it from code automatically when a condition comes true without actually using spark-submit Is it possible? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Applications-tp16976p17453.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Submiting Spark application through code
Hi, i am submitting spark application in the following fashion bin/spark-submit --class "NetworkCount" --master spark://abc.test.com:7077 try/simple-project/target/simple-project-1.0-jar-with-dependencies.jar But is there any other way to submit spark application through the code? like for example i am checking for a condition if true i wanted to run the spark application (isConditionTrue){ runSpark("NetworkCount","masterurl","jar") } I am aware we can set the jar and master url with spark context, but how to run it from code automatically when a condition comes true without actually using spark-submit Is it possible? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Submiting-Spark-application-through-code-tp17452.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: checkpoint and not running out of disk space
I am new to spark, i am using Spark streaming with Kafka.. My streaming duration is 1s.. Assume i get 100 records in 1s and 120 records in 2s and 80 records in 3s --> {sec 1 1,2,...100} --> {sec 2 1,2..120} --> {sec 3 1,2,..80} I apply my logic in sec 1 and have a result => result1 i want to use result1 in 2s and have a combined result of both result1 and 120 records of 2s as => result2 I tried to cache the result but i am not able to get the cached result1 in 2s is it possible.. or show some light on how to achieve my goal here? JavaPairReceiverInputDStream messages = KafkaUtils.createStream(jssc, String.class,String.class, StringDecoder.class,StringDecoder.class, kafkaParams, topicMap, StorageLevel.MEMORY_AND_DISK_SER_2()); i process messages and find word which is the result for 1s ... if(resultCp!=null){ resultCp.print(); result = resultCp.union(words.mapValues(new Sum())); }else{ result = words.mapValues(new Sum()); } resultCp = result.cache(); when in 2s the resultCp should not be null but it returns null value so at any given time i have that particular seconds data alone i want to find the cumulative result. Do any one know how to do it.. I learnt that once spark streaming is started jssc.start() the control is no more at our end it lies with spark.. so is it possible to send the result of 1s to 2s to find the accumulated value? Any help is much much appreciated.. Thanks in advance -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/checkpoint-and-not-running-out-of-disk-space-tp1525p16790.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org