Here's an example of a Cassandra etl that you can follow which should exit on its own. I'm using it as a blueprint for revolving spark streaming apps on top of.
For me, I kill the streaming app w system.exit after a sufficient amount of data is collected. That seems to work for most any scenario... But you I guess could also kill on the stream handler side as well if you are writing a custom dstream. https://github.com/jayunit100/SparkBlueprint/blob/master/src/main/scala/sparkapps/tweetstream/Processor.scala > On Dec 5, 2014, at 1:50 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > > Batch is the batch duration that you are specifying while creating the > StreamingContext, so at the end of every batch's computation the data will > get flushed to Cassandra, and why are you stopping your program with Ctrl + > C? You can always specify the time with the sc.awaitTermination(Duration) > > Thanks > Best Regards > >> On Fri, Dec 5, 2014 at 11:53 AM, <m.sar...@accenture.com> wrote: >> Hi Gerard/Akhil, >> >> By "how do I specify a batch" I was trying to ask that when does the data in >> the JavaDStream gets flushed into Cassandra table?. >> I read somewhere that the streaming data in batches gets written in >> Cassandra. This batch can be of some particular time, or one particular run. >> That was what I was trying to understand, how to set that "Batch" in my >> program. Because if a batch means one cycle run of my streaming app, then >> in my app, I'm hitting a Ctrl+C to kill the program. So the program is >> terminating, and would the data get inserted successfully into my Cassandra >> table? >> For example, >> >> in Terminal-A I'm running Kafka-producer to stream-in messages. >> >> Terminal-B I'm running my Streaming App. In my App there is a line >> jssc.awaitTermination(); which will keep running my App till I kill it. >> Eventually I am hitting Ctrl+C in my App terminal, i.e. Terminal-B and >> killing it. So its a kind of ungraceful termination. So in this case will >> the data in my App DStream get written into Cassandra? >> >> >> >> Thanks and Regards, >> >> Md. Aiman Sarosh. >> Accenture Services Pvt. Ltd. >> Mob #: (+91) - 9836112841. >> >> From: Gerard Maas <gerard.m...@gmail.com> >> Sent: Thursday, December 4, 2014 10:22 PM >> To: Akhil Das >> Cc: Sarosh, M.; user@spark.apache.org >> Subject: Re: Spark-Streaming: output to cassandra >> >> I guess he's already doing so, given the 'saveToCassandra' usage. >> What I don't understand is the question "how do I specify a batch". That >> doesn't make much sense to me. Could you explain further? >> >> -kr, Gerard. >> >>> On Thu, Dec 4, 2014 at 5:36 PM, Akhil Das <ak...@sigmoidanalytics.com> >>> wrote: >>> You can use the datastax's Cassandra connector. >>> >>> Thanks >>> Best Regards >>> >>>> On Thu, Dec 4, 2014 at 8:21 PM, <m.sar...@accenture.com> wrote: >>>> Hi, >>>> >>>> I have written the code below which is streaming data from kafka, and >>>> printing to the console. >>>> I want to extend this, and want my data to go into Cassandra table instead. >>>> >>>> JavaStreamingContext jssc = new JavaStreamingContext("local[4]", >>>> "SparkStream", new Duration(1000)); >>>> JavaPairReceiverInputDStream<String, String> messages = >>>> KafkaUtils.createStream(jssc, args[0], args[1], topicMap ); >>>> >>>> System.out.println("Connection done!"); >>>> JavaDStream<String> data = messages.map(new Function<Tuple2<String, >>>> String>, String>() >>>> { >>>> public String call(Tuple2<String, String> message) >>>> { >>>> return message._2(); >>>> } >>>> } >>>> ); >>>> //data.print(); --> output to console >>>> data.foreachRDD(saveToCassandra("mykeyspace","mytable")); >>>> jssc.start(); >>>> jssc.awaitTermination(); >>>> >>>> >>>> How should I implement the line: >>>> data.foreachRDD(saveToCassandra("mykeyspace","mytable")); >>>> so that data goes into Cassandra, in each batch. And how do I specify a >>>> batch, because if i do Ctrl+C on the console of streaming-job-jar, nothing >>>> will be entered into cassandra for sure since it is getting killed. >>>> >>>> Please help. >>>> >>>> Thanks and Regards, >>>> >>>> Md. Aiman Sarosh. >>>> Accenture Services Pvt. Ltd. >>>> Mob #: (+91) - 9836112841. >>>> >>>> >>>> This message is for the designated recipient only and may contain >>>> privileged, proprietary, or otherwise confidential information. If you >>>> have received it in error, please notify the sender immediately and delete >>>> the original. Any other use of the e-mail by you is prohibited. Where >>>> allowed by local law, electronic communications with Accenture and its >>>> affiliates, including e-mail and instant messaging (including content), >>>> may be scanned by our systems for the purposes of information security and >>>> assessment of internal compliance with Accenture policy. >>>> ______________________________________________________________________________________ >>>> >>>> www.accenture.com >