Here's an example of a Cassandra etl that you can follow which should exit on 
its own.  I'm using it as a blueprint for revolving spark streaming apps on top 
of.

For me, I kill the streaming app w system.exit after a sufficient amount of 
data is collected.

That seems to work for most any scenario... 

But you I guess could also kill on the stream handler side as well if you are 
writing a custom dstream.

https://github.com/jayunit100/SparkBlueprint/blob/master/src/main/scala/sparkapps/tweetstream/Processor.scala

> On Dec 5, 2014, at 1:50 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote:
> 
> Batch is the batch duration that you are specifying while creating the 
> StreamingContext, so at the end of every batch's computation the data will 
> get flushed to Cassandra, and why are you stopping your program with Ctrl + 
> C? You can always specify the time with the sc.awaitTermination(Duration) 
> 
> Thanks
> Best Regards
> 
>> On Fri, Dec 5, 2014 at 11:53 AM, <m.sar...@accenture.com> wrote:
>> Hi Gerard/Akhil,
>> 
>> By "how do I specify a batch" I was trying to ask that when does the data in 
>> the JavaDStream gets flushed into Cassandra table?. 
>> I read somewhere that the streaming data in batches gets written in 
>> Cassandra. This batch can be of some particular time, or one particular run.
>> That was what I was trying to understand, how to set that "Batch" in my 
>> program. Because if a batch means one cycle run of my  streaming app, then 
>> in my app, I'm hitting a Ctrl+C to kill the program. So the program is 
>> terminating, and would the data get inserted successfully into my Cassandra 
>> table?
>> For example, 
>> 
>> in Terminal-A I'm running Kafka-producer to stream-in messages. 
>> 
>> Terminal-B I'm running my Streaming App. In my App there is a line 
>> jssc.awaitTermination();​ which will keep running my App till I kill it.
>> Eventually I am hitting Ctrl+C in my App terminal, i.e. Terminal-B and 
>> killing it. So its a kind of ungraceful termination. So in this case will 
>> the data in my App DStream get written into Cassandra?
>> 
>> 
>> 
>> Thanks and Regards,
>> 
>> Md. Aiman Sarosh.
>> Accenture Services Pvt. Ltd.
>> Mob #:  (+91) - 9836112841.
>>  
>> From: Gerard Maas <gerard.m...@gmail.com>
>> Sent: Thursday, December 4, 2014 10:22 PM
>> To: Akhil Das
>> Cc: Sarosh, M.; user@spark.apache.org
>> Subject: Re: Spark-Streaming: output to cassandra
>>  
>> I guess he's already doing so, given the 'saveToCassandra' usage.  
>> What I don't understand is the question "how do I specify a batch". That 
>> doesn't make much sense to me. Could you explain further?
>> 
>> -kr, Gerard.
>> 
>>> On Thu, Dec 4, 2014 at 5:36 PM, Akhil Das <ak...@sigmoidanalytics.com> 
>>> wrote:
>>> You can use the datastax's  Cassandra connector.
>>> 
>>> Thanks
>>> Best Regards
>>> 
>>>> On Thu, Dec 4, 2014 at 8:21 PM, <m.sar...@accenture.com> wrote:
>>>> Hi,
>>>> 
>>>> I have written the code below which is streaming data from kafka, and 
>>>> printing to the console.
>>>> I want to extend this, and want my data to go into Cassandra table instead.
>>>> 
>>>> JavaStreamingContext jssc = new JavaStreamingContext("local[4]", 
>>>> "SparkStream", new Duration(1000));
>>>> JavaPairReceiverInputDStream<String, String> messages = 
>>>> KafkaUtils.createStream(jssc, args[0], args[1], topicMap );
>>>> 
>>>> System.out.println("Connection done!");
>>>> JavaDStream<String> data = messages.map(new Function<Tuple2<String, 
>>>> String>, String>() 
>>>> {
>>>> public String call(Tuple2<String, String> message)
>>>> {
>>>> return message._2();
>>>> }
>>>> }
>>>> );
>>>> //data.print();   --> output to console
>>>> data.foreachRDD(saveToCassandra("mykeyspace","mytable"));
>>>> jssc.start();
>>>> jssc.awaitTermination();
>>>> 
>>>> 
>>>> How should I implement the line:
>>>> data.foreachRDD(saveToCassandra("mykeyspace","mytable"));​
>>>> so that data goes into Cassandra, in each batch.  And how do I specify a 
>>>> batch, because if i do Ctrl+C on the console of streaming-job-jar, nothing 
>>>> will be entered into cassandra for sure since it is getting killed.
>>>> 
>>>> Please help.
>>>> 
>>>> Thanks and Regards,
>>>> 
>>>> Md. Aiman Sarosh.
>>>> Accenture Services Pvt. Ltd.
>>>> Mob #:  (+91) - 9836112841.
>>>> 
>>>> 
>>>> This message is for the designated recipient only and may contain 
>>>> privileged, proprietary, or otherwise confidential information. If you 
>>>> have received it in error, please notify the sender immediately and delete 
>>>> the original. Any other use of the e-mail by you is prohibited. Where 
>>>> allowed by local law, electronic communications with Accenture and its 
>>>> affiliates, including e-mail and instant messaging (including content), 
>>>> may be scanned by our systems for the purposes of information security and 
>>>> assessment of internal compliance with Accenture policy. 
>>>> ______________________________________________________________________________________
>>>> 
>>>> www.accenture.com
> 

Reply via email to