Re: Spark streaming alerting

2015-03-24 Thread Helena Edelson
Streaming _from_ cassandra, CassandraInputDStream, is coming BTW 
https://issues.apache.org/jira/browse/SPARK-6283 
https://issues.apache.org/jira/browse/SPARK-6283
I am working on it now.

Helena
@helenaedelson

 On Mar 23, 2015, at 5:22 AM, Khanderao Kand Gmail khanderao.k...@gmail.com 
 wrote:
 
 Akhil 
 
 You are right in tour answer to what Mohit wrote. However what Mohit seems to 
 be alluring but did not write properly might be different.
 
 Mohit
 
 You are wrong in saying generally streaming works in HDFS and cassandra . 
 Streaming typically works with streaming or queing source like Kafka, 
 kinesis, Twitter, flume, zeroMQ, etc (but can also from HDFS and S3 ) However 
 , streaming context ( receiver wishing the streaming context ) gets 
 events/messages/records and forms a time window based batch (RDD)- 
 
 So there is a maximum gap of window time from alert message was available to 
 spark and when the processing happens. I think you meant about this. 
 
 As per spark programming model, RDD is the right way to deal with data.  If 
 you are fine with the minimum delay of say a sec (based on min time window 
 that dstreaming can support) then what Rohit gave is a right model. 
 
 Khanderao
 
 On Mar 22, 2015, at 11:39 PM, Akhil Das ak...@sigmoidanalytics.com 
 mailto:ak...@sigmoidanalytics.com wrote:
 
 What do you mean you can't send it directly from spark workers? Here's a 
 simple approach which you could do:
 
 val data = ssc.textFileStream(sigmoid/)
 val dist = data.filter(_.contains(ERROR)).foreachRDD(rdd = 
 alert(Errors : + rdd.count()))
 
 And the alert() function could be anything triggering an email or sending an 
 SMS alert.
 
 Thanks
 Best Regards
 
 On Sun, Mar 22, 2015 at 1:52 AM, Mohit Anchlia mohitanch...@gmail.com 
 mailto:mohitanch...@gmail.com wrote:
 Is there a module in spark streaming that lets you listen to the 
 alerts/conditions as they happen in the streaming module? Generally spark 
 streaming components will execute on large set of clusters like hdfs or 
 Cassandra, however when it comes to alerting you generally can't send it 
 directly from the spark workers, which means you need a way to listen to the 
 alerts.
 



Re: Spark streaming alerting

2015-03-24 Thread Anwar Rizal
Helena,

The CassandraInputDStream sounds interesting. I dont find many things in
the jira though. Do you have more details on what it tries to achieve ?

Thanks,
Anwar.

On Tue, Mar 24, 2015 at 2:39 PM, Helena Edelson helena.edel...@datastax.com
 wrote:

 Streaming _from_ cassandra, CassandraInputDStream, is coming BTW
 https://issues.apache.org/jira/browse/SPARK-6283
 I am working on it now.

 Helena
 @helenaedelson

 On Mar 23, 2015, at 5:22 AM, Khanderao Kand Gmail 
 khanderao.k...@gmail.com wrote:

 Akhil

 You are right in tour answer to what Mohit wrote. However what Mohit seems
 to be alluring but did not write properly might be different.

 Mohit

 You are wrong in saying generally streaming works in HDFS and cassandra
 . Streaming typically works with streaming or queing source like Kafka,
 kinesis, Twitter, flume, zeroMQ, etc (but can also from HDFS and S3 )
 However , streaming context ( receiver wishing the streaming context )
 gets events/messages/records and forms a time window based batch (RDD)-

 So there is a maximum gap of window time from alert message was available
 to spark and when the processing happens. I think you meant about this.

 As per spark programming model, RDD is the right way to deal with data.
 If you are fine with the minimum delay of say a sec (based on min time
 window that dstreaming can support) then what Rohit gave is a right model.

 Khanderao

 On Mar 22, 2015, at 11:39 PM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 What do you mean you can't send it directly from spark workers? Here's a
 simple approach which you could do:

 val data = ssc.textFileStream(sigmoid/)
 val dist = data.filter(_.contains(ERROR)).foreachRDD(rdd =
 alert(Errors : + rdd.count()))

 And the alert() function could be anything triggering an email or sending
 an SMS alert.

 Thanks
 Best Regards

 On Sun, Mar 22, 2015 at 1:52 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 Is there a module in spark streaming that lets you listen to
 the alerts/conditions as they happen in the streaming module? Generally
 spark streaming components will execute on large set of clusters like hdfs
 or Cassandra, however when it comes to alerting you generally can't send it
 directly from the spark workers, which means you need a way to listen to
 the alerts.






Re: Spark streaming alerting

2015-03-24 Thread Helena Edelson
I created a jira ticket for my work in both the spark and 
spark-cassandra-connector JIRAs, I don’t know why you can not see them.
Users can stream from any cassandra table, just as one can stream from a Kafka 
topic; same principle. 

Helena
@helenaedelson

 On Mar 24, 2015, at 11:29 AM, Anwar Rizal anriza...@gmail.com wrote:
 
 Helena,
 
 The CassandraInputDStream sounds interesting. I dont find many things in the 
 jira though. Do you have more details on what it tries to achieve ?
 
 Thanks,
 Anwar.
 
 On Tue, Mar 24, 2015 at 2:39 PM, Helena Edelson helena.edel...@datastax.com 
 mailto:helena.edel...@datastax.com wrote:
 Streaming _from_ cassandra, CassandraInputDStream, is coming BTW 
 https://issues.apache.org/jira/browse/SPARK-6283 
 https://issues.apache.org/jira/browse/SPARK-6283
 I am working on it now.
 
 Helena
 @helenaedelson
 
 On Mar 23, 2015, at 5:22 AM, Khanderao Kand Gmail khanderao.k...@gmail.com 
 mailto:khanderao.k...@gmail.com wrote:
 
 Akhil 
 
 You are right in tour answer to what Mohit wrote. However what Mohit seems 
 to be alluring but did not write properly might be different.
 
 Mohit
 
 You are wrong in saying generally streaming works in HDFS and cassandra . 
 Streaming typically works with streaming or queing source like Kafka, 
 kinesis, Twitter, flume, zeroMQ, etc (but can also from HDFS and S3 ) 
 However , streaming context ( receiver wishing the streaming context ) 
 gets events/messages/records and forms a time window based batch (RDD)- 
 
 So there is a maximum gap of window time from alert message was available to 
 spark and when the processing happens. I think you meant about this. 
 
 As per spark programming model, RDD is the right way to deal with data.  If 
 you are fine with the minimum delay of say a sec (based on min time window 
 that dstreaming can support) then what Rohit gave is a right model. 
 
 Khanderao
 
 On Mar 22, 2015, at 11:39 PM, Akhil Das ak...@sigmoidanalytics.com 
 mailto:ak...@sigmoidanalytics.com wrote:
 
 What do you mean you can't send it directly from spark workers? Here's a 
 simple approach which you could do:
 
 val data = ssc.textFileStream(sigmoid/)
 val dist = data.filter(_.contains(ERROR)).foreachRDD(rdd = 
 alert(Errors : + rdd.count()))
 
 And the alert() function could be anything triggering an email or sending 
 an SMS alert.
 
 Thanks
 Best Regards
 
 On Sun, Mar 22, 2015 at 1:52 AM, Mohit Anchlia mohitanch...@gmail.com 
 mailto:mohitanch...@gmail.com wrote:
 Is there a module in spark streaming that lets you listen to the 
 alerts/conditions as they happen in the streaming module? Generally spark 
 streaming components will execute on large set of clusters like hdfs or 
 Cassandra, however when it comes to alerting you generally can't send it 
 directly from the spark workers, which means you need a way to listen to 
 the alerts.
 
 
 



Re: Spark streaming alerting

2015-03-23 Thread Mohit Anchlia
I think I didn't explain myself properly :) What I meant to say was that
generally spark worker runs on either on HDFS's data nodes or on Cassandra
nodes, which typically is in a private network (protected). When a
condition is matched it's difficult to send out the alerts directly from
the worker nodes because of the security concerns. I was wondering if there
is a way to listen on the events as they occur on the sliding window scale
or is the best way to accomplish is to post back to a queue?

On Mon, Mar 23, 2015 at 2:22 AM, Khanderao Kand Gmail 
khanderao.k...@gmail.com wrote:

 Akhil

 You are right in tour answer to what Mohit wrote. However what Mohit seems
 to be alluring but did not write properly might be different.

 Mohit

 You are wrong in saying generally streaming works in HDFS and cassandra
 . Streaming typically works with streaming or queing source like Kafka,
 kinesis, Twitter, flume, zeroMQ, etc (but can also from HDFS and S3 )
 However , streaming context ( receiver wishing the streaming context )
 gets events/messages/records and forms a time window based batch (RDD)-

 So there is a maximum gap of window time from alert message was available
 to spark and when the processing happens. I think you meant about this.

 As per spark programming model, RDD is the right way to deal with data.
 If you are fine with the minimum delay of say a sec (based on min time
 window that dstreaming can support) then what Rohit gave is a right model.

 Khanderao

 On Mar 22, 2015, at 11:39 PM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 What do you mean you can't send it directly from spark workers? Here's a
 simple approach which you could do:

 val data = ssc.textFileStream(sigmoid/)
 val dist = data.filter(_.contains(ERROR)).foreachRDD(rdd =
 alert(Errors : + rdd.count()))

 And the alert() function could be anything triggering an email or sending
 an SMS alert.

 Thanks
 Best Regards

 On Sun, Mar 22, 2015 at 1:52 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 Is there a module in spark streaming that lets you listen to
 the alerts/conditions as they happen in the streaming module? Generally
 spark streaming components will execute on large set of clusters like hdfs
 or Cassandra, however when it comes to alerting you generally can't send it
 directly from the spark workers, which means you need a way to listen to
 the alerts.





Re: Spark streaming alerting

2015-03-23 Thread Jeffrey Jedele
What exactly do you mean by alerts?

Something specific to your data or general events of the spark cluster? For
the first, sth like Akhil suggested should work. For the latter, I would
suggest having a log consolidation system like logstash in place and use
this to generate alerts.

Regards,
Jeff

2015-03-23 7:39 GMT+01:00 Akhil Das ak...@sigmoidanalytics.com:

 What do you mean you can't send it directly from spark workers? Here's a
 simple approach which you could do:

 val data = ssc.textFileStream(sigmoid/)
 val dist = data.filter(_.contains(ERROR)).foreachRDD(rdd =
 alert(Errors : + rdd.count()))

 And the alert() function could be anything triggering an email or sending
 an SMS alert.

 Thanks
 Best Regards

 On Sun, Mar 22, 2015 at 1:52 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 Is there a module in spark streaming that lets you listen to
 the alerts/conditions as they happen in the streaming module? Generally
 spark streaming components will execute on large set of clusters like hdfs
 or Cassandra, however when it comes to alerting you generally can't send it
 directly from the spark workers, which means you need a way to listen to
 the alerts.





Re: Spark streaming alerting

2015-03-23 Thread Akhil Das
What do you mean you can't send it directly from spark workers? Here's a
simple approach which you could do:

val data = ssc.textFileStream(sigmoid/)
val dist = data.filter(_.contains(ERROR)).foreachRDD(rdd =
alert(Errors : + rdd.count()))

And the alert() function could be anything triggering an email or sending
an SMS alert.

Thanks
Best Regards

On Sun, Mar 22, 2015 at 1:52 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 Is there a module in spark streaming that lets you listen to
 the alerts/conditions as they happen in the streaming module? Generally
 spark streaming components will execute on large set of clusters like hdfs
 or Cassandra, however when it comes to alerting you generally can't send it
 directly from the spark workers, which means you need a way to listen to
 the alerts.



Re: Spark streaming alerting

2015-03-23 Thread Khanderao Kand Gmail
Akhil 

You are right in tour answer to what Mohit wrote. However what Mohit seems to 
be alluring but did not write properly might be different.

Mohit

You are wrong in saying generally streaming works in HDFS and cassandra . 
Streaming typically works with streaming or queing source like Kafka, kinesis, 
Twitter, flume, zeroMQ, etc (but can also from HDFS and S3 ) However , 
streaming context ( receiver wishing the streaming context ) gets 
events/messages/records and forms a time window based batch (RDD)- 

So there is a maximum gap of window time from alert message was available to 
spark and when the processing happens. I think you meant about this. 

As per spark programming model, RDD is the right way to deal with data.  If you 
are fine with the minimum delay of say a sec (based on min time window that 
dstreaming can support) then what Rohit gave is a right model. 

Khanderao

 On Mar 22, 2015, at 11:39 PM, Akhil Das ak...@sigmoidanalytics.com wrote:
 
 What do you mean you can't send it directly from spark workers? Here's a 
 simple approach which you could do:
 
 val data = ssc.textFileStream(sigmoid/)
 val dist = data.filter(_.contains(ERROR)).foreachRDD(rdd = 
 alert(Errors : + rdd.count()))
 
 And the alert() function could be anything triggering an email or sending an 
 SMS alert.
 
 Thanks
 Best Regards
 
 On Sun, Mar 22, 2015 at 1:52 AM, Mohit Anchlia mohitanch...@gmail.com 
 wrote:
 Is there a module in spark streaming that lets you listen to the 
 alerts/conditions as they happen in the streaming module? Generally spark 
 streaming components will execute on large set of clusters like hdfs or 
 Cassandra, however when it comes to alerting you generally can't send it 
 directly from the spark workers, which means you need a way to listen to the 
 alerts.