Streaming _from_ cassandra, CassandraInputDStream, is coming BTW
https://issues.apache.org/jira/browse/SPARK-6283
https://issues.apache.org/jira/browse/SPARK-6283
I am working on it now.
Helena
@helenaedelson
On Mar 23, 2015, at 5:22 AM, Khanderao Kand Gmail khanderao.k...@gmail.com
wrote:
Helena,
The CassandraInputDStream sounds interesting. I dont find many things in
the jira though. Do you have more details on what it tries to achieve ?
Thanks,
Anwar.
On Tue, Mar 24, 2015 at 2:39 PM, Helena Edelson helena.edel...@datastax.com
wrote:
Streaming _from_ cassandra,
I created a jira ticket for my work in both the spark and
spark-cassandra-connector JIRAs, I don’t know why you can not see them.
Users can stream from any cassandra table, just as one can stream from a Kafka
topic; same principle.
Helena
@helenaedelson
On Mar 24, 2015, at 11:29 AM, Anwar
I think I didn't explain myself properly :) What I meant to say was that
generally spark worker runs on either on HDFS's data nodes or on Cassandra
nodes, which typically is in a private network (protected). When a
condition is matched it's difficult to send out the alerts directly from
the worker
What exactly do you mean by alerts?
Something specific to your data or general events of the spark cluster? For
the first, sth like Akhil suggested should work. For the latter, I would
suggest having a log consolidation system like logstash in place and use
this to generate alerts.
Regards,
Jeff
What do you mean you can't send it directly from spark workers? Here's a
simple approach which you could do:
val data = ssc.textFileStream(sigmoid/)
val dist = data.filter(_.contains(ERROR)).foreachRDD(rdd =
alert(Errors : + rdd.count()))
And the alert() function could be anything
Akhil
You are right in tour answer to what Mohit wrote. However what Mohit seems to
be alluring but did not write properly might be different.
Mohit
You are wrong in saying generally streaming works in HDFS and cassandra .
Streaming typically works with streaming or queing source like Kafka,
Is there a module in spark streaming that lets you listen to
the alerts/conditions as they happen in the streaming module? Generally
spark streaming components will execute on large set of clusters like hdfs
or Cassandra, however when it comes to alerting you generally can't send it
directly from