Re: store spark streaming dstream in hdfs or cassandra

2014-07-31 Thread Gerard Maas
To read/write  from/to Cassandra I recommend you to use the Spark-Cassandra
connector at [1]
Using it, saving a Spark Streaming RDD to Cassandra is fairly easy:

sparkConfig.set(CassandraConnectionHost, cassandraHost)
val sc = new SparkContext(sparkConfig)
val ssc = new StreamingContext(sc, Seconds(x))
...
stream.foreachRDD{ rdd => {rdd.saveToCassandra(keyspace, table); ()}}
...

The most recent version has additional support for creating the
streamingContext with the cassandra config, effectively merging the 2nd and
3rd lines above.

-kr, Gerard.

[1] https://github.com/datastax/spark-cassandra-connector/


On Thu, Jul 31, 2014 at 9:12 PM, Hari Shreedharan  wrote:

> Off the top of my head, you can use the ForEachDStream to which you pass
> in the code that writes to Hadoop, and then register that as an output
> stream, so the function you pass in is periodically executed and causes the
> data to be written to HDFS. If you are ok with the data being in text
> format - simply use saveAsTextFiles method in the RDD class.
>
>
>
>
> salemi wrote:
>
>
> Hi,
>
> I was wondering what is the best way to store off dstreams in hdfs or
> casandra.
> Could somebody provide an example?
>
> Thanks,
> Ali
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/store-spark-streaming-dstream-in-hdfs-or-cassandra-tp11064.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>


Re: store spark streaming dstream in hdfs or cassandra

2014-07-31 Thread Hari Shreedharan
Off the top of my head, you can use the ForEachDStream to which you pass 
in the code that writes to Hadoop, and then register that as an output 
stream, so the function you pass in is periodically executed and causes 
the data to be written to HDFS. If you are ok with the data being in 
text format - simply use saveAsTextFiles method in the RDD class.




salemi wrote:


Hi,

I was wondering what is the best way to store off dstreams in hdfs or
casandra.
Could somebody provide an example?

Thanks,
Ali



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/store-spark-streaming-dstream-in-hdfs-or-cassandra-tp11064.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.


store spark streaming dstream in hdfs or cassandra

2014-07-31 Thread salemi
Hi,

I was wondering what is the best way to store off dstreams in hdfs or
casandra. 
Could somebody provide an example?

Thanks,
Ali



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/store-spark-streaming-dstream-in-hdfs-or-cassandra-tp11064.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.