Hi all,

I am currently trying to save to Cassandra after some Spark Streaming
computation.

I call a myDStream.foreachRDD so that I can collect each RDD in the driver
app runtime and inside I do something like this:
myDStream.foreachRDD(rdd => {

var someCol = Seq[MyType]()

foreach(kv =>{
  someCol :+ rdd._2 //I only want the RDD value and not the key
 }
val collectionRDD = sc.parallelize(someCol) //THIS IS WHY IT FAILS TRYING TO
RUN THE WORKER
collectionRDD.saveToCassandra(...)
}

I get the NotSerializableException while trying to run the Node (also tried
someCol as shared variable).
I believe this happens because the myDStream doesn't exist yet when the code
is pushed to the Node so the parallelize doens't have any structure to
relate to it. Inside this foreachRDD I should only do RDD calls which are
only related to other RDDs. I guess this was just a desperate attempt....

So I have a question
Using the Cassandra Spark driver - Can we only write to Cassandra from an
RDD? In my case I only want to write once all the computation is finished in
a single batch on the driver app.

tnks in advance.

Rod











--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Cassandra-driver-Spark-question-tp9177.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to