Hi all, I am currently trying to save to Cassandra after some Spark Streaming computation.
I call a myDStream.foreachRDD so that I can collect each RDD in the driver app runtime and inside I do something like this: myDStream.foreachRDD(rdd => { var someCol = Seq[MyType]() foreach(kv =>{ someCol :+ rdd._2 //I only want the RDD value and not the key } val collectionRDD = sc.parallelize(someCol) //THIS IS WHY IT FAILS TRYING TO RUN THE WORKER collectionRDD.saveToCassandra(...) } I get the NotSerializableException while trying to run the Node (also tried someCol as shared variable). I believe this happens because the myDStream doesn't exist yet when the code is pushed to the Node so the parallelize doens't have any structure to relate to it. Inside this foreachRDD I should only do RDD calls which are only related to other RDDs. I guess this was just a desperate attempt.... So I have a question Using the Cassandra Spark driver - Can we only write to Cassandra from an RDD? In my case I only want to write once all the computation is finished in a single batch on the driver app. tnks in advance. Rod -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cassandra-driver-Spark-question-tp9177.html Sent from the Apache Spark User List mailing list archive at Nabble.com.