Tnks to both for the comments and the debugging suggestion, I will try to use.
Regarding you comment, yes I do agree the current solution was not efficient but for using the saveToCassandra method I need an RDD thus the paralelize method. I finally got direct by Piotr to use the CassandraConnect and got this fixed in the meantime. Bottom line is I started using the new Cassandra Spark driver with async calls, prepared statements and batch executions on the node transformation and performance improved greatly. tnks, Rod -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cassandra-driver-Spark-question-tp9177p9990.html Sent from the Apache Spark User List mailing list archive at Nabble.com.