Hey all I tried spark connector with Cassandra and I ran into a problem that I was blocked on for couple of weeks. I managed to find a solution to the problem but I am not sure whether it was a bug of the connector/spark or not.
I had three tables in Cassandra (Running Cassandra on 5 node cluster) and a large Spark cluster (5 worker node with each having 32 cores and 240G Memory). When I ran my job which extracts data from S3 and writes to 3 tables in Cassandra using around 1TB of memory and 160 cores, sometimes my job get stuck at last few task of a stage... After playing around for a while I realised that reducing number of cores to 2 per machine (10 Total) made the job stable. I gradually increased the number of cores and it hanged again once I had about 50 cores total. I would like to know if anyone else experienced this and if this is explainable? On another note I would like to know if people seeing good performance reading from cassandra using spark as oppose to reading data from HDFS. Kind of an open question but would like to see how others are using it. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Cassandra-Connector-Issue-and-performance-tp15005.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org