Hi Mike, I developed a Solution with cassandra and spark, using DSE. The main difficult is about cassandra, you need to understand very well its data model and its Query patterns. Cassandra has better performance than hdfs and it has DR and stronger availability. Hdfs is a filesystem, cassandra is a dbms. Cassandra supports full CRUD without acid. Hdfs is more flexible than cassandra.
In my opinion, if you have a real time series, go with Cassandra paying attention at your reporting data access patterns. Paolo Inviata dal mio Windows Phone ________________________________ Da: Mike Trienis<mailto:mike.trie...@orcsol.com> Inviato: 11/02/2015 05:59 A: user@spark.apache.org<mailto:user@spark.apache.org> Oggetto: Datastore HDFS vs Cassandra Hi, I am considering implement Apache Spark on top of Cassandra database after listing to related talk and reading through the slides from DataStax. It seems to fit well with our time-series data and reporting requirements. http://www.slideshare.net/patrickmcfadin/apache-cassandra-apache-spark-for-time-series-data Does anyone have any experiences using Apache Spark and Cassandra, including limitations (and or) technical difficulties? How does Cassandra compare with HDFS and what use cases would make HDFS more suitable? Thanks, Mike. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Datastore-HDFS-vs-Cassandra-tp21590.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org