Re: Datastore HDFS vs Cassandra
One additional comment I would make is that you should be careful with Updates in Cassandra, it does support them but large amounts of Updates (i.e changing existing keys) tends to cause fragmentation. If you are (mostly) adding new keys (e.g new records in the the time series) then Cassandra can be excellent cheers On Wed, Feb 11, 2015 at 6:13 PM, Paolo Platter paolo.plat...@agilelab.it wrote: Hi Mike, I developed a Solution with cassandra and spark, using DSE. The main difficult is about cassandra, you need to understand very well its data model and its Query patterns. Cassandra has better performance than hdfs and it has DR and stronger availability. Hdfs is a filesystem, cassandra is a dbms. Cassandra supports full CRUD without acid. Hdfs is more flexible than cassandra. In my opinion, if you have a real time series, go with Cassandra paying attention at your reporting data access patterns. Paolo Inviata dal mio Windows Phone -- Da: Mike Trienis mike.trie...@orcsol.com Inviato: 11/02/2015 05:59 A: user@spark.apache.org Oggetto: Datastore HDFS vs Cassandra Hi, I am considering implement Apache Spark on top of Cassandra database after listing to related talk and reading through the slides from DataStax. It seems to fit well with our time-series data and reporting requirements. http://www.slideshare.net/patrickmcfadin/apache-cassandra-apache-spark-for-time-series-data Does anyone have any experiences using Apache Spark and Cassandra, including limitations (and or) technical difficulties? How does Cassandra compare with HDFS and what use cases would make HDFS more suitable? Thanks, Mike. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Datastore-HDFS-vs-Cassandra-tp21590.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- *Franc Carter* | Systems Architect | Rozetta Technology franc.car...@rozettatech.com franc.car...@rozettatech.com| www.rozettatechnology.com Tel: +61 2 8355 2515 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 AUSTRALIA
Re: Datastore HDFS vs Cassandra
I forgot to mention that if you do decide to use Cassandra I'd highly recommend jumping on the Cassandra mailing list, if we had taken in come of the advice on that list things would have been considerably smoother cheers On Wed, Feb 11, 2015 at 8:12 PM, Christian Betz christian.b...@performance-media.de wrote: Hi Regarding the Cassandra Data model, there's an excellent post on the ebay tech blog: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/. There's also a slideshare for this somewhere. Happy hacking Chris Von: Franc Carter franc.car...@rozettatech.com Datum: Mittwoch, 11. Februar 2015 10:03 An: Paolo Platter paolo.plat...@agilelab.it Cc: Mike Trienis mike.trie...@orcsol.com, user@spark.apache.org user@spark.apache.org Betreff: Re: Datastore HDFS vs Cassandra One additional comment I would make is that you should be careful with Updates in Cassandra, it does support them but large amounts of Updates (i.e changing existing keys) tends to cause fragmentation. If you are (mostly) adding new keys (e.g new records in the the time series) then Cassandra can be excellent cheers On Wed, Feb 11, 2015 at 6:13 PM, Paolo Platter paolo.plat...@agilelab.it wrote: Hi Mike, I developed a Solution with cassandra and spark, using DSE. The main difficult is about cassandra, you need to understand very well its data model and its Query patterns. Cassandra has better performance than hdfs and it has DR and stronger availability. Hdfs is a filesystem, cassandra is a dbms. Cassandra supports full CRUD without acid. Hdfs is more flexible than cassandra. In my opinion, if you have a real time series, go with Cassandra paying attention at your reporting data access patterns. Paolo Inviata dal mio Windows Phone -- Da: Mike Trienis mike.trie...@orcsol.com Inviato: ?11/?02/?2015 05:59 A: user@spark.apache.org Oggetto: Datastore HDFS vs Cassandra Hi, I am considering implement Apache Spark on top of Cassandra database after listing to related talk and reading through the slides from DataStax. It seems to fit well with our time-series data and reporting requirements. http://www.slideshare.net/patrickmcfadin/apache-cassandra-apache-spark-for-time-series-data Does anyone have any experiences using Apache Spark and Cassandra, including limitations (and or) technical difficulties? How does Cassandra compare with HDFS and what use cases would make HDFS more suitable? Thanks, Mike. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Datastore-HDFS-vs-Cassandra-tp21590.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- *Franc Carter* | Systems Architect | Rozetta Technology franc.car...@rozettatech.com franc.car...@rozettatech.com| www.rozettatechnology.com Tel: +61 2 8355 2515 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 AUSTRALIA -- *Franc Carter* | Systems Architect | Rozetta Technology franc.car...@rozettatech.com franc.car...@rozettatech.com| www.rozettatechnology.com Tel: +61 2 8355 2515 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 AUSTRALIA
Re: Datastore HDFS vs Cassandra
Hi Regarding the Cassandra Data model, there's an excellent post on the ebay tech blog: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/. There's also a slideshare for this somewhere. Happy hacking Chris Von: Franc Carter franc.car...@rozettatech.commailto:franc.car...@rozettatech.com Datum: Mittwoch, 11. Februar 2015 10:03 An: Paolo Platter paolo.plat...@agilelab.itmailto:paolo.plat...@agilelab.it Cc: Mike Trienis mike.trie...@orcsol.commailto:mike.trie...@orcsol.com, user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Betreff: Re: Datastore HDFS vs Cassandra One additional comment I would make is that you should be careful with Updates in Cassandra, it does support them but large amounts of Updates (i.e changing existing keys) tends to cause fragmentation. If you are (mostly) adding new keys (e.g new records in the the time series) then Cassandra can be excellent cheers On Wed, Feb 11, 2015 at 6:13 PM, Paolo Platter paolo.plat...@agilelab.itmailto:paolo.plat...@agilelab.it wrote: Hi Mike, I developed a Solution with cassandra and spark, using DSE. The main difficult is about cassandra, you need to understand very well its data model and its Query patterns. Cassandra has better performance than hdfs and it has DR and stronger availability. Hdfs is a filesystem, cassandra is a dbms. Cassandra supports full CRUD without acid. Hdfs is more flexible than cassandra. In my opinion, if you have a real time series, go with Cassandra paying attention at your reporting data access patterns. Paolo Inviata dal mio Windows Phone Da: Mike Trienismailto:mike.trie...@orcsol.com Inviato: ?11/?02/?2015 05:59 A: user@spark.apache.orgmailto:user@spark.apache.org Oggetto: Datastore HDFS vs Cassandra Hi, I am considering implement Apache Spark on top of Cassandra database after listing to related talk and reading through the slides from DataStax. It seems to fit well with our time-series data and reporting requirements. http://www.slideshare.net/patrickmcfadin/apache-cassandra-apache-spark-for-time-series-data Does anyone have any experiences using Apache Spark and Cassandra, including limitations (and or) technical difficulties? How does Cassandra compare with HDFS and what use cases would make HDFS more suitable? Thanks, Mike. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Datastore-HDFS-vs-Cassandra-tp21590.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org -- Franc Carter | Systems Architect | Rozetta Technology franc.car...@rozettatech.com mailto:franc.car...@rozettatech.com | www.rozettatechnology.comhttp://www.rozettatechnology.com/ Tel: +61 2 8355 2515 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 AUSTRALIA
Re: Datastore HDFS vs Cassandra
Thanks everyone for your responses. I'll definitely think carefully about the data models, querying patterns and fragmentation side-effects. Cheers, Mike. On Wed, Feb 11, 2015 at 1:14 AM, Franc Carter franc.car...@rozettatech.com wrote: I forgot to mention that if you do decide to use Cassandra I'd highly recommend jumping on the Cassandra mailing list, if we had taken in come of the advice on that list things would have been considerably smoother cheers On Wed, Feb 11, 2015 at 8:12 PM, Christian Betz christian.b...@performance-media.de wrote: Hi Regarding the Cassandra Data model, there's an excellent post on the ebay tech blog: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/. There's also a slideshare for this somewhere. Happy hacking Chris Von: Franc Carter franc.car...@rozettatech.com Datum: Mittwoch, 11. Februar 2015 10:03 An: Paolo Platter paolo.plat...@agilelab.it Cc: Mike Trienis mike.trie...@orcsol.com, user@spark.apache.org user@spark.apache.org Betreff: Re: Datastore HDFS vs Cassandra One additional comment I would make is that you should be careful with Updates in Cassandra, it does support them but large amounts of Updates (i.e changing existing keys) tends to cause fragmentation. If you are (mostly) adding new keys (e.g new records in the the time series) then Cassandra can be excellent cheers On Wed, Feb 11, 2015 at 6:13 PM, Paolo Platter paolo.plat...@agilelab.it wrote: Hi Mike, I developed a Solution with cassandra and spark, using DSE. The main difficult is about cassandra, you need to understand very well its data model and its Query patterns. Cassandra has better performance than hdfs and it has DR and stronger availability. Hdfs is a filesystem, cassandra is a dbms. Cassandra supports full CRUD without acid. Hdfs is more flexible than cassandra. In my opinion, if you have a real time series, go with Cassandra paying attention at your reporting data access patterns. Paolo Inviata dal mio Windows Phone -- Da: Mike Trienis mike.trie...@orcsol.com Inviato: ?11/?02/?2015 05:59 A: user@spark.apache.org Oggetto: Datastore HDFS vs Cassandra Hi, I am considering implement Apache Spark on top of Cassandra database after listing to related talk and reading through the slides from DataStax. It seems to fit well with our time-series data and reporting requirements. http://www.slideshare.net/patrickmcfadin/apache-cassandra-apache-spark-for-time-series-data Does anyone have any experiences using Apache Spark and Cassandra, including limitations (and or) technical difficulties? How does Cassandra compare with HDFS and what use cases would make HDFS more suitable? Thanks, Mike. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Datastore-HDFS-vs-Cassandra-tp21590.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- *Franc Carter* | Systems Architect | Rozetta Technology franc.car...@rozettatech.com franc.car...@rozettatech.com| www.rozettatechnology.com Tel: +61 2 8355 2515 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 AUSTRALIA -- *Franc Carter* | Systems Architect | Rozetta Technology franc.car...@rozettatech.com franc.car...@rozettatech.com| www.rozettatechnology.com Tel: +61 2 8355 2515 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 AUSTRALIA
R: Datastore HDFS vs Cassandra
Hi Mike, I developed a Solution with cassandra and spark, using DSE. The main difficult is about cassandra, you need to understand very well its data model and its Query patterns. Cassandra has better performance than hdfs and it has DR and stronger availability. Hdfs is a filesystem, cassandra is a dbms. Cassandra supports full CRUD without acid. Hdfs is more flexible than cassandra. In my opinion, if you have a real time series, go with Cassandra paying attention at your reporting data access patterns. Paolo Inviata dal mio Windows Phone Da: Mike Trienismailto:mike.trie...@orcsol.com Inviato: 11/02/2015 05:59 A: user@spark.apache.orgmailto:user@spark.apache.org Oggetto: Datastore HDFS vs Cassandra Hi, I am considering implement Apache Spark on top of Cassandra database after listing to related talk and reading through the slides from DataStax. It seems to fit well with our time-series data and reporting requirements. http://www.slideshare.net/patrickmcfadin/apache-cassandra-apache-spark-for-time-series-data Does anyone have any experiences using Apache Spark and Cassandra, including limitations (and or) technical difficulties? How does Cassandra compare with HDFS and what use cases would make HDFS more suitable? Thanks, Mike. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Datastore-HDFS-vs-Cassandra-tp21590.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org