Hi

Regarding the Cassandra Data model, there's an excellent post on the ebay tech 
blog: 
http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/.
 There's also a slideshare for this somewhere.

Happy hacking

Chris

Von: Franc Carter 
<franc.car...@rozettatech.com<mailto:franc.car...@rozettatech.com>>
Datum: Mittwoch, 11. Februar 2015 10:03
An: Paolo Platter <paolo.plat...@agilelab.it<mailto:paolo.plat...@agilelab.it>>
Cc: Mike Trienis <mike.trie...@orcsol.com<mailto:mike.trie...@orcsol.com>>, 
"user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Betreff: Re: Datastore HDFS vs Cassandra


One additional comment I would make is that you should be careful with Updates 
in Cassandra, it does support them but large amounts of Updates (i.e changing 
existing keys) tends to cause fragmentation. If you are (mostly) adding new 
keys (e.g new records in the the time series) then Cassandra can be excellent

cheers


On Wed, Feb 11, 2015 at 6:13 PM, Paolo Platter 
<paolo.plat...@agilelab.it<mailto:paolo.plat...@agilelab.it>> wrote:
Hi Mike,

I developed a Solution with cassandra and spark, using DSE.
The main difficult is about cassandra, you need to understand very well its 
data model and its Query patterns.
Cassandra has better performance than hdfs and it has DR and stronger 
availability.
Hdfs is a filesystem, cassandra is a dbms.
Cassandra supports full CRUD without acid.
Hdfs is more flexible than cassandra.

In my opinion, if you have a real time series, go with Cassandra paying 
attention at your reporting data access patterns.

Paolo

Inviata dal mio Windows Phone
________________________________
Da: Mike Trienis<mailto:mike.trie...@orcsol.com>
Inviato: ?11/?02/?2015 05:59
A: user@spark.apache.org<mailto:user@spark.apache.org>
Oggetto: Datastore HDFS vs Cassandra

Hi,

I am considering implement Apache Spark on top of Cassandra database after
listing to related talk and reading through the slides from DataStax. It
seems to fit well with our time-series data and reporting requirements.

http://www.slideshare.net/patrickmcfadin/apache-cassandra-apache-spark-for-time-series-data

Does anyone have any experiences using Apache Spark and Cassandra, including
limitations (and or) technical difficulties? How does Cassandra compare with
HDFS and what use cases would make HDFS more suitable?

Thanks, Mike.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Datastore-HDFS-vs-Cassandra-tp21590.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>




--

Franc Carter | Systems Architect | Rozetta Technology

franc.car...@rozettatech.com <mailto:franc.car...@rozettatech.com> | 
www.rozettatechnology.com<http://www.rozettatechnology.com/>

Tel: +61 2 8355 2515

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

AUSTRALIA

Reply via email to