Re: Datastore HDFS vs Cassandra

2015-02-11 Thread Franc Carter
One additional comment I would make is that you should be careful with
Updates in Cassandra, it does support them but large amounts of Updates
(i.e changing existing keys) tends to cause fragmentation. If you are
(mostly) adding new keys (e.g new records in the the time series) then
Cassandra can be excellent

cheers


On Wed, Feb 11, 2015 at 6:13 PM, Paolo Platter paolo.plat...@agilelab.it
wrote:

   Hi Mike,

 I developed a Solution with cassandra and spark, using DSE.
 The main difficult is about cassandra, you need to understand very well
 its data model and its Query patterns.
 Cassandra has better performance than hdfs and it has DR and stronger
 availability.
 Hdfs is a filesystem, cassandra is a dbms.
 Cassandra supports full CRUD without acid.
 Hdfs is more flexible than cassandra.

 In my opinion, if you have a real time series, go with Cassandra paying
 attention at your reporting data access patterns.

 Paolo

 Inviata dal mio Windows Phone
  --
 Da: Mike Trienis mike.trie...@orcsol.com
 Inviato: ‎11/‎02/‎2015 05:59
 A: user@spark.apache.org
 Oggetto: Datastore HDFS vs Cassandra

   Hi,

 I am considering implement Apache Spark on top of Cassandra database after
 listing to related talk and reading through the slides from DataStax. It
 seems to fit well with our time-series data and reporting requirements.


 http://www.slideshare.net/patrickmcfadin/apache-cassandra-apache-spark-for-time-series-data

 Does anyone have any experiences using Apache Spark and Cassandra,
 including
 limitations (and or) technical difficulties? How does Cassandra compare
 with
 HDFS and what use cases would make HDFS more suitable?

 Thanks, Mike.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Datastore-HDFS-vs-Cassandra-tp21590.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




-- 

*Franc Carter* | Systems Architect | Rozetta Technology

franc.car...@rozettatech.com  franc.car...@rozettatech.com|
www.rozettatechnology.com

Tel: +61 2 8355 2515

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

AUSTRALIA


Re: Datastore HDFS vs Cassandra

2015-02-11 Thread Franc Carter
I forgot to mention that if you do decide to use Cassandra I'd highly
recommend jumping on the Cassandra mailing list, if we had taken in come of
the advice on that list things would have been considerably smoother

cheers

On Wed, Feb 11, 2015 at 8:12 PM, Christian Betz 
christian.b...@performance-media.de wrote:

   Hi

  Regarding the Cassandra Data model, there's an excellent post on the
 ebay tech blog:
 http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/.
 There's also a slideshare for this somewhere.

  Happy hacking

  Chris

   Von: Franc Carter franc.car...@rozettatech.com
 Datum: Mittwoch, 11. Februar 2015 10:03
 An: Paolo Platter paolo.plat...@agilelab.it
 Cc: Mike Trienis mike.trie...@orcsol.com, user@spark.apache.org 
 user@spark.apache.org
 Betreff: Re: Datastore HDFS vs Cassandra


 One additional comment I would make is that you should be careful with
 Updates in Cassandra, it does support them but large amounts of Updates
 (i.e changing existing keys) tends to cause fragmentation. If you are
 (mostly) adding new keys (e.g new records in the the time series) then
 Cassandra can be excellent

  cheers


 On Wed, Feb 11, 2015 at 6:13 PM, Paolo Platter paolo.plat...@agilelab.it
 wrote:

   Hi Mike,

 I developed a Solution with cassandra and spark, using DSE.
 The main difficult is about cassandra, you need to understand very well
 its data model and its Query patterns.
 Cassandra has better performance than hdfs and it has DR and stronger
 availability.
 Hdfs is a filesystem, cassandra is a dbms.
 Cassandra supports full CRUD without acid.
 Hdfs is more flexible than cassandra.

 In my opinion, if you have a real time series, go with Cassandra paying
 attention at your reporting data access patterns.

 Paolo

 Inviata dal mio Windows Phone
  --
 Da: Mike Trienis mike.trie...@orcsol.com
 Inviato: ?11/?02/?2015 05:59
 A: user@spark.apache.org
 Oggetto: Datastore HDFS vs Cassandra

   Hi,

 I am considering implement Apache Spark on top of Cassandra database after
 listing to related talk and reading through the slides from DataStax. It
 seems to fit well with our time-series data and reporting requirements.


 http://www.slideshare.net/patrickmcfadin/apache-cassandra-apache-spark-for-time-series-data

 Does anyone have any experiences using Apache Spark and Cassandra,
 including
 limitations (and or) technical difficulties? How does Cassandra compare
 with
 HDFS and what use cases would make HDFS more suitable?

 Thanks, Mike.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Datastore-HDFS-vs-Cassandra-tp21590.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




  --

 *Franc Carter* | Systems Architect | Rozetta Technology

 franc.car...@rozettatech.com  franc.car...@rozettatech.com|
 www.rozettatechnology.com

 Tel: +61 2 8355 2515

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215

 AUSTRALIA




-- 

*Franc Carter* | Systems Architect | Rozetta Technology

franc.car...@rozettatech.com  franc.car...@rozettatech.com|
www.rozettatechnology.com

Tel: +61 2 8355 2515

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

AUSTRALIA


Re: Datastore HDFS vs Cassandra

2015-02-11 Thread Christian Betz
Hi

Regarding the Cassandra Data model, there's an excellent post on the ebay tech 
blog: 
http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/.
 There's also a slideshare for this somewhere.

Happy hacking

Chris

Von: Franc Carter 
franc.car...@rozettatech.commailto:franc.car...@rozettatech.com
Datum: Mittwoch, 11. Februar 2015 10:03
An: Paolo Platter paolo.plat...@agilelab.itmailto:paolo.plat...@agilelab.it
Cc: Mike Trienis mike.trie...@orcsol.commailto:mike.trie...@orcsol.com, 
user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Betreff: Re: Datastore HDFS vs Cassandra


One additional comment I would make is that you should be careful with Updates 
in Cassandra, it does support them but large amounts of Updates (i.e changing 
existing keys) tends to cause fragmentation. If you are (mostly) adding new 
keys (e.g new records in the the time series) then Cassandra can be excellent

cheers


On Wed, Feb 11, 2015 at 6:13 PM, Paolo Platter 
paolo.plat...@agilelab.itmailto:paolo.plat...@agilelab.it wrote:
Hi Mike,

I developed a Solution with cassandra and spark, using DSE.
The main difficult is about cassandra, you need to understand very well its 
data model and its Query patterns.
Cassandra has better performance than hdfs and it has DR and stronger 
availability.
Hdfs is a filesystem, cassandra is a dbms.
Cassandra supports full CRUD without acid.
Hdfs is more flexible than cassandra.

In my opinion, if you have a real time series, go with Cassandra paying 
attention at your reporting data access patterns.

Paolo

Inviata dal mio Windows Phone

Da: Mike Trienismailto:mike.trie...@orcsol.com
Inviato: ?11/?02/?2015 05:59
A: user@spark.apache.orgmailto:user@spark.apache.org
Oggetto: Datastore HDFS vs Cassandra

Hi,

I am considering implement Apache Spark on top of Cassandra database after
listing to related talk and reading through the slides from DataStax. It
seems to fit well with our time-series data and reporting requirements.

http://www.slideshare.net/patrickmcfadin/apache-cassandra-apache-spark-for-time-series-data

Does anyone have any experiences using Apache Spark and Cassandra, including
limitations (and or) technical difficulties? How does Cassandra compare with
HDFS and what use cases would make HDFS more suitable?

Thanks, Mike.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Datastore-HDFS-vs-Cassandra-tp21590.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org




--

Franc Carter | Systems Architect | Rozetta Technology

franc.car...@rozettatech.com mailto:franc.car...@rozettatech.com | 
www.rozettatechnology.comhttp://www.rozettatechnology.com/

Tel: +61 2 8355 2515

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

AUSTRALIA



Re: Datastore HDFS vs Cassandra

2015-02-11 Thread Mike Trienis
Thanks everyone for your responses. I'll definitely think carefully about
the data models, querying patterns and fragmentation side-effects.

Cheers, Mike.

On Wed, Feb 11, 2015 at 1:14 AM, Franc Carter franc.car...@rozettatech.com
wrote:


 I forgot to mention that if you do decide to use Cassandra I'd highly
 recommend jumping on the Cassandra mailing list, if we had taken in come of
 the advice on that list things would have been considerably smoother

 cheers

 On Wed, Feb 11, 2015 at 8:12 PM, Christian Betz 
 christian.b...@performance-media.de wrote:

   Hi

  Regarding the Cassandra Data model, there's an excellent post on the
 ebay tech blog:
 http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/.
 There's also a slideshare for this somewhere.

  Happy hacking

  Chris

   Von: Franc Carter franc.car...@rozettatech.com
 Datum: Mittwoch, 11. Februar 2015 10:03
 An: Paolo Platter paolo.plat...@agilelab.it
 Cc: Mike Trienis mike.trie...@orcsol.com, user@spark.apache.org 
 user@spark.apache.org
 Betreff: Re: Datastore HDFS vs Cassandra


 One additional comment I would make is that you should be careful with
 Updates in Cassandra, it does support them but large amounts of Updates
 (i.e changing existing keys) tends to cause fragmentation. If you are
 (mostly) adding new keys (e.g new records in the the time series) then
 Cassandra can be excellent

  cheers


 On Wed, Feb 11, 2015 at 6:13 PM, Paolo Platter paolo.plat...@agilelab.it
  wrote:

   Hi Mike,

 I developed a Solution with cassandra and spark, using DSE.
 The main difficult is about cassandra, you need to understand very well
 its data model and its Query patterns.
 Cassandra has better performance than hdfs and it has DR and stronger
 availability.
 Hdfs is a filesystem, cassandra is a dbms.
 Cassandra supports full CRUD without acid.
 Hdfs is more flexible than cassandra.

 In my opinion, if you have a real time series, go with Cassandra paying
 attention at your reporting data access patterns.

 Paolo

 Inviata dal mio Windows Phone
  --
 Da: Mike Trienis mike.trie...@orcsol.com
 Inviato: ?11/?02/?2015 05:59
 A: user@spark.apache.org
 Oggetto: Datastore HDFS vs Cassandra

   Hi,

 I am considering implement Apache Spark on top of Cassandra database
 after
 listing to related talk and reading through the slides from DataStax. It
 seems to fit well with our time-series data and reporting requirements.


 http://www.slideshare.net/patrickmcfadin/apache-cassandra-apache-spark-for-time-series-data

 Does anyone have any experiences using Apache Spark and Cassandra,
 including
 limitations (and or) technical difficulties? How does Cassandra compare
 with
 HDFS and what use cases would make HDFS more suitable?

 Thanks, Mike.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Datastore-HDFS-vs-Cassandra-tp21590.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




  --

 *Franc Carter* | Systems Architect | Rozetta Technology

 franc.car...@rozettatech.com  franc.car...@rozettatech.com|
 www.rozettatechnology.com

 Tel: +61 2 8355 2515

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215

 AUSTRALIA




 --

 *Franc Carter* | Systems Architect | Rozetta Technology

 franc.car...@rozettatech.com  franc.car...@rozettatech.com|
 www.rozettatechnology.com

 Tel: +61 2 8355 2515

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215

 AUSTRALIA




R: Datastore HDFS vs Cassandra

2015-02-10 Thread Paolo Platter
Hi Mike,

I developed a Solution with cassandra and spark, using DSE.
The main difficult is about cassandra, you need to understand very well its 
data model and its Query patterns.
Cassandra has better performance than hdfs and it has DR and stronger 
availability.
Hdfs is a filesystem, cassandra is a dbms.
Cassandra supports full CRUD without acid.
Hdfs is more flexible than cassandra.

In my opinion, if you have a real time series, go with Cassandra paying 
attention at your reporting data access patterns.

Paolo

Inviata dal mio Windows Phone

Da: Mike Trienismailto:mike.trie...@orcsol.com
Inviato: ‎11/‎02/‎2015 05:59
A: user@spark.apache.orgmailto:user@spark.apache.org
Oggetto: Datastore HDFS vs Cassandra

Hi,

I am considering implement Apache Spark on top of Cassandra database after
listing to related talk and reading through the slides from DataStax. It
seems to fit well with our time-series data and reporting requirements.

http://www.slideshare.net/patrickmcfadin/apache-cassandra-apache-spark-for-time-series-data

Does anyone have any experiences using Apache Spark and Cassandra, including
limitations (and or) technical difficulties? How does Cassandra compare with
HDFS and what use cases would make HDFS more suitable?

Thanks, Mike.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Datastore-HDFS-vs-Cassandra-tp21590.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org