Re: maximum storage per node

2013-07-29 Thread aaron morton
 Does anyone have opinions on the maximum amount of data reasonable to store 
 on one Cassandra node?
spinning disk and 1Gbe networking, rule of thumb was 300Gb to 500GB. 

SSD or very fast local disk, 10Gbe networking, optionally JBOD, cassandra 1.2 
and vnodes people are talking about multiple TB's per node. 

  If there are limitations, what are the reasons for it?   

The main issues were:

* As discussed potentially very long compaction
* As discussed repair taking a very long time to calculate the merkle trees. 
* Potentially taking a very long time to rebuild a new node after one totally 
fails. vNodes address this by increasing the number of nodes that can stream 
data to one bootstrapping. This is really something that has to fit into your 
operations. 

Hope that helps. 
 
-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 27/07/2013, at 5:00 AM, Robert Coli rc...@eventbrite.com wrote:

 On Fri, Jul 26, 2013 at 4:23 AM, Romain HARDOUIN romain.hardo...@urssaf.fr 
 wrote:
 Do you have some fairly complex queries to run against your data? 
 Or your need is just to store large pieces of data? (In which case Object 
 Storage like OpenStack Swift could be more appropriate IMHO)  
 
 Or distributed blob storage like MogileFS.
 
 https://code.google.com/p/mogilefs/
 
 =Rob



Re: maximum storage per node

2013-07-26 Thread cem
I dont think it is a good idea to put multiple instance in same machine.
You may lose multiple instances at the same time if the machine goes
down. You can also specify multiple directories as storage in 1.2.

I am also not sure boot-strapping will be a big problem since the number
keys you will store is relatively small.

Why didnt you partition your data according to time instead of using your
own compactor?

Cem



On Fri, Jul 26, 2013 at 3:50 AM, sankalp kohli kohlisank...@gmail.comwrote:

 Try putting multiple instances per machine with each instance mapped to
 its own disk. This might not work with v-nodes


 On Thu, Jul 25, 2013 at 9:04 AM, Pruner, Anne (Anne) pru...@avaya.comwrote:

  I actually wrote my own compactor that deals with this problem.

 ** **

 Anne

 ** **

 *From:* cem [mailto:cayiro...@gmail.com]
 *Sent:* Thursday, July 25, 2013 11:59 AM

 *To:* user@cassandra.apache.org
 *Subject:* Re: maximum storage per node

 ** **

 You will suffer from long compactions if you are planning to get rid of
 from old records by TTL.

 ** **

 Best Regards,

 Cem.

 ** **

 On Thu, Jul 25, 2013 at 5:51 PM, Kanwar Sangha kan...@mavenir.com
 wrote:

 Issues with large data nodes would be –

  

 · Nodetool repair will be impossible to run

 · Your read i/o will suffer since you will almost always go to
 disk (each read will take 3 IOPS worst case)

 · Boot-straping the node in case of failures will take days/weeks
 

  

  

 *From:* Pruner, Anne (Anne) [mailto:pru...@avaya.com]
 *Sent:* 25 July 2013 10:45
 *To:* user@cassandra.apache.org
 *Subject:* RE: maximum storage per node

  

 We’re storing fairly large files (about 1MB apiece) for a few months and
 then deleting the oldest to get more space to add new ones.  We have large
 requirements (maybe up to 100 TB), so having a 1TB limit would be
 unworkable.

  

 What is the reason for the limit?  Does something fail after that?

  

 If there are hardware issues, what’s recommended?

  

 BTW, we’re using Cassandra 1.2

  

 Anne

  

 *From:* cem [mailto:cayiro...@gmail.com cayiro...@gmail.com]
 *Sent:* Thursday, July 25, 2013 11:41 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: maximum storage per node

  

 Between 500GB - 1TB is recommended. 

  

 But it depends also your hardware, traffic characteristics and
 requirements. Can you give some details on that?

  

 Best Regards,

 Cem

  

 On Thu, Jul 25, 2013 at 5:35 PM, Pruner, Anne (Anne) pru...@avaya.com
 wrote:

 Does anyone have opinions on the maximum amount of data reasonable to
 store on one Cassandra node?  If there are limitations, what are the
 reasons for it?

  

 Thanks,

 Anne

  

 ** **





RE: maximum storage per node

2013-07-26 Thread Romain HARDOUIN
Do you have some fairly complex queries to run against your data?
Or your need is just to store large pieces of data? (In which case Object 
Storage like OpenStack Swift could be more appropriate IMHO)

Pruner, Anne (Anne) pru...@avaya.com a écrit sur 25/07/2013 17:45:11 :

 De : Pruner, Anne (Anne) pru...@avaya.com
 A : user@cassandra.apache.org user@cassandra.apache.org, 
 Date : 25/07/2013 17:45
 Objet : RE: maximum storage per node
 
 We’re storing fairly large files (about 1MB apiece) for a few months
 and then deleting the oldest to get more space to add new ones.  We 
 have large requirements (maybe up to 100 TB), so having a 1TB limit 
 would be unworkable.
 
 What is the reason for the limit?  Does something fail after that?
 
 If there are hardware issues, what’s recommended?
 
 BTW, we’re using Cassandra 1.2
 
 Anne
 
 From: cem [mailto:cayiro...@gmail.com] 
 Sent: Thursday, July 25, 2013 11:41 AM
 To: user@cassandra.apache.org
 Subject: Re: maximum storage per node
 
 Between 500GB - 1TB is recommended. 
 
 But it depends also your hardware, traffic characteristics and 
 requirements. Can you give some details on that?
 
 Best Regards,
 Cem
 
 On Thu, Jul 25, 2013 at 5:35 PM, Pruner, Anne (Anne) pru...@avaya.com 
wrote:
 Does anyone have opinions on the maximum amount of data reasonable 
 to store on one Cassandra node?  If there are limitations, what are 
 the reasons for it?
 
 Thanks,
 Anne
 


Re: maximum storage per node

2013-07-26 Thread Robert Coli
On Fri, Jul 26, 2013 at 4:23 AM, Romain HARDOUIN
romain.hardo...@urssaf.frwrote:

 Do you have some fairly complex queries to run against your data?
 Or your need is just to store large pieces of data? (In which case Object
 Storage like OpenStack Swift could be more appropriate IMHO)


Or distributed blob storage like MogileFS.

https://code.google.com/p/mogilefs/

=Rob


maximum storage per node

2013-07-25 Thread Pruner, Anne (Anne)
Does anyone have opinions on the maximum amount of data reasonable to store on 
one Cassandra node?  If there are limitations, what are the reasons for it?

Thanks,
Anne


Re: maximum storage per node

2013-07-25 Thread cem
Between 500GB - 1TB is recommended.

But it depends also your hardware, traffic characteristics and
requirements. Can you give some details on that?

Best Regards,
Cem


On Thu, Jul 25, 2013 at 5:35 PM, Pruner, Anne (Anne) pru...@avaya.comwrote:

  Does anyone have opinions on the maximum amount of data reasonable to
 store on one Cassandra node?  If there are limitations, what are the
 reasons for it?

 ** **

 Thanks,

 Anne



RE: maximum storage per node

2013-07-25 Thread Pruner, Anne (Anne)
We're storing fairly large files (about 1MB apiece) for a few months and then 
deleting the oldest to get more space to add new ones.  We have large 
requirements (maybe up to 100 TB), so having a 1TB limit would be unworkable.

What is the reason for the limit?  Does something fail after that?

If there are hardware issues, what's recommended?

BTW, we're using Cassandra 1.2

Anne

From: cem [mailto:cayiro...@gmail.com]
Sent: Thursday, July 25, 2013 11:41 AM
To: user@cassandra.apache.org
Subject: Re: maximum storage per node

Between 500GB - 1TB is recommended.

But it depends also your hardware, traffic characteristics and requirements. 
Can you give some details on that?

Best Regards,
Cem

On Thu, Jul 25, 2013 at 5:35 PM, Pruner, Anne (Anne) 
pru...@avaya.commailto:pru...@avaya.com wrote:
Does anyone have opinions on the maximum amount of data reasonable to store on 
one Cassandra node?  If there are limitations, what are the reasons for it?

Thanks,
Anne



RE: maximum storage per node

2013-07-25 Thread Kanwar Sangha
Issues with large data nodes would be -


* Nodetool repair will be impossible to run

* Your read i/o will suffer since you will almost always go to disk 
(each read will take 3 IOPS worst case)

* Boot-straping the node in case of failures will take days/weeks



From: Pruner, Anne (Anne) [mailto:pru...@avaya.com]
Sent: 25 July 2013 10:45
To: user@cassandra.apache.org
Subject: RE: maximum storage per node

We're storing fairly large files (about 1MB apiece) for a few months and then 
deleting the oldest to get more space to add new ones.  We have large 
requirements (maybe up to 100 TB), so having a 1TB limit would be unworkable.

What is the reason for the limit?  Does something fail after that?

If there are hardware issues, what's recommended?

BTW, we're using Cassandra 1.2

Anne

From: cem [mailto:cayiro...@gmail.com]
Sent: Thursday, July 25, 2013 11:41 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: maximum storage per node

Between 500GB - 1TB is recommended.

But it depends also your hardware, traffic characteristics and requirements. 
Can you give some details on that?

Best Regards,
Cem

On Thu, Jul 25, 2013 at 5:35 PM, Pruner, Anne (Anne) 
pru...@avaya.commailto:pru...@avaya.com wrote:
Does anyone have opinions on the maximum amount of data reasonable to store on 
one Cassandra node?  If there are limitations, what are the reasons for it?

Thanks,
Anne



Re: maximum storage per node

2013-07-25 Thread cem
You will suffer from long compactions if you are planning to get rid of
from old records by TTL.

Best Regards,
Cem.


On Thu, Jul 25, 2013 at 5:51 PM, Kanwar Sangha kan...@mavenir.com wrote:

  Issues with large data nodes would be –

 ** **

 **· **Nodetool repair will be impossible to run

 **· **Your read i/o will suffer since you will almost always go
 to disk (each read will take 3 IOPS worst case)

 **· **Boot-straping the node in case of failures will take
 days/weeks

 ** **

 ** **

 *From:* Pruner, Anne (Anne) [mailto:pru...@avaya.com]
 *Sent:* 25 July 2013 10:45
 *To:* user@cassandra.apache.org
 *Subject:* RE: maximum storage per node

 ** **

 We’re storing fairly large files (about 1MB apiece) for a few months and
 then deleting the oldest to get more space to add new ones.  We have large
 requirements (maybe up to 100 TB), so having a 1TB limit would be
 unworkable.

 ** **

 What is the reason for the limit?  Does something fail after that?

 ** **

 If there are hardware issues, what’s recommended?

 ** **

 BTW, we’re using Cassandra 1.2

 ** **

 Anne

 ** **

 *From:* cem [mailto:cayiro...@gmail.com cayiro...@gmail.com]
 *Sent:* Thursday, July 25, 2013 11:41 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: maximum storage per node

 ** **

 Between 500GB - 1TB is recommended. 

 ** **

 But it depends also your hardware, traffic characteristics and
 requirements. Can you give some details on that?

 ** **

 Best Regards,

 Cem

 ** **

 On Thu, Jul 25, 2013 at 5:35 PM, Pruner, Anne (Anne) pru...@avaya.com
 wrote:

 Does anyone have opinions on the maximum amount of data reasonable to
 store on one Cassandra node?  If there are limitations, what are the
 reasons for it?

  

 Thanks,

 Anne

 ** **



RE: maximum storage per node

2013-07-25 Thread Pruner, Anne (Anne)
I actually wrote my own compactor that deals with this problem.

Anne

From: cem [mailto:cayiro...@gmail.com]
Sent: Thursday, July 25, 2013 11:59 AM
To: user@cassandra.apache.org
Subject: Re: maximum storage per node

You will suffer from long compactions if you are planning to get rid of from 
old records by TTL.

Best Regards,
Cem.

On Thu, Jul 25, 2013 at 5:51 PM, Kanwar Sangha 
kan...@mavenir.commailto:kan...@mavenir.com wrote:
Issues with large data nodes would be -


* Nodetool repair will be impossible to run

* Your read i/o will suffer since you will almost always go to disk 
(each read will take 3 IOPS worst case)

* Boot-straping the node in case of failures will take days/weeks



From: Pruner, Anne (Anne) [mailto:pru...@avaya.commailto:pru...@avaya.com]
Sent: 25 July 2013 10:45
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: maximum storage per node

We're storing fairly large files (about 1MB apiece) for a few months and then 
deleting the oldest to get more space to add new ones.  We have large 
requirements (maybe up to 100 TB), so having a 1TB limit would be unworkable.

What is the reason for the limit?  Does something fail after that?

If there are hardware issues, what's recommended?

BTW, we're using Cassandra 1.2

Anne

From: cem [mailto:cayiro...@gmail.com]
Sent: Thursday, July 25, 2013 11:41 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: maximum storage per node

Between 500GB - 1TB is recommended.

But it depends also your hardware, traffic characteristics and requirements. 
Can you give some details on that?

Best Regards,
Cem

On Thu, Jul 25, 2013 at 5:35 PM, Pruner, Anne (Anne) 
pru...@avaya.commailto:pru...@avaya.com wrote:
Does anyone have opinions on the maximum amount of data reasonable to store on 
one Cassandra node?  If there are limitations, what are the reasons for it?

Thanks,
Anne




Re: maximum storage per node

2013-07-25 Thread sankalp kohli
Try putting multiple instances per machine with each instance mapped to its
own disk. This might not work with v-nodes


On Thu, Jul 25, 2013 at 9:04 AM, Pruner, Anne (Anne) pru...@avaya.comwrote:

  I actually wrote my own compactor that deals with this problem.

 ** **

 Anne

 ** **

 *From:* cem [mailto:cayiro...@gmail.com]
 *Sent:* Thursday, July 25, 2013 11:59 AM

 *To:* user@cassandra.apache.org
 *Subject:* Re: maximum storage per node

 ** **

 You will suffer from long compactions if you are planning to get rid of
 from old records by TTL.

 ** **

 Best Regards,

 Cem.

 ** **

 On Thu, Jul 25, 2013 at 5:51 PM, Kanwar Sangha kan...@mavenir.com wrote:
 

 Issues with large data nodes would be –

  

 · Nodetool repair will be impossible to run

 · Your read i/o will suffer since you will almost always go to
 disk (each read will take 3 IOPS worst case)

 · Boot-straping the node in case of failures will take days/weeks*
 ***

  

  

 *From:* Pruner, Anne (Anne) [mailto:pru...@avaya.com]
 *Sent:* 25 July 2013 10:45
 *To:* user@cassandra.apache.org
 *Subject:* RE: maximum storage per node

  

 We’re storing fairly large files (about 1MB apiece) for a few months and
 then deleting the oldest to get more space to add new ones.  We have large
 requirements (maybe up to 100 TB), so having a 1TB limit would be
 unworkable.

  

 What is the reason for the limit?  Does something fail after that?

  

 If there are hardware issues, what’s recommended?

  

 BTW, we’re using Cassandra 1.2

  

 Anne

  

 *From:* cem [mailto:cayiro...@gmail.com cayiro...@gmail.com]
 *Sent:* Thursday, July 25, 2013 11:41 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: maximum storage per node

  

 Between 500GB - 1TB is recommended. 

  

 But it depends also your hardware, traffic characteristics and
 requirements. Can you give some details on that?

  

 Best Regards,

 Cem

  

 On Thu, Jul 25, 2013 at 5:35 PM, Pruner, Anne (Anne) pru...@avaya.com
 wrote:

 Does anyone have opinions on the maximum amount of data reasonable to
 store on one Cassandra node?  If there are limitations, what are the
 reasons for it?

  

 Thanks,

 Anne

  

 ** **