Re: question on capacity planning

2011-07-14 Thread Sameer Farooqui
So, in our experience, the amount of storage overhead is much higher. If you
plan on storing 120TB of data, you will want to expect storing 250 TB of
data on disk after the data over head. And then since you have to leave 50%
of storage space free for compaction, you're looking at needing about 500TB
of total storage space.


On Wed, Jun 29, 2011 at 9:17 AM, Ryan King r...@twitter.com wrote:

 On Wed, Jun 29, 2011 at 5:36 AM, Jacob, Arun arun.ja...@disney.com
 wrote:
  if I'm planning to store 20TB of new data per week, and expire all data
  every 2 weeks, with a replication factor of 3, do I only need
 approximately
  120 TB of disk? I'm going to use ttl in my column values to automatically
  expire data. Or would I need more capacity to handle sstable merges?
 Given
  this amount of data, would you recommend node storage at 2TB per node or
  more? This application will have a heavy write /moderate read use
 profile.

 You'll need extra space for both compaction and the overhead in the
 storage format.

 As to the amount of storage per node, that depends on your latency and
 throughput requirements.

 -ryan



question on capacity planning

2011-06-29 Thread Jacob, Arun
if I'm planning to store 20TB of new data per week, and expire all data every 2 
weeks, with a replication factor of 3, do I only need approximately 120 TB of 
disk? I'm going to use ttl in my column values to automatically expire data. Or 
would I need more capacity to handle sstable merges? Given this amount of data, 
would you recommend node storage at 2TB per node or more? This application will 
have a heavy write /moderate read use profile.

-- Arun


Re: question on capacity planning

2011-06-29 Thread Ryan King
On Wed, Jun 29, 2011 at 5:36 AM, Jacob, Arun arun.ja...@disney.com wrote:
 if I'm planning to store 20TB of new data per week, and expire all data
 every 2 weeks, with a replication factor of 3, do I only need approximately
 120 TB of disk? I'm going to use ttl in my column values to automatically
 expire data. Or would I need more capacity to handle sstable merges? Given
 this amount of data, would you recommend node storage at 2TB per node or
 more? This application will have a heavy write /moderate read use profile.

You'll need extra space for both compaction and the overhead in the
storage format.

As to the amount of storage per node, that depends on your latency and
throughput requirements.

-ryan