Re: Disc size for cluster

2017-01-26 Thread Anuj Wadehra
Adding to what Benjamin said..
It is hard to estimate disk space if you are using STCS for a table where rows 
are updated frequently leading to lot of fragmentation. STCS may also lead to 
scenarios where tombstones are not evicted for long times. You may go live and 
everything goes well for months. Then gradually you realize that large sstables 
are holding on to tombstones as they are not getting compacted.  It is not easy 
to test disk space requirements with precision upfront unless you test your 
system with data patterns for some time.
Your life can be easy much easier if you take care of following points with 
STCS:
1. If you can afford some extra IO, go for slightly aggressive STCS strategy 
using one or more of following settings: min_threshold=2, 
bucket_high=2,unchecked_tombstone_compactions=true. Which one of these to use 
depends on your use case.Study these settings.
2. Estimate free disk required for compactions at any point of time. 
For example, suppose you have 5 tables with 3 TB data in total and you estimate 
that data distribution will be as follows:A: 800 gb B:700gb C:600gb D:500gb 
E:400gb
If you have concurrent_compactors=3 and 90% data of your largest tables are 
getting compacted simultaneously, you will need 90/100*(800+700+600)gb =1.9 TB 
free disk space. So you wont need 6 TB disk for 3 TB data. Only 4.9 TB would do.
3. Take 10-15% buffer for future schema changes and calculation errors. Better 
safe than sorry :)

Thanks
Anuj 
 
  On Thu, 26 Jan, 2017 at 2:41 PM, Benjamin Roth 
wrote:   Hi!
This is basically right, but:1. How do you know the 3TB storage will be 3TB on 
cassandra? This depends how the data is serialized, compressed and how often it 
changes and it depends on your compaction settings2. 50% free space on STCS is 
only required if you do a full compaction of a single CF that takes all the 
space. Normally you need as much free space as the target SSTable of a 
compaction will take. If you split your data across more CFs, its unlikely you 
really hit this value.
.. probably you should do some tests. But in the end it is always good to have 
some headroom. I personally would scale out if free space is < 30% but that 
always depends on your model.

2017-01-26 9:56 GMT+01:00 Raphael Vogel :

HiJust want to validate my estimation for a C* cluster which should have around 
3 TB of usable storage.Assuming a RF of 3 and SizeTiered Compaction Strategy.Is 
it correct, that SizeTiered Compaction Strategy needs (in the worst case) 50% 
free disc space during compaction? So this would then result in a cluster of 
3TB x 3 x 2 == 18 TB of raw storage? Thanks and RegardsRaphael Vogel



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer  


Re: Disc size for cluster

2017-01-26 Thread Benjamin Roth
Hi!

This is basically right, but:
1. How do you know the 3TB storage will be 3TB on cassandra? This depends
how the data is serialized, compressed and how often it changes and it
depends on your compaction settings
2. 50% free space on STCS is only required if you do a full compaction of a
single CF that takes all the space. Normally you need as much free space as
the target SSTable of a compaction will take. If you split your data across
more CFs, its unlikely you really hit this value.

.. probably you should do some tests. But in the end it is always good to
have some headroom. I personally would scale out if free space is < 30% but
that always depends on your model.


2017-01-26 9:56 GMT+01:00 Raphael Vogel :

> Hi
> Just want to validate my estimation for a C* cluster which should have
> around 3 TB of usable storage.
> Assuming a RF of 3 and SizeTiered Compaction Strategy.
> Is it correct, that SizeTiered Compaction Strategy needs (in the worst
> case) 50% free disc space during compaction?
>
> So this would then result in a cluster of 3TB x 3 x 2 == 18 TB of raw
> storage?
>
> Thanks and Regards
> Raphael Vogel
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Disc size for cluster

2017-01-26 Thread Raphael Vogel


Hi

Just want to validate my estimation for a C* cluster which should have around 3 TB of usable storage.

Assuming a RF of 3 and SizeTiered Compaction Strategy.

Is it correct, that SizeTiered Compaction Strategy needs (in the worst case) 50% free disc space during compaction?

 

So this would then result in a cluster of 3TB x 3 x 2 == 18 TB of raw storage?

 

Thanks and Regards

Raphael Vogel