I'm in the middle of increasing PG count for one of our pools by making small 
increments, waiting for the process to complete, rinse and repeat.  I'm doing 
it this way so I can control when all this activity is happening and keeping it 
away from the busier production traffic times.

I'm expecting some inbalance as PGs get created on already unbalanced OSDs, 
however our monitoring picked up something today that I'm not really 
understanding.  Our total utilization is just over 50% and about 96% of our 
total data is in this one pool.  Due to there not being enough PGs, the amount 
of data in each is quite large and since they aren't evenly spread across the 
OSDs, there's a bit of inbalance.  That's all cool and to be expected, which is 
the reason for increasing the PG count in the first place.

However, as some PGs are splitting, the new PGs are sometimes being created on 
OSDs that already have a disproportionate amount of data.  Again, not totally 
unexpected.  Our monitoring detected the usage of this pool to be >85% today as 
I neared the end of another increase in PG count.  What I'm not understanding 
is how this value is determined.  I've read other posts and the calculations 
suggested don't give a result that equals what shows in my %USED column.  I'm 
suspecting that it's somehow related to the MAX AVAIL value (which I believe is 
somewhat indirectly related to the amount available based on the individual OSD 
utilization), but none of the posts I read mention this in their calculations 
and I've been unable to create a formula with any of the values I have to end 
up with the &USED value I have.

For the record, my current total utilization based on a 'ceph osd df' looks 
like this:

              TOTAL 39507G(SIZE) 19931G(USE) 17568G(AVAIL) 50.45(%USE)

My most utilised OSD (currently in the process of moving some data off this 
OSD) is 81.58% used with 188G available and a variance of 1.62.

A cut-down output of 'ceph df' looks like this:

GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    39507G     17569G       19930G         50.45
POOLS:
    NAME                          ID     USED       %USED     MAX AVAIL     
OBJECTS
    default.rgw.buckets.data      30      9552G     86.05         1548G     
36285066

I suspect that as I get the utilization of my over-utilized OSDs down, this 
%USED value will drop.  But, I'd just love to fully understand how this value 
is calculated.

Thanks,
Mark J

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to