Thanks Anthony,

Shortly after I made that post, I found a Server Fault post where someone had 
asked the exact same question.  The reply was this - "The 'MAX AVAIL' column 
represents the amount of data that can be used before the first OSD becomes 
full. It takes into account the projected distribution of data across disks 
from the CRUSH map and uses the 'first OSD to fill up' as the target."

To answer your question, yes we have a rather unbalanced cluster which is 
something I'm working on.  When I saw these figures, I got scared that I had 
less time to work on it than I thought.  There are about 10 pools in the 
cluster, but we primarily use one for almost all of our storage and it only has 
64 pgs & 1 replica across 20 OSDs.  So, as data has grown, it works out that 
each PG in this cluster accounts for about 148GB, and the OSDs are about 1.4TB 
each, so it's easy to see how it's found itself way out of balance.

Anyway, once I've added the OSDs and data has rebalanced, I'm going to start 
the process of incrementally increasing the PG count for this pool in a staged 
process to reduce the amount of data per PG and (hopefully) balance out the 
data distribution better than it is.

This is one big learning process - I just wish I wasn't learning in production 
so much.



On Mon, 2021-01-11 at 15:58 -0800, Anthony D'Atri wrote:

Either you have multiple CRUSH roots or device classes, or you have unbalanced 
OSD utilization.  What version of Ceph?  Do you have any balancing enabled?


Do


ceph osd df | sort -nk8 | head

ceph osd df | sort -nk8 | tail


and I’ll bet you have OSDs way more full than others.  The STDDEV value that 
ceph df reports I suspect is accordingly high


On Jan 11, 2021, at 2:07 PM, Mark Johnson <

<mailto:ma...@iovox.com>

ma...@iovox.com

> wrote:


Can someone please explain to me the difference between the Global "AVAIL" and 
the "MAX AVAIL" in the pools table when I do a "ceph df detail"?  The reason 
being that we have a total of 14 pools, however almost all of our data exists 
in one pool.  A "ceph df detail" shows the following:


GLOBAL:

   SIZE       AVAIL     RAW USED     %RAW USED     OBJECTS

   28219G     6840G       19945G         70.68      36112k


But the POOLS table from the same output shows the MAX AVAIL for each pool as 
498G and the pool with all the data shows 9472G used with a %USED of 95.00.  If 
it matters, the pool size is set to 2 so my guess is the global available 
figure is raw, meaning I should still have approx. 3.4TB available, but that 
95% used has me concerned.  I'm going to be adding some OSDs soon but still 
would like to understand the difference and how much trouble I'm in at this 
point.



_______________________________________________

ceph-users mailing list --

<mailto:ceph-users@ceph.io>

ceph-users@ceph.io


To unsubscribe send an email to

<mailto:ceph-users-le...@ceph.io>

ceph-users-le...@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to