Hello,

On Sun, 28 Aug 2016 14:34:25 -0500 Sean Sullivan wrote:

> I was curious if anyone has filled ceph storage beyond 75%. 

If you (re-)search the ML archives, you will find plenty of cases like
this, albeit most of them involuntary. 
Same goes for uneven distribution.

> Admitedly we
> lost a single host due to power failure and are down 1 host until the
> replacement parts arrive but outside of that I am seeing disparity between
> the most and least full osd::
> 
> ID  WEIGHT  REWEIGHT SIZE  USE   AVAIL %USE  VAR
> MIN/MAX VAR: 0/1.26  STDDEV: 7.12
>                TOTAL 2178T 1625T  552T 74.63
> 
> 559 4.54955  1.00000 3724G 2327G 1396G 62.50 0.84
> 193 2.48537  1.00000 3724G 3406G  317G 91.47 1.23
> 
Those extremes, especially with the weights they have, look odd indeed.
Unless OSD 193 is in the rack which lost a node.

> The crush weights are really off right now but even with a default crush
> map I am seeing a similar spread::
> 
> # osdmaptool --test-map-pgs --pool 1 /tmp/osdmap
>  avg 82 stddev 10.54 (0.128537x) (expected 9.05095 0.110377x))
>  min osd.336 55
>  max osd.54 115
> 
> That's with a default weight of 3.000 across all osds. I was wondering if
> anyone can give me any tips on how to reach closer to 80% full.
> 
> We have 630 osds (down one host right now but it will be back in in a week
> or so) spread across 3 racks of 7 hosts (30 osds each). Our data
> replication scheme is by rack and we only use S3 (so 98% of our data is in
> .rgw.buckets pool). We are on hammer (94.7) and using the hammer tunables.
> 
What comes to mind here is that probably your split into 3 buckets (racks)
and then into 7 (hosts) is probably not helping the already rather fuzzy
CRUSH algorithm to come up with an even distribution.
Meaning that imbalances are likely to be amplified. 

And dense (30 OSDs) storage servers amplify things of course when one goes
down. 

So how many PGs in the bucket pool then?

With jewel (backport exists, check the ML archives) there's an improved
reweight-by-utilization script that can help with these things.
And I prefer to do this manually by using the (persistent) crush-reweight
to achieve a more even distribution.

For example on one cluster here I got the 18 HDD OSDs all within 100GB of
each other.

However having lost 3 of those OSDs 2 days ago the spread is now 300GB,
most likely NOT helped by the manual adjustments done earlier. 
So your nice and evenly distributed cluster during normal state may be
worse off using custom weights when there is a significant OSD loss.

Christian
-- 
Christian Balzer        Network/Systems Engineer                
ch...@gol.com           Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to