Hello,

On Mon, 08 Sep 2014 11:42:59 -0400 JR wrote:

> Greetings all,
> 
> I have a small ceph cluster (4 nodes, 2 osds per node) which recently
> started showing:
> 
> root@ocd45:~# ceph health
> HEALTH_WARN 1 near full osd(s)
> 
> admin@node4:~$ for i in 2 3 4 5; do sudo ssh osd4$i df -h |egrep
> 'Filesystem|osd/ceph'; done
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sdc1       442G  249G  194G  57% /var/lib/ceph/osd/ceph-5
> /dev/sdb1       442G  287G  156G  65% /var/lib/ceph/osd/ceph-1
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sdc1       442G  396G   47G  90% /var/lib/ceph/osd/ceph-7
> /dev/sdb1       442G  316G  127G  72% /var/lib/ceph/osd/ceph-3
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sdb1       442G  229G  214G  52% /var/lib/ceph/osd/ceph-2
> /dev/sdc1       442G  229G  214G  52% /var/lib/ceph/osd/ceph-6
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sdc1       442G  238G  205G  54% /var/lib/ceph/osd/ceph-4
> /dev/sdb1       442G  278G  165G  63% /var/lib/ceph/osd/ceph-0
> 
>
See the very recent "Uneven OSD usage" for a discussion about this.
What are your PG/PGP values?

> This cluster has been running for weeks, under significant load, and has
> been 100% stable. Unfortunately we have to ship it out of the building
> to another part of our business (where we will have little access to it).
> 
> Based on what I've read about 'ceph osd reweight' I'm a bit hesitant to
> just run it (I don't want to do anything that impacts this cluster's
> stability).
> 
> Is there another, better way to equalize the distribution the data on
> the osd partitions?
> 
> I'm running dumpling.
> 
As per the thread and my experience, Firefly would solve this. If you can
upgrade during a weekend or whenever there is little to no access, do it.

Another option (of course any and all of these will result in data
movement, so pick an appropriate time), would be to "use ceph osd
reweight" to lower the weight of osd.7 in particular.

Lastly, given the utilization of your cluster, your really ought to deploy
more OSDs and/or more nodes, if a node would go down you'd easily get into
a "real" near full or full situation.

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
ch...@gol.com           Global OnLine Japan/Fusion Communult in data
movement, so pick an appropriate time), would be to ications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to