After doing some testing, I'm a bit confused even more. What I'm trying to achieve is minimal data movement when I have to service a node to replace a failed drive. Since these nodes don't have hot-swap bays, I'll need to power down the box to replace the failed drive. I don't want Ceph to shuffle data until the new drive comes up and is ready.
My thought was to set norecover nobackfill, take down the host, replace the drive, start the host, remove the old OSD from the cluster, ceph-disk prepare the new disk then unset norecover nobackfill. However in my testing with a 4 node cluster ( v.94.0 10 OSDs each, replication 3, min_size 2, chooselead_fristn host), if I take down a host I/O becomes blocked even though only one copy should be taken down and still satisfies min_size. When I unset norecover, then I/O proceeds and some backfill activity happens. At some point the backfill stops and everything seems to be "happy" in the degraded state. I'm really interested to know what is going on with "norecover" as the cluster seems to break if it is set. Unsetting the "norecover" flag causes some degraded objects to recover, but not all. Writing to new blocks in an RBD causes the number of degraded objects to increase, but works just fine otherwise. Here is an example after taking down one host and removing the OSDs from the CRUSH map (I'm reformatting all the drives in the host currently). # ceph status cluster 146c4fe8-7c85-46dc-b8b3-69072d658287 health HEALTH_WARN 1345 pgs backfill 10 pgs backfilling 2016 pgs degraded 661 pgs recovery_wait 2016 pgs stuck degraded 2016 pgs stuck unclean 1356 pgs stuck undersized 1356 pgs undersized recovery 40642/167785 objects degraded (24.223%) recovery 31481/167785 objects misplaced (18.763%) too many PGs per OSD (665 > max 300) nobackfill flag(s) set monmap e5: 3 mons at {nodea= 10.8.6.227:6789/0,nodeb=10.8.6.228:6789/0,nodec=10.8.6.229:6789/0} election epoch 2576, quorum 0,1,2 nodea,nodeb,nodec osdmap e59031: 30 osds: 30 up, 30 in; 1356 remapped pgs flags nobackfill pgmap v4723208: 6656 pgs, 4 pools, 330 GB data, 53235 objects 863 GB used, 55000 GB / 55863 GB avail 40642/167785 objects degraded (24.223%) 31481/167785 objects misplaced (18.763%) 4640 active+clean 1345 active+undersized+degraded+remapped+wait_backfill 660 active+recovery_wait+degraded 10 active+undersized+degraded+remapped+backfilling 1 active+recovery_wait+undersized+degraded+remapped client io 1864 kB/s rd, 8853 kB/s wr, 65 op/s Any help understanding these flags would be very helpful. Thanks, Robert On Mon, Apr 13, 2015 at 1:40 PM, Robert LeBlanc <rob...@leblancnet.us> wrote: > I'm looking for documentation about what exactly each of these do and > I can't find it. Can someone point me in the right direction? > > The names seem too ambiguous to come to any conclusion about what > exactly they do. > > Thanks, > Robert >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com