After doing some testing, I'm a bit confused even more.

What I'm trying to achieve is minimal data movement when I have to service
a node to replace a failed drive. Since these nodes don't have hot-swap
bays, I'll need to power down the box to replace the failed drive. I don't
want Ceph to shuffle data until the new drive comes up and is ready.

My thought was to set norecover nobackfill, take down the host, replace the
drive, start the host, remove the old OSD from the cluster, ceph-disk
prepare the new disk then unset norecover nobackfill.

However in my testing with a 4 node cluster ( v.94.0 10 OSDs each,
replication 3, min_size 2, chooselead_fristn host), if I take down a host
I/O becomes blocked even though only one copy should be taken down and
still satisfies min_size. When I unset norecover, then I/O proceeds and
some backfill activity happens. At some point the backfill stops and
everything seems to be "happy" in the degraded state.

I'm really interested to know what is going on with "norecover" as the
cluster seems to break if it is set. Unsetting the "norecover" flag causes
some degraded objects to recover, but not all. Writing to new blocks in an
RBD causes the number of degraded objects to increase, but works just fine
otherwise. Here is an example after taking down one host and removing the
OSDs from the CRUSH map (I'm reformatting all the drives in the host
currently).

# ceph status
    cluster 146c4fe8-7c85-46dc-b8b3-69072d658287
     health HEALTH_WARN
            1345 pgs backfill
            10 pgs backfilling
            2016 pgs degraded
            661 pgs recovery_wait
            2016 pgs stuck degraded
            2016 pgs stuck unclean
            1356 pgs stuck undersized
            1356 pgs undersized
            recovery 40642/167785 objects degraded (24.223%)
            recovery 31481/167785 objects misplaced (18.763%)
            too many PGs per OSD (665 > max 300)
            nobackfill flag(s) set
     monmap e5: 3 mons at {nodea=
10.8.6.227:6789/0,nodeb=10.8.6.228:6789/0,nodec=10.8.6.229:6789/0}
            election epoch 2576, quorum 0,1,2 nodea,nodeb,nodec
     osdmap e59031: 30 osds: 30 up, 30 in; 1356 remapped pgs
            flags nobackfill
      pgmap v4723208: 6656 pgs, 4 pools, 330 GB data, 53235 objects
            863 GB used, 55000 GB / 55863 GB avail
            40642/167785 objects degraded (24.223%)
            31481/167785 objects misplaced (18.763%)
                4640 active+clean
                1345 active+undersized+degraded+remapped+wait_backfill
                 660 active+recovery_wait+degraded
                  10 active+undersized+degraded+remapped+backfilling
                   1 active+recovery_wait+undersized+degraded+remapped
  client io 1864 kB/s rd, 8853 kB/s wr, 65 op/s

Any help understanding these flags would be very helpful.

Thanks,
Robert

On Mon, Apr 13, 2015 at 1:40 PM, Robert LeBlanc <rob...@leblancnet.us>
wrote:

> I'm looking for documentation about what exactly each of these do and
> I can't find it. Can someone point me in the right direction?
>
> The names seem too ambiguous to come to any conclusion about what
> exactly they do.
>
> Thanks,
> Robert
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to