I've been in a state where reweight-by-utilization was deadlocked (not the daemons, but the remap scheduling). After successive osd reweight commands, two OSDs wanted to swap PGs, but they were both toofull. I ended up temporarily increasing mon_osd_nearfull_ratio to 0.87. That removed the impediment, and everything finished remapping. Everything went smoothly, and I changed it back when all the remapping finished.
Just be careful if you need to get close to mon_osd_full_ratio. Ceph does greater-than on these percentages, not greater-than-equal. You really don't want the disks to get greater-than mon_osd_full_ratio, because all external IO will stop until you resolve that. On Mon, Oct 20, 2014 at 10:18 AM, Leszek Master <keks...@gmail.com> wrote: > You can set lower weight on full osds, or try changing the > osd_near_full_ratio parameter in your cluster from 85 to for example 89. > But i don't know what can go wrong when you do that. > > > 2014-10-20 17:12 GMT+02:00 Wido den Hollander <w...@42on.com>: > >> On 10/20/2014 05:10 PM, Harald Rößler wrote: >> > yes, tomorrow I will get the replacement of the failed disk, to get a >> new node with many disk will take a few days. >> > No other idea? >> > >> >> If the disks are all full, then, no. >> >> Sorry to say this, but it came down to poor capacity management. Never >> let any disk in your cluster fill over 80% to prevent these situations. >> >> Wido >> >> > Harald Rößler >> > >> > >> >> Am 20.10.2014 um 16:45 schrieb Wido den Hollander <w...@42on.com>: >> >> >> >> On 10/20/2014 04:43 PM, Harald Rößler wrote: >> >>> Yes, I had some OSD which was near full, after that I tried to fix >> the problem with "ceph osd reweight-by-utilization", but this does not >> help. After that I set the near full ratio to 88% with the idea that the >> remapping would fix the issue. Also a restart of the OSD doesn’t help. At >> the same time I had a hardware failure of on disk. :-(. After that failure >> the recovery process start at "degraded ~ 13%“ and stops at 7%. >> >>> Honestly I am scared in the moment I am doing the wrong operation. >> >>> >> >> >> >> Any chance of adding a new node with some fresh disks? Seems like you >> >> are operating on the storage capacity limit of the nodes and that your >> >> only remedy would be adding more spindles. >> >> >> >> Wido >> >> >> >>> Regards >> >>> Harald Rößler >> >>> >> >>> >> >>> >> >>>> Am 20.10.2014 um 14:51 schrieb Wido den Hollander <w...@42on.com>: >> >>>> >> >>>> On 10/20/2014 02:45 PM, Harald Rößler wrote: >> >>>>> Dear All >> >>>>> >> >>>>> I have in them moment a issue with my cluster. The recovery process >> stops. >> >>>>> >> >>>> >> >>>> See this: 2 active+degraded+remapped+backfill_toofull >> >>>> >> >>>> 156 pgs backfill_toofull >> >>>> >> >>>> You have one or more OSDs which are to full and that causes recovery >> to >> >>>> stop. >> >>>> >> >>>> If you add more capacity to the cluster recovery will continue and >> finish. >> >>>> >> >>>>> ceph -s >> >>>>> health HEALTH_WARN 188 pgs backfill; 156 pgs backfill_toofull; 4 >> pgs backfilling; 55 pgs degraded; 49 pgs recovery_wait; 297 pgs stuck >> unclean; recovery 111487/1488290 degraded (7.491%) >> >>>>> monmap e2: 3 mons at {0= >> 10.99.10.10:6789/0,12=10.99.10.22:6789/0,6=10.99.10.16:6789/0}, election >> epoch 332, quorum 0,1,2 0,12,6 >> >>>>> osdmap e6748: 24 osds: 23 up, 23 in >> >>>>> pgmap v43314672: 3328 pgs: 3031 active+clean, 43 >> active+remapped+wait_backfill, 3 active+degraded+wait_backfill, 96 >> active+remapped+wait_backfill+backfill_toofull, 31 active+recovery_wait, 19 >> active+degraded+wait_backfill+backfill_toofull, 36 active+remapped, 3 >> active+remapped+backfilling, 18 active+remapped+backfill_toofull, 6 >> active+degraded+remapped+wait_backfill, 15 active+recovery_wait+remapped, >> 21 active+degraded+remapped+wait_backfill+backfill_toofull, 1 >> active+recovery_wait+degraded, 1 active+degraded+remapped+backfilling, 2 >> active+degraded+remapped+backfill_toofull, 2 >> active+recovery_wait+degraded+remapped; 1698 GB data, 5206 GB used, 971 GB >> / 6178 GB avail; 24382B/s rd, 12411KB/s wr, 320op/s; 111487/1488290 >> degraded (7.491%) >> >>>>> >> >>>>> >> >>>>> I have tried to restart all OSD in the cluster, but does not help >> to finish the recovery of the cluster. >> >>>>> >> >>>>> Have someone any idea >> >>>>> >> >>>>> Kind Regards >> >>>>> Harald Rößler >> >>>>> >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> ceph-users mailing list >> >>>>> ceph-users@lists.ceph.com >> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Wido den Hollander >> >>>> Ceph consultant and trainer >> >>>> 42on B.V. >> >>>> >> >>>> Phone: +31 (0)20 700 9902 >> >>>> Skype: contact42on >> >>>> _______________________________________________ >> >>>> ceph-users mailing list >> >>>> ceph-users@lists.ceph.com >> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>> >> >> >> >> >> >> -- >> >> Wido den Hollander >> >> Ceph consultant and trainer >> >> 42on B.V. >> >> >> >> Phone: +31 (0)20 700 9902 >> >> Skype: contact42on >> > >> >> >> -- >> Wido den Hollander >> Ceph consultant and trainer >> 42on B.V. >> >> Phone: +31 (0)20 700 9902 >> Skype: contact42on >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > 2014-10-20 17:12 GMT+02:00 Wido den Hollander <w...@42on.com>: > >> On 10/20/2014 05:10 PM, Harald Rößler wrote: >> > yes, tomorrow I will get the replacement of the failed disk, to get a >> new node with many disk will take a few days. >> > No other idea? >> > >> >> If the disks are all full, then, no. >> >> Sorry to say this, but it came down to poor capacity management. Never >> let any disk in your cluster fill over 80% to prevent these situations. >> >> Wido >> >> > Harald Rößler >> > >> > >> >> Am 20.10.2014 um 16:45 schrieb Wido den Hollander <w...@42on.com>: >> >> >> >> On 10/20/2014 04:43 PM, Harald Rößler wrote: >> >>> Yes, I had some OSD which was near full, after that I tried to fix >> the problem with "ceph osd reweight-by-utilization", but this does not >> help. After that I set the near full ratio to 88% with the idea that the >> remapping would fix the issue. Also a restart of the OSD doesn’t help. At >> the same time I had a hardware failure of on disk. :-(. After that failure >> the recovery process start at "degraded ~ 13%“ and stops at 7%. >> >>> Honestly I am scared in the moment I am doing the wrong operation. >> >>> >> >> >> >> Any chance of adding a new node with some fresh disks? Seems like you >> >> are operating on the storage capacity limit of the nodes and that your >> >> only remedy would be adding more spindles. >> >> >> >> Wido >> >> >> >>> Regards >> >>> Harald Rößler >> >>> >> >>> >> >>> >> >>>> Am 20.10.2014 um 14:51 schrieb Wido den Hollander <w...@42on.com>: >> >>>> >> >>>> On 10/20/2014 02:45 PM, Harald Rößler wrote: >> >>>>> Dear All >> >>>>> >> >>>>> I have in them moment a issue with my cluster. The recovery process >> stops. >> >>>>> >> >>>> >> >>>> See this: 2 active+degraded+remapped+backfill_toofull >> >>>> >> >>>> 156 pgs backfill_toofull >> >>>> >> >>>> You have one or more OSDs which are to full and that causes recovery >> to >> >>>> stop. >> >>>> >> >>>> If you add more capacity to the cluster recovery will continue and >> finish. >> >>>> >> >>>>> ceph -s >> >>>>> health HEALTH_WARN 188 pgs backfill; 156 pgs backfill_toofull; 4 >> pgs backfilling; 55 pgs degraded; 49 pgs recovery_wait; 297 pgs stuck >> unclean; recovery 111487/1488290 degraded (7.491%) >> >>>>> monmap e2: 3 mons at {0= >> 10.99.10.10:6789/0,12=10.99.10.22:6789/0,6=10.99.10.16:6789/0}, election >> epoch 332, quorum 0,1,2 0,12,6 >> >>>>> osdmap e6748: 24 osds: 23 up, 23 in >> >>>>> pgmap v43314672: 3328 pgs: 3031 active+clean, 43 >> active+remapped+wait_backfill, 3 active+degraded+wait_backfill, 96 >> active+remapped+wait_backfill+backfill_toofull, 31 active+recovery_wait, 19 >> active+degraded+wait_backfill+backfill_toofull, 36 active+remapped, 3 >> active+remapped+backfilling, 18 active+remapped+backfill_toofull, 6 >> active+degraded+remapped+wait_backfill, 15 active+recovery_wait+remapped, >> 21 active+degraded+remapped+wait_backfill+backfill_toofull, 1 >> active+recovery_wait+degraded, 1 active+degraded+remapped+backfilling, 2 >> active+degraded+remapped+backfill_toofull, 2 >> active+recovery_wait+degraded+remapped; 1698 GB data, 5206 GB used, 971 GB >> / 6178 GB avail; 24382B/s rd, 12411KB/s wr, 320op/s; 111487/1488290 >> degraded (7.491%) >> >>>>> >> >>>>> >> >>>>> I have tried to restart all OSD in the cluster, but does not help >> to finish the recovery of the cluster. >> >>>>> >> >>>>> Have someone any idea >> >>>>> >> >>>>> Kind Regards >> >>>>> Harald Rößler >> >>>>> >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> ceph-users mailing list >> >>>>> ceph-users@lists.ceph.com >> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Wido den Hollander >> >>>> Ceph consultant and trainer >> >>>> 42on B.V. >> >>>> >> >>>> Phone: +31 (0)20 700 9902 >> >>>> Skype: contact42on >> >>>> _______________________________________________ >> >>>> ceph-users mailing list >> >>>> ceph-users@lists.ceph.com >> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>> >> >> >> >> >> >> -- >> >> Wido den Hollander >> >> Ceph consultant and trainer >> >> 42on B.V. >> >> >> >> Phone: +31 (0)20 700 9902 >> >> Skype: contact42on >> > >> >> >> -- >> Wido den Hollander >> Ceph consultant and trainer >> 42on B.V. >> >> Phone: +31 (0)20 700 9902 >> Skype: contact42on >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com