[Openstack-operators] Ceph recovery going unusually slow
Hi All, I wonder if anyone could help at all. We were doing some routine maintenance on our ceph cluster and after running a "service ceph-all restart" on one of our nodes we noticed that something wasn't quite right. The cluster has gone into an error mode and we have multiple stuck PGs and the object replacement recovery is taking a strangely long time. At first there was about 46% objects misplaced and we now have roughly 16%. However it has taken about 36 hours to do the recovery so far and with a possible 16 to go we are looking at a fairly major issue. As a lot of the system is now blocked for read / writes, customers cannot access their VMs. I think the main issue at the moment is that we have 210pgs stuck inactive and nothing we seem to do can get them to peer. Below is an ouptut of the ceph status. Can anyone help or have any ideas on how to speed up the recover process? We have tried turning down logging on the OSD's but some are going so slow they wont allow us to injectargs into them. health HEALTH_ERR 210 pgs are stuck inactive for more than 300 seconds 298 pgs backfill_wait 3 pgs backfilling 1 pgs degraded 200 pgs peering 1 pgs recovery_wait 1 pgs stuck degraded 210 pgs stuck inactive 512 pgs stuck unclean 3310 requests are blocked > 32 sec recovery 2/11094405 objects degraded (0.000%) recovery 1785063/11094405 objects misplaced (16.090%) nodown,noout,noscrub,nodeep-scrub flag(s) set election epoch 16314, quorum 0,1,2,3,4,5,6,7,8 storage-1,storage-2,storage-3,storage-4,storage-5,storage-6,storage-7,storage-8,storage-9 osdmap e213164: 54 osds: 54 up, 54 in; 329 remapped pgs flags nodown,noout,noscrub,nodeep-scrub pgmap v41030942: 2036 pgs, 14 pools, 14183 GB data, 3309 kobjects 43356 GB used, 47141 GB / 90498 GB avail 2/11094405 objects degraded (0.000%) 1785063/11094405 objects misplaced (16.090%) 1524 active+clean 298 active+remapped+wait_backfill 153 peering 47 remapped+peering 10 inactive 3 active+remapped+backfilling 1 active+recovery_wait+degraded+remapped Many thanks, Grant ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [dev] [doc] Operations Guide future
Blair – correct, it was the majority in the room. I just wanted to reach out and ensure that operators had a chance to voice opinions and see where we were going ( Sounds like option 3 is still the favorable direction. This is going to be a really big exercise, lifting the content out of the repos. Are people able to help? Thanks everyone for getting on board ( On 6/2/17, 2:44 AM, "Blair Bethwaite"wrote: Hi Alex, Likewise for option 3. If I recall correctly from the summit session that was also the main preference in the room? On 2 June 2017 at 11:15, George Mihaiescu wrote: > +1 for option 3 > > > > On Jun 1, 2017, at 11:06, Alexandra Settle wrote: > > Hi everyone, > > > > I haven’t had any feedback regarding moving the Operations Guide to the > OpenStack wiki. I’m not taking silence as compliance. I would really like to > hear people’s opinions on this matter. > > > > To recap: > > > > Option one: Kill the Operations Guide completely and move the Administration > Guide to project repos. > Option two: Combine the Operations and Administration Guides (and then this > will be moved into the project-specific repos) > Option three: Move Operations Guide to OpenStack wiki (for ease of > operator-specific maintainability) and move the Administration Guide to > project repos. > > > > Personally, I think that option 3 is more realistic. The idea for the last > option is that operators are maintaining operator-specific documentation and > updating it as they go along and we’re not losing anything by combining or > deleting. I don’t want to lose what we have by going with option 1, and I > think option 2 is just a workaround without fixing the problem – we are not > getting contributions to the project. > > > > Thoughts? > > > > Alex > > > > From: Alexandra Settle > Date: Friday, May 19, 2017 at 1:38 PM > To: Melvin Hillsman , OpenStack Operators > > Subject: Re: [Openstack-operators] Fwd: [openstack-dev] [openstack-doc] > [dev] What's up doc? Summit recap edition > > > > Hi everyone, > > > > Adding to this, I would like to draw your attention to the last dot point of > my email: > > > > “One of the key takeaways from the summit was the session that I joint > moderated with Melvin Hillsman regarding the Operations and Administration > Guides. You can find the etherpad with notes here: > https://etherpad.openstack.org/p/admin-ops-guides The session was really > helpful – we were able to discuss with the operators present the current > situation of the documentation team, and how they could help us maintain the > two guides, aimed at the same audience. The operator’s present at the > session agreed that the Administration Guide was important, and could be > maintained upstream. However, they voted and agreed that the best course of > action for the Operations Guide was for it to be pulled down and put into a > wiki that the operators could manage themselves. We will be looking at > actioning this item as soon as possible.” > > > > I would like to go ahead with this, but I would appreciate feedback from > operators who were not able to attend the summit. In the etherpad you will > see the three options that the operators in the room recommended as being > viable, and the voted option being moving the Operations Guide out of > docs.openstack.org into a wiki. The aim of this was to empower the > operations community to take more control of the updates in an environment > they are more familiar with (and available to others). > > > > What does everyone think of the proposed options? Questions? Other thoughts? > > > > Alex > > > > From: Melvin Hillsman > Date: Friday, May 19, 2017 at 1:30 PM > To: OpenStack Operators > Subject: [Openstack-operators] Fwd: [openstack-dev] [openstack-doc] [dev] > What's up doc? Summit recap edition > > > > > > -- Forwarded message -- > From: Alexandra Settle > Date: Fri, May 19, 2017 at 6:12 AM > Subject: [openstack-dev] [openstack-doc] [dev] What's up doc? Summit recap > edition > To: "openstack-d...@lists.openstack.org" > > Cc: "OpenStack Development Mailing List (not for usage questions)" > > > > Hi everyone, > > > The OpenStack