[Openstack-operators] Ceph recovery going unusually slow

2017-06-02 Thread Grant Morley
Hi All,

I wonder if anyone could help at all.

We were doing some routine maintenance on our ceph cluster and after
running a "service ceph-all restart" on one of our nodes we noticed that
something wasn't quite right. The cluster has gone into an error mode and
we have multiple stuck PGs and the object replacement recovery is taking a
strangely long time. At first there was about 46% objects misplaced and we
now have roughly 16%.

However it has taken about 36 hours to do the recovery so far and with a
possible 16 to go we are looking at a fairly major issue. As a lot of the
system is now blocked for read / writes, customers cannot access their VMs.

I think the main issue at the moment is that we have 210pgs stuck inactive
and nothing we seem to do can get them to peer.

Below is an ouptut of the ceph status. Can anyone help or have any ideas on
how to speed up the recover process? We have tried turning down logging on
the OSD's but some are going so slow they wont allow us to injectargs into
them.

health HEALTH_ERR
210 pgs are stuck inactive for more than 300 seconds
298 pgs backfill_wait
3 pgs backfilling
1 pgs degraded
200 pgs peering
1 pgs recovery_wait
1 pgs stuck degraded
210 pgs stuck inactive
512 pgs stuck unclean
3310 requests are blocked > 32 sec
recovery 2/11094405 objects degraded (0.000%)
recovery 1785063/11094405 objects misplaced (16.090%)
nodown,noout,noscrub,nodeep-scrub flag(s) set

election epoch 16314, quorum 0,1,2,3,4,5,6,7,8
storage-1,storage-2,storage-3,storage-4,storage-5,storage-6,storage-7,storage-8,storage-9
 osdmap e213164: 54 osds: 54 up, 54 in; 329 remapped pgs
flags nodown,noout,noscrub,nodeep-scrub
  pgmap v41030942: 2036 pgs, 14 pools, 14183 GB data, 3309 kobjects
43356 GB used, 47141 GB / 90498 GB avail
2/11094405 objects degraded (0.000%)
1785063/11094405 objects misplaced (16.090%)
1524 active+clean
 298 active+remapped+wait_backfill
 153 peering
  47 remapped+peering
  10 inactive
   3 active+remapped+backfilling
   1 active+recovery_wait+degraded+remapped

Many thanks,

Grant
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [dev] [doc] Operations Guide future

2017-06-02 Thread Alexandra Settle
Blair – correct, it was the majority in the room. I just wanted to reach out 
and ensure that operators had a chance to voice opinions and see where we were 
going (

Sounds like option 3 is still the favorable direction. This is going to be a 
really big exercise, lifting the content out of the repos. Are people able to 
help?

Thanks everyone for getting on board (

On 6/2/17, 2:44 AM, "Blair Bethwaite"  wrote:

Hi Alex,

Likewise for option 3. If I recall correctly from the summit session
that was also the main preference in the room?

On 2 June 2017 at 11:15, George Mihaiescu  wrote:
> +1 for option 3
>
>
>
> On Jun 1, 2017, at 11:06, Alexandra Settle  wrote:
>
> Hi everyone,
>
>
>
> I haven’t had any feedback regarding moving the Operations Guide to the
> OpenStack wiki. I’m not taking silence as compliance. I would really like 
to
> hear people’s opinions on this matter.
>
>
>
> To recap:
>
>
>
> Option one: Kill the Operations Guide completely and move the 
Administration
> Guide to project repos.
> Option two: Combine the Operations and Administration Guides (and then 
this
> will be moved into the project-specific repos)
> Option three: Move Operations Guide to OpenStack wiki (for ease of
> operator-specific maintainability) and move the Administration Guide to
> project repos.
>
>
>
> Personally, I think that option 3 is more realistic. The idea for the last
> option is that operators are maintaining operator-specific documentation 
and
> updating it as they go along and we’re not losing anything by combining or
> deleting. I don’t want to lose what we have by going with option 1, and I
> think option 2 is just a workaround without fixing the problem – we are 
not
> getting contributions to the project.
>
>
>
> Thoughts?
>
>
>
> Alex
>
>
>
> From: Alexandra Settle 
> Date: Friday, May 19, 2017 at 1:38 PM
> To: Melvin Hillsman , OpenStack Operators
> 
> Subject: Re: [Openstack-operators] Fwd: [openstack-dev] [openstack-doc]
> [dev] What's up doc? Summit recap edition
>
>
>
> Hi everyone,
>
>
>
> Adding to this, I would like to draw your attention to the last dot point 
of
> my email:
>
>
>
> “One of the key takeaways from the summit was the session that I joint
> moderated with Melvin Hillsman regarding the Operations and Administration
> Guides. You can find the etherpad with notes here:
> https://etherpad.openstack.org/p/admin-ops-guides  The session was really
> helpful – we were able to discuss with the operators present the current
> situation of the documentation team, and how they could help us maintain 
the
> two guides, aimed at the same audience. The operator’s present at the
> session agreed that the Administration Guide was important, and could be
> maintained upstream. However, they voted and agreed that the best course 
of
> action for the Operations Guide was for it to be pulled down and put into 
a
> wiki that the operators could manage themselves. We will be looking at
> actioning this item as soon as possible.”
>
>
>
> I would like to go ahead with this, but I would appreciate feedback from
> operators who were not able to attend the summit. In the etherpad you will
> see the three options that the operators in the room recommended as being
> viable, and the voted option being moving the Operations Guide out of
> docs.openstack.org into a wiki. The aim of this was to empower the
> operations community to take more control of the updates in an environment
> they are more familiar with (and available to others).
>
>
>
> What does everyone think of the proposed options? Questions? Other 
thoughts?
>
>
>
> Alex
>
>
>
> From: Melvin Hillsman 
> Date: Friday, May 19, 2017 at 1:30 PM
> To: OpenStack Operators 
> Subject: [Openstack-operators] Fwd: [openstack-dev] [openstack-doc] [dev]
> What's up doc? Summit recap edition
>
>
>
>
>
> -- Forwarded message --
> From: Alexandra Settle 
> Date: Fri, May 19, 2017 at 6:12 AM
> Subject: [openstack-dev] [openstack-doc] [dev] What's up doc? Summit recap
> edition
> To: "openstack-d...@lists.openstack.org"
> 
> Cc: "OpenStack Development Mailing List (not for usage questions)"
> 
>
>
> Hi everyone,
>
>
> The OpenStack