In my experience I have seen something like this this happen twice - First
time there were unclean PGs because Ceph was down to one replica of a PG.
When that happens Ceph blocks IO to remaining replicas when the number
falls below the Œmin_size¹ parameter. That will manifest as blocked ops.
Second time the disk was Œsoft-failing¹ - gaining many bad sectors but
SMART still reported the drive as OK.  Maybe check OSD.5 and OSD.7 for low
level media errors with a tool like MegaCli, or whatever controller
management tool comes with your hardware.
At any rate, restarting the problem-child OSD is probably troubleshooting
step #1, which you have done.

On 7/14/15, 6:45 AM, "Deneau, Tom" <tom.den...@amd.com> wrote:

>I don't think there were any stale or unclean PGs,  (when there are,
>I have seen "health detail" list them and it did not in this case).
>I have since restarted the 2 osds and the health went immediately to
>HEALTH_OK.
>
>-- Tom
>
>> -----Original Message-----
>> From: Will.Boege [mailto:will.bo...@target.com]
>> Sent: Monday, July 13, 2015 10:19 PM
>> To: Deneau, Tom; ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] slow requests going up and down
>> 
>> Does the ceph health detail show anything about stale or unclean PGs, or
>> are you just getting the blocked ops messages?
>> 
>> On 7/13/15, 5:38 PM, "Deneau, Tom" <tom.den...@amd.com> wrote:
>> 
>> >I have a cluster where over the weekend something happened and
>>successive
>> >calls to ceph health detail show things like below.
>> >What does it mean when the number of blocked requests goes up and down
>> >like this?
>> >Some clients are still running successfully.
>> >
>> >-- Tom Deneau, AMD
>> >
>> >
>> >
>> >HEALTH_WARN 20 requests are blocked > 32 sec; 2 osds have slow requests
>> >20 ops are blocked > 536871 sec
>> >2 ops are blocked > 536871 sec on osd.5
>> >18 ops are blocked > 536871 sec on osd.7
>> >2 osds have slow requests
>> >
>> >HEALTH_WARN 4 requests are blocked > 32 sec; 2 osds have slow requests
>> >4 ops are blocked > 536871 sec
>> >2 ops are blocked > 536871 sec on osd.5
>> >2 ops are blocked > 536871 sec on osd.7
>> >2 osds have slow requests
>> >
>> >HEALTH_WARN 27 requests are blocked > 32 sec; 2 osds have slow requests
>> >27 ops are blocked > 536871 sec
>> >2 ops are blocked > 536871 sec on osd.5
>> >25 ops are blocked > 536871 sec on osd.7
>> >2 osds have slow requests
>> >
>> >HEALTH_WARN 34 requests are blocked > 32 sec; 2 osds have slow requests
>> >34 ops are blocked > 536871 sec
>> >9 ops are blocked > 536871 sec on osd.5
>> >25 ops are blocked > 536871 sec on osd.7
>> >2 osds have slow requests
>> >_______________________________________________
>> >ceph-users mailing list
>> >ceph-users@lists.ceph.com
>> >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to