Load on all nodes is 1.04 to 1.07
I am updating now to Jewel 10.2 (from 9.2)
This is CephFS with SSD journals.

Hopefully the update to jewel fixes lots.


Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238



From: Lincoln Bryant [mailto:linco...@uchicago.edu]
Sent: Thursday, April 28, 2016 12:56 PM
To: Andrus, Brian Contractor
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Troubleshoot blocked OSDs

OK, a few more questions.

What does the load look like on the OSDs with ‘iostat’ during the rsync?

What version of Ceph? Are you using RBD, CephFS, something else?

SSD journals or no?

—Lincoln

On Apr 28, 2016, at 2:53 PM, Andrus, Brian Contractor 
<bdand...@nps.edu<mailto:bdand...@nps.edu>> wrote:

Lincoln,

That was the odd thing to me. Ceph health detail listed all 4 OSDs, so I 
checked all the systems.
I have since let it settle until it is OK again and started. Within a couple 
minutes, it started showing blocked requests and they are indeed on all 4 OSDs.

Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238



From: Lincoln Bryant [mailto:linco...@uchicago.edu]
Sent: Thursday, April 28, 2016 12:31 PM
To: Andrus, Brian Contractor
Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Troubleshoot blocked OSDs

Hi Brian,

The first thing you can do is “ceph health detail”, which should give you some 
more information about which OSD(s) have blocked requests.

If it’s isolated to one OSD in particular, perhaps use iostat to check 
utilization and/or smartctl to check health.

—Lincoln

On Apr 28, 2016, at 2:26 PM, Andrus, Brian Contractor 
<bdand...@nps.edu<mailto:bdand...@nps.edu>> wrote:

All,

I have a small ceph cluster with 4 OSDs and 3 MONs on 4 systems.
I was rsyncing about 50TB of files and things get very slow. To the point I 
stopped the rsync, but even with everything stopped, I see:

health HEALTH_WARN
            80 requests are blocked > 32 sec

The number was as high as 218, but they seem to be draining down.
I see no issues on any of the systems, CPU load is low, memory usage is low.

How do I go about finding why a request is blocked for so long? These have been 
hitting >500 seconds for block time.

Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to