Re: [ceph-users] Troubleshoot blocked OSDs

Tu Holmes Thu, 28 Apr 2016 15:11:30 -0700

I had this same sort of thing with Hammer. 
Looking forward to your results. 
Please post your configuration when done. 
I am contemplating doing a similar action to resolve my issues and it would be 
interesting in knowing your outcome first.


//Tu




On Thu, Apr 28, 2016 at 1:18 PM -0700, "Andrus, Brian Contractor" 
<bdand...@nps.edu> wrote:




















Load on all nodes is 1.04 to 1.07


I am updating now to Jewel 10.2 (from 9.2)


This is CephFS with SSD journals.


 


Hopefully the update to jewel fixes lots.


 


 


Brian Andrus


ITACS/Research Computing


Naval Postgraduate School


Monterey, California


voice: 831-656-6238


 


 


 




From: Lincoln Bryant [mailto:linco...@uchicago.edu]


Sent: Thursday, April 28, 2016 12:56 PM

To: Andrus, Brian Contractor

Cc: ceph-users@lists.ceph.com

Subject: Re: [ceph-users] Troubleshoot blocked OSDs




 


OK, a few more questions.



 




What does the load look like on the OSDs with ‘iostat’ during the rsync?



 




What version of Ceph? Are you using RBD, CephFS, something else? 




 




SSD journals or no?




 




—Lincoln




 





On Apr 28, 2016, at 2:53 PM, Andrus, Brian Contractor <bdand...@nps.edu> wrote:



 




Lincoln,




 




That was the odd thing to me. Ceph health detail listed all 4 OSDs, so I 
checked all the systems.




I have since let it settle until it is OK again and started. Within a couple 
minutes, it started showing blocked requests and they are indeed on all 4 OSDs.




 




Brian Andrus




ITACS/Research Computing




Naval Postgraduate School




Monterey, California




voice: 831-656-6238




 




 




 






From: Lincoln
 Bryant [mailto:linco...@uchicago.edu] 

Sent: Thursday, April 28, 2016 12:31 PM

To: Andrus, Brian Contractor

Cc: ceph-users@lists.ceph.com

Subject: Re: [ceph-users] Troubleshoot blocked OSDs






 




Hi Brian,





 






The first thing you can do is “ceph health detail”, which should give you some 
more information about which OSD(s) have blocked requests.






 






If it’s isolated to one OSD in particular, perhaps use iostat to check 
utilization and/or smartctl to check health. 






 






—Lincoln






 









On Apr 28, 2016, at 2:26 PM, Andrus, Brian Contractor <bdand...@nps.edu> wrote:





 






All,






 






I have a small ceph cluster with 4 OSDs and 3 MONs on 4 systems.






I was rsyncing about 50TB of files and things get very slow. To the point I 
stopped the rsync, but even with everything stopped, I see:






 






health HEALTH_WARN






            80 requests are blocked > 32 sec






 






The number was as high as 218, but they seem to be draining down.






I see no issues on any of the systems, CPU load is low, memory usage is low.






 






How do I go about finding why a request is blocked for so long? These have been 
hitting >500 seconds for block time.






 






Brian Andrus






ITACS/Research Computing






Naval Postgraduate School






Monterey, California






voice: 831-656-6238






 





_______________________________________________

ceph-users mailing list

ceph-users@lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com







 




_______________________________________________

ceph-users mailing list

ceph-users@lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Troubleshoot blocked OSDs

Reply via email to