There are multiple approaches to give you more information about the Health 
state.  CLI has these 2 options:
ceph health detail
ceph status

I also like using ceph-dash.  ( https://github.com/Crapworks/ceph-dash )  It 
has an associated nagios check to scrape the ceph-dash page.

I personally do `watch ceph status` when I'm monitoring the cluster closely.  
It will show you things like blocked requests, osds flapping, mon clock skew, 
or whatever your problem is causing the health_warn state.  The most likely 
cause for health_warn off and on is blocked requests.  Those are caused by any 
number of things that you would need to diagnose further if that is what is 
causing the health_warn state.

________________________________

[cid:image52c4b1.JPG@3ecb414b.49abf25e]<https://storagecraft.com>       David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943

________________________________

If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

________________________________

________________________________________
From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of John Spray 
[jsp...@redhat.com]
Sent: Thursday, February 23, 2017 3:47 PM
To: Scottix
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Random Health_warn

On Thu, Feb 23, 2017 at 9:49 PM, Scottix <scot...@gmail.com> wrote:
> ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>
> We are seeing a weird behavior or not sure how to diagnose what could be
> going on. We started monitoring the overall_status from the json query and
> every once in a while we would get a HEALTH_WARN for a minute or two.
>
> Monitoring logs.
> 02/23/2017 07:25:54 AM HEALTH_OK
> 02/23/2017 07:24:54 AM HEALTH_WARN
> 02/23/2017 07:23:55 AM HEALTH_OK
> 02/23/2017 07:22:54 AM HEALTH_OK
> ...
> 02/23/2017 05:13:55 AM HEALTH_OK
> 02/23/2017 05:12:54 AM HEALTH_WARN
> 02/23/2017 05:11:54 AM HEALTH_WARN
> 02/23/2017 05:10:54 AM HEALTH_OK
> 02/23/2017 05:09:54 AM HEALTH_OK
>
> When I check the mon leader logs there is no indication of an error or
> issues that could be occuring. Is there a way to find what is causing the
> HEALTH_WARN?

Possibly not without grabbing more than just the overall status at the
same time as you're grabbing the OK/WARN status.

Internally, the OK/WARN/ERROR health state is generated on-demand by
applying a bunch of checks to the state of the system when the user
runs the health command -- the system doesn't know it's in a warning
state until it's asked.  Often you will see a corresponding log
message, but not necessarily.

John

> Best,
> Scott
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to