Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN
Hello, On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote: > Hi, > > I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data, > network also fine: > Ceph ceph-0.72.2. > > When I issue "ceph status" command, I get randomly HEALTH_OK, and > imidiately after that when repeating command, I get HEALTH_WARN > > Examle given down - these commands were issues within less than 1 sec > between them > There are NO occuring of word "warn" in the logs (grep -ir "warn" > /var/log/ceph) on any of the servers... > I get false alerts with my status monitoring script, for this reason... > If I recall correctly, the logs will show INF, WRN and ERR, so grep for WRN. Regards, Christian > Any help would be greatly appriciated. > > Thanks, > > [root@cs3 ~]# ceph status > cluster cab20370-bf6a-4589-8010-8d5fc8682eab > health HEALTH_OK > monmap e2: 3 mons at > {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, > election epoch 122, quorum 0,1,2 cs1,cs2,cs3 > osdmap e890: 6 osds: 6 up, 6 in > pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects > 2576 GB used, 19732 GB / 22309 GB avail > 448 active+clean > client io 17331 kB/s rd, 113 kB/s wr, 176 op/s > > [root@cs3 ~]# ceph status > cluster cab20370-bf6a-4589-8010-8d5fc8682eab > health HEALTH_WARN > monmap e2: 3 mons at > {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, > election epoch 122, quorum 0,1,2 cs1,cs2,cs3 > osdmap e890: 6 osds: 6 up, 6 in > pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects > 2576 GB used, 19732 GB / 22309 GB avail > 448 active+clean > client io 28383 kB/s rd, 566 kB/s wr, 321 op/s > > [root@cs3 ~]# ceph status > cluster cab20370-bf6a-4589-8010-8d5fc8682eab > health HEALTH_OK > monmap e2: 3 mons at > {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, > election epoch 122, quorum 0,1,2 cs1,cs2,cs3 > osdmap e890: 6 osds: 6 up, 6 in > pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects > 2576 GB used, 19732 GB / 22309 GB avail > 448 active+clean > client io 21632 kB/s rd, 49354 B/s wr, 283 op/s > -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN
Hi Christian, that seems true, thanks. But again, there are only occurence in GZ logs files (that were logrotated, not in current log files): Example: [root@cs2 ~]# grep -ir "WRN" /var/log/ceph/ Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches Binary file /var/log/ceph/ceph.log-20140614.gz matches Binary file /var/log/ceph/ceph.log-20140611.gz matches Binary file /var/log/ceph/ceph.log-20140612.gz matches Binary file /var/log/ceph/ceph.log-20140613.gz matches Thanks, Andrija On 17 June 2014 10:48, Christian Balzer wrote: > > Hello, > > On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote: > > > Hi, > > > > I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data, > > network also fine: > > Ceph ceph-0.72.2. > > > > When I issue "ceph status" command, I get randomly HEALTH_OK, and > > imidiately after that when repeating command, I get HEALTH_WARN > > > > Examle given down - these commands were issues within less than 1 sec > > between them > > There are NO occuring of word "warn" in the logs (grep -ir "warn" > > /var/log/ceph) on any of the servers... > > I get false alerts with my status monitoring script, for this reason... > > > If I recall correctly, the logs will show INF, WRN and ERR, so grep for > WRN. > > Regards, > > Christian > > > Any help would be greatly appriciated. > > > > Thanks, > > > > [root@cs3 ~]# ceph status > > cluster cab20370-bf6a-4589-8010-8d5fc8682eab > > health HEALTH_OK > > monmap e2: 3 mons at > > > {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, > > election epoch 122, quorum 0,1,2 cs1,cs2,cs3 > > osdmap e890: 6 osds: 6 up, 6 in > > pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects > > 2576 GB used, 19732 GB / 22309 GB avail > > 448 active+clean > > client io 17331 kB/s rd, 113 kB/s wr, 176 op/s > > > > [root@cs3 ~]# ceph status > > cluster cab20370-bf6a-4589-8010-8d5fc8682eab > > health HEALTH_WARN > > monmap e2: 3 mons at > > > {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, > > election epoch 122, quorum 0,1,2 cs1,cs2,cs3 > > osdmap e890: 6 osds: 6 up, 6 in > > pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects > > 2576 GB used, 19732 GB / 22309 GB avail > > 448 active+clean > > client io 28383 kB/s rd, 566 kB/s wr, 321 op/s > > > > [root@cs3 ~]# ceph status > > cluster cab20370-bf6a-4589-8010-8d5fc8682eab > > health HEALTH_OK > > monmap e2: 3 mons at > > > {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, > > election epoch 122, quorum 0,1,2 cs1,cs2,cs3 > > osdmap e890: 6 osds: 6 up, 6 in > > pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects > > 2576 GB used, 19732 GB / 22309 GB avail > > 448 active+clean > > client io 21632 kB/s rd, 49354 B/s wr, 283 op/s > > > > > -- > Christian BalzerNetwork/Systems Engineer > ch...@gol.com Global OnLine Japan/Fusion Communications > http://www.gol.com/ > -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN
Try grep in cs1 and cs3 could be a disk space issue. Regards, Stanislav Yanchev Core System Administrator [cid:image001.png@01CF8A23.BF80A000] Mobile: +359 882 549 441 s.yanc...@maxtelecom.bg<mailto:s.yanc...@maxtelecom.bg> www.maxtelecom.bg<http://www.maxtelecom.bg> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Andrija Panic Sent: Tuesday, June 17, 2014 11:57 AM To: Christian Balzer Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN Hi Christian, that seems true, thanks. But again, there are only occurence in GZ logs files (that were logrotated, not in current log files): Example: [root@cs2 ~]# grep -ir "WRN" /var/log/ceph/ Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches Binary file /var/log/ceph/ceph.log-20140614.gz matches Binary file /var/log/ceph/ceph.log-20140611.gz matches Binary file /var/log/ceph/ceph.log-20140612.gz matches Binary file /var/log/ceph/ceph.log-20140613.gz matches Thanks, Andrija On 17 June 2014 10:48, Christian Balzer mailto:ch...@gol.com>> wrote: Hello, On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote: > Hi, > > I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data, > network also fine: > Ceph ceph-0.72.2. > > When I issue "ceph status" command, I get randomly HEALTH_OK, and > imidiately after that when repeating command, I get HEALTH_WARN > > Examle given down - these commands were issues within less than 1 sec > between them > There are NO occuring of word "warn" in the logs (grep -ir "warn" > /var/log/ceph) on any of the servers... > I get false alerts with my status monitoring script, for this reason... > If I recall correctly, the logs will show INF, WRN and ERR, so grep for WRN. Regards, Christian > Any help would be greatly appriciated. > > Thanks, > > [root@cs3 ~]# ceph status > cluster cab20370-bf6a-4589-8010-8d5fc8682eab > health HEALTH_OK > monmap e2: 3 mons at > {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, > election epoch 122, quorum 0,1,2 cs1,cs2,cs3 > osdmap e890: 6 osds: 6 up, 6 in > pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects > 2576 GB used, 19732 GB / 22309 GB avail > 448 active+clean > client io 17331 kB/s rd, 113 kB/s wr, 176 op/s > > [root@cs3 ~]# ceph status > cluster cab20370-bf6a-4589-8010-8d5fc8682eab > health HEALTH_WARN > monmap e2: 3 mons at > {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, > election epoch 122, quorum 0,1,2 cs1,cs2,cs3 > osdmap e890: 6 osds: 6 up, 6 in > pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects > 2576 GB used, 19732 GB / 22309 GB avail > 448 active+clean > client io 28383 kB/s rd, 566 kB/s wr, 321 op/s > > [root@cs3 ~]# ceph status > cluster cab20370-bf6a-4589-8010-8d5fc8682eab > health HEALTH_OK > monmap e2: 3 mons at > {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, > election epoch 122, quorum 0,1,2 cs1,cs2,cs3 > osdmap e890: 6 osds: 6 up, 6 in > pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects > 2576 GB used, 19732 GB / 22309 GB avail > 448 active+clean > client io 21632 kB/s rd, 49354 B/s wr, 283 op/s > -- Christian BalzerNetwork/Systems Engineer ch...@gol.com<mailto:ch...@gol.com> Global OnLine Japan/Fusion Communications http://www.gol.com/ -- Andrija Panić -- http://admintweets.com -- CONFIDENTIALITY NOTICE The information contained in this message (including any attachments) is confidential and may be legally privileged or otherwise protected from disclosure. This message is intended solely for the addressee(s). If you are not the intended recipient, please notify the sender by return e-mail and delete this message from your system. Any unauthorised use, reproduction, or dissemination of this message is strictly prohibited. Any liability arising from any third party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. Please note that e-mails are susceptible to change. Max Telecom shall not be liable for the improper or incomplete transmission of the information contained in this communication, nor shall it be liable for any delay in its receipt. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN
Hi, thanks for that, but is not space issue: OSD drives are only 12% full. and /var drive on which MON lives is over 70% only on CS3 server, but I have increased alert treshold in ceph.conf (mon data avail warn = 15, mon data avail crit = 5), and since I increased them those alerts are gone (anyway, these alerts for /var full over 70% can be normally seen in logs and in ceph -w output). Here I get no normal/visible warning in eather logs or ceph -w output... Thanks, Andrija On 17 June 2014 11:00, Stanislav Yanchev wrote: > Try grep in cs1 and cs3 could be a disk space issue. > > > > > > Regards, > > *Stanislav Yanchev* > Core System Administrator > > [image: MAX TELECOM] > > Mobile: +359 882 549 441 > s.yanc...@maxtelecom.bg > www.maxtelecom.bg > > > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf > Of *Andrija Panic > *Sent:* Tuesday, June 17, 2014 11:57 AM > *To:* Christian Balzer > *Cc:* ceph-users@lists.ceph.com > *Subject:* Re: [ceph-users] Cluster status reported wrongly as > HEALTH_WARN > > > > Hi Christian, > > > > that seems true, thanks. > > > > But again, there are only occurence in GZ logs files (that were > logrotated, not in current log files): > > Example: > > > > [root@cs2 ~]# grep -ir "WRN" /var/log/ceph/ > > Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches > > Binary file /var/log/ceph/ceph.log-20140614.gz matches > > Binary file /var/log/ceph/ceph.log-20140611.gz matches > > Binary file /var/log/ceph/ceph.log-20140612.gz matches > > Binary file /var/log/ceph/ceph.log-20140613.gz matches > > > > Thanks, > > Andrija > > > > On 17 June 2014 10:48, Christian Balzer wrote: > > > Hello, > > > On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote: > > > Hi, > > > > I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data, > > network also fine: > > Ceph ceph-0.72.2. > > > > When I issue "ceph status" command, I get randomly HEALTH_OK, and > > imidiately after that when repeating command, I get HEALTH_WARN > > > > Examle given down - these commands were issues within less than 1 sec > > between them > > There are NO occuring of word "warn" in the logs (grep -ir "warn" > > /var/log/ceph) on any of the servers... > > I get false alerts with my status monitoring script, for this reason... > > > > If I recall correctly, the logs will show INF, WRN and ERR, so grep for > WRN. > > Regards, > > Christian > > > > Any help would be greatly appriciated. > > > > Thanks, > > > > [root@cs3 ~]# ceph status > > cluster cab20370-bf6a-4589-8010-8d5fc8682eab > > health HEALTH_OK > > monmap e2: 3 mons at > > > {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, > > election epoch 122, quorum 0,1,2 cs1,cs2,cs3 > > osdmap e890: 6 osds: 6 up, 6 in > > pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects > > 2576 GB used, 19732 GB / 22309 GB avail > > 448 active+clean > > client io 17331 kB/s rd, 113 kB/s wr, 176 op/s > > > > [root@cs3 ~]# ceph status > > cluster cab20370-bf6a-4589-8010-8d5fc8682eab > > health HEALTH_WARN > > monmap e2: 3 mons at > > > {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, > > election epoch 122, quorum 0,1,2 cs1,cs2,cs3 > > osdmap e890: 6 osds: 6 up, 6 in > > pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects > > 2576 GB used, 19732 GB / 22309 GB avail > > 448 active+clean > > client io 28383 kB/s rd, 566 kB/s wr, 321 op/s > > > > [root@cs3 ~]# ceph status > > cluster cab20370-bf6a-4589-8010-8d5fc8682eab > > health HEALTH_OK > > monmap e2: 3 mons at > > > {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, > > election epoch 122, quorum 0,1,2 cs1,cs2,cs3 > > osdmap e890: 6 osds: 6 up, 6 in > > pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects > > 2576 GB used, 19732 GB / 22309 GB avail > > 448 active+clean > > client io 21632 kB/s rd, 49354 B/s wr, 283 op/s > > > > > -- > > Christian BalzerNetwork/Systems Engineer > ch...@gol.com Global OnLine Japan/Fusion Communications > http://www.gol.com/ > > > > > > -- > > > > Andrija Panić > > -- > >
Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN
Try running "ceph health detail" on each of the monitors. Your disk space thresholds probably aren't configured correctly or something. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic wrote: > Hi, > > thanks for that, but is not space issue: > > OSD drives are only 12% full. > and /var drive on which MON lives is over 70% only on CS3 server, but I > have increased alert treshold in ceph.conf (mon data avail warn = 15, mon > data avail crit = 5), and since I increased them those alerts are gone > (anyway, these alerts for /var full over 70% can be normally seen in logs > and in ceph -w output). > > Here I get no normal/visible warning in eather logs or ceph -w output... > > Thanks, > Andrija > > > > > On 17 June 2014 11:00, Stanislav Yanchev wrote: > >> Try grep in cs1 and cs3 could be a disk space issue. >> >> >> >> >> >> Regards, >> >> *Stanislav Yanchev* >> Core System Administrator >> >> [image: MAX TELECOM] >> >> Mobile: +359 882 549 441 >> s.yanc...@maxtelecom.bg >> www.maxtelecom.bg >> >> >> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf >> Of *Andrija Panic >> *Sent:* Tuesday, June 17, 2014 11:57 AM >> *To:* Christian Balzer >> *Cc:* ceph-users@lists.ceph.com >> *Subject:* Re: [ceph-users] Cluster status reported wrongly as >> HEALTH_WARN >> >> >> >> Hi Christian, >> >> >> >> that seems true, thanks. >> >> >> >> But again, there are only occurence in GZ logs files (that were >> logrotated, not in current log files): >> >> Example: >> >> >> >> [root@cs2 ~]# grep -ir "WRN" /var/log/ceph/ >> >> Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches >> >> Binary file /var/log/ceph/ceph.log-20140614.gz matches >> >> Binary file /var/log/ceph/ceph.log-20140611.gz matches >> >> Binary file /var/log/ceph/ceph.log-20140612.gz matches >> >> Binary file /var/log/ceph/ceph.log-20140613.gz matches >> >> >> >> Thanks, >> >> Andrija >> >> >> >> On 17 June 2014 10:48, Christian Balzer wrote: >> >> >> Hello, >> >> >> On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote: >> >> > Hi, >> > >> > I have 3 node (2 OSD per node) CEPH cluster, running fine, not much >> data, >> > network also fine: >> > Ceph ceph-0.72.2. >> > >> > When I issue "ceph status" command, I get randomly HEALTH_OK, and >> > imidiately after that when repeating command, I get HEALTH_WARN >> > >> > Examle given down - these commands were issues within less than 1 sec >> > between them >> > There are NO occuring of word "warn" in the logs (grep -ir "warn" >> > /var/log/ceph) on any of the servers... >> > I get false alerts with my status monitoring script, for this reason... >> > >> >> If I recall correctly, the logs will show INF, WRN and ERR, so grep for >> WRN. >> >> Regards, >> >> Christian >> >> >> > Any help would be greatly appriciated. >> > >> > Thanks, >> > >> > [root@cs3 ~]# ceph status >> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab >> > health HEALTH_OK >> > monmap e2: 3 mons at >> > >> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, >> > election epoch 122, quorum 0,1,2 cs1,cs2,cs3 >> > osdmap e890: 6 osds: 6 up, 6 in >> > pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects >> > 2576 GB used, 19732 GB / 22309 GB avail >> > 448 active+clean >> > client io 17331 kB/s rd, 113 kB/s wr, 176 op/s >> > >> > [root@cs3 ~]# ceph status >> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab >> > health HEALTH_WARN >> > monmap e2: 3 mons at >> > >> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, >> > election epoch 122, quorum 0,1,2 cs1,cs2,cs3 >> > osdmap e890: 6 osds: 6 up, 6 in >> > pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects >> > 2576 GB used, 19732 GB / 22309 GB avail >> > 448 active+clean >> > client io 28383 kB/s rd, 566 kB/s wr, 321 op/s >> > >> > [root@cs3
Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN
Hi Gregory, indeed - I still have warnings about 20% free space on CS3 server, where MON lives...strange is that I don't get these warnings with prolonged "ceph -w" output... [root@cs2 ~]# ceph health detail HEALTH_WARN mon.cs3 addr 10.44.xxx.12:6789/0 has 20% avail disk space -- low disk space! I don't understand, how is this possible to get warnings - I have folowing in each ceph.conf file, under the general section: mon data avail warn = 15 mon data avail crit = 5 I found this settings on ceph mailing list... Thanks a lot, Andrija On 17 June 2014 19:22, Gregory Farnum wrote: > Try running "ceph health detail" on each of the monitors. Your disk space > thresholds probably aren't configured correctly or something. > -Greg > > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic > wrote: > >> Hi, >> >> thanks for that, but is not space issue: >> >> OSD drives are only 12% full. >> and /var drive on which MON lives is over 70% only on CS3 server, but I >> have increased alert treshold in ceph.conf (mon data avail warn = 15, mon >> data avail crit = 5), and since I increased them those alerts are gone >> (anyway, these alerts for /var full over 70% can be normally seen in logs >> and in ceph -w output). >> >> Here I get no normal/visible warning in eather logs or ceph -w output... >> >> Thanks, >> Andrija >> >> >> >> >> On 17 June 2014 11:00, Stanislav Yanchev wrote: >> >>> Try grep in cs1 and cs3 could be a disk space issue. >>> >>> >>> >>> >>> >>> Regards, >>> >>> *Stanislav Yanchev* >>> Core System Administrator >>> >>> [image: MAX TELECOM] >>> >>> Mobile: +359 882 549 441 >>> s.yanc...@maxtelecom.bg >>> www.maxtelecom.bg >>> >>> >>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On >>> Behalf Of *Andrija Panic >>> *Sent:* Tuesday, June 17, 2014 11:57 AM >>> *To:* Christian Balzer >>> *Cc:* ceph-users@lists.ceph.com >>> *Subject:* Re: [ceph-users] Cluster status reported wrongly as >>> HEALTH_WARN >>> >>> >>> >>> Hi Christian, >>> >>> >>> >>> that seems true, thanks. >>> >>> >>> >>> But again, there are only occurence in GZ logs files (that were >>> logrotated, not in current log files): >>> >>> Example: >>> >>> >>> >>> [root@cs2 ~]# grep -ir "WRN" /var/log/ceph/ >>> >>> Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches >>> >>> Binary file /var/log/ceph/ceph.log-20140614.gz matches >>> >>> Binary file /var/log/ceph/ceph.log-20140611.gz matches >>> >>> Binary file /var/log/ceph/ceph.log-20140612.gz matches >>> >>> Binary file /var/log/ceph/ceph.log-20140613.gz matches >>> >>> >>> >>> Thanks, >>> >>> Andrija >>> >>> >>> >>> On 17 June 2014 10:48, Christian Balzer wrote: >>> >>> >>> Hello, >>> >>> >>> On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote: >>> >>> > Hi, >>> > >>> > I have 3 node (2 OSD per node) CEPH cluster, running fine, not much >>> data, >>> > network also fine: >>> > Ceph ceph-0.72.2. >>> > >>> > When I issue "ceph status" command, I get randomly HEALTH_OK, and >>> > imidiately after that when repeating command, I get HEALTH_WARN >>> > >>> > Examle given down - these commands were issues within less than 1 sec >>> > between them >>> > There are NO occuring of word "warn" in the logs (grep -ir "warn" >>> > /var/log/ceph) on any of the servers... >>> > I get false alerts with my status monitoring script, for this reason... >>> > >>> >>> If I recall correctly, the logs will show INF, WRN and ERR, so grep for >>> WRN. >>> >>> Regards, >>> >>> Christian >>> >>> >>> > Any help would be greatly appriciated. >>> > >>> > Thanks, >>> > >>> > [root@cs3 ~]# ceph status >>> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab >>> > health HEALTH_OK >>> > monmap e2: 3 mons at >
Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN
As stupid as I could do it... After lowering mon data . from 20% to 15% treshold, it seems I forgot to restart MON service on this one node... I appologies for bugging and thanks again everybody. Andrija On 18 June 2014 09:49, Andrija Panic wrote: > Hi Gregory, > > indeed - I still have warnings about 20% free space on CS3 server, where > MON lives...strange is that I don't get these warnings with prolonged "ceph > -w" output... > [root@cs2 ~]# ceph health detail > HEALTH_WARN > mon.cs3 addr 10.44.xxx.12:6789/0 has 20% avail disk space -- low disk > space! > > I don't understand, how is this possible to get warnings - I have folowing > in each ceph.conf file, under the general section: > > mon data avail warn = 15 > mon data avail crit = 5 > > I found this settings on ceph mailing list... > > Thanks a lot, > Andrija > > > On 17 June 2014 19:22, Gregory Farnum wrote: > >> Try running "ceph health detail" on each of the monitors. Your disk space >> thresholds probably aren't configured correctly or something. >> -Greg >> >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic >> wrote: >> >>> Hi, >>> >>> thanks for that, but is not space issue: >>> >>> OSD drives are only 12% full. >>> and /var drive on which MON lives is over 70% only on CS3 server, but I >>> have increased alert treshold in ceph.conf (mon data avail warn = 15, mon >>> data avail crit = 5), and since I increased them those alerts are gone >>> (anyway, these alerts for /var full over 70% can be normally seen in logs >>> and in ceph -w output). >>> >>> Here I get no normal/visible warning in eather logs or ceph -w output... >>> >>> Thanks, >>> Andrija >>> >>> >>> >>> >>> On 17 June 2014 11:00, Stanislav Yanchev >>> wrote: >>> >>>> Try grep in cs1 and cs3 could be a disk space issue. >>>> >>>> >>>> >>>> >>>> >>>> Regards, >>>> >>>> *Stanislav Yanchev* >>>> Core System Administrator >>>> >>>> [image: MAX TELECOM] >>>> >>>> Mobile: +359 882 549 441 >>>> s.yanc...@maxtelecom.bg >>>> www.maxtelecom.bg >>>> >>>> >>>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On >>>> Behalf Of *Andrija Panic >>>> *Sent:* Tuesday, June 17, 2014 11:57 AM >>>> *To:* Christian Balzer >>>> *Cc:* ceph-users@lists.ceph.com >>>> *Subject:* Re: [ceph-users] Cluster status reported wrongly as >>>> HEALTH_WARN >>>> >>>> >>>> >>>> Hi Christian, >>>> >>>> >>>> >>>> that seems true, thanks. >>>> >>>> >>>> >>>> But again, there are only occurence in GZ logs files (that were >>>> logrotated, not in current log files): >>>> >>>> Example: >>>> >>>> >>>> >>>> [root@cs2 ~]# grep -ir "WRN" /var/log/ceph/ >>>> >>>> Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches >>>> >>>> Binary file /var/log/ceph/ceph.log-20140614.gz matches >>>> >>>> Binary file /var/log/ceph/ceph.log-20140611.gz matches >>>> >>>> Binary file /var/log/ceph/ceph.log-20140612.gz matches >>>> >>>> Binary file /var/log/ceph/ceph.log-20140613.gz matches >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Andrija >>>> >>>> >>>> >>>> On 17 June 2014 10:48, Christian Balzer wrote: >>>> >>>> >>>> Hello, >>>> >>>> >>>> On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote: >>>> >>>> > Hi, >>>> > >>>> > I have 3 node (2 OSD per node) CEPH cluster, running fine, not much >>>> data, >>>> > network also fine: >>>> > Ceph ceph-0.72.2. >>>> > >>>> > When I issue "ceph status" command, I get randomly HEALTH_OK, and >>>> > imidiately after that when repeating command, I get HEALTH_WARN >>>> > >>>> > Examle given down - these commands were issues within less than
Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN
The lack of warnings in ceph -w for this issue is a bug in Emperor. It's resolved in Firefly. -Greg On Wed, Jun 18, 2014 at 3:49 AM, Andrija Panic wrote: > > Hi Gregory, > > indeed - I still have warnings about 20% free space on CS3 server, where MON > lives...strange is that I don't get these warnings with prolonged "ceph -w" > output... > [root@cs2 ~]# ceph health detail > HEALTH_WARN > mon.cs3 addr 10.44.xxx.12:6789/0 has 20% avail disk space -- low disk space! > > I don't understand, how is this possible to get warnings - I have folowing in > each ceph.conf file, under the general section: > > mon data avail warn = 15 > mon data avail crit = 5 > > I found this settings on ceph mailing list... > > Thanks a lot, > Andrija > > > On 17 June 2014 19:22, Gregory Farnum wrote: >> >> Try running "ceph health detail" on each of the monitors. Your disk space >> thresholds probably aren't configured correctly or something. >> -Greg >> >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic >> wrote: >>> >>> Hi, >>> >>> thanks for that, but is not space issue: >>> >>> OSD drives are only 12% full. >>> and /var drive on which MON lives is over 70% only on CS3 server, but I >>> have increased alert treshold in ceph.conf (mon data avail warn = 15, mon >>> data avail crit = 5), and since I increased them those alerts are gone >>> (anyway, these alerts for /var full over 70% can be normally seen in logs >>> and in ceph -w output). >>> >>> Here I get no normal/visible warning in eather logs or ceph -w output... >>> >>> Thanks, >>> Andrija >>> >>> >>> >>> >>> On 17 June 2014 11:00, Stanislav Yanchev wrote: >>>> >>>> Try grep in cs1 and cs3 could be a disk space issue. >>>> >>>> >>>> >>>> >>>> >>>> Regards, >>>> >>>> Stanislav Yanchev >>>> Core System Administrator >>>> >>>> >>>> >>>> Mobile: +359 882 549 441 >>>> s.yanc...@maxtelecom.bg >>>> www.maxtelecom.bg >>>> >>>> >>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >>>> Andrija Panic >>>> Sent: Tuesday, June 17, 2014 11:57 AM >>>> To: Christian Balzer >>>> Cc: ceph-users@lists.ceph.com >>>> Subject: Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN >>>> >>>> >>>> >>>> Hi Christian, >>>> >>>> >>>> >>>> that seems true, thanks. >>>> >>>> >>>> >>>> But again, there are only occurence in GZ logs files (that were >>>> logrotated, not in current log files): >>>> >>>> Example: >>>> >>>> >>>> >>>> [root@cs2 ~]# grep -ir "WRN" /var/log/ceph/ >>>> >>>> Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches >>>> >>>> Binary file /var/log/ceph/ceph.log-20140614.gz matches >>>> >>>> Binary file /var/log/ceph/ceph.log-20140611.gz matches >>>> >>>> Binary file /var/log/ceph/ceph.log-20140612.gz matches >>>> >>>> Binary file /var/log/ceph/ceph.log-20140613.gz matches >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Andrija >>>> >>>> >>>> >>>> On 17 June 2014 10:48, Christian Balzer wrote: >>>> >>>> >>>> Hello, >>>> >>>> >>>> On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote: >>>> >>>> > Hi, >>>> > >>>> > I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data, >>>> > network also fine: >>>> > Ceph ceph-0.72.2. >>>> > >>>> > When I issue "ceph status" command, I get randomly HEALTH_OK, and >>>> > imidiately after that when repeating command, I get HEALTH_WARN >>>> > >>>> > Examle given down - these commands were issues within less than 1 sec >>>> > between them >>>> > There are NO occuring of word "warn" in the logs (grep -ir "warn" >&g
Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN
Thanks Greg, seems like I'm going to update soon... Thanks again, Andrija On 18 June 2014 14:06, Gregory Farnum wrote: > The lack of warnings in ceph -w for this issue is a bug in Emperor. > It's resolved in Firefly. > -Greg > > On Wed, Jun 18, 2014 at 3:49 AM, Andrija Panic > wrote: > > > > Hi Gregory, > > > > indeed - I still have warnings about 20% free space on CS3 server, where > MON lives...strange is that I don't get these warnings with prolonged "ceph > -w" output... > > [root@cs2 ~]# ceph health detail > > HEALTH_WARN > > mon.cs3 addr 10.44.xxx.12:6789/0 has 20% avail disk space -- low disk > space! > > > > I don't understand, how is this possible to get warnings - I have > folowing in each ceph.conf file, under the general section: > > > > mon data avail warn = 15 > > mon data avail crit = 5 > > > > I found this settings on ceph mailing list... > > > > Thanks a lot, > > Andrija > > > > > > On 17 June 2014 19:22, Gregory Farnum wrote: > >> > >> Try running "ceph health detail" on each of the monitors. Your disk > space thresholds probably aren't configured correctly or something. > >> -Greg > >> > >> Software Engineer #42 @ http://inktank.com | http://ceph.com > >> > >> > >> On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic > wrote: > >>> > >>> Hi, > >>> > >>> thanks for that, but is not space issue: > >>> > >>> OSD drives are only 12% full. > >>> and /var drive on which MON lives is over 70% only on CS3 server, but > I have increased alert treshold in ceph.conf (mon data avail warn = 15, mon > data avail crit = 5), and since I increased them those alerts are gone > (anyway, these alerts for /var full over 70% can be normally seen in logs > and in ceph -w output). > >>> > >>> Here I get no normal/visible warning in eather logs or ceph -w > output... > >>> > >>> Thanks, > >>> Andrija > >>> > >>> > >>> > >>> > >>> On 17 June 2014 11:00, Stanislav Yanchev > wrote: > >>>> > >>>> Try grep in cs1 and cs3 could be a disk space issue. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> Regards, > >>>> > >>>> Stanislav Yanchev > >>>> Core System Administrator > >>>> > >>>> > >>>> > >>>> Mobile: +359 882 549 441 > >>>> s.yanc...@maxtelecom.bg > >>>> www.maxtelecom.bg > >>>> > >>>> > >>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On > Behalf Of Andrija Panic > >>>> Sent: Tuesday, June 17, 2014 11:57 AM > >>>> To: Christian Balzer > >>>> Cc: ceph-users@lists.ceph.com > >>>> Subject: Re: [ceph-users] Cluster status reported wrongly as > HEALTH_WARN > >>>> > >>>> > >>>> > >>>> Hi Christian, > >>>> > >>>> > >>>> > >>>> that seems true, thanks. > >>>> > >>>> > >>>> > >>>> But again, there are only occurence in GZ logs files (that were > logrotated, not in current log files): > >>>> > >>>> Example: > >>>> > >>>> > >>>> > >>>> [root@cs2 ~]# grep -ir "WRN" /var/log/ceph/ > >>>> > >>>> Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches > >>>> > >>>> Binary file /var/log/ceph/ceph.log-20140614.gz matches > >>>> > >>>> Binary file /var/log/ceph/ceph.log-20140611.gz matches > >>>> > >>>> Binary file /var/log/ceph/ceph.log-20140612.gz matches > >>>> > >>>> Binary file /var/log/ceph/ceph.log-20140613.gz matches > >>>> > >>>> > >>>> > >>>> Thanks, > >>>> > >>>> Andrija > >>>> > >>>> > >>>> > >>>> On 17 June 2014 10:48, Christian Balzer wrote: > >>>> > >>>> > >>>> Hello, > >>>> > >>>> > >>>> On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote: > >>>> > >>