Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-17 Thread Christian Balzer

Hello,

On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:

> Hi,
> 
> I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data,
> network also fine:
> Ceph ceph-0.72.2.
> 
> When I issue "ceph status" command, I get randomly HEALTH_OK, and
> imidiately after that when repeating command, I get HEALTH_WARN
> 
> Examle given down - these commands were issues within less than 1 sec
> between them
> There are NO occuring of word "warn" in the logs (grep -ir "warn"
> /var/log/ceph) on any of the servers...
> I get false alerts with my status monitoring script, for this reason...
> 
If I recall correctly, the logs will show INF, WRN and ERR, so grep for
WRN. 

Regards,

Christian

> Any help would be greatly appriciated.
> 
> Thanks,
> 
> [root@cs3 ~]# ceph status
> cluster cab20370-bf6a-4589-8010-8d5fc8682eab
>  health HEALTH_OK
>  monmap e2: 3 mons at
> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
> election epoch 122, quorum 0,1,2 cs1,cs2,cs3
>  osdmap e890: 6 osds: 6 up, 6 in
>   pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects
> 2576 GB used, 19732 GB / 22309 GB avail
>  448 active+clean
>   client io 17331 kB/s rd, 113 kB/s wr, 176 op/s
> 
> [root@cs3 ~]# ceph status
> cluster cab20370-bf6a-4589-8010-8d5fc8682eab
>  health HEALTH_WARN
>  monmap e2: 3 mons at
> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
> election epoch 122, quorum 0,1,2 cs1,cs2,cs3
>  osdmap e890: 6 osds: 6 up, 6 in
>   pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects
> 2576 GB used, 19732 GB / 22309 GB avail
>  448 active+clean
>   client io 28383 kB/s rd, 566 kB/s wr, 321 op/s
> 
> [root@cs3 ~]# ceph status
> cluster cab20370-bf6a-4589-8010-8d5fc8682eab
>  health HEALTH_OK
>  monmap e2: 3 mons at
> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
> election epoch 122, quorum 0,1,2 cs1,cs2,cs3
>  osdmap e890: 6 osds: 6 up, 6 in
>   pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects
> 2576 GB used, 19732 GB / 22309 GB avail
>  448 active+clean
>   client io 21632 kB/s rd, 49354 B/s wr, 283 op/s
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-17 Thread Andrija Panic
Hi Christian,

that seems true, thanks.

But again, there are only occurence in GZ logs files (that were logrotated,
not in current log files):
Example:

[root@cs2 ~]# grep -ir "WRN" /var/log/ceph/
Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches
Binary file /var/log/ceph/ceph.log-20140614.gz matches
Binary file /var/log/ceph/ceph.log-20140611.gz matches
Binary file /var/log/ceph/ceph.log-20140612.gz matches
Binary file /var/log/ceph/ceph.log-20140613.gz matches

Thanks,
Andrija


On 17 June 2014 10:48, Christian Balzer  wrote:

>
> Hello,
>
> On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:
>
> > Hi,
> >
> > I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data,
> > network also fine:
> > Ceph ceph-0.72.2.
> >
> > When I issue "ceph status" command, I get randomly HEALTH_OK, and
> > imidiately after that when repeating command, I get HEALTH_WARN
> >
> > Examle given down - these commands were issues within less than 1 sec
> > between them
> > There are NO occuring of word "warn" in the logs (grep -ir "warn"
> > /var/log/ceph) on any of the servers...
> > I get false alerts with my status monitoring script, for this reason...
> >
> If I recall correctly, the logs will show INF, WRN and ERR, so grep for
> WRN.
>
> Regards,
>
> Christian
>
> > Any help would be greatly appriciated.
> >
> > Thanks,
> >
> > [root@cs3 ~]# ceph status
> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab
> >  health HEALTH_OK
> >  monmap e2: 3 mons at
> >
> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
> > election epoch 122, quorum 0,1,2 cs1,cs2,cs3
> >  osdmap e890: 6 osds: 6 up, 6 in
> >   pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects
> > 2576 GB used, 19732 GB / 22309 GB avail
> >  448 active+clean
> >   client io 17331 kB/s rd, 113 kB/s wr, 176 op/s
> >
> > [root@cs3 ~]# ceph status
> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab
> >  health HEALTH_WARN
> >  monmap e2: 3 mons at
> >
> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
> > election epoch 122, quorum 0,1,2 cs1,cs2,cs3
> >  osdmap e890: 6 osds: 6 up, 6 in
> >   pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects
> > 2576 GB used, 19732 GB / 22309 GB avail
> >  448 active+clean
> >   client io 28383 kB/s rd, 566 kB/s wr, 321 op/s
> >
> > [root@cs3 ~]# ceph status
> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab
> >  health HEALTH_OK
> >  monmap e2: 3 mons at
> >
> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
> > election epoch 122, quorum 0,1,2 cs1,cs2,cs3
> >  osdmap e890: 6 osds: 6 up, 6 in
> >   pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects
> > 2576 GB used, 19732 GB / 22309 GB avail
> >  448 active+clean
> >   client io 21632 kB/s rd, 49354 B/s wr, 283 op/s
> >
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Fusion Communications
> http://www.gol.com/
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-17 Thread Stanislav Yanchev
Try grep in cs1 and cs3 could be a disk space issue.


Regards,
Stanislav Yanchev
Core System Administrator

[cid:image001.png@01CF8A23.BF80A000]

Mobile: +359 882 549 441
s.yanc...@maxtelecom.bg<mailto:s.yanc...@maxtelecom.bg>
www.maxtelecom.bg<http://www.maxtelecom.bg>

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Andrija Panic
Sent: Tuesday, June 17, 2014 11:57 AM
To: Christian Balzer
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

Hi Christian,

that seems true, thanks.

But again, there are only occurence in GZ logs files (that were logrotated, not 
in current log files):
Example:

[root@cs2 ~]# grep -ir "WRN" /var/log/ceph/
Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches
Binary file /var/log/ceph/ceph.log-20140614.gz matches
Binary file /var/log/ceph/ceph.log-20140611.gz matches
Binary file /var/log/ceph/ceph.log-20140612.gz matches
Binary file /var/log/ceph/ceph.log-20140613.gz matches

Thanks,
Andrija

On 17 June 2014 10:48, Christian Balzer mailto:ch...@gol.com>> 
wrote:

Hello,

On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:

> Hi,
>
> I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data,
> network also fine:
> Ceph ceph-0.72.2.
>
> When I issue "ceph status" command, I get randomly HEALTH_OK, and
> imidiately after that when repeating command, I get HEALTH_WARN
>
> Examle given down - these commands were issues within less than 1 sec
> between them
> There are NO occuring of word "warn" in the logs (grep -ir "warn"
> /var/log/ceph) on any of the servers...
> I get false alerts with my status monitoring script, for this reason...
>
If I recall correctly, the logs will show INF, WRN and ERR, so grep for
WRN.

Regards,

Christian

> Any help would be greatly appriciated.
>
> Thanks,
>
> [root@cs3 ~]# ceph status
> cluster cab20370-bf6a-4589-8010-8d5fc8682eab
>  health HEALTH_OK
>  monmap e2: 3 mons at
> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
> election epoch 122, quorum 0,1,2 cs1,cs2,cs3
>  osdmap e890: 6 osds: 6 up, 6 in
>   pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects
> 2576 GB used, 19732 GB / 22309 GB avail
>  448 active+clean
>   client io 17331 kB/s rd, 113 kB/s wr, 176 op/s
>
> [root@cs3 ~]# ceph status
> cluster cab20370-bf6a-4589-8010-8d5fc8682eab
>  health HEALTH_WARN
>  monmap e2: 3 mons at
> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
> election epoch 122, quorum 0,1,2 cs1,cs2,cs3
>  osdmap e890: 6 osds: 6 up, 6 in
>   pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects
> 2576 GB used, 19732 GB / 22309 GB avail
>  448 active+clean
>   client io 28383 kB/s rd, 566 kB/s wr, 321 op/s
>
> [root@cs3 ~]# ceph status
> cluster cab20370-bf6a-4589-8010-8d5fc8682eab
>  health HEALTH_OK
>  monmap e2: 3 mons at
> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
> election epoch 122, quorum 0,1,2 cs1,cs2,cs3
>  osdmap e890: 6 osds: 6 up, 6 in
>   pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects
> 2576 GB used, 19732 GB / 22309 GB avail
>  448 active+clean
>   client io 21632 kB/s rd, 49354 B/s wr, 283 op/s
>


--
Christian BalzerNetwork/Systems Engineer
ch...@gol.com<mailto:ch...@gol.com>   Global OnLine Japan/Fusion 
Communications
http://www.gol.com/


--

Andrija Panić
--
  http://admintweets.com
--


CONFIDENTIALITY NOTICE
The information contained in this message (including any attachments) is 
confidential and may be legally privileged or otherwise protected from 
disclosure. This message is intended solely for the addressee(s). If you are 
not the intended recipient, please notify the sender by return e-mail and 
delete this message from your system. Any unauthorised use, reproduction, or 
dissemination of this message is strictly prohibited. Any liability arising 
from any third party acting, or refraining from acting, on any information 
contained in this e-mail is hereby excluded. Please note that e-mails are 
susceptible to change. Max Telecom shall not be liable for the improper or 
incomplete transmission of the information contained in this communication, nor 
shall it be liable for any delay in its receipt.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-17 Thread Andrija Panic
Hi,

thanks for that, but is not space issue:

OSD drives are only 12% full.
and /var drive on which MON lives is over 70% only on CS3 server, but I
have increased alert treshold in ceph.conf (mon data avail warn = 15, mon
data avail crit = 5), and since I increased them those alerts are gone
(anyway, these alerts for /var full over 70% can be normally seen in logs
and in ceph -w output).

Here I get no normal/visible warning in eather logs or ceph -w output...

Thanks,
Andrija




On 17 June 2014 11:00, Stanislav Yanchev  wrote:

> Try grep in cs1 and cs3 could be a disk space issue.
>
>
>
>
>
> Regards,
>
> *Stanislav Yanchev*
> Core System Administrator
>
> [image: MAX TELECOM]
>
> Mobile: +359 882 549 441
> s.yanc...@maxtelecom.bg
> www.maxtelecom.bg
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Andrija Panic
> *Sent:* Tuesday, June 17, 2014 11:57 AM
> *To:* Christian Balzer
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] Cluster status reported wrongly as
> HEALTH_WARN
>
>
>
> Hi Christian,
>
>
>
> that seems true, thanks.
>
>
>
> But again, there are only occurence in GZ logs files (that were
> logrotated, not in current log files):
>
> Example:
>
>
>
> [root@cs2 ~]# grep -ir "WRN" /var/log/ceph/
>
> Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches
>
> Binary file /var/log/ceph/ceph.log-20140614.gz matches
>
> Binary file /var/log/ceph/ceph.log-20140611.gz matches
>
> Binary file /var/log/ceph/ceph.log-20140612.gz matches
>
> Binary file /var/log/ceph/ceph.log-20140613.gz matches
>
>
>
> Thanks,
>
> Andrija
>
>
>
> On 17 June 2014 10:48, Christian Balzer  wrote:
>
>
> Hello,
>
>
> On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:
>
> > Hi,
> >
> > I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data,
> > network also fine:
> > Ceph ceph-0.72.2.
> >
> > When I issue "ceph status" command, I get randomly HEALTH_OK, and
> > imidiately after that when repeating command, I get HEALTH_WARN
> >
> > Examle given down - these commands were issues within less than 1 sec
> > between them
> > There are NO occuring of word "warn" in the logs (grep -ir "warn"
> > /var/log/ceph) on any of the servers...
> > I get false alerts with my status monitoring script, for this reason...
> >
>
> If I recall correctly, the logs will show INF, WRN and ERR, so grep for
> WRN.
>
> Regards,
>
> Christian
>
>
> > Any help would be greatly appriciated.
> >
> > Thanks,
> >
> > [root@cs3 ~]# ceph status
> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab
> >  health HEALTH_OK
> >  monmap e2: 3 mons at
> >
> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
> > election epoch 122, quorum 0,1,2 cs1,cs2,cs3
> >  osdmap e890: 6 osds: 6 up, 6 in
> >   pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects
> > 2576 GB used, 19732 GB / 22309 GB avail
> >  448 active+clean
> >   client io 17331 kB/s rd, 113 kB/s wr, 176 op/s
> >
> > [root@cs3 ~]# ceph status
> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab
> >  health HEALTH_WARN
> >  monmap e2: 3 mons at
> >
> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
> > election epoch 122, quorum 0,1,2 cs1,cs2,cs3
> >  osdmap e890: 6 osds: 6 up, 6 in
> >   pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects
> > 2576 GB used, 19732 GB / 22309 GB avail
> >  448 active+clean
> >   client io 28383 kB/s rd, 566 kB/s wr, 321 op/s
> >
> > [root@cs3 ~]# ceph status
> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab
> >  health HEALTH_OK
> >  monmap e2: 3 mons at
> >
> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
> > election epoch 122, quorum 0,1,2 cs1,cs2,cs3
> >  osdmap e890: 6 osds: 6 up, 6 in
> >   pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects
> > 2576 GB used, 19732 GB / 22309 GB avail
> >  448 active+clean
> >   client io 21632 kB/s rd, 49354 B/s wr, 283 op/s
> >
>
>
> --
>
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Fusion Communications
> http://www.gol.com/
>
>
>
>
>
> --
>
>
>
> Andrija Panić
>
> --
>
>

Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-17 Thread Gregory Farnum
Try running "ceph health detail" on each of the monitors. Your disk space
thresholds probably aren't configured correctly or something.
-Greg

Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic 
wrote:

> Hi,
>
> thanks for that, but is not space issue:
>
> OSD drives are only 12% full.
> and /var drive on which MON lives is over 70% only on CS3 server, but I
> have increased alert treshold in ceph.conf (mon data avail warn = 15, mon
> data avail crit = 5), and since I increased them those alerts are gone
> (anyway, these alerts for /var full over 70% can be normally seen in logs
> and in ceph -w output).
>
> Here I get no normal/visible warning in eather logs or ceph -w output...
>
> Thanks,
> Andrija
>
>
>
>
> On 17 June 2014 11:00, Stanislav Yanchev  wrote:
>
>> Try grep in cs1 and cs3 could be a disk space issue.
>>
>>
>>
>>
>>
>> Regards,
>>
>> *Stanislav Yanchev*
>> Core System Administrator
>>
>> [image: MAX TELECOM]
>>
>> Mobile: +359 882 549 441
>> s.yanc...@maxtelecom.bg
>> www.maxtelecom.bg
>>
>>
>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
>> Of *Andrija Panic
>> *Sent:* Tuesday, June 17, 2014 11:57 AM
>> *To:* Christian Balzer
>> *Cc:* ceph-users@lists.ceph.com
>> *Subject:* Re: [ceph-users] Cluster status reported wrongly as
>> HEALTH_WARN
>>
>>
>>
>> Hi Christian,
>>
>>
>>
>> that seems true, thanks.
>>
>>
>>
>> But again, there are only occurence in GZ logs files (that were
>> logrotated, not in current log files):
>>
>> Example:
>>
>>
>>
>> [root@cs2 ~]# grep -ir "WRN" /var/log/ceph/
>>
>> Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches
>>
>> Binary file /var/log/ceph/ceph.log-20140614.gz matches
>>
>> Binary file /var/log/ceph/ceph.log-20140611.gz matches
>>
>> Binary file /var/log/ceph/ceph.log-20140612.gz matches
>>
>> Binary file /var/log/ceph/ceph.log-20140613.gz matches
>>
>>
>>
>> Thanks,
>>
>> Andrija
>>
>>
>>
>> On 17 June 2014 10:48, Christian Balzer  wrote:
>>
>>
>> Hello,
>>
>>
>> On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:
>>
>> > Hi,
>> >
>> > I have 3 node (2 OSD per node) CEPH cluster, running fine, not much
>> data,
>> > network also fine:
>> > Ceph ceph-0.72.2.
>> >
>> > When I issue "ceph status" command, I get randomly HEALTH_OK, and
>> > imidiately after that when repeating command, I get HEALTH_WARN
>> >
>> > Examle given down - these commands were issues within less than 1 sec
>> > between them
>> > There are NO occuring of word "warn" in the logs (grep -ir "warn"
>> > /var/log/ceph) on any of the servers...
>> > I get false alerts with my status monitoring script, for this reason...
>> >
>>
>> If I recall correctly, the logs will show INF, WRN and ERR, so grep for
>> WRN.
>>
>> Regards,
>>
>> Christian
>>
>>
>> > Any help would be greatly appriciated.
>> >
>> > Thanks,
>> >
>> > [root@cs3 ~]# ceph status
>> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab
>> >  health HEALTH_OK
>> >  monmap e2: 3 mons at
>> >
>> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
>> > election epoch 122, quorum 0,1,2 cs1,cs2,cs3
>> >  osdmap e890: 6 osds: 6 up, 6 in
>> >   pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects
>> > 2576 GB used, 19732 GB / 22309 GB avail
>> >  448 active+clean
>> >   client io 17331 kB/s rd, 113 kB/s wr, 176 op/s
>> >
>> > [root@cs3 ~]# ceph status
>> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab
>> >  health HEALTH_WARN
>> >  monmap e2: 3 mons at
>> >
>> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
>> > election epoch 122, quorum 0,1,2 cs1,cs2,cs3
>> >  osdmap e890: 6 osds: 6 up, 6 in
>> >   pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects
>> > 2576 GB used, 19732 GB / 22309 GB avail
>> >  448 active+clean
>> >   client io 28383 kB/s rd, 566 kB/s wr, 321 op/s
>> >
>> > [root@cs3 

Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-18 Thread Andrija Panic
Hi Gregory,

indeed - I still have warnings about 20% free space on CS3 server, where
MON lives...strange is that I don't get these warnings with prolonged "ceph
-w" output...
[root@cs2 ~]# ceph health detail
HEALTH_WARN
mon.cs3 addr 10.44.xxx.12:6789/0 has 20% avail disk space -- low disk space!

I don't understand, how is this possible to get warnings - I have folowing
in each ceph.conf file, under the general section:

mon data avail warn = 15
mon data avail crit = 5

I found this settings on ceph mailing list...

Thanks a lot,
Andrija


On 17 June 2014 19:22, Gregory Farnum  wrote:

> Try running "ceph health detail" on each of the monitors. Your disk space
> thresholds probably aren't configured correctly or something.
> -Greg
>
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic 
> wrote:
>
>> Hi,
>>
>> thanks for that, but is not space issue:
>>
>> OSD drives are only 12% full.
>> and /var drive on which MON lives is over 70% only on CS3 server, but I
>> have increased alert treshold in ceph.conf (mon data avail warn = 15, mon
>> data avail crit = 5), and since I increased them those alerts are gone
>> (anyway, these alerts for /var full over 70% can be normally seen in logs
>> and in ceph -w output).
>>
>> Here I get no normal/visible warning in eather logs or ceph -w output...
>>
>> Thanks,
>> Andrija
>>
>>
>>
>>
>> On 17 June 2014 11:00, Stanislav Yanchev  wrote:
>>
>>> Try grep in cs1 and cs3 could be a disk space issue.
>>>
>>>
>>>
>>>
>>>
>>> Regards,
>>>
>>> *Stanislav Yanchev*
>>> Core System Administrator
>>>
>>> [image: MAX TELECOM]
>>>
>>> Mobile: +359 882 549 441
>>> s.yanc...@maxtelecom.bg
>>> www.maxtelecom.bg
>>>
>>>
>>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
>>> Behalf Of *Andrija Panic
>>> *Sent:* Tuesday, June 17, 2014 11:57 AM
>>> *To:* Christian Balzer
>>> *Cc:* ceph-users@lists.ceph.com
>>> *Subject:* Re: [ceph-users] Cluster status reported wrongly as
>>> HEALTH_WARN
>>>
>>>
>>>
>>> Hi Christian,
>>>
>>>
>>>
>>> that seems true, thanks.
>>>
>>>
>>>
>>> But again, there are only occurence in GZ logs files (that were
>>> logrotated, not in current log files):
>>>
>>> Example:
>>>
>>>
>>>
>>> [root@cs2 ~]# grep -ir "WRN" /var/log/ceph/
>>>
>>> Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches
>>>
>>> Binary file /var/log/ceph/ceph.log-20140614.gz matches
>>>
>>> Binary file /var/log/ceph/ceph.log-20140611.gz matches
>>>
>>> Binary file /var/log/ceph/ceph.log-20140612.gz matches
>>>
>>> Binary file /var/log/ceph/ceph.log-20140613.gz matches
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Andrija
>>>
>>>
>>>
>>> On 17 June 2014 10:48, Christian Balzer  wrote:
>>>
>>>
>>> Hello,
>>>
>>>
>>> On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:
>>>
>>> > Hi,
>>> >
>>> > I have 3 node (2 OSD per node) CEPH cluster, running fine, not much
>>> data,
>>> > network also fine:
>>> > Ceph ceph-0.72.2.
>>> >
>>> > When I issue "ceph status" command, I get randomly HEALTH_OK, and
>>> > imidiately after that when repeating command, I get HEALTH_WARN
>>> >
>>> > Examle given down - these commands were issues within less than 1 sec
>>> > between them
>>> > There are NO occuring of word "warn" in the logs (grep -ir "warn"
>>> > /var/log/ceph) on any of the servers...
>>> > I get false alerts with my status monitoring script, for this reason...
>>> >
>>>
>>> If I recall correctly, the logs will show INF, WRN and ERR, so grep for
>>> WRN.
>>>
>>> Regards,
>>>
>>> Christian
>>>
>>>
>>> > Any help would be greatly appriciated.
>>> >
>>> > Thanks,
>>> >
>>> > [root@cs3 ~]# ceph status
>>> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab
>>> >  health HEALTH_OK
>>> >  monmap e2: 3 mons at
>

Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-18 Thread Andrija Panic
As stupid as I could do it...
After lowering mon data . from 20% to 15% treshold, it seems I forgot
to restart MON service on this one node...

I appologies for bugging and thanks again everybody.

Andrija


On 18 June 2014 09:49, Andrija Panic  wrote:

> Hi Gregory,
>
> indeed - I still have warnings about 20% free space on CS3 server, where
> MON lives...strange is that I don't get these warnings with prolonged "ceph
> -w" output...
> [root@cs2 ~]# ceph health detail
> HEALTH_WARN
> mon.cs3 addr 10.44.xxx.12:6789/0 has 20% avail disk space -- low disk
> space!
>
> I don't understand, how is this possible to get warnings - I have folowing
> in each ceph.conf file, under the general section:
>
> mon data avail warn = 15
> mon data avail crit = 5
>
> I found this settings on ceph mailing list...
>
> Thanks a lot,
> Andrija
>
>
> On 17 June 2014 19:22, Gregory Farnum  wrote:
>
>> Try running "ceph health detail" on each of the monitors. Your disk space
>> thresholds probably aren't configured correctly or something.
>> -Greg
>>
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic 
>> wrote:
>>
>>> Hi,
>>>
>>> thanks for that, but is not space issue:
>>>
>>> OSD drives are only 12% full.
>>> and /var drive on which MON lives is over 70% only on CS3 server, but I
>>> have increased alert treshold in ceph.conf (mon data avail warn = 15, mon
>>> data avail crit = 5), and since I increased them those alerts are gone
>>> (anyway, these alerts for /var full over 70% can be normally seen in logs
>>> and in ceph -w output).
>>>
>>> Here I get no normal/visible warning in eather logs or ceph -w output...
>>>
>>> Thanks,
>>> Andrija
>>>
>>>
>>>
>>>
>>> On 17 June 2014 11:00, Stanislav Yanchev 
>>> wrote:
>>>
>>>> Try grep in cs1 and cs3 could be a disk space issue.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> *Stanislav Yanchev*
>>>> Core System Administrator
>>>>
>>>> [image: MAX TELECOM]
>>>>
>>>> Mobile: +359 882 549 441
>>>> s.yanc...@maxtelecom.bg
>>>> www.maxtelecom.bg
>>>>
>>>>
>>>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
>>>> Behalf Of *Andrija Panic
>>>> *Sent:* Tuesday, June 17, 2014 11:57 AM
>>>> *To:* Christian Balzer
>>>> *Cc:* ceph-users@lists.ceph.com
>>>> *Subject:* Re: [ceph-users] Cluster status reported wrongly as
>>>> HEALTH_WARN
>>>>
>>>>
>>>>
>>>> Hi Christian,
>>>>
>>>>
>>>>
>>>> that seems true, thanks.
>>>>
>>>>
>>>>
>>>> But again, there are only occurence in GZ logs files (that were
>>>> logrotated, not in current log files):
>>>>
>>>> Example:
>>>>
>>>>
>>>>
>>>> [root@cs2 ~]# grep -ir "WRN" /var/log/ceph/
>>>>
>>>> Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches
>>>>
>>>> Binary file /var/log/ceph/ceph.log-20140614.gz matches
>>>>
>>>> Binary file /var/log/ceph/ceph.log-20140611.gz matches
>>>>
>>>> Binary file /var/log/ceph/ceph.log-20140612.gz matches
>>>>
>>>> Binary file /var/log/ceph/ceph.log-20140613.gz matches
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Andrija
>>>>
>>>>
>>>>
>>>> On 17 June 2014 10:48, Christian Balzer  wrote:
>>>>
>>>>
>>>> Hello,
>>>>
>>>>
>>>> On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > I have 3 node (2 OSD per node) CEPH cluster, running fine, not much
>>>> data,
>>>> > network also fine:
>>>> > Ceph ceph-0.72.2.
>>>> >
>>>> > When I issue "ceph status" command, I get randomly HEALTH_OK, and
>>>> > imidiately after that when repeating command, I get HEALTH_WARN
>>>> >
>>>> > Examle given down - these commands were issues within less than

Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-18 Thread Gregory Farnum
The lack of warnings in ceph -w for this issue is a bug in Emperor.
It's resolved in Firefly.
-Greg

On Wed, Jun 18, 2014 at 3:49 AM, Andrija Panic  wrote:
>
> Hi Gregory,
>
> indeed - I still have warnings about 20% free space on CS3 server, where MON 
> lives...strange is that I don't get these warnings with prolonged "ceph -w" 
> output...
> [root@cs2 ~]# ceph health detail
> HEALTH_WARN
> mon.cs3 addr 10.44.xxx.12:6789/0 has 20% avail disk space -- low disk space!
>
> I don't understand, how is this possible to get warnings - I have folowing in 
> each ceph.conf file, under the general section:
>
> mon data avail warn = 15
> mon data avail crit = 5
>
> I found this settings on ceph mailing list...
>
> Thanks a lot,
> Andrija
>
>
> On 17 June 2014 19:22, Gregory Farnum  wrote:
>>
>> Try running "ceph health detail" on each of the monitors. Your disk space 
>> thresholds probably aren't configured correctly or something.
>> -Greg
>>
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic  
>> wrote:
>>>
>>> Hi,
>>>
>>> thanks for that, but is not space issue:
>>>
>>> OSD drives are only 12% full.
>>> and /var drive on which MON lives is over 70% only on CS3 server, but I 
>>> have increased alert treshold in ceph.conf (mon data avail warn = 15, mon 
>>> data avail crit = 5), and since I increased them those alerts are gone 
>>> (anyway, these alerts for /var full over 70% can be normally seen in logs 
>>> and in ceph -w output).
>>>
>>> Here I get no normal/visible warning in eather logs or ceph -w output...
>>>
>>> Thanks,
>>> Andrija
>>>
>>>
>>>
>>>
>>> On 17 June 2014 11:00, Stanislav Yanchev  wrote:
>>>>
>>>> Try grep in cs1 and cs3 could be a disk space issue.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Stanislav Yanchev
>>>> Core System Administrator
>>>>
>>>>
>>>>
>>>> Mobile: +359 882 549 441
>>>> s.yanc...@maxtelecom.bg
>>>> www.maxtelecom.bg
>>>>
>>>>
>>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
>>>> Andrija Panic
>>>> Sent: Tuesday, June 17, 2014 11:57 AM
>>>> To: Christian Balzer
>>>> Cc: ceph-users@lists.ceph.com
>>>> Subject: Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN
>>>>
>>>>
>>>>
>>>> Hi Christian,
>>>>
>>>>
>>>>
>>>> that seems true, thanks.
>>>>
>>>>
>>>>
>>>> But again, there are only occurence in GZ logs files (that were 
>>>> logrotated, not in current log files):
>>>>
>>>> Example:
>>>>
>>>>
>>>>
>>>> [root@cs2 ~]# grep -ir "WRN" /var/log/ceph/
>>>>
>>>> Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches
>>>>
>>>> Binary file /var/log/ceph/ceph.log-20140614.gz matches
>>>>
>>>> Binary file /var/log/ceph/ceph.log-20140611.gz matches
>>>>
>>>> Binary file /var/log/ceph/ceph.log-20140612.gz matches
>>>>
>>>> Binary file /var/log/ceph/ceph.log-20140613.gz matches
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Andrija
>>>>
>>>>
>>>>
>>>> On 17 June 2014 10:48, Christian Balzer  wrote:
>>>>
>>>>
>>>> Hello,
>>>>
>>>>
>>>> On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data,
>>>> > network also fine:
>>>> > Ceph ceph-0.72.2.
>>>> >
>>>> > When I issue "ceph status" command, I get randomly HEALTH_OK, and
>>>> > imidiately after that when repeating command, I get HEALTH_WARN
>>>> >
>>>> > Examle given down - these commands were issues within less than 1 sec
>>>> > between them
>>>> > There are NO occuring of word "warn" in the logs (grep -ir "warn"
>&g

Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-18 Thread Andrija Panic
Thanks Greg, seems like I'm going to update soon...

Thanks again,
Andrija


On 18 June 2014 14:06, Gregory Farnum  wrote:

> The lack of warnings in ceph -w for this issue is a bug in Emperor.
> It's resolved in Firefly.
> -Greg
>
> On Wed, Jun 18, 2014 at 3:49 AM, Andrija Panic 
> wrote:
> >
> > Hi Gregory,
> >
> > indeed - I still have warnings about 20% free space on CS3 server, where
> MON lives...strange is that I don't get these warnings with prolonged "ceph
> -w" output...
> > [root@cs2 ~]# ceph health detail
> > HEALTH_WARN
> > mon.cs3 addr 10.44.xxx.12:6789/0 has 20% avail disk space -- low disk
> space!
> >
> > I don't understand, how is this possible to get warnings - I have
> folowing in each ceph.conf file, under the general section:
> >
> > mon data avail warn = 15
> > mon data avail crit = 5
> >
> > I found this settings on ceph mailing list...
> >
> > Thanks a lot,
> > Andrija
> >
> >
> > On 17 June 2014 19:22, Gregory Farnum  wrote:
> >>
> >> Try running "ceph health detail" on each of the monitors. Your disk
> space thresholds probably aren't configured correctly or something.
> >> -Greg
> >>
> >> Software Engineer #42 @ http://inktank.com | http://ceph.com
> >>
> >>
> >> On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic 
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> thanks for that, but is not space issue:
> >>>
> >>> OSD drives are only 12% full.
> >>> and /var drive on which MON lives is over 70% only on CS3 server, but
> I have increased alert treshold in ceph.conf (mon data avail warn = 15, mon
> data avail crit = 5), and since I increased them those alerts are gone
> (anyway, these alerts for /var full over 70% can be normally seen in logs
> and in ceph -w output).
> >>>
> >>> Here I get no normal/visible warning in eather logs or ceph -w
> output...
> >>>
> >>> Thanks,
> >>> Andrija
> >>>
> >>>
> >>>
> >>>
> >>> On 17 June 2014 11:00, Stanislav Yanchev 
> wrote:
> >>>>
> >>>> Try grep in cs1 and cs3 could be a disk space issue.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Regards,
> >>>>
> >>>> Stanislav Yanchev
> >>>> Core System Administrator
> >>>>
> >>>>
> >>>>
> >>>> Mobile: +359 882 549 441
> >>>> s.yanc...@maxtelecom.bg
> >>>> www.maxtelecom.bg
> >>>>
> >>>>
> >>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
> Behalf Of Andrija Panic
> >>>> Sent: Tuesday, June 17, 2014 11:57 AM
> >>>> To: Christian Balzer
> >>>> Cc: ceph-users@lists.ceph.com
> >>>> Subject: Re: [ceph-users] Cluster status reported wrongly as
> HEALTH_WARN
> >>>>
> >>>>
> >>>>
> >>>> Hi Christian,
> >>>>
> >>>>
> >>>>
> >>>> that seems true, thanks.
> >>>>
> >>>>
> >>>>
> >>>> But again, there are only occurence in GZ logs files (that were
> logrotated, not in current log files):
> >>>>
> >>>> Example:
> >>>>
> >>>>
> >>>>
> >>>> [root@cs2 ~]# grep -ir "WRN" /var/log/ceph/
> >>>>
> >>>> Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches
> >>>>
> >>>> Binary file /var/log/ceph/ceph.log-20140614.gz matches
> >>>>
> >>>> Binary file /var/log/ceph/ceph.log-20140611.gz matches
> >>>>
> >>>> Binary file /var/log/ceph/ceph.log-20140612.gz matches
> >>>>
> >>>> Binary file /var/log/ceph/ceph.log-20140613.gz matches
> >>>>
> >>>>
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Andrija
> >>>>
> >>>>
> >>>>
> >>>> On 17 June 2014 10:48, Christian Balzer  wrote:
> >>>>
> >>>>
> >>>> Hello,
> >>>>
> >>>>
> >>>> On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:
> >>>>
> >>