Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
Dumpling

ceph-0.67.9-16.g69a99e6

I guess it shouldn’t be logging it at all?


Thanks
Jan


> On 02 Jun 2015, at 20:42, Somnath Roy  wrote:
> 
> Which code base are you using ?
> 
> -Original Message-
> From: Jan Schermer [mailto:j...@schermer.cz] 
> Sent: Tuesday, June 02, 2015 11:41 AM
> To: Somnath Roy
> Cc: ceph-users
> Subject: Re: [ceph-users] ceph-mon logging like crazy because?
> 
> We actually have
> "debug mon = 0”
> 
> It was always spammy, but this is too spammy - on one mon the log size is 
> 500MB since morning. on other node it’s 17GB and about 16.5GB of that is 
> within one hour - something’s not right there and this is likely just a 
> symptom…
> 
> Jan
> 
> 
> On 02 Jun 2015, at 20:36, Somnath Roy  wrote:
>> 
>> I think with the latest version of code it is printing only for log level 5, 
>> earlier it was 1. Here is the link where I had some conversation about this 
>> earlier with Sage.
>> 
>> http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881
>> 
>> So, IMO nothing to worry about other than log spam here which is fixed 
>> in the latest build or you can fix it with debug mon = 0/0
>> 
>> Thanks & Regards
>> Somnath
>> 
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
>> Of Jan Schermer
>> Sent: Tuesday, June 02, 2015 11:33 AM
>> To: ceph-users
>> Subject: Re: [ceph-users] ceph-mon logging like crazy because?
>> 
>> Another follow-up.
>> The whole madness started with “mon compact” which we run from cron (else 
>> leveldb eats all space). It’s been running for about 14 days now with no 
>> incident.
>> 
>> 2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 
>> handle_command mon_command({"prefix": "compact"} v 0) v1
>> 2015-06-02 16:40:23.646514 7f430a746700  1 
>> mon.node-14@2(peon).paxos(paxos updating c 36596805..36597321) 
>> lease_timeout -- calling new election
>> 2015-06-02 16:40:23.646947 7f4309d45700  1 
>> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
>> is_readable now=2015-06-02 16:40:23.646947 lease_expire=0.00 has 
>> v0 lc 3659
>> 7321
>> 2015-06-02 16:40:23.646953 7f4309d45700  1 
>> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
>> is_readable now=2015-06-02 16:40:23.646954 lease_expire=0.00 has 
>> v0 lc 3659
>> 7321
>> 2015-06-02 16:40:23.646960 7f4309d45700  1 
>> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
>> is_readable now=2015-06-02 16:40:23.646961 lease_expire=0.00 has 
>> v0 lc 3659
>> 7321
>> 2015-06-02 16:40:23.646963 7f4309d45700  1 
>> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
>> is_readable now=2015-06-02 16:40:23.646964 lease_expire=0.00 has 
>> v0 lc 3659
>> 7321
>> 2015-06-02 16:40:23.646968 7f4309d45700  1 
>> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
>> is_readable now=2015-06-02 16:40:23.646969 lease_expire=0.00 has 
>> v0 lc 3659
>> 7321
>> 2015-06-02 16:40:23.646971 7f4309d45700  1 
>> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
>> is_readable now=2015-06-02 16:40:23.646972 lease_expire=0.00 has 
>> v0 lc 3659
>> 7321
>> 2015-06-02 16:40:23.646976 7f4309d45700  1 
>> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
>> is_readable now=2015-06-02 16:40:23.646977 lease_expire=0.00 has 
>> v0 lc 3659
>> 7321
>> 2015-06-02 16:40:23.646979 7f4309d45700  1 
>> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
>> is_readable now=2015-06-02 16:40:23.646980 lease_expire=0.00 has 
>> v0 lc 3659
>> 7321
>> 
>> 
>> The sequence that follows is
>> probing recovering
>> electing recovering
>> peon recovering
>> peon active (and this is the madness)
>> 
>> It logs much less now, but the issue is still here…
>> 
>> Jan
>> 
>>> On 02 Jun 2015, at 20:17, Jan Schermer  wrote:
>>> 
>>> Actually looks like it stopped, but here’s a more representative 
>>> sample (notice how often it logged this!)
>>> 
>>> v0 lc 36602135
>>> 2015-06-02 17:39:59.865833 7f4309d45700  1 
>>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
>>> is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02
>>> 17:40:04.221316 has
>>> v0 lc 36602135
>>> 2015-06-02 1

Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Somnath Roy
Which code base are you using ?

-Original Message-
From: Jan Schermer [mailto:j...@schermer.cz] 
Sent: Tuesday, June 02, 2015 11:41 AM
To: Somnath Roy
Cc: ceph-users
Subject: Re: [ceph-users] ceph-mon logging like crazy because?

We actually have
"debug mon = 0”

It was always spammy, but this is too spammy - on one mon the log size is 500MB 
since morning. on other node it’s 17GB and about 16.5GB of that is within one 
hour - something’s not right there and this is likely just a symptom…

Jan


On 02 Jun 2015, at 20:36, Somnath Roy  wrote:
> 
> I think with the latest version of code it is printing only for log level 5, 
> earlier it was 1. Here is the link where I had some conversation about this 
> earlier with Sage.
> 
> http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881
> 
> So, IMO nothing to worry about other than log spam here which is fixed 
> in the latest build or you can fix it with debug mon = 0/0
> 
> Thanks & Regards
> Somnath
> 
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
> Of Jan Schermer
> Sent: Tuesday, June 02, 2015 11:33 AM
> To: ceph-users
> Subject: Re: [ceph-users] ceph-mon logging like crazy because?
> 
> Another follow-up.
> The whole madness started with “mon compact” which we run from cron (else 
> leveldb eats all space). It’s been running for about 14 days now with no 
> incident.
> 
> 2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 
> handle_command mon_command({"prefix": "compact"} v 0) v1
> 2015-06-02 16:40:23.646514 7f430a746700  1 
> mon.node-14@2(peon).paxos(paxos updating c 36596805..36597321) 
> lease_timeout -- calling new election
> 2015-06-02 16:40:23.646947 7f4309d45700  1 
> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
> is_readable now=2015-06-02 16:40:23.646947 lease_expire=0.00 has 
> v0 lc 3659
> 7321
> 2015-06-02 16:40:23.646953 7f4309d45700  1 
> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
> is_readable now=2015-06-02 16:40:23.646954 lease_expire=0.00 has 
> v0 lc 3659
> 7321
> 2015-06-02 16:40:23.646960 7f4309d45700  1 
> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
> is_readable now=2015-06-02 16:40:23.646961 lease_expire=0.00 has 
> v0 lc 3659
> 7321
> 2015-06-02 16:40:23.646963 7f4309d45700  1 
> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
> is_readable now=2015-06-02 16:40:23.646964 lease_expire=0.00 has 
> v0 lc 3659
> 7321
> 2015-06-02 16:40:23.646968 7f4309d45700  1 
> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
> is_readable now=2015-06-02 16:40:23.646969 lease_expire=0.00 has 
> v0 lc 3659
> 7321
> 2015-06-02 16:40:23.646971 7f4309d45700  1 
> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
> is_readable now=2015-06-02 16:40:23.646972 lease_expire=0.00 has 
> v0 lc 3659
> 7321
> 2015-06-02 16:40:23.646976 7f4309d45700  1 
> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
> is_readable now=2015-06-02 16:40:23.646977 lease_expire=0.00 has 
> v0 lc 3659
> 7321
> 2015-06-02 16:40:23.646979 7f4309d45700  1 
> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
> is_readable now=2015-06-02 16:40:23.646980 lease_expire=0.00 has 
> v0 lc 3659
> 7321
> 
> 
> The sequence that follows is
> probing recovering
> electing recovering
> peon recovering
> peon active (and this is the madness)
> 
> It logs much less now, but the issue is still here…
> 
> Jan
> 
>> On 02 Jun 2015, at 20:17, Jan Schermer  wrote:
>> 
>> Actually looks like it stopped, but here’s a more representative 
>> sample (notice how often it logged this!)
>> 
>> v0 lc 36602135
>> 2015-06-02 17:39:59.865833 7f4309d45700  1 
>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
>> is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02
>> 17:40:04.221316 has
>> v0 lc 36602135
>> 2015-06-02 17:39:59.865860 7f4309d45700  1 
>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
>> is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02
>> 17:40:04.221316 has
>> v0 lc 36602135
>> 2015-06-02 17:39:59.865886 7f4309d45700  1 
>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
>> is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02
>> 17:40:04.221316 has
>> v0 lc 36602135
>> 2015-06-02 17:39:59.865944 7f4309d45700  1 
>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
>> is_readable

Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
We actually have
"debug mon = 0”

It was always spammy, but this is too spammy - on one mon the log size is 500MB 
since morning. on other node it’s 17GB and about 16.5GB of that is within one 
hour - something’s not right there and this is likely just a symptom…

Jan


On 02 Jun 2015, at 20:36, Somnath Roy  wrote:
> 
> I think with the latest version of code it is printing only for log level 5, 
> earlier it was 1. Here is the link where I had some conversation about this 
> earlier with Sage.
> 
> http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881
> 
> So, IMO nothing to worry about other than log spam here which is fixed in the 
> latest build or you can fix it with debug mon = 0/0
> 
> Thanks & Regards
> Somnath
> 
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan 
> Schermer
> Sent: Tuesday, June 02, 2015 11:33 AM
> To: ceph-users
> Subject: Re: [ceph-users] ceph-mon logging like crazy because?
> 
> Another follow-up.
> The whole madness started with “mon compact” which we run from cron (else 
> leveldb eats all space). It’s been running for about 14 days now with no 
> incident.
> 
> 2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 
> handle_command mon_command({"prefix": "compact"} v 0) v1
> 2015-06-02 16:40:23.646514 7f430a746700  1 mon.node-14@2(peon).paxos(paxos 
> updating c 36596805..36597321) lease_timeout -- calling new election
> 2015-06-02 16:40:23.646947 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
> recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646947 
> lease_expire=0.00 has v0 lc 3659
> 7321
> 2015-06-02 16:40:23.646953 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
> recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646954 
> lease_expire=0.00 has v0 lc 3659
> 7321
> 2015-06-02 16:40:23.646960 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
> recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646961 
> lease_expire=0.00 has v0 lc 3659
> 7321
> 2015-06-02 16:40:23.646963 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
> recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646964 
> lease_expire=0.00 has v0 lc 3659
> 7321
> 2015-06-02 16:40:23.646968 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
> recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646969 
> lease_expire=0.00 has v0 lc 3659
> 7321
> 2015-06-02 16:40:23.646971 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
> recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646972 
> lease_expire=0.00 has v0 lc 3659
> 7321
> 2015-06-02 16:40:23.646976 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
> recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646977 
> lease_expire=0.00 has v0 lc 3659
> 7321
> 2015-06-02 16:40:23.646979 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
> recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646980 
> lease_expire=0.00 has v0 lc 3659
> 7321
> 
> 
> The sequence that follows is
> probing recovering
> electing recovering
> peon recovering
> peon active (and this is the madness)
> 
> It logs much less now, but the issue is still here…
> 
> Jan
> 
>> On 02 Jun 2015, at 20:17, Jan Schermer  wrote:
>> 
>> Actually looks like it stopped, but here’s a more representative
>> sample (notice how often it logged this!)
>> 
>> v0 lc 36602135
>> 2015-06-02 17:39:59.865833 7f4309d45700  1
>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
>> is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02
>> 17:40:04.221316 has
>> v0 lc 36602135
>> 2015-06-02 17:39:59.865860 7f4309d45700  1
>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
>> is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02
>> 17:40:04.221316 has
>> v0 lc 36602135
>> 2015-06-02 17:39:59.865886 7f4309d45700  1
>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
>> is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02
>> 17:40:04.221316 has
>> v0 lc 36602135
>> 2015-06-02 17:39:59.865944 7f4309d45700  1
>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
>> is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02
>> 17:40:04.221316 has
>> v0 lc 36602135
>> 2015-06-02 17:39:59.865989 7f4309d45700  1
>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
>> is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02
>>

Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Somnath Roy
I think with the latest version of code it is printing only for log level 5, 
earlier it was 1. Here is the link where I had some conversation about this 
earlier with Sage.

http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881

So, IMO nothing to worry about other than log spam here which is fixed in the 
latest build or you can fix it with debug mon = 0/0

Thanks & Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan 
Schermer
Sent: Tuesday, June 02, 2015 11:33 AM
To: ceph-users
Subject: Re: [ceph-users] ceph-mon logging like crazy because?

Another follow-up.
The whole madness started with “mon compact” which we run from cron (else 
leveldb eats all space). It’s been running for about 14 days now with no 
incident.

2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 
handle_command mon_command({"prefix": "compact"} v 0) v1
2015-06-02 16:40:23.646514 7f430a746700  1 mon.node-14@2(peon).paxos(paxos 
updating c 36596805..36597321) lease_timeout -- calling new election
2015-06-02 16:40:23.646947 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646947 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646953 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646954 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646960 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646961 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646963 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646964 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646968 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646969 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646971 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646972 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646976 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646977 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646979 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646980 
lease_expire=0.00 has v0 lc 3659
7321


The sequence that follows is
probing recovering
electing recovering
peon recovering
peon active (and this is the madness)

It logs much less now, but the issue is still here…

Jan

> On 02 Jun 2015, at 20:17, Jan Schermer  wrote:
>
> Actually looks like it stopped, but here’s a more representative
> sample (notice how often it logged this!)
>
> v0 lc 36602135
> 2015-06-02 17:39:59.865833 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865860 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865886 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865944 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865989 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866025 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.866027 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866072 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.866074 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866121 7f4309d45700  1
> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
> is_readable now=2015-06-02 17:39:59.866123 lease_expire=2015-06-02
> 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866164 7f4309d45700  1
> mon.node-14@2(peon)

Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
Another follow-up.
The whole madness started with “mon compact” which we run from cron (else 
leveldb eats all space). It’s been running for about 14 days now with no 
incident.

2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 
handle_command mon_command({"prefix": "compact"} v 0) v1
2015-06-02 16:40:23.646514 7f430a746700  1 mon.node-14@2(peon).paxos(paxos 
updating c 36596805..36597321) lease_timeout -- calling new election
2015-06-02 16:40:23.646947 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646947 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646953 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646954 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646960 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646961 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646963 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646964 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646968 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646969 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646971 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646972 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646976 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646977 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646979 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646980 
lease_expire=0.00 has v0 lc 3659
7321


The sequence that follows is
probing recovering
electing recovering
peon recovering
peon active (and this is the madness)

It logs much less now, but the issue is still here…

Jan

> On 02 Jun 2015, at 20:17, Jan Schermer  wrote:
> 
> Actually looks like it stopped, but here’s a more representative sample
> (notice how often it logged this!)
> 
> v0 lc 36602135
> 2015-06-02 17:39:59.865833 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865834 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865860 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865861 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865886 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865887 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865944 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865946 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865989 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865992 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866025 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866027 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866072 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866074 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866121 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866123 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866164 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866166 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866205 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866207 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866244 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866246 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866285 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866287 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06

Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
Actually looks like it stopped, but here’s a more representative sample
(notice how often it logged this!)

 v0 lc 36602135
2015-06-02 17:39:59.865833 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865834 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.865860 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865861 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.865886 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865887 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.865944 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865946 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.865989 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865992 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866025 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866027 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866072 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866074 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866121 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866123 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866164 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866166 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866205 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866207 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866244 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866246 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866285 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866287 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866325 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866327 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135



> On 02 Jun 2015, at 20:14, Jan Schermer  wrote:
> 
> Our mons just went into a logging frenzy.
> 
> We have 3 mons in the cluster, and they mostly log stuff like this
> 
> 2015-06-02 18:00:48.749386 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
> active c 36603331..36604063) is_readable now=2015-06-02 18:00:48.749389 
> lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
> 2015-06-02 18:00:49.025179 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
> active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.025187 
> lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
> 2015-06-02 18:00:49.025640 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
> active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.025642 
> lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
> 2015-06-02 18:00:49.026132 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
> active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.026134 
> lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
> 2015-06-02 18:00:49.028388 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
> active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.028393 
> lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
> 
> 
> There are few lines every second, sometimes more, sometimes less (tell me if 
> that’s normal. I’m not sure)
> 
> Two of them went completely haywire, one log is 17GB now and rising. It’s 
> still mostly the same content, just more frequent:
> 
> 2015-06-02 18:09:00.879950 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.879956 
> lease_expire=0.00 has v0 lc 36604772
> 2015-06-02 18:09:00.879968 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.879969 
> lease_expire=0.00 has v0 lc 36604772
> 2015-06-02 18:09:00.954835 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.954843 
> lease_expire=0.00 has v0 lc 36604772
> 2015-06-02 18:09:00.954860 7f4309d45700  1 mon.node-14@2(peon).pa