Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
Actually looks like it stopped, but here’s a more representative sample
(notice how often it logged this!)

 v0 lc 36602135
2015-06-02 17:39:59.865833 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865834 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.865860 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865861 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.865886 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865887 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.865944 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865946 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.865989 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865992 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866025 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866027 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866072 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866074 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866121 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866123 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866164 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866166 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866205 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866207 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866244 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866246 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866285 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866287 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866325 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866327 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135



 On 02 Jun 2015, at 20:14, Jan Schermer j...@schermer.cz wrote:
 
 Our mons just went into a logging frenzy.
 
 We have 3 mons in the cluster, and they mostly log stuff like this
 
 2015-06-02 18:00:48.749386 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
 active c 36603331..36604063) is_readable now=2015-06-02 18:00:48.749389 
 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
 2015-06-02 18:00:49.025179 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
 active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.025187 
 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
 2015-06-02 18:00:49.025640 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
 active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.025642 
 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
 2015-06-02 18:00:49.026132 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
 active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.026134 
 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
 2015-06-02 18:00:49.028388 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
 active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.028393 
 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
 
 
 There are few lines every second, sometimes more, sometimes less (tell me if 
 that’s normal. I’m not sure)
 
 Two of them went completely haywire, one log is 17GB now and rising. It’s 
 still mostly the same content, just more frequent:
 
 2015-06-02 18:09:00.879950 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.879956 
 lease_expire=0.00 has v0 lc 36604772
 2015-06-02 18:09:00.879968 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.879969 
 lease_expire=0.00 has v0 lc 36604772
 2015-06-02 18:09:00.954835 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.954843 
 lease_expire=0.00 has v0 lc 36604772
 2015-06-02 18:09:00.954860 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 updating c 

Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Somnath Roy
I think with the latest version of code it is printing only for log level 5, 
earlier it was 1. Here is the link where I had some conversation about this 
earlier with Sage.

http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881

So, IMO nothing to worry about other than log spam here which is fixed in the 
latest build or you can fix it with debug mon = 0/0

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan 
Schermer
Sent: Tuesday, June 02, 2015 11:33 AM
To: ceph-users
Subject: Re: [ceph-users] ceph-mon logging like crazy because?

Another follow-up.
The whole madness started with “mon compact” which we run from cron (else 
leveldb eats all space). It’s been running for about 14 days now with no 
incident.

2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 
handle_command mon_command({prefix: compact} v 0) v1
2015-06-02 16:40:23.646514 7f430a746700  1 mon.node-14@2(peon).paxos(paxos 
updating c 36596805..36597321) lease_timeout -- calling new election
2015-06-02 16:40:23.646947 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646947 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646953 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646954 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646960 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646961 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646963 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646964 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646968 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646969 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646971 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646972 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646976 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646977 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646979 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646980 
lease_expire=0.00 has v0 lc 3659
7321


The sequence that follows is
probing recovering
electing recovering
peon recovering
peon active (and this is the madness)

It logs much less now, but the issue is still here…

Jan

 On 02 Jun 2015, at 20:17, Jan Schermer j...@schermer.cz wrote:

 Actually looks like it stopped, but here’s a more representative
 sample (notice how often it logged this!)

 v0 lc 36602135
 2015-06-02 17:39:59.865833 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865860 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865886 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865944 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865989 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866025 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.866027 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866072 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.866074 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866121 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.866123 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866164 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.866166 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866205 7f4309d45700  1
 mon.node-14@2

Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
Dumpling

ceph-0.67.9-16.g69a99e6

I guess it shouldn’t be logging it at all?


Thanks
Jan


 On 02 Jun 2015, at 20:42, Somnath Roy somnath@sandisk.com wrote:
 
 Which code base are you using ?
 
 -Original Message-
 From: Jan Schermer [mailto:j...@schermer.cz] 
 Sent: Tuesday, June 02, 2015 11:41 AM
 To: Somnath Roy
 Cc: ceph-users
 Subject: Re: [ceph-users] ceph-mon logging like crazy because?
 
 We actually have
 debug mon = 0”
 
 It was always spammy, but this is too spammy - on one mon the log size is 
 500MB since morning. on other node it’s 17GB and about 16.5GB of that is 
 within one hour - something’s not right there and this is likely just a 
 symptom…
 
 Jan
 
 
 On 02 Jun 2015, at 20:36, Somnath Roy somnath@sandisk.com wrote:
 
 I think with the latest version of code it is printing only for log level 5, 
 earlier it was 1. Here is the link where I had some conversation about this 
 earlier with Sage.
 
 http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881
 
 So, IMO nothing to worry about other than log spam here which is fixed 
 in the latest build or you can fix it with debug mon = 0/0
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
 Of Jan Schermer
 Sent: Tuesday, June 02, 2015 11:33 AM
 To: ceph-users
 Subject: Re: [ceph-users] ceph-mon logging like crazy because?
 
 Another follow-up.
 The whole madness started with “mon compact” which we run from cron (else 
 leveldb eats all space). It’s been running for about 14 days now with no 
 incident.
 
 2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 
 handle_command mon_command({prefix: compact} v 0) v1
 2015-06-02 16:40:23.646514 7f430a746700  1 
 mon.node-14@2(peon).paxos(paxos updating c 36596805..36597321) 
 lease_timeout -- calling new election
 2015-06-02 16:40:23.646947 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646947 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646953 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646954 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646960 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646961 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646963 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646964 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646968 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646969 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646971 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646972 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646976 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646977 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646979 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646980 lease_expire=0.00 has 
 v0 lc 3659
 7321
 
 
 The sequence that follows is
 probing recovering
 electing recovering
 peon recovering
 peon active (and this is the madness)
 
 It logs much less now, but the issue is still here…
 
 Jan
 
 On 02 Jun 2015, at 20:17, Jan Schermer j...@schermer.cz wrote:
 
 Actually looks like it stopped, but here’s a more representative 
 sample (notice how often it logged this!)
 
 v0 lc 36602135
 2015-06-02 17:39:59.865833 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865860 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865886 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865944 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865989 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866025

Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
Another follow-up.
The whole madness started with “mon compact” which we run from cron (else 
leveldb eats all space). It’s been running for about 14 days now with no 
incident.

2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 
handle_command mon_command({prefix: compact} v 0) v1
2015-06-02 16:40:23.646514 7f430a746700  1 mon.node-14@2(peon).paxos(paxos 
updating c 36596805..36597321) lease_timeout -- calling new election
2015-06-02 16:40:23.646947 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646947 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646953 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646954 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646960 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646961 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646963 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646964 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646968 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646969 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646971 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646972 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646976 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646977 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646979 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646980 
lease_expire=0.00 has v0 lc 3659
7321


The sequence that follows is
probing recovering
electing recovering
peon recovering
peon active (and this is the madness)

It logs much less now, but the issue is still here…

Jan

 On 02 Jun 2015, at 20:17, Jan Schermer j...@schermer.cz wrote:
 
 Actually looks like it stopped, but here’s a more representative sample
 (notice how often it logged this!)
 
 v0 lc 36602135
 2015-06-02 17:39:59.865833 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865834 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865860 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865861 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865886 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865887 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865944 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865946 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865989 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865992 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866025 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866027 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866072 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866074 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866121 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866123 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866164 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866166 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866205 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866207 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866244 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866246 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866285 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866287 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866325 7f4309d45700  1 

[ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
Our mons just went into a logging frenzy.

We have 3 mons in the cluster, and they mostly log stuff like this

2015-06-02 18:00:48.749386 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
active c 36603331..36604063) is_readable now=2015-06-02 18:00:48.749389 
lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
2015-06-02 18:00:49.025179 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.025187 
lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
2015-06-02 18:00:49.025640 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.025642 
lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
2015-06-02 18:00:49.026132 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.026134 
lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
2015-06-02 18:00:49.028388 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.028393 
lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063


There are few lines every second, sometimes more, sometimes less (tell me if 
that’s normal. I’m not sure)

Two of them went completely haywire, one log is 17GB now and rising. It’s still 
mostly the same content, just more frequent:

2015-06-02 18:09:00.879950 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.879956 
lease_expire=0.00 has v0 lc 36604772
2015-06-02 18:09:00.879968 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.879969 
lease_expire=0.00 has v0 lc 36604772
2015-06-02 18:09:00.954835 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.954843 
lease_expire=0.00 has v0 lc 36604772
2015-06-02 18:09:00.954860 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.954861 
lease_expire=0.00 has v0 lc 36604772
2015-06-02 18:09:01.249648 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249668 
lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773
2015-06-02 18:09:01.249697 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249699 
lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773
2015-06-02 18:09:01.249708 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249709 
lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773
2015-06-02 18:09:01.249736 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249736 
lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773


Any idea what it might be? Clocks look synced, no other aparent problem that I 
can see, the cluster is working.
I’d like to know why this happened before I restart the unhealthy mons which (I 
hope) will fix this.

Thanks

Jan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Somnath Roy
Which code base are you using ?

-Original Message-
From: Jan Schermer [mailto:j...@schermer.cz] 
Sent: Tuesday, June 02, 2015 11:41 AM
To: Somnath Roy
Cc: ceph-users
Subject: Re: [ceph-users] ceph-mon logging like crazy because?

We actually have
debug mon = 0”

It was always spammy, but this is too spammy - on one mon the log size is 500MB 
since morning. on other node it’s 17GB and about 16.5GB of that is within one 
hour - something’s not right there and this is likely just a symptom…

Jan


On 02 Jun 2015, at 20:36, Somnath Roy somnath@sandisk.com wrote:
 
 I think with the latest version of code it is printing only for log level 5, 
 earlier it was 1. Here is the link where I had some conversation about this 
 earlier with Sage.
 
 http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881
 
 So, IMO nothing to worry about other than log spam here which is fixed 
 in the latest build or you can fix it with debug mon = 0/0
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
 Of Jan Schermer
 Sent: Tuesday, June 02, 2015 11:33 AM
 To: ceph-users
 Subject: Re: [ceph-users] ceph-mon logging like crazy because?
 
 Another follow-up.
 The whole madness started with “mon compact” which we run from cron (else 
 leveldb eats all space). It’s been running for about 14 days now with no 
 incident.
 
 2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 
 handle_command mon_command({prefix: compact} v 0) v1
 2015-06-02 16:40:23.646514 7f430a746700  1 
 mon.node-14@2(peon).paxos(paxos updating c 36596805..36597321) 
 lease_timeout -- calling new election
 2015-06-02 16:40:23.646947 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646947 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646953 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646954 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646960 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646961 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646963 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646964 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646968 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646969 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646971 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646972 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646976 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646977 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646979 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646980 lease_expire=0.00 has 
 v0 lc 3659
 7321
 
 
 The sequence that follows is
 probing recovering
 electing recovering
 peon recovering
 peon active (and this is the madness)
 
 It logs much less now, but the issue is still here…
 
 Jan
 
 On 02 Jun 2015, at 20:17, Jan Schermer j...@schermer.cz wrote:
 
 Actually looks like it stopped, but here’s a more representative 
 sample (notice how often it logged this!)
 
 v0 lc 36602135
 2015-06-02 17:39:59.865833 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865860 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865886 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865944 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865989 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866025 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.866027 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015

Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
We actually have
debug mon = 0”

It was always spammy, but this is too spammy - on one mon the log size is 500MB 
since morning. on other node it’s 17GB and about 16.5GB of that is within one 
hour - something’s not right there and this is likely just a symptom…

Jan


On 02 Jun 2015, at 20:36, Somnath Roy somnath@sandisk.com wrote:
 
 I think with the latest version of code it is printing only for log level 5, 
 earlier it was 1. Here is the link where I had some conversation about this 
 earlier with Sage.
 
 http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881
 
 So, IMO nothing to worry about other than log spam here which is fixed in the 
 latest build or you can fix it with debug mon = 0/0
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan 
 Schermer
 Sent: Tuesday, June 02, 2015 11:33 AM
 To: ceph-users
 Subject: Re: [ceph-users] ceph-mon logging like crazy because?
 
 Another follow-up.
 The whole madness started with “mon compact” which we run from cron (else 
 leveldb eats all space). It’s been running for about 14 days now with no 
 incident.
 
 2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 
 handle_command mon_command({prefix: compact} v 0) v1
 2015-06-02 16:40:23.646514 7f430a746700  1 mon.node-14@2(peon).paxos(paxos 
 updating c 36596805..36597321) lease_timeout -- calling new election
 2015-06-02 16:40:23.646947 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
 recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646947 
 lease_expire=0.00 has v0 lc 3659
 7321
 2015-06-02 16:40:23.646953 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
 recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646954 
 lease_expire=0.00 has v0 lc 3659
 7321
 2015-06-02 16:40:23.646960 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
 recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646961 
 lease_expire=0.00 has v0 lc 3659
 7321
 2015-06-02 16:40:23.646963 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
 recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646964 
 lease_expire=0.00 has v0 lc 3659
 7321
 2015-06-02 16:40:23.646968 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
 recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646969 
 lease_expire=0.00 has v0 lc 3659
 7321
 2015-06-02 16:40:23.646971 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
 recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646972 
 lease_expire=0.00 has v0 lc 3659
 7321
 2015-06-02 16:40:23.646976 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
 recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646977 
 lease_expire=0.00 has v0 lc 3659
 7321
 2015-06-02 16:40:23.646979 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
 recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646980 
 lease_expire=0.00 has v0 lc 3659
 7321
 
 
 The sequence that follows is
 probing recovering
 electing recovering
 peon recovering
 peon active (and this is the madness)
 
 It logs much less now, but the issue is still here…
 
 Jan
 
 On 02 Jun 2015, at 20:17, Jan Schermer j...@schermer.cz wrote:
 
 Actually looks like it stopped, but here’s a more representative
 sample (notice how often it logged this!)
 
 v0 lc 36602135
 2015-06-02 17:39:59.865833 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865860 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865886 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865944 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865989 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866025 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.866027 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866072 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.866074 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866121 7f4309d45700  1
 mon.node-14@2(peon