We actually have "debug mon = 0” It was always spammy, but this is too spammy - on one mon the log size is 500MB since morning. on other node it’s 17GB and about 16.5GB of that is within one hour - something’s not right there and this is likely just a symptom…
Jan On 02 Jun 2015, at 20:36, Somnath Roy <somnath....@sandisk.com> wrote: > > I think with the latest version of code it is printing only for log level 5, > earlier it was 1. Here is the link where I had some conversation about this > earlier with Sage. > > http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881 > > So, IMO nothing to worry about other than log spam here which is fixed in the > latest build or you can fix it with debug mon = 0/0 > > Thanks & Regards > Somnath > > -----Original Message----- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan > Schermer > Sent: Tuesday, June 02, 2015 11:33 AM > To: ceph-users > Subject: Re: [ceph-users] ceph-mon logging like crazy because....? > > Another follow-up. > The whole madness started with “mon compact” which we run from cron (else > leveldb eats all space). It’s been running for about 14 days now with no > incident. > > 2015-06-02 16:40:01.624804 7f4309d45700 0 mon.node-14@2(peon) e3 > handle_command mon_command({"prefix": "compact"} v 0) v1 > 2015-06-02 16:40:23.646514 7f430a746700 1 mon.node-14@2(peon).paxos(paxos > updating c 36596805..36597321) lease_timeout -- calling new election > 2015-06-02 16:40:23.646947 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos > recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646947 > lease_expire=0.000000 has v0 lc 3659 > 7321 > 2015-06-02 16:40:23.646953 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos > recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646954 > lease_expire=0.000000 has v0 lc 3659 > 7321 > 2015-06-02 16:40:23.646960 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos > recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646961 > lease_expire=0.000000 has v0 lc 3659 > 7321 > 2015-06-02 16:40:23.646963 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos > recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646964 > lease_expire=0.000000 has v0 lc 3659 > 7321 > 2015-06-02 16:40:23.646968 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos > recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646969 > lease_expire=0.000000 has v0 lc 3659 > 7321 > 2015-06-02 16:40:23.646971 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos > recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646972 > lease_expire=0.000000 has v0 lc 3659 > 7321 > 2015-06-02 16:40:23.646976 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos > recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646977 > lease_expire=0.000000 has v0 lc 3659 > 7321 > 2015-06-02 16:40:23.646979 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos > recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646980 > lease_expire=0.000000 has v0 lc 3659 > 7321 > > > The sequence that follows is > probing recovering > electing recovering > peon recovering > peon active (and this is the madness) > > It logs much less now, but the issue is still here… > > Jan > >> On 02 Jun 2015, at 20:17, Jan Schermer <j...@schermer.cz> wrote: >> >> Actually looks like it stopped, but here’s a more representative >> sample (notice how often it logged this!) >> >> v0 lc 36602135 >> 2015-06-02 17:39:59.865833 7f4309d45700 1 >> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >> is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02 >> 17:40:04.221316 has >> v0 lc 36602135 >> 2015-06-02 17:39:59.865860 7f4309d45700 1 >> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >> is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02 >> 17:40:04.221316 has >> v0 lc 36602135 >> 2015-06-02 17:39:59.865886 7f4309d45700 1 >> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >> is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02 >> 17:40:04.221316 has >> v0 lc 36602135 >> 2015-06-02 17:39:59.865944 7f4309d45700 1 >> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >> is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02 >> 17:40:04.221316 has >> v0 lc 36602135 >> 2015-06-02 17:39:59.865989 7f4309d45700 1 >> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >> is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02 >> 17:40:04.221316 has >> v0 lc 36602135 >> 2015-06-02 17:39:59.866025 7f4309d45700 1 >> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >> is_readable now=2015-06-02 17:39:59.866027 lease_expire=2015-06-02 >> 17:40:04.221316 has >> v0 lc 36602135 >> 2015-06-02 17:39:59.866072 7f4309d45700 1 >> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >> is_readable now=2015-06-02 17:39:59.866074 lease_expire=2015-06-02 >> 17:40:04.221316 has >> v0 lc 36602135 >> 2015-06-02 17:39:59.866121 7f4309d45700 1 >> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >> is_readable now=2015-06-02 17:39:59.866123 lease_expire=2015-06-02 >> 17:40:04.221316 has >> v0 lc 36602135 >> 2015-06-02 17:39:59.866164 7f4309d45700 1 >> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >> is_readable now=2015-06-02 17:39:59.866166 lease_expire=2015-06-02 >> 17:40:04.221316 has >> v0 lc 36602135 >> 2015-06-02 17:39:59.866205 7f4309d45700 1 >> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >> is_readable now=2015-06-02 17:39:59.866207 lease_expire=2015-06-02 >> 17:40:04.221316 has >> v0 lc 36602135 >> 2015-06-02 17:39:59.866244 7f4309d45700 1 >> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >> is_readable now=2015-06-02 17:39:59.866246 lease_expire=2015-06-02 >> 17:40:04.221316 has >> v0 lc 36602135 >> 2015-06-02 17:39:59.866285 7f4309d45700 1 >> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >> is_readable now=2015-06-02 17:39:59.866287 lease_expire=2015-06-02 >> 17:40:04.221316 has >> v0 lc 36602135 >> 2015-06-02 17:39:59.866325 7f4309d45700 1 >> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >> is_readable now=2015-06-02 17:39:59.866327 lease_expire=2015-06-02 >> 17:40:04.221316 has >> v0 lc 36602135 >> >> >> >>> On 02 Jun 2015, at 20:14, Jan Schermer <j...@schermer.cz> wrote: >>> >>> Our mons just went into a logging frenzy. >>> >>> We have 3 mons in the cluster, and they mostly log stuff like this >>> >>> 2015-06-02 18:00:48.749386 7f1c08c0d700 1 >>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) >>> is_readable now=2015-06-02 18:00:48.749389 lease_expire=2015-06-02 >>> 18:00:53.507837 has v0 lc 36604063 >>> 2015-06-02 18:00:49.025179 7f1c08c0d700 1 >>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) >>> is_readable now=2015-06-02 18:00:49.025187 lease_expire=2015-06-02 >>> 18:00:53.507837 has v0 lc 36604063 >>> 2015-06-02 18:00:49.025640 7f1c08c0d700 1 >>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) >>> is_readable now=2015-06-02 18:00:49.025642 lease_expire=2015-06-02 >>> 18:00:53.507837 has v0 lc 36604063 >>> 2015-06-02 18:00:49.026132 7f1c08c0d700 1 >>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) >>> is_readable now=2015-06-02 18:00:49.026134 lease_expire=2015-06-02 >>> 18:00:53.507837 has v0 lc 36604063 >>> 2015-06-02 18:00:49.028388 7f1c08c0d700 1 >>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) >>> is_readable now=2015-06-02 18:00:49.028393 lease_expire=2015-06-02 >>> 18:00:53.507837 has v0 lc 36604063 >>> >>> >>> There are few lines every second, sometimes more, sometimes less >>> (tell me if that’s normal. I’m not sure) >>> >>> Two of them went completely haywire, one log is 17GB now and rising. It’s >>> still mostly the same content, just more frequent: >>> >>> 2015-06-02 18:09:00.879950 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) >>> is_readable now=2015-06-02 18:09:00.879956 lease_expire=0.000000 has >>> v0 lc 36604772 >>> 2015-06-02 18:09:00.879968 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) >>> is_readable now=2015-06-02 18:09:00.879969 lease_expire=0.000000 has >>> v0 lc 36604772 >>> 2015-06-02 18:09:00.954835 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) >>> is_readable now=2015-06-02 18:09:00.954843 lease_expire=0.000000 has >>> v0 lc 36604772 >>> 2015-06-02 18:09:00.954860 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) >>> is_readable now=2015-06-02 18:09:00.954861 lease_expire=0.000000 has >>> v0 lc 36604772 >>> 2015-06-02 18:09:01.249648 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) >>> is_readable now=2015-06-02 18:09:01.249668 lease_expire=2015-06-02 >>> 18:09:06.091738 has v0 lc 36604773 >>> 2015-06-02 18:09:01.249697 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) >>> is_readable now=2015-06-02 18:09:01.249699 lease_expire=2015-06-02 >>> 18:09:06.091738 has v0 lc 36604773 >>> 2015-06-02 18:09:01.249708 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) >>> is_readable now=2015-06-02 18:09:01.249709 lease_expire=2015-06-02 >>> 18:09:06.091738 has v0 lc 36604773 >>> 2015-06-02 18:09:01.249736 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) >>> is_readable now=2015-06-02 18:09:01.249736 lease_expire=2015-06-02 >>> 18:09:06.091738 has v0 lc 36604773 >>> >>> >>> Any idea what it might be? Clocks look synced, no other aparent problem >>> that I can see, the cluster is working. >>> I’d like to know why this happened before I restart the unhealthy mons >>> which (I hope) will fix this. >>> >>> Thanks >>> >>> Jan >>> >> > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ________________________________ > > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If the > reader of this message is not the intended recipient, you are hereby notified > that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly > prohibited. If you have received this communication in error, please notify > the sender by telephone or e-mail (as shown above) immediately and destroy > any and all copies of this message in your possession (whether hard copies or > electronically stored copies). > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com