Another follow-up.
The whole madness started with “mon compact” which we run from cron (else 
leveldb eats all space). It’s been running for about 14 days now with no 
incident.

2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 
handle_command mon_command({"prefix": "compact"} v 0) v1
2015-06-02 16:40:23.646514 7f430a746700  1 mon.node-14@2(peon).paxos(paxos 
updating c 36596805..36597321) lease_timeout -- calling new election
2015-06-02 16:40:23.646947 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646947 
lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646953 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646954 
lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646960 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646961 
lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646963 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646964 
lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646968 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646969 
lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646971 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646972 
lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646976 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646977 
lease_expire=0.000000 has v0 lc 3659
7321
2015-06-02 16:40:23.646979 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646980 
lease_expire=0.000000 has v0 lc 3659
7321


The sequence that follows is
probing recovering
electing recovering
peon recovering
peon active (and this is the madness)

It logs much less now, but the issue is still here…

Jan

> On 02 Jun 2015, at 20:17, Jan Schermer <j...@schermer.cz> wrote:
> 
> Actually looks like it stopped, but here’s a more representative sample
> (notice how often it logged this!)
> 
> v0 lc 36602135
> 2015-06-02 17:39:59.865833 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865834 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865860 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865861 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865886 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865887 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865944 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865946 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.865989 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865992 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866025 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866027 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866072 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866074 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866121 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866123 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866164 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866166 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866205 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866207 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866244 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866246 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866285 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866287 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 2015-06-02 17:39:59.866325 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
> active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866327 
> lease_expire=2015-06-02 17:40:04.221316 has
> v0 lc 36602135
> 
> 
> 
>> On 02 Jun 2015, at 20:14, Jan Schermer <j...@schermer.cz> wrote:
>> 
>> Our mons just went into a logging frenzy.
>> 
>> We have 3 mons in the cluster, and they mostly log stuff like this
>> 
>> 2015-06-02 18:00:48.749386 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
>> active c 36603331..36604063) is_readable now=2015-06-02 18:00:48.749389 
>> lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
>> 2015-06-02 18:00:49.025179 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
>> active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.025187 
>> lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
>> 2015-06-02 18:00:49.025640 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
>> active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.025642 
>> lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
>> 2015-06-02 18:00:49.026132 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
>> active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.026134 
>> lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
>> 2015-06-02 18:00:49.028388 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
>> active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.028393 
>> lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
>> 
>> 
>> There are few lines every second, sometimes more, sometimes less (tell me if 
>> that’s normal. I’m not sure)
>> 
>> Two of them went completely haywire, one log is 17GB now and rising. It’s 
>> still mostly the same content, just more frequent:
>> 
>> 2015-06-02 18:09:00.879950 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
>> updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.879956 
>> lease_expire=0.000000 has v0 lc 36604772
>> 2015-06-02 18:09:00.879968 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
>> updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.879969 
>> lease_expire=0.000000 has v0 lc 36604772
>> 2015-06-02 18:09:00.954835 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
>> updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.954843 
>> lease_expire=0.000000 has v0 lc 36604772
>> 2015-06-02 18:09:00.954860 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
>> updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.954861 
>> lease_expire=0.000000 has v0 lc 36604772
>> 2015-06-02 18:09:01.249648 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
>> active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249668 
>> lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773
>> 2015-06-02 18:09:01.249697 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
>> active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249699 
>> lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773
>> 2015-06-02 18:09:01.249708 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
>> active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249709 
>> lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773
>> 2015-06-02 18:09:01.249736 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
>> active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249736 
>> lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773
>> 
>> 
>> Any idea what it might be? Clocks look synced, no other aparent problem that 
>> I can see, the cluster is working.
>> I’d like to know why this happened before I restart the unhealthy mons which 
>> (I hope) will fix this.
>> 
>> Thanks
>> 
>> Jan
>> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to