On 12/10/25 6:25 AM, 张祖建 via discuss wrote: > FYI, I'm using OVN 24.03 and OVS 3.3. > > 张祖建 <[email protected] <mailto:[email protected]>> 于2025年12月10日周三 > 13:21写道: > > Hi, all: > > In my 3-node ovn nb cluster, there was 2 leaders elected at the same time: > > > /Name: OVN_Northbound > Cluster ID: 42f9 (42f9d799-8be7-43d8-a0c3-7d27bad687f2) > Server ID: 4400 (44006f96-ec88-41d5-be54-8a36dca481a3) > Address: tcp:[10.81.50.33]:6643 > Status: cluster member > Role: leader > Term: 12157 > Leader: self > Vote: self > > Last Election started 43508320 ms ago, reason: timeout > Last Election won: 43501443 ms ago > Election timer: 5000 > Log: [761097, 762117] > Entries not yet committed: 0 > Entries not yet applied: 0 > Connections: ->f871 ->6e32 <-6e32 <-f871 > Disconnections: 1412579 > Servers: > f871 (f871 at tcp:[10.81.50.34]:6643) next_index=761097 match_index=0 > last msg 10 ms ago > 6e32 (6e32 at tcp:[10.81.50.32]:6643) next_index=762117 > match_index=762116 last msg 987 ms ago > 4400 (4400 at tcp:[10.81.50.33]:6643) (self) next_index=761710 > match_index=762116 > > > Name: OVN_Northbound > Cluster ID: 42f9 (42f9d799-8be7-43d8-a0c3-7d27bad687f2) > Server ID: f871 (f87101dc-fcaf-4849-b630-46ad005849a5) > Address: tcp:[10.81.50.34]:6643 > Status: cluster member > Role: leader > Term: 12157 > Leader: self > Vote: self > > Last Election started 43502055 ms ago, reason: timeout > Last Election won: 43501652 ms ago > Election timer: 5000 > Log: [761087, 761724] > Entries not yet committed: 0 > Entries not yet applied: 0 > Connections: ->6e32 ->4400 <-6e32 <-4400 > Disconnections: 815567 > Servers: > 6e32 (6e32 at tcp:[10.81.50.32]:6643) next_index=761724 > match_index=761723 last msg 1525 ms ago > f871 (f871 at tcp:[10.81.50.34]:6643) (self) next_index=761710 > match_index=761723 > 4400 (4400 at tcp:[10.81.50.33]:6643) next_index=761087 match_index=0 > last msg 2 ms ago > > > Name: OVN_Northbound > Cluster ID: 42f9 (42f9d799-8be7-43d8-a0c3-7d27bad687f2) > Server ID: 6e32 (6e322210-900e-406d-b9c7-be449d71fe0c) > Address: tcp:[10.81.50.32]:6643 > Status: cluster member > Role: follower > Term: 12157 > Leader: f871 > Vote: 4400 > > Last Election started 43504174 ms ago, reason: timeout > Election timer: 5000 > Log: [761093, 762117] > Entries not yet committed: 0 > Entries not yet applied: 0 > Connections: ->f871 ->4400 <-4400 <-f871 > Disconnections: 602205 > Servers: > 6e32 (6e32 at tcp:[10.81.50.32]:6643) (self) > f871 (f871 at tcp:[10.81.50.34]:6643) last msg 67 ms ago > 4400 (4400 at tcp:[10.81.50.33]:6643) last msg 1413 ms ago/ > > Finally, server 4400 transferred the leadership to write a snapshot and > the cluster came back to normal. > > Is there any idea why it happened? Thanks. > > Attachment is the ovsdb-server logs.
Hi. Thanks for the report and the logs! This is a very interesting issue. This is definitely should not be happening and at the moment I can't see from the code how we could end up in this situation. The main concern in the logs are messages about vote changes like this: server f871 changed its vote from f871 to 4400 If one server changes its vote within a single term, this indeed can cause two servers to be elected leaders. But it's not clear to me, why the vote change would happen in the first place. It should not be possible. I'll try to have a deeper look at this problem a bit later. Best regards, Ilya Maximets. _______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
