[
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968562#comment-13968562
]
Alexander Shraer commented on ZOOKEEPER-1807:
---------------------------------------------
Hi Flavio, Raul,
Befor ZK-107 the line was " for (QuorumServer server :
self.getVotingView().values()) {"
This patch basically brings this back. So if I understand correctly this wasn't
sending notifications to observers before.
But - everyone will send notifications to followers and if a follower receives
a message it will respond directly, even to an observer. My reasoning is that
FLE terminates once we have a quorum of the last committed config. So we could
only
possibly need votes from followers in the last committed config. Not from
observers. Observers may contact followers through the same logic and get
updated but this is not enforced by the termination rule of FLE. In the
attached test the observer finds out that he really is a follower (whose vote
is needed).
Alex
> Observers spam each other creating connections to the election addr
> -------------------------------------------------------------------
>
> Key: ZOOKEEPER-1807
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
> Project: ZooKeeper
> Issue Type: Bug
> Reporter: Raul Gutierrez Segales
> Assignee: Alexander Shraer
> Priority: Blocker
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch,
> ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch,
> ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png
>
>
> Hey [~shralex],
> I noticed today that my Observers are spamming each other trying to open
> connections to the election port. I've got tons of these:
> {noformat}
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a
> connection already for server 9
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a
> connection already for server 10
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a
> connection already for server 6
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a
> connection already for server 12
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a
> connection already for server 14
> {noformat}
> and so and so on ad nauseam.
> Now, looking around I found this inside FastLeaderElection.java from when you
> committed ZOOKEEPER-107:
> {noformat}
> private void sendNotifications() {
> - for (QuorumServer server : self.getVotingView().values()) {
> - long sid = server.id;
> -
> + for (long sid : self.getAllKnownServerIds()) {
> + QuorumVerifier qv = self.getQuorumVerifier();
> {noformat}
> Is that really desired? I suspect that is what's causing Observers to try to
> connect to each other (as opposed as just connecting to participants). I'll
> give it a try now and let you know. (Also, we use observer ids that are > 0,
> and I saw some parts of the code that might not deal with that assumption -
> so it could be that too..).
--
This message was sent by Atlassian JIRA
(v6.2#6252)