Github user hanm commented on a diff in the pull request:
https://github.com/apache/zookeeper/pull/92#discussion_r97451281
--- Diff:
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -468,31 +469,33 @@ synchronized private boolean connectOne(long sid,
InetSocketAddress electionAddr
*/
synchronized void connectOne(long sid){
+ connectOne(sid, self.getLastSeenQuorumVerifier());
+ }
+
+ synchronized void connectOne(long sid, QuorumVerifier lastSeenQV){
if (senderWorkerMap.get(sid) != null) {
- LOG.debug("There is a connection already for server " + sid);
- return;
+ LOG.debug("There is a connection already for server " + sid);
+ return;
}
- synchronized(self) {
- boolean knownId = false;
- // Resolve hostname for the remote server before attempting to
- // connect in case the underlying ip address has changed.
- self.recreateSocketAddresses(sid);
- if (self.getView().containsKey(sid)) {
- knownId = true;
- if (connectOne(sid, self.getView().get(sid).electionAddr))
- return;
- }
- if (self.getLastSeenQuorumVerifier()!=null &&
self.getLastSeenQuorumVerifier().getAllMembers().containsKey(sid)
- && (!knownId ||
(self.getLastSeenQuorumVerifier().getAllMembers().get(sid).electionAddr !=
- self.getView().get(sid).electionAddr))) {
- knownId = true;
- if (connectOne(sid,
self.getLastSeenQuorumVerifier().getAllMembers().get(sid).electionAddr))
- return;
- }
- if (!knownId) {
- LOG.warn("Invalid server id: " + sid);
+ boolean knownId = false;
+ // Resolve hostname for the remote server before attempting to
+ // connect in case the underlying ip address has changed.
+ self.recreateSocketAddresses(sid);
+ if (self.getView().containsKey(sid)) {
--- End diff --
@shralex Thanks for review comments! Made two changes:
* Refactored the code to reuse getView results. This view is not passed in
as I thought that's simplified caller site.
* This code block inside connectOne is now synchronized with the same lock
that protecting other view / quorum verifiers of the same QuorumPeer. I think
this makes the code block semantically equivalent to the previous code block
before this change, where the code block was synchronizing on the whole
QuorumPeer 'self' with the intention that during the entire execution of
connectOne, accesses to configs are protected. I did not add any comments as
with the explicit synchronizing block, the semantic should be self explanatory.
My stress tests look good so far with latest changes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---