[ https://issues.apache.org/jira/browse/HBASE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964879#action_12964879 ]
HBase Review Board commented on HBASE-3282: ------------------------------------------- Message from: "Jonathan Gray" <jg...@apache.org> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1259/ ----------------------------------------------------------- Review request for hbase and stack. Summary ------- We currently let go of dead servers once we finish their shutdown. We should hang on to them longer to deal with things like network partitions. I'm not a fan of SoftReferences so I decided on another approach. DeadServers now has a maximum number of servers to hold on to in the set (default 100). Once it reaches the max, it evicts the oldest. More code than I had hoped but nothing too crazy. This addresses bug HBASE-3282. http://issues.apache.org/jira/browse/HBASE-3282 Diffs ----- branches/0.90/src/main/java/org/apache/hadoop/hbase/master/DeadServer.java 1040221 branches/0.90/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1040221 branches/0.90/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 1040221 branches/0.90/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 1040221 Diff: http://review.cloudera.org/r/1259/diff Testing ------- Running unit tests now. Thanks, Jonathan > Need to retain DeadServers to ensure we don't allow previously expired RS > instances to rejoin cluster > ----------------------------------------------------------------------------------------------------- > > Key: HBASE-3282 > URL: https://issues.apache.org/jira/browse/HBASE-3282 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.90.0 > Reporter: Jonathan Gray > Assignee: Jonathan Gray > Fix For: 0.90.0, 0.92.0 > > > Currently we clear a server from the deadserver set once we finish processing > it's shutdown. However, certain circumstances (network partitions, race > conditions) could lead to the RS not doing a check-in until after the > shutdown has been processed. As-is, this RS will now be let back in to the > cluster rather than rejected with YouAreDeadException. > We should hang on to the dead servers so we always reject them. > One concern is that the set will grow indefinitely. One recommendation by > stack is to use SoftReferences. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.