[ https://issues.apache.org/jira/browse/HDFS-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938880#comment-13938880 ]
Todd Lipcon commented on HDFS-6089: ----------------------------------- Yep, everything Andrew said makes sense to me. Maybe we should just have a shorter timeout on the rollEditLog call? Or somehow interrupt the RPC more quickly during a transitionToActive call (given in that case we know that the previous active is likely dead)? > Standby NN while transitioning to active throws a connection refused error > when the prior active NN process is suspended > ------------------------------------------------------------------------------------------------------------------------ > > Key: HDFS-6089 > URL: https://issues.apache.org/jira/browse/HDFS-6089 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha > Affects Versions: 2.4.0 > Reporter: Arpit Gupta > Assignee: Jing Zhao > Attachments: HDFS-6089.000.patch, HDFS-6089.001.patch > > > The following scenario was tested: > * Determine Active NN and suspend the process (kill -19) > * Wait about 60s to let the standby transition to active > * Get the service state for nn1 and nn2 and make sure nn2 has transitioned to > active. > What was noticed that some times the call to get the service state of nn2 got > a socket time out exception. -- This message was sent by Atlassian JIRA (v6.2#6252)