[ https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15797465#comment-15797465 ]
Akira Ajisaka commented on HDFS-11180: -------------------------------------- bq. I also found the reason that the test is currently succeeding. It appears that the JMX cache is being populated before the lock is taken on the FSEditLog, then once the lock is taken the metrics are able to be read because they are cached (and so the original method requiring synchronization is not used). I confirmed this using the logs available and also if you add a Thread.sleep(10000) (equivalent to the default JMX cache TTL) at the start of the synchronization block in branch-2.7 the test will fail. I couldn't reproduce this, but agreed with you. Let's update the tests in a separate jira. > Intermittent deadlock in NameNode when failover happens. > -------------------------------------------------------- > > Key: HDFS-11180 > URL: https://issues.apache.org/jira/browse/HDFS-11180 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.6.0 > Reporter: Abhishek Modi > Assignee: Akira Ajisaka > Priority: Blocker > Labels: high-availability > Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2, 2.6.6 > > Attachments: HDFS-11180-branch-2.01.patch, > HDFS-11180-branch-2.6.01.patch, HDFS-11180-branch-2.7.01.patch, > HDFS-11180-branch-2.8.01.patch, HDFS-11180.00.patch, HDFS-11180.01.patch, > HDFS-11180.02.patch, HDFS-11180.03.patch, HDFS-11180.04.patch, jstack.log > > > It is happening due to metrics getting updated at the same time when failover > is happening. Please find attached jstack at that point of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org