[ https://issues.apache.org/jira/browse/HDFS-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon updated HDFS-3901: ------------------------------ Attachment: hdfs-3901.txt Oops, it was just a missing import of com.google.common.base.Stopwatch, which I had in my local tree from a different patch. Here's an updated diff against the tip of the branch > QJM: send 'heartbeat' messages to JNs even when they are out-of-sync > -------------------------------------------------------------------- > > Key: HDFS-3901 > URL: https://issues.apache.org/jira/browse/HDFS-3901 > Project: Hadoop HDFS > Issue Type: Sub-task > Affects Versions: QuorumJournalManager (HDFS-3077) > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Attachments: hdfs-3901.txt, hdfs-3901.txt > > > Currently, if one of the JNs has fallen out of sync with the writer (eg > because it went down), it will be marked as such until the next log roll. > This causes the writer to no longer send any RPCs to it. This means that the > JN's metrics will no longer reflect up-to-date information on how far laggy > they are. > This patch will introduce a heartbeat() RPC that has no effect except to > update the JN's view of the latest committed txid. When the writer is talking > to an out-of-sync logger, it will send these heartbeat messages once a second. > In a future patch we can extend the heartbeat functionality so that NNs > periodically check their connections to JNs if no edits arrive, such that a > fenced NN won't accidentally continue to serve reads indefinitely. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira