[ 
https://issues.apache.org/jira/browse/HDFS-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-4656:
---------------------------------

    Attachment: HDFS-4656.patch

Trivial patch attached. To test this I added some extra logging both to the 
heartbeat loop itself and inside the {{if}} block mentioned in the issue 
description. Without the patch I saw output like this:

{noformat}
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> actually performing heartbeat
13/04/01 17:20:37 INFO datanode.BlockPoolSliceScanner: Verification succeeded 
for BP-193129671-172.21.3.106-1357591728674:blk_2854604910635513445_147059
13/04/01 17:20:38 INFO datanode.BlockPoolSliceScanner: Verification succeeded 
for BP-193129671-172.21.3.106-1357591728674:blk_-7126490035827800141_147054
13/04/01 17:20:39 INFO datanode.BlockPoolSliceScanner: Verification succeeded 
for BP-193129671-172.21.3.106-1357591728674:blk_-7998009805522854247_147131
13/04/01 17:20:40 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:40 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:40 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:40 INFO datanode.DataNode: ====> actually performing heartbeat
13/04/01 17:20:40 INFO datanode.BlockPoolSliceScanner: Verification succeeded 
for BP-193129671-172.21.3.106-1357591728674:blk_-7289451447776161689_147106
13/04/01 17:20:41 INFO datanode.BlockPoolSliceScanner: Verification succeeded 
for BP-193129671-172.21.3.106-1357591728674:blk_-3895917754682539115_147098
13/04/01 17:20:42 INFO datanode.BlockPoolSliceScanner: Verification succeeded 
for BP-193129671-172.21.3.106-1357591728674:blk_1182110057394126316_147123
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> actually performing heartbeat
{noformat}

With the patch I see output like this:
{noformat}
13/04/01 17:22:45 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:22:45 INFO datanode.DataNode: ====> actually performing heartbeat
13/04/01 17:22:46 INFO datanode.BlockPoolSliceScanner: Verification succeeded 
for BP-193129671-172.21.3.106-1357591728674:blk_-5664468450728548846_148227
13/04/01 17:22:47 INFO datanode.BlockPoolSliceScanner: Verification succeeded 
for BP-193129671-172.21.3.106-1357591728674:blk_5505984176030032691_148190
13/04/01 17:22:48 INFO datanode.BlockPoolSliceScanner: Verification succeeded 
for BP-193129671-172.21.3.106-1357591728674:blk_-847450075981641452_148208
13/04/01 17:22:48 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:22:48 INFO datanode.DataNode: ====> actually performing heartbeat
13/04/01 17:22:49 INFO datanode.BlockPoolSliceScanner: Verification succeeded 
for BP-193129671-172.21.3.106-1357591728674:blk_-5755422789655200141_148209
13/04/01 17:22:50 INFO datanode.BlockPoolSliceScanner: Verification succeeded 
for BP-193129671-172.21.3.106-1357591728674:blk_-4066581005605116272_148148
13/04/01 17:22:51 INFO datanode.BlockPoolSliceScanner: Verification succeeded 
for BP-193129671-172.21.3.106-1357591728674:blk_-3643658690040746645_148232
13/04/01 17:22:51 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:22:51 INFO datanode.DataNode: ====> actually performing heartbeat
13/04/01 17:22:52 INFO datanode.BlockPoolSliceScanner: Verification succeeded 
for BP-193129671-172.21.3.106-1357591728674:blk_7451729659994128063_148176
13/04/01 17:22:53 INFO datanode.BlockPoolSliceScanner: Verification succeeded 
for BP-193129671-172.21.3.106-1357591728674:blk_7633597047916157767_148158
13/04/01 17:22:54 INFO datanode.BlockPoolSliceScanner: Verification succeeded 
for BP-193129671-172.21.3.106-1357591728674:blk_8386436294983826146_148194
13/04/01 17:22:54 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:22:54 INFO datanode.DataNode: ====> actually performing heartbeat
13/04/01 17:22:55 INFO datanode.BlockPoolSliceScanner: Verification succeeded 
for BP-193129671-172.21.3.106-1357591728674:blk_-4489059851090749136_148224
13/04/01 17:22:56 INFO datanode.BlockPoolSliceScanner: Verification succeeded 
for BP-193129671-172.21.3.106-1357591728674:blk_-5331285809567165401_148151
13/04/01 17:22:57 INFO datanode.BlockPoolSliceScanner: Verification succeeded 
for BP-193129671-172.21.3.106-1357591728674:blk_-4611051254925118595_148196
13/04/01 17:22:57 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:22:57 INFO datanode.DataNode: ====> actually performing heartbeat
{noformat}

No tests are included since the patch is trivial and would be  difficult to 
test in a non-convoluted manner.
                
> DN heartbeat loop can be briefly tight
> --------------------------------------
>
>                 Key: HDFS-4656
>                 URL: https://issues.apache.org/jira/browse/HDFS-4656
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.0.4-alpha
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>            Priority: Minor
>         Attachments: HDFS-4656.patch
>
>
> The DN hearbeat loop looks roughly like this:
> {code}
> if (now - timeOfLastHeartbeat > configuredHeartbeatInterval) {
>   // do heartbeat
> }
> timeToWait = configuredHeartbeatInterval - (now - timeOfLastHeartbeat)
> sleep(timeToWait)
> {code}
> The trouble is that since we sleep for exactly the heartbeat interval, and 
> then check to see if we have waited _more_ than that heartbeat interval, we 
> will very often have waited exactly the heartbeat interval (in millis), and 
> not more than it. In this case we will skip actually performing the heartbeat 
> and will calculcate timeToWait as being 0ms. The DN heartbeat loop will then 
> loop tightly for 1ms. The solution is just to change the "{{>}}" in the code 
> above to "{{>=}}".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to