[ https://issues.apache.org/jira/browse/HDFS-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042578#comment-13042578 ]
Todd Lipcon commented on HDFS-1149: ----------------------------------- A few nits: - for DataNode.setHeartbeatsEnabled, I think it would be better to make it package-private, and then bounce through the "DataNodeAdapter" class to get at it. I also think it would be clearer if we inverted its meaning and renamed it to {{heartbeatsDisabledForTests}} - that way when reading the code later it will be clear that this is always false in normal operation. - Same goes for all of the new public members in LeaseManager/Lease -- I think you can just move the getLeaseByPath function into NameNodeAdapter, then it can all stay package-protected, right? - In the test case, I think it's better to call {{stm.hflush()}} after the writer has lost its lease -- this is a DN-only operation, which means that it's verifying that the lease recovery has gone all the way through, not just a NN state change. The fact that you check isUnderConstruction should already do that as well, but just a double-check. Then you can close the stream as well and check for the same exception. - I think the new NAMENODE_LEASE_MANAGER_SLEEP_TIME is probably better named NAMENODE_LEASE_RECHECK_INTERVAL (more consistent with other variables like {{heartbeatRecheckInterval}} and {{replicationRecheckInterval}}) Other concern: - Does this interact correctly with lease maintenance on rename/delete? I think so... but it would be good to add the following tests: Test A: 1) client creates file /dir_a/file and leaves it open 2) client renames /dir_a to /dir_b (this calls LeaseManager.changeLease) 3) client dies, so lease recovery happens 4) NN reassigns lease to NN_Recovery 5) NN restarts and loads edits: NN_Recovery should own the lease on the new location of the file [ this tests that on edit log replay, the lease is properly tracked to the new name of the file ] Test B: 1) client creates file /file and leaves it open 2) client deletes file /file 3) client dies, so lease recovery happens 4) NN reassigns lease to NN_Recovery 5) NN restarts and loads edits: no NPEs or anything I'm also wondering if we have an issue with regards to safeMode. In theory we should never write anything to the edit log while in safemode, but I don't see safemode checks in internalReleaseLease. This is similar to the bugs seen in HDFS-988 if you want some background > Lease reassignment is not persisted to edit log > ----------------------------------------------- > > Key: HDFS-1149 > URL: https://issues.apache.org/jira/browse/HDFS-1149 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.21.0, 0.22.0, 0.23.0 > Reporter: Todd Lipcon > Assignee: Aaron T. Myers > Fix For: 0.23.0 > > Attachments: hdfs-1149.0.patch > > > During lease recovery, the lease gets reassigned to a special NN holder. This > is not currently persisted to the edit log, which means that after an NN > restart, the original leaseholder could end up allocating more blocks or > completing a file that has already started recovery. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira