[ 
https://issues.apache.org/jira/browse/HDFS-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042578#comment-13042578
 ] 

Todd Lipcon commented on HDFS-1149:
-----------------------------------

A few nits:

- for DataNode.setHeartbeatsEnabled, I think it would be better to make it 
package-private, and then bounce through the "DataNodeAdapter" class to get at 
it. I also think it would be clearer if we inverted its meaning and renamed it 
to {{heartbeatsDisabledForTests}} - that way when reading the code later it 
will be clear that this is always false in normal operation.
- Same goes for all of the new public members in LeaseManager/Lease -- I think 
you can just move the getLeaseByPath function into NameNodeAdapter, then it can 
all stay package-protected, right?
- In the test case, I think it's better to call {{stm.hflush()}} after the 
writer has lost its lease -- this is a DN-only operation, which means that it's 
verifying that the lease recovery has gone all the way through, not just a NN 
state change. The fact that you check isUnderConstruction should already do 
that as well, but just a double-check. Then you can close the stream as well 
and check for the same exception.
- I think the new NAMENODE_LEASE_MANAGER_SLEEP_TIME is probably better named 
NAMENODE_LEASE_RECHECK_INTERVAL (more consistent with other variables like 
{{heartbeatRecheckInterval}} and {{replicationRecheckInterval}})

Other concern:
- Does this interact correctly with lease maintenance on rename/delete? I think 
so... but it would be good to add the following tests:

Test A:
1) client creates file /dir_a/file and leaves it open
2) client renames /dir_a to /dir_b   (this calls LeaseManager.changeLease)
3) client dies, so lease recovery happens
4) NN reassigns lease to NN_Recovery
5) NN restarts and loads edits: NN_Recovery should own the lease on the new 
location of the file

[ this tests that on edit log replay, the lease is properly tracked to the new 
name of the file ]

Test B:
1) client creates file /file and leaves it open
2) client deletes file /file
3) client dies, so lease recovery happens
4) NN reassigns lease to NN_Recovery
5) NN restarts and loads edits: no NPEs or anything


I'm also wondering if we have an issue with regards to safeMode. In theory we 
should never write anything to the edit log while in safemode, but I don't see 
safemode checks in internalReleaseLease. This is similar to the bugs seen in 
HDFS-988 if you want some background


> Lease reassignment is not persisted to edit log
> -----------------------------------------------
>
>                 Key: HDFS-1149
>                 URL: https://issues.apache.org/jira/browse/HDFS-1149
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.21.0, 0.22.0, 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Aaron T. Myers
>             Fix For: 0.23.0
>
>         Attachments: hdfs-1149.0.patch
>
>
> During lease recovery, the lease gets reassigned to a special NN holder. This 
> is not currently persisted to the edit log, which means that after an NN 
> restart, the original leaseholder could end up allocating more blocks or 
> completing a file that has already started recovery.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to