[ 
https://issues.apache.org/jira/browse/HDFS-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051887#comment-14051887
 ] 

Kihwal Lee commented on HDFS-6618:
----------------------------------

[~cmccabe], thanks for the review.
bq. What happens if a QuotaExceededException is thrown here .....
This is indeed problematic, but is also the case for existing code and what you 
are suggesting.  If an exception is thrown in the middle of deleting, the 
partial delete is not undone. The inode at the top of the tree being deleted 
and potentially more will have already unlinked and the rest will remain 
linked, but unreachable.  If inodes are  removed altogether at the end, none of 
inodes will get removed from inodeMap, when an exception is thrown. This will 
cause inodes and blocks to leak.  If we remove inodes as we go, at leaset some 
inodes will get removed in the same situation. Either way things will leak, but 
to a lesser degree in the latter case. But I wouldn't say the latter is 
superior because of this difference. I am just saying it's no worse.

One of the key motivation of removing inodes inline was to avoid overhead of 
building up large data structure when deleting a large tree. Although now it's 
backed by {{ChunkedArrayList}}, there will be lots of realloc and quite a bit 
of memory consumption. All or part of them may be promoted and remain in the 
heap until the next old gen collection.  This may be acceptable if we are doing 
deferred removal outside the lock. But since we are trying to do it inside both 
FSNamesystem and FSDirectory lock, building the list is just a waste.

About leaking inodes and blocks: 
- inodes were removed from inodeMap but blocks weren't. This includes adding a 
block after the inode is deleted due to the delete-addBlock race.  Since the 
block is not removed from blocksMap, but the block still has reference to the 
block collection (i.e. inode).  This will cause memory leak, which will 
disappear when namenode is restarted.
- unlinked/deleted inodes were not deleted from inodeMap.  The deleted inodes 
will remain in memory. If blocks were also not removed from blocksMap, they 
will remain in memory.  If blocks were collected, but not removed from 
blocksMap, they will disappear after restart. When saving fsimage, the orphaned 
inodes will be saved in the inode section. The way it saves 
INodeDirectorySection also causes all leaked (still linked) children and blocks 
to be saved. When loading the fsimage, the leak will be recreated in memory.

I am a bit depressed after writing this. Let's fix things one at time...

> Remove deleted INodes from INodeMap right away
> ----------------------------------------------
>
>                 Key: HDFS-6618
>                 URL: https://issues.apache.org/jira/browse/HDFS-6618
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.5.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Blocker
>         Attachments: HDFS-6618.AbstractList.patch, 
> HDFS-6618.inodeRemover.patch, HDFS-6618.inodeRemover.v2.patch, HDFS-6618.patch
>
>
> After HDFS-6527, we have not seen the edit log corruption for weeks on 
> multiple clusters until yesterday. Previously, we would see it within 30 
> minutes on a cluster.
> But the same condition was reproduced even with HDFS-6527.  The only 
> explanation is that the RPC handler thread serving {{addBlock()}} was 
> accessing stale parent value.  Although nulling out parent is done inside the 
> {{FSNamesystem}} and {{FSDirectory}} write lock, there is no memory barrier 
> because there is no "synchronized" block involved in the process.
> I suggest making parent volatile.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to