Wei-Chiu Chuang created HDFS-14492:
--------------------------------------

             Summary: Snapshot memory leak
                 Key: HDFS-14492
                 URL: https://issues.apache.org/jira/browse/HDFS-14492
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: snapshots
    Affects Versions: 2.6.0
         Environment: CDH5.14.4
            Reporter: Wei-Chiu Chuang


We recently examined the NameNode heap dump of a big, heavy snapshot user, 
trying to trim some fat, and surely enough we found memory leak in it: when 
snapshots are removed, the corresponding data structures are not removed.

This cluster has 586 million file system objects (286 million files, 287 
million blocks, 13 million directories), using around 132gb of heap.

While only 44.5 million files have snapshotted copies, 
(INodeFileAttributes$SnapshotCopy), most inodes (nearly 212 million) have 
FileWithSnapshotFeature and FileDiffList. Those inodes had snapshotted copies 
at some point in the past, but after snapshots are removed, those data 
structured are still kept in the heap.

INode$Feature = 32.5 byte on average, FileWithSnapshotFeature = 32 bytes, 
FileDiffList = 24 bytes. It may not sound a lot, but they add up quickly in 
large clusters like this. In this cluster, a whopping 13.8gb of memory could 
have been saved:  ((32.5 + 32 + 24) bytes * (211997769 -  44572380) =~ 13.8gb) 
if not for this bug. That is more than 10% of savings in heap size.

Heap histogram for reference:
{noformat}
num #instances #bytes class name
 ----------------------------------------------
 1: 286418254 27496152384 org.apache.hadoop.hdfs.server.namenode.INodeFile
 2: 287322227 18388622528 
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo
 3: 227899550 17144816120 [B
 4: 287324031 13769408616 
[Lorg.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo;
 5: 71352116 12353841568 [Ljava.lang.Object;
 6: 286322650 9170335840 
[Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
 7: 235632329 7658462416 [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature;
 8: 4 7046430816 [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement;
 9: 211997769 6783928608 
org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature
 10: 211997769 5087946456 
org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList
 11: 76586261 3780468856 [I
 12: 44572380 3209211360 
org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy
 13: 58634517 2345380680 java.util.ArrayList
 14: 44572380 2139474240 
org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff
 15: 76582416 1837977984 org.apache.hadoop.hdfs.server.namenode.AclFeature
 16: 12907668 1135874784 
org.apache.hadoop.hdfs.server.namenode.INodeDirectory{noformat}
[~szetszwo] [~arpaga] [~smeng] [~shashikant]  any thoughts?

I am thinking that inside 

INodeFile#destroyAndCollectBlocks(), in addition to cleaning up file diffs, it 
should also remove FileWithSnapshotFeature. I am not familiar with the snapshot 
implementation, so any guidance is greatly appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to