I'm running 0.19.0 on a 10 node cluster (8 core, 16GB RAM, 4x1.5TB). The current status of my FS is approximately 1 million files and directories, 950k blocks, and heap size of 7GB (16GB reserved). Average block replication is 3.8. I'm concerned that the heap size is steadily climbing... a 7GB heap is substantially higher per file that I have on a similar 0.18.2 cluster, which has closer to a 1GB heap. My typical usage model is 1) write a number of small files into HDFS (tens or hundreds of thousands at a time), 2) archive those files, 3) delete the originals. I've tried dropping the replication factor of the _index and _masterindex files without much effect on overall heap size. While I had trash enabled at one point, I've since disabled it and deleted the .Trash folders.
On namenode startup, I get a massive number of the following lines in my log file: 2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.processReport: block blk_-2389330910609345428_7332878 on 172.16.129.33:50010 size 798080 does not belong to any file. 2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-2389330910609345428 is added to invalidSet of 172.16.129.33:50010 I suspect the original files may be left behind and causing the heap size bloat. Is there any accounting mechanism to determine what is contributing to my heap size? Thanks, Sean