Hi, I am running a Hadoop DFS on a cluster of 5 data nodes with a name node and one secondary name node.
I have 1788874 files and directories, 1465394 blocks = 3254268 total. Heap Size max is 3.47 GB. My problem is that I produce many small files. Therefore I have a cron job which just runs daily across the new files and copies them into bigger files and deletes the small files. Apart from this program, even a fsck kills the cluster. The problem is that, as soon as I start this program, the heap space of the name node reaches 100 %. What could be the problem? There are not many small files right now and still it doesn't work. I guess we have this problem since the upgrade to 0.17. Here is some additional data about the DFS: Capacity : 2 TB DFS Remaining : 1.19 TB DFS Used : 719.35 GB DFS Used% : 35.16 % Thanks for hints, Gert