Dennis Kubes wrote:

From time to time a message pops up on the mailing list about OOM errors for the namenode because of too many files. Most recently there was a 1.7 million file installation that was failing. I know the simple solution to this is to have a larger java heap for the namenode. But the non-simple way would be to convert the BlocksMap for the NameNode to be stored on disk and then queried and updated for operations. This would eliminate memory problems for large file installations but also might degrade performance slightly. Questions:

1) Is there any current work to allow the namenode to store on disk versus is memory? This could be a configurable option.

2) Besides possible slight degradation in performance, is there a reason why the BlocksMap shouldn't or couldn't be stored on disk?

As Doug mentioned the main worry is that this will drastically reduce performance. Part of the reason is that large chunk of the work on NamenNode happens under a single global lock. So if there is seek under this lock, it affects every thing else.

One good long term fix for this is to make it easy to split the namespace between multiple namenodes.. There was some work done on supporting "volumes". Also the fact that HDFS now supports symbolic links might make this easier for someone adventurous to use that as a quick hack to get around this.

If you have a rough prototype implementation I am sure there will be a lot of interest in evaluating it. If Java has any disk based or memory mapped data structures, that might be the quickest way to try its affects.

Raghu.

Reply via email to