Dennis Kubes wrote:
From time to time a message pops up on the mailing list about OOM
errors for the namenode because of too many files. Most recently there
was a 1.7 million file installation that was failing. I know the simple
solution to this is to have a larger java heap for the namenode. But
the non-simple way would be to convert the BlocksMap for the NameNode to
be stored on disk and then queried and updated for operations. This
would eliminate memory problems for large file installations but also
might degrade performance slightly. Questions:
1) Is there any current work to allow the namenode to store on disk
versus is memory? This could be a configurable option.
2) Besides possible slight degradation in performance, is there a reason
why the BlocksMap shouldn't or couldn't be stored on disk?
As Doug mentioned the main worry is that this will drastically reduce
performance. Part of the reason is that large chunk of the work on
NamenNode happens under a single global lock. So if there is seek under
this lock, it affects every thing else.
One good long term fix for this is to make it easy to split the
namespace between multiple namenodes.. There was some work done on
supporting "volumes". Also the fact that HDFS now supports symbolic
links might make this easier for someone adventurous to use that as a
quick hack to get around this.
If you have a rough prototype implementation I am sure there will be a
lot of interest in evaluating it. If Java has any disk based or memory
mapped data structures, that might be the quickest way to try its affects.
Raghu.