2M files is excessive. But there is no reason block reports should break. My preference is to make block reports handle this better. DNs dropping in and out of the cluster causes too many other problems.

Raghu.

Konstantin Shvachko wrote:
Hi Jason,

2 million blocks per data-node is not going to work.
There were discussions about it previously, please
check the mail archives.

This means you have a lot of very small files, which
HDFS is not designed to support. A general recommendation
is to group small files into large ones, introducing
some kind of record structure delimiting those small files,
and control it in on the application level.

Thanks,
--Konstantin

Reply via email to