2M files is excessive. But there is no reason block reports should break. My preference is to make block reports handle this better. DNs dropping in and out of the cluster causes too many other problems.
Raghu. Konstantin Shvachko wrote:
Hi Jason, 2 million blocks per data-node is not going to work. There were discussions about it previously, please check the mail archives. This means you have a lot of very small files, which HDFS is not designed to support. A general recommendation is to group small files into large ones, introducing some kind of record structure delimiting those small files, and control it in on the application level. Thanks, --Konstantin