It does not matter what the file size is because the file size is split into blocks which is what the NN tracks.
For larger deployments you can go with a large block size like 256MB or even 512MB. Generally the bigger the file the better split calculation is very input format dependent however. On Wed, Jun 6, 2012 at 10:00 AM, Mohit Anchlia <mohitanch...@gmail.com> wrote: > We have continuous flow of data into the sequence file. I am wondering what > would be the ideal file size before file gets rolled over. I know too many > small files are not good but could someone tell me what would be the ideal > size such that it doesn't overload NameNode.