Try CombineFileInputFormat. Thanks Yongqiang On 11/26/09 4:02 AM, "Cubic" <cubicdes...@gmail.com> wrote:
> i list. > > I have small files containing data that has to be processed. A file > can be small, even down to 10MB (but it can me also 100-600MB large) > and contains at least 30000 records to be processed. > Processing one record can take 30 seconds to 2 minutes. My cluster is > about 10 nodes. Each node has 16 cores. > > Anybody can give an idea about how to deal with these small files? It > is not quite a common Hadoop task; I know. For example, how many map > tasks should I set in this case? > >