Try CombineFileInputFormat.

Thanks
Yongqiang
On 11/26/09 4:02 AM, "Cubic" <cubicdes...@gmail.com> wrote:

> i list.
> 
> I have small files containing data that has to be processed. A file
> can be small, even down to 10MB (but it can me also 100-600MB large)
> and contains at least 30000 records to be processed.
> Processing one record can take 30 seconds to 2 minutes. My cluster is
> about 10 nodes. Each node has 16 cores.
> 
> Anybody can give an idea about how to deal with these small files? It
> is not quite a common Hadoop task; I know. For example, how many map
> tasks should I set in this case?
> 
> 


Reply via email to