Hi list.

I have small files containing data that has to be processed. A file
can be small, even down to 10MB (but it can me also 100-600MB large)
and contains at least 30000 records to be processed.
Processing one record can take 30 seconds to 2 minutes. My cluster is
about 10 nodes. Each node has 16 cores.

Anybody can give an idea about how to deal with these small files? It
is not quite a common Hadoop task; I know. For example, how many map
tasks should I set in this case?

Reply via email to