Hi list. I have small files containing data that has to be processed. A file can be small, even down to 10MB (but it can me also 100-600MB large) and contains at least 30000 records to be processed. Processing one record can take 30 seconds to 2 minutes. My cluster is about 10 nodes. Each node has 16 cores.
Anybody can give an idea about how to deal with these small files? It is not quite a common Hadoop task; I know. For example, how many map tasks should I set in this case?