Hi, I have a use case - I have files lying on the local disk of every node on my cluster. I want to write a Mapper only MapReduce job that reads the file off the local disk on every machine, applies some transformation and wrotes to HDFS.
Specifically, 1. The Job shouldn't have any input/output paths, and null key value pairs. 2. Mapper Only 3. I want to be able to control the number of Mappers, depending on the size of my cluster. What's the best way to do this? I would appreciate any example code. Deepak