Hi,

I have a use case - I have  files lying on the local disk of every node on
my cluster. I want to write a Mapper only MapReduce job that reads the file
off the local disk on every machine, applies some transformation and wrotes
to HDFS.

Specifically,

1. The Job shouldn't have any input/output paths, and null key value pairs.
2. Mapper Only
3. I want to be able to control the number of Mappers, depending on the
size of my cluster.

What's the best way to do this? I would appreciate any example code.

Deepak

Reply via email to