I have been playing with mapreduce.tasktracker.map.tasks.maximum to reduce the load on my Cassandra cluster (using the Cassandra ColumnFamilyInputFormat). I'd like to find ways of throttling the map operations in the case I may be affecting OLTP activity on the cluster.
What parameters can I use to limit the number of map tasks running concurrently across the whole cluster? mapreduce.tasktracker.map.tasks.maximum limits the number of concurrent maps per task tracker. But can i do this at the job level? Should I look at the "fair" scheduler? regards,Michael