Hallo,

I have deployed UIMA on Hadoop and currently I’m writing my thesis about this topic. One map task now receives exactly one file as data. There is one thing that is still confusing me: Hadoop Wiki (How many maps and reduces) says: “The number of map tasks can also be increased manually using the JobConf's conf.setNumMapTasks(int num). This can be used to increase the number of map tasks.” By varying the value of NumMapTasks, I’ve also noticed differences in the performance results. Usually the number of maps controls the balance of distribution, but which mechanism takes place in detail, if the map size is fixed (determined by size of the files). Is that some kind of input and output queue, that gets filled?

Thanks in advance,

Marc

Reply via email to