Hallo,
I have deployed UIMA on Hadoop and currently I’m writing my thesis about
this topic. One map task now receives exactly one file as data. There is
one thing that is still confusing me: Hadoop Wiki (How many maps and
reduces) says:
“The number of map tasks can also be increased manually using the
JobConf's conf.setNumMapTasks(int num). This can be used to increase the
number of map tasks.”
By varying the value of NumMapTasks, I’ve also noticed differences in
the performance results. Usually the number of maps controls the balance
of distribution, but which mechanism takes place in detail, if the map
size is fixed (determined by size of the files). Is that some kind of
input and output queue, that gets filled?
Thanks in advance,
Marc