UIMA scale-out using Hadoop, number of map tasks

Marc Hofer Sat, 16 Aug 2008 02:42:26 -0700

Hallo,

I have deployed UIMA on Hadoop and currently I’m writing my thesis aboutthis topic. One map task now receives exactly one file as data. There isone thing that is still confusing me: Hadoop Wiki (How many maps andreduces) says:“The number of map tasks can also be increased manually using theJobConf's conf.setNumMapTasks(int num). This can be used to increase thenumber of map tasks.”By varying the value of NumMapTasks, I’ve also noticed differences inthe performance results. Usually the number of maps controls the balanceof distribution, but which mechanism takes place in detail, if the mapsize is fixed (determined by size of the files). Is that some kind ofinput and output queue, that gets filled?


Thanks in advance,

Marc

UIMA scale-out using Hadoop, number of map tasks

Reply via email to