1. Performance tunning/optimization any good suggestions or links?
    Take a look at
http://wiki.datameer.com/documentation/current/Hadoop+Cluster+Configuration+Tips

2. Logging - If I do any logging in map/reduce class where will be logging
or system.out information written?
Be careful while doing so since on large amounts of data you can fill up
disk on datanodes very quickly. You can find the logs through the
jobtracker page by clicking on specific map and reduce tasks.

3. How do we reuse jvm? map tasks creation takes time.
Look at mapred.job.reuse.jvm.num.tasks

4. Different types of spills - how do we avoid them?
Depends on what is causing the spills. You can have spills on Map and
Reduce side, and adjusting config properties such "io.sort.mb",
"io.sort.factor", and a few others on the Reduce side. Tom White's book has
a good explanation on these.

Thanks,
Prashant Kommireddi


On Thu, Jan 12, 2012 at 8:10 AM, screen <satish.se...@hcl.in> wrote:

>
> Thanks. Seperate files for line item has created 10 map tasks out of which
> only some are in running state (given by max map reduce tasks)  rest are in
> wait. So if I have 8 cpus, I have max_map_tasks as 7 so 3 are in wait
> state.
> I can see 7 cpus utilization 90-95%.
>
> 1. Performance tunning/optimization any good suggestions or links?
>
> 2. Logging - If I do any logging in map/reduce class where will be logging
> or system.out information written?
>
> 3. How do we reuse jvm? map tasks creation takes time.
>
> 4. Different types of spills - how do we avoid them?
>
> --
> View this message in context:
> http://old.nabble.com/increase-number-of-map-tasks-tp33107775p33128748.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Reply via email to