Hi,
so you have 3 boxes, since you run 3 reduce tasks?
What happens is that 3 splits of your data are sorted. In the very
end you will get as much out put files as you have reduce tasks.
The sorting itself does happen in memory.
Check in hadoop-default.xml (it is may be in the hadoop jar)
<name>io.sort.factor</name>
and
<name>io.sort.mb</name>
HTH
Stefan
Am 24.05.2006 um 11:13 schrieb Hamza Kaya:
Hi,
I'm trying to crawl approx. 500.000 urls. After inject and generate I
started fetchers using 6 map tasks and 3 reduce tasks. All the map
tasks had
successfully completed while all the reduce tasks got an OutOfMemory
exception. This exception was caught after the append phase (during
the sort
phase). As far as I observed, during a fetch operation, all the map
tasks
outputs to a temp. sequence file. During the reduce operation, each
reducer
copies all map outputs to their local disk and append them to a
single seq.
file. After this operation reducer try to sort this file and output
the
sorted file to its local disk. And then, a record writer is opened
to write
this sorted file to the segment, which is in DFS. If this scenario is
correct, then all the reduce tasks are supposed to do the same job.
All try
to sort the whole map outputs and the winner of this operation will
be able
to write to dfs. So only one reducer is expected to write to dfs.
If this is
the case then an OutOfMemory exception will not be surprising for
500.000+urls. Since reducers will try to sort a file bigger then 1GB.
Any comments
on this scenario are welcome. And how can I avoid these exceptions?
Thanx,
--
Hamza KAYA
-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers