fixed, thanks.
On Sun, Aug 16, 2009 at 8:38 PM, Andrzej Bialecki<[email protected]> wrote: > MoD wrote: >> >> Julien, >> >> I did tryed with 2048M / Task child, >> no luck I still have two reduce that doesn't go through, >> >> Is it somewhat related to the number of reduce, >> on this cluster I have 4 servers : >> - dual xeon dual core (8 core) >> - 8Gb ram >> - 4 disks >> >> I did set mapred.reduce.tasks and mapred.map.tasks to 16. >> because : 4 server of 4 disks. (what do you think) >> >> Maybe if this job is too big for my cluster, does adding reduce task >> could subdivise the problem into smaller reduces. >> indeed I think no, cause I guess the input key is for the same domain ? >> >> so my two last reduce task are the biggest domains of my DB ? > > This is likely caused by a large number of inlinks for certain urls - the > updatedb reduce collects this list in memory, and this sometimes leads to > memory exhaustion. Please try limiting the max. number of inlinks per url > (see nutch-default.xml for details). > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > >
