Re: Nutch updatedb Crash

MoD Sun, 16 Aug 2009 11:59:57 -0700

fixed, thanks.


On Sun, Aug 16, 2009 at 8:38 PM, Andrzej Bialecki<[email protected]> wrote:
> MoD wrote:
>>
>> Julien,
>>
>> I did tryed with 2048M / Task child,
>> no luck I still have two reduce that doesn't go through,
>>
>> Is it somewhat related to the number of reduce,
>> on this cluster I have 4 servers :
>> - dual xeon dual core (8 core)
>> - 8Gb ram
>> - 4 disks
>>
>> I did set mapred.reduce.tasks and mapred.map.tasks to 16.
>> because : 4 server of 4 disks. (what do you think)
>>
>> Maybe if this job is too big for my cluster, does adding reduce task
>> could subdivise the problem into smaller reduces.
>> indeed I think no, cause I guess the input key is for the same domain ?
>>
>> so my two last reduce task are the biggest domains of my DB ?
>
> This is likely caused by a large number of inlinks for certain urls - the
> updatedb reduce collects this list in memory, and this sometimes leads to
> memory exhaustion. Please try limiting the max. number of inlinks per url
> (see nutch-default.xml for details).
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>

Re: Nutch updatedb Crash

Reply via email to