Re: nutch 0.9, mergesegs error

John Mendenhall Wed, 13 Feb 2008 11:27:10 -0800

On Tue, 05 Feb 2008, John Mendenhall wrote:

> -----
> Merging 14 segments to /var/nutch/crawl/mergesegs_dir/20080201220906
> SegmentMerger:   adding /var/nutch/crawl/segments/20080128132506
> SegmentMerger:   adding ...
> SegmentMerger: using segment data from: content crawl_generate crawl_fetch 
> crawl_parse parse_data parse_text
> task_0001_m_000075_0: Exception in thread "main" 
> java.net.SocketTimeoutException: timed out waiting for rpc response
> task_0001_m_000075_0:   at org.apache.hadoop.ipc.Client.call(Client.java:473)
> task_0001_m_000075_0:   at 
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
> task_0001_m_000075_0:   at 
> org.apache.hadoop.mapred.$Proxy0.reportDiagnosticInfo(Unknown Source)
> task_0001_m_000075_0:   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1454)
> task_0001_m_000080_0: Exception in thread "main" java.net.SocketException: 
> Socket closed
> task_0001_m_000080_0:   at 
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:99)
> task_0001_m_000080_0:   at 
> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> task_0001_m_000080_0:   at 
> org.apache.hadoop.ipc.Client$Connection$2.write(Client.java:189)
> task_0001_m_000080_0:   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> task_0001_m_000080_0:   at 
> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> task_0001_m_000080_0:   at 
> java.io.DataOutputStream.flush(DataOutputStream.java:106)
> task_0001_m_000080_0:   at 
> org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:324)
> task_0001_m_000080_0:   at org.apache.hadoop.ipc.Client.call(Client.java:461)
> task_0001_m_000080_0:   at 
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
> task_0001_m_000080_0:   at 
> org.apache.hadoop.mapred.$Proxy0.reportDiagnosticInfo(Unknown Source)
> task_0001_m_000080_0:   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1454)
> task_0001_m_000072_1: log4j:WARN No appenders could be found for logger 
> (org.apache.hadoop.ipc.Client).
> task_0001_m_000072_1: log4j:WARN Please initialize the log4j system properly.
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>         at 
> org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:590)
>         at org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:638)
> -----
> 
> nutch mergesegs returns with status code of 1.
> 
> I have tried looking at why the log4j warning is happening.
> All other runs seem fine.  Log4j seems to be setup for all
> other instances where it is needed.
> 
> Where do I need to look to find out why nutch mergesegs is
> crashing?
> 
> Why is log4j not finding the log4j.properties file?
> The nutch script in nutch/bin already adds the conf
> dir to the class path.
> 
> Thanks in advance for any assistance you can provide.
> 
> JohnM


I modified the configuration to use less memory.
I also rebooted all servers.
Then, I reran the index and it worked.

I currently have 3 servers, 1 serving as master and
slave.  Each has a different amount of memory available.
Each has a different processor type.

What is the rule of thumb for setting the heap size,
and the child process heap sizes for each server?

Thanks!

JohnM

-- 
john mendenhall
[EMAIL PROTECTED]
surf utopia
internet services

Re: nutch 0.9, mergesegs error

Reply via email to