Hi,
This problem is killer! I've been strugelling with this for about a month!
It doesn't happen all the time, because of this problem the largest crawl I
could ever done is about 1 million pages. I have three machines, 3
datanode, 1 data replicate, 1 job tracker, here is what I get:
nameserver tasktracker log file:
060219 142405 task_r_125kgt 0.14583334% reduce > copy >
060219 142406 task_r_125kgt 0.14583334% reduce > copy >
060219 142407 task_m_grycae Error running child
060219 142407 task_m_grycae java.io.IOException: timed out waiting for
response
060219 142407 task_m_grycae at org.apache.hadoop.ipc.Client.call(
Client.java:303)
060219 142407 task_m_grycae at org.apache.hadoop.ipc.RPC$Invoker.invoke(
RPC.java:141)
060219 142407 task_m_grycae at
org.apache.hadoop.mapred.$Proxy0.progress(Unknown
Source)
060219 142407 task_m_grycae at
org.apache.hadoop.mapred.Task.reportProgress(Task.java:112)
060219 142407 task_m_grycae at org.apache.hadoop.mapred.Task$1.setStatus
(Task.java:93)
060219 142407 task_m_grycae at
org.apache.nutch.fetcher.Fetcher.reportStatus(Fetcher.java:276)
060219 142407 task_m_grycae at org.apache.nutch.fetcher.Fetcher.run(
Fetcher.java:325)
060219 142407 task_m_grycae at org.apache.hadoop.mapred.MapTask.run(
MapTask.java:129)
060219 142407 task_m_grycae at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:637)
060219 142407 task_m_grycae 0.825607% 108745 pages, 5259 errors,
15.6pages/s, 2418 kb/s,
060219 142407 task_r_125kgt 0.14583334% reduce > copy >
060219 142408 task_m_grycae Parent died. Exiting task_m_grycae
060219 142408 task_r_125kgt 0.14583334% reduce > copy >
060219 142408 Server connection on port 50050 from xxxxxx: exiting
060219 142408 Server connection on port 50050 from xxxxxx: exiting
060219 142408 task_m_grycae Child Error
java.io.IOException: Task process exit with nonzero status.
at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:144)
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:97)
060219 142411 task_m_grycae done; removing files.
060219 142413 task_r_125kgt 0.14583334% reduce > copy >
One of the datanode tasktracker log file:
060219 142611 task_m_2yfbgf fetching
http://codex.wordpress.org/Managing_Plugins
060219 142611 task_m_2yfbgf fetching
http://www.scubaboard.com/cms/search.php
060219 142611 task_m_2yfbgf Error reading child output
java.io.IOException: Bad file descriptor
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:194)
at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java
:411)
at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java
:453)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at java.io.BufferedReader.fill(BufferedReader.java:136)
at java.io.BufferedReader.readLine(BufferedReader.java:299)
at java.io.BufferedReader.readLine(BufferedReader.java:362)
at org.apache.hadoop.mapred.TaskRunner.logStream(TaskRunner.java
:170)
at org.apache.hadoop.mapred.TaskRunner.access$100(TaskRunner.java
:29)
at org.apache.hadoop.mapred.TaskRunner$1.run(TaskRunner.java:137)
060219 142611 task_m_2yfbgf 0.019530244% 2170 pages, 61 errors,
12.3pages/s, 1975 kb/s,
060219 142612 Server connection on port 50051 from xxxxxx: exiting
060219 142612 Server connection on port 50051 from xxxxxx: exiting
060219 142612 task_m_2yfbgf Child Error
java.io.IOException: Task process exit with nonzero status.
at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:144)
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:97)
060219 142615 task_m_2yfbgf done; removing files.
The other datanode looks fine.
Thanks, Mike
On 2/16/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
>
> Gal Nitzan wrote:
> > During fetch all tasktrackers aborting the fetch with:
> >
> > task_m_b45ma2 Child Error
> > java.io.IOException: Task process exit with nonzero status.
> > at
> > org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:144)
> > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:97)
> >
>
> What's reported just before this in this tasktracker's log?
>
> What's reported around this time in the jobtracker's log?
>
> Doug
>