It looks like the child JVM is silently exiting. The "error reading child output" just shows that the child's standard output has been closed, and the "child error" says the JVM exited with non-zero.

Perhaps you can get a core dump by setting 'ulimit -c' to something big. JVM core dumps can be informative.

This doesn't look like something that should kill a crawl, though. Are you using a tasktracker & jobtrackers, or running things with a "local" jobtracker? With a tasktracker this task would be retried. Are you seeing this? Does a given task consistently fail when retried?

Doug

Mike Smith wrote:
I have been getting this exception during fetching for almost a month. This
exception stops the whole crawl. It happens on and off! Any Idea?? We are
really stocked with this problem.

I am using 3 data node and 1 name server.

060223 173809 task_m_b8ibww  fetching http://www.heartcenter.com/94fall.pdf
060223 173809 task_m_b8ibww  fetching
http://www.medinfo.co.uk/conditions/tenosynovitis.html
060223 173809 task_m_b8ibww  fetching
http://www.boncholesterol.com/whatsnew/index.shtml
060223 173809 task_m_b8ibww  fetching
http://www.drcranton.com/hrt/promise_of_longevity.htm
060223 173809 task_m_b8ibww  fetching
http://www.drcranton.com/hrt/promise_of_longevity.htm
060223 173809 task_m_b8ibww Error reading child output
java.io.IOException: Bad file descriptor
        at java.io.FileInputStream.readBytes(Native Method)
        at java.io.FileInputStream.read(FileInputStream.java:194)
        at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java
:411)
        at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java
:453)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
        at java.io.InputStreamReader.read(InputStreamReader.java:167)
        at java.io.BufferedReader.fill(BufferedReader.java:136)
        at java.io.BufferedReader.readLine(BufferedReader.java:299)
        at java.io.BufferedReader.readLine(BufferedReader.java:362)
        at org.apache.hadoop.mapred.TaskRunner.logStream(TaskRunner.java
:170)
        at org.apache.hadoop.mapred.TaskRunner.access$100(TaskRunner.java
:29)
        at org.apache.hadoop.mapred.TaskRunner$1.run(TaskRunner.java:137)
060223 173809 task_r_3h1pex 0.16666667% reduce > copy >
060223 173809 Server connection on port 50050 from xxxxxx: exiting
060223 173809 Server connection on port 50050 from xxxxxx: exiting
060223 173809 task_m_b8ibww Child Error
java.io.IOException: Task process exit with nonzero status.
        at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:144)
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:97)
060223 173812 task_m_b8ibww done; removing files.

Reply via email to