Hi Doug I did some more testings using the last svn. Childs still die without any clear log after a while.
I used two machines through Hadoop, both are datanode and tasktracker and one is namenode and jobtracker. I started with 2000 seed nodes and it went fine till 4th cycle, reached about 600,000 pages and the next round was for 3,000,000 pages to fetch. It failed again with this exception in the middle of fetching: 060302 232934 task_m_7lbv7e fetching http://www.findarticles.com/p/articles/mi_m0KJI/is_9_115/ai_107836357 060302 232934 task_m_7lbv7e fetching http://www.wholehealthmd.com/hc/resourceareas_supp/1,1442,544,00.html 060302 232934 task_m_7lbv7e fetching http://www.dow.com/haltermann/products/d-petro.htm 060302 232934 task_m_7lbv7e 0.7877368% 700644 pages, 24594 errors, 14.0pages/s, 2254 kb/s, 060302 232934 task_m_7lbv7e fetching http://www.findarticles.com/p/articles/mi_hb3594/is_199510/ai_n8541042 060302 232934 task_m_7lbv7e Error reading child output java.io.IOException: Bad file descriptor at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:194) at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java :411) at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java :453) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183) at java.io.InputStreamReader.read(InputStreamReader.java:167) at java.io.BufferedReader.fill(BufferedReader.java:136) at java.io.BufferedReader.readLine(BufferedReader.java:299) at java.io.BufferedReader.readLine(BufferedReader.java:362) at org.apache.hadoop.mapred.TaskRunner.logStream(TaskRunner.java :299) at org.apache.hadoop.mapred.TaskRunner.access$100(TaskRunner.java :32) at org.apache.hadoop.mapred.TaskRunner$1.run(TaskRunner.java:266) 060302 232934 task_m_7lbv7e 0.7877451% 700644 pages, 24594 errors, 14.0pages/s, 2254 kb/s, 060302 232934 task_m_7lbv7e 0.7877451% 700644 pages, 24594 errors, 14.0pages/s, 2254 kb/s, 060302 232934 Server connection on port 50050 from 164.67.195.27: exiting 060302 232934 Server connection on port 50050 from 164.67.195.27: exiting 060302 232934 task_m_7lbv7e Child Error java.io.IOException: Task process exit with nonzero status. at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:273) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:145) 060302 232937 task_m_7lbv7e done; removing files. And this is console output: 060303 010945 map 86% reduce 0% 060303 012033 map 86% reduce 6% 060303 012223 map 87% reduce 6% 060303 014623 map 88% reduce 6% 060303 021304 map 89% reduce 6% 060303 022921 map 50% reduce 0% 060303 022921 SEVERE error, caught Exception in main() java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310) at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:366) at org.apache.nutch.fetcher.Fetcher.doMain(Fetcher.java:400) at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:411) This error has been around for large scale crawl since couple months ago. I was wondering if anybody else has had the same issue for large scale crawl. Thanks, Mike. On 2/26/06, Gal Nitzan <[EMAIL PROTECTED]> wrote: > > Still got the same... > > I'm not sure if it is relevant to this issue but the call you added to > Fetcher.java: > > job.setBoolean("mapred.speculative.execution", false); > > Doesn't work. All task trackers still fetch together though I have only > 3 sites in the fetchlist. > > The task trackers fetch the same pages... > > I have used latest build from hadoop trunk. > > Gal. > > > On Fri, 2006-02-24 at 14:15 -0800, Doug Cutting wrote: > > Mike Smith wrote: > > > 060219 142408 task_m_grycae Parent died. Exiting task_m_grycae > > > > This means the child process, executing the task, was unable to ping its > > parent process (the task tracker). > > > > > 060219 142408 task_m_grycae Child Error > > > java.io.IOException: Task process exit with nonzero status. > > > at org.apache.hadoop.mapred.TaskRunner.runChild( > TaskRunner.java:144) > > > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:97) > > > > And this means that the parent was really still alive, and has noticed > > that the child killed itself. > > > > It would be good to know how the child failed to contact its parent. We > > should probably log a stack trace when this happens. I just made that > > change in Hadoop and will propagate it to Nutch. > > > > Doug > > > > >