Hello Team,
I am using a cluster of 7 Dell SC1425's, 1 master and 6 slaves, all Xeon 3Ghz
with Gigabit ethernet and 2 x WD4000KD drives, running CentOS 4.3 and Sun
jdk1.5.0_06-b05.
Nutch/hadoop versions:
nutch-0.8 (revision 395259)
hadoop-0.2-dev (revision 395539):
My fetches (1.25M URL's) are failing during the reduce phase of an otherwise
successful run. Fetch was invoked as follows:
# nohup bin/nutch fetch /user/root/crawlA/segments/20060419162433 -threads 150
>> logs/logall.txt &
And, the errors listed below were displayed in the web ui.
First question. Updatedb won't run against the segment so what can I do to
salvage it? Is the segment salvageable? "bin/hadoop dfs -du" fills me with
hope :)
/user/root/crawlA/segments/20060419162433/content 2577482693
/user/root/crawlA/segments/20060419162433/crawl_fetch 76587779
/user/root/crawlA/segments/20060419162433/crawl_generate 91242415
/user/root/crawlA/segments/20060419162433/crawl_parse 1122074034
/user/root/crawlA/segments/20060419162433/parse_data 706562716
/user/root/crawlA/segments/20060419162433/parse_text 707691237
Second question, should I raise an issue in JIRA quoting the errors below?
Many thanks,
Monu Ogbe
*** EXHIBITS
*** Excerpt from hadoop-site.xml
<property>
<name>dfs.name.dir</name>
<value>/home/nutch/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/nutch/hadoop/dfs/data</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/nutch/hadoop/mapred/local</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/home/nutch/hadoop/mapred/system</value>
</property>
<property>
<name>mapred.temp.dir</name>
<value>/home/nutch/hadoop/mapred/temp</value>
</property>
*** Errors displayed in the webapp
tip_epgdt0 0.0 reduce > copy >
Timed out.java.io.IOException: Task process exit with nonzero status of 143.
at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:273)
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:145)
Timed out.java.io.IOException: Task process exit with nonzero status of 143.
at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:273)
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:145)
tip_hu8h3m 0.0
Timed out.java.io.IOException: Task process exit with nonzero status of 143.
at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:273)
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:145)
java.rmi.RemoteException: java.io.IOException: Cannot create file
/user/root/crawlA/segments/20060419162433/parse_text/part-00005/data on client
DFSClient_task_r_poobc6
at org.apache.hadoop.dfs.NameNode.create(NameNode.java:156)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
at org.apache.hadoop.ipc.Client.call(Client.java:303)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:691)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:864)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:952)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at
org.apache.hadoop.fs.FSDataOutputStream$Summer.close(FSDataOutputStream.java:83)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:144)
at org.apache.hadoop.io.MapFile$Writer.close(MapFile.java:117)
at
org.apache.nutch.parse.ParseOutputFormat$1.close(ParseOutputFormat.java:128)
at
org.apache.nutch.fetcher.FetcherOutputFormat$1.close(FetcherOutputFormat.java:96)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:290)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:743)
java.rmi.RemoteException: java.io.IOException: Cannot create file
/user/root/crawlA/segments/20060419162433/parse_text/part-00005/data on client
DFSClient_task_r_qklnfm
at org.apache.hadoop.dfs.NameNode.create(NameNode.java:156)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
at org.apache.hadoop.ipc.Client.call(Client.java:303)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:691)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:864)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:952)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at
org.apache.hadoop.fs.FSDataOutputStream$Summer.close(FSDataOutputStream.java:83)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:144)
at org.apache.hadoop.io.MapFile$Writer.close(MapFile.java:117)
at
org.apache.nutch.parse.ParseOutputFormat$1.close(ParseOutputFormat.java:128)
at
org.apache.nutch.fetcher.FetcherOutputFormat$1.close(FetcherOutputFormat.java:96)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:290)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:743)
java.rmi.RemoteException: java.io.IOException: Cannot create file
/user/root/crawlA/segments/20060419162433/parse_text/part-00005/data on client
DFSClient_task_r_vu6z9t
at org.apache.hadoop.dfs.NameNode.create(NameNode.java:156)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
at org.apache.hadoop.ipc.Client.call(Client.java:303)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:691)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:864)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:952)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at
org.apache.hadoop.fs.FSDataOutputStream$Summer.close(FSDataOutputStream.java:83)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:144)
at org.apache.hadoop.io.MapFile$Writer.close(MapFile.java:117)
at
org.apache.nutch.parse.ParseOutputFormat$1.close(ParseOutputFormat.java:128)
at
org.apache.nutch.fetcher.FetcherOutputFormat$1.close(FetcherOutputFormat.java:96)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:290)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:743)
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general