Hello Team,

I am using a cluster of 7 Dell SC1425's, 1 master and 6 slaves, all Xeon 3Ghz
with Gigabit ethernet and 2 x WD4000KD drives, running CentOS 4.3 and Sun
jdk1.5.0_06-b05.

Nutch/hadoop versions:
nutch-0.8 (revision 395259)
hadoop-0.2-dev (revision 395539):

My fetches (1.25M URL's) are failing during the reduce phase of an otherwise
successful run.  Fetch was invoked as follows:

# nohup bin/nutch fetch /user/root/crawlA/segments/20060419162433 -threads 150
>> logs/logall.txt &

And, the errors listed below were displayed in the web ui.

First question.  Updatedb won't run against the segment so what can I do to
salvage it?  Is the segment salvageable?  "bin/hadoop dfs -du" fills me with
hope :)

/user/root/crawlA/segments/20060419162433/content       2577482693
/user/root/crawlA/segments/20060419162433/crawl_fetch   76587779
/user/root/crawlA/segments/20060419162433/crawl_generate        91242415
/user/root/crawlA/segments/20060419162433/crawl_parse   1122074034
/user/root/crawlA/segments/20060419162433/parse_data    706562716
/user/root/crawlA/segments/20060419162433/parse_text    707691237

Second question, should I raise an issue in JIRA quoting the errors below?

Many thanks,

Monu Ogbe


*** EXHIBITS

*** Excerpt from hadoop-site.xml

<property>
  <name>dfs.name.dir</name>
  <value>/home/nutch/hadoop/dfs/name</value>
</property>

<property>
  <name>dfs.data.dir</name>
  <value>/home/nutch/hadoop/dfs/data</value>
</property>

<property>
  <name>mapred.local.dir</name>
  <value>/home/nutch/hadoop/mapred/local</value>
</property>

<property>
  <name>mapred.system.dir</name>
  <value>/home/nutch/hadoop/mapred/system</value>
</property>

<property>
  <name>mapred.temp.dir</name>
  <value>/home/nutch/hadoop/mapred/temp</value>
</property>


*** Errors displayed in the webapp

tip_epgdt0      0.0     reduce > copy >

Timed out.java.io.IOException: Task process exit with nonzero status of 143.
        at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:273)
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:145)



Timed out.java.io.IOException: Task process exit with nonzero status of 143.
        at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:273)
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:145)

tip_hu8h3m      0.0

Timed out.java.io.IOException: Task process exit with nonzero status of 143.
        at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:273)
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:145)



java.rmi.RemoteException: java.io.IOException: Cannot create file
/user/root/crawlA/segments/20060419162433/parse_text/part-00005/data on client
DFSClient_task_r_poobc6
        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:156)
        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)

        at org.apache.hadoop.ipc.Client.call(Client.java:303)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
        at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
        at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:691)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:864)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:952)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
        at
org.apache.hadoop.fs.FSDataOutputStream$Summer.close(FSDataOutputStream.java:83)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
        at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:144)
        at org.apache.hadoop.io.MapFile$Writer.close(MapFile.java:117)
        at 
org.apache.nutch.parse.ParseOutputFormat$1.close(ParseOutputFormat.java:128)
        at
org.apache.nutch.fetcher.FetcherOutputFormat$1.close(FetcherOutputFormat.java:96)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:290)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:743)



java.rmi.RemoteException: java.io.IOException: Cannot create file
/user/root/crawlA/segments/20060419162433/parse_text/part-00005/data on client
DFSClient_task_r_qklnfm
        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:156)
        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)

        at org.apache.hadoop.ipc.Client.call(Client.java:303)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
        at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
        at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:691)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:864)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:952)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
        at
org.apache.hadoop.fs.FSDataOutputStream$Summer.close(FSDataOutputStream.java:83)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
        at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:144)
        at org.apache.hadoop.io.MapFile$Writer.close(MapFile.java:117)
        at 
org.apache.nutch.parse.ParseOutputFormat$1.close(ParseOutputFormat.java:128)
        at
org.apache.nutch.fetcher.FetcherOutputFormat$1.close(FetcherOutputFormat.java:96)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:290)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:743)



java.rmi.RemoteException: java.io.IOException: Cannot create file
/user/root/crawlA/segments/20060419162433/parse_text/part-00005/data on client
DFSClient_task_r_vu6z9t
        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:156)
        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)

        at org.apache.hadoop.ipc.Client.call(Client.java:303)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
        at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
        at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:691)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:864)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:952)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
        at
org.apache.hadoop.fs.FSDataOutputStream$Summer.close(FSDataOutputStream.java:83)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
        at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:144)
        at org.apache.hadoop.io.MapFile$Writer.close(MapFile.java:117)
        at 
org.apache.nutch.parse.ParseOutputFormat$1.close(ParseOutputFormat.java:128)
        at
org.apache.nutch.fetcher.FetcherOutputFormat$1.close(FetcherOutputFormat.java:96)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:290)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:743)



-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to