Thank you Doug,

Quoting Doug Cutting <[EMAIL PROTECTED]>:

[EMAIL PROTECTED] wrote:
First question.  Updatedb won't run against the segment so what can I do to
salvage it?  Is the segment salvageable?

Probably. I think you're hitting some current bugs in DFS & MapReduce. Once these are fixed, then your updatedb's should succeed!

Actually, I think that updatedb won't run because the fetched segment didn't
complete correctly.  Don't know whether the instructions in the 0.7 FAQ apply:

%touch /index/segments/2005somesegment/fetcher.done

Second question, should I raise an issue in JIRA quoting the errors below?

Yes, please.


Will do.

*** Excerpt from hadoop-site.xml
<property>
  <name>mapred.system.dir</name>
  <value>/home/nutch/hadoop/mapred/system</value>
</property>

Unlike the other paths, mapred.system.dir is not a local path, but a path in the default filesystem, dfs in your case. Your setting is fine, I just thought I'd mention that.

Timed out.java.io.IOException: Task process exit with nonzero status of 143.

These 143's are a mystery to me. We really need to figure out what is causing these! One suggestion I found on the net was to try passing '-Xrs' to java, i.e., setting mapred.child.java.opts to include it. Another idea is to put 'ulimit -c unlimited' in one's conf/hadoop-env.sh, so that these will cause core dumps. Then, hopefully, we can use gdb to see where the JVM crashed. I have not had time recently to try either of these on a cluster, the only place where this problem has been seen.

java.rmi.RemoteException: java.io.IOException: Cannot create file
/user/root/crawlA/segments/20060419162433/parse_text/part-00005/data on client
DFSClient_task_r_poobc6

This bug is triggered by the previous bug. In the first case the output is started, then the task jvm crashes. But DFS waits a minute before it will let another task create a file with the same name (to time out the other writer). So if the replacement task starts within a minute, then this error is thrown. I think Owen is working on a patch for this which will make DFSClient try to open the file for at least a minute before throwing an exception. We should have that committed today. This won't fix the 143's, but should allow your jobs to complete in spite of them.

Thanks for your patience,

My patience!?  I am in awe of the commitment and relentless effort of you and
the development team.

Many thanks,

Monu




-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to