[EMAIL PROTECTED] wrote:
First question. Updatedb won't run against the segment so what can I do to salvage it? Is the segment salvageable?
Probably. I think you're hitting some current bugs in DFS & MapReduce. Once these are fixed, then your updatedb's should succeed!
Second question, should I raise an issue in JIRA quoting the errors below?
Yes, please.
*** Excerpt from hadoop-site.xml <property> <name>mapred.system.dir</name> <value>/home/nutch/hadoop/mapred/system</value> </property>
Unlike the other paths, mapred.system.dir is not a local path, but a path in the default filesystem, dfs in your case. Your setting is fine, I just thought I'd mention that.
Timed out.java.io.IOException: Task process exit with nonzero status of 143.
These 143's are a mystery to me. We really need to figure out what is causing these! One suggestion I found on the net was to try passing '-Xrs' to java, i.e., setting mapred.child.java.opts to include it. Another idea is to put 'ulimit -c unlimited' in one's conf/hadoop-env.sh, so that these will cause core dumps. Then, hopefully, we can use gdb to see where the JVM crashed. I have not had time recently to try either of these on a cluster, the only place where this problem has been seen.
java.rmi.RemoteException: java.io.IOException: Cannot create file /user/root/crawlA/segments/20060419162433/parse_text/part-00005/data on client DFSClient_task_r_poobc6
This bug is triggered by the previous bug. In the first case the output is started, then the task jvm crashes. But DFS waits a minute before it will let another task create a file with the same name (to time out the other writer). So if the replacement task starts within a minute, then this error is thrown. I think Owen is working on a patch for this which will make DFSClient try to open the file for at least a minute before throwing an exception. We should have that committed today. This won't fix the 143's, but should allow your jobs to complete in spite of them.
Thanks for your patience, Doug ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
