Thank you Doug,
Quoting Doug Cutting <[EMAIL PROTECTED]>:
[EMAIL PROTECTED] wrote:
First question. Updatedb won't run against the segment so what can I do to
salvage it? Is the segment salvageable?
Probably. I think you're hitting some current bugs in DFS &
MapReduce. Once these are fixed, then your updatedb's should succeed!
Actually, I think that updatedb won't run because the fetched segment didn't
complete correctly. Don't know whether the instructions in the 0.7 FAQ apply:
%touch /index/segments/2005somesegment/fetcher.done
Second question, should I raise an issue in JIRA quoting the errors below?
Yes, please.
Will do.
*** Excerpt from hadoop-site.xml
<property>
<name>mapred.system.dir</name>
<value>/home/nutch/hadoop/mapred/system</value>
</property>
Unlike the other paths, mapred.system.dir is not a local path, but a
path in the default filesystem, dfs in your case. Your setting is
fine, I just thought I'd mention that.
Timed out.java.io.IOException: Task process exit with nonzero status of 143.
These 143's are a mystery to me. We really need to figure out what
is causing these! One suggestion I found on the net was to try
passing '-Xrs' to java, i.e., setting mapred.child.java.opts to
include it. Another idea is to put 'ulimit -c unlimited' in one's
conf/hadoop-env.sh, so that these will cause core dumps. Then,
hopefully, we can use gdb to see where the JVM crashed. I have not
had time recently to try either of these on a cluster, the only place
where this problem has been seen.
java.rmi.RemoteException: java.io.IOException: Cannot create file
/user/root/crawlA/segments/20060419162433/parse_text/part-00005/data
on client
DFSClient_task_r_poobc6
This bug is triggered by the previous bug. In the first case the
output is started, then the task jvm crashes. But DFS waits a minute
before it will let another task create a file with the same name (to
time out the other writer). So if the replacement task starts within
a minute, then this error is thrown. I think Owen is working on a
patch for this which will make DFSClient try to open the file for at
least a minute before throwing an exception. We should have that
committed today. This won't fix the 143's, but should allow your
jobs to complete in spite of them.
Thanks for your patience,
My patience!? I am in awe of the commitment and relentless effort of you and
the development team.
Many thanks,
Monu
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general