Hi,
I am also facing this same problem. Have you figured out a solution to this
yet?
Also i keep getting the following error every time i recrawl -
DeleteDuplicates: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
at org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java
:491)
at org.apache.nutch.indexer.DeleteDuplicates.run(DeleteDuplicates.java:515)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
at org.apache.nutch.indexer.DeleteDuplicates.main(DeleteDuplicates.java:499)
Can anyone please help me out with these problems?
Thanks,
-Chris
On 6/21/07, Phạm Hải Thanh <[EMAIL PROTECTED]> wrote:
Hi all,
After recrawl several times, I have problem with the directory:
merge-output. I have digged into mail archive and found some clue: you
should use a new dir name for the new merge, e.g., merge-output_new, then
mv merge-output_new to merge-output.
Anyone can show me exactly how to do this ?
Thanks a lot
============================================================================
After refetching database during index merging I get following error.
2007-04-27 15:58:37,787 FATAL indexer.IndexMerger - IndexMerger:
java.io.IOException: Target /usr/local/nutch/nutchdb/index/merge-output
already
exists
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:230)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:70)
at
org.apache.hadoop.fs.LocalFileSystem.copyFromLocalFile(
LocalFileSystem.java:49)
at
org.apache.hadoop.fs.FileSystem.moveFromLocalFile(FileSystem.java:750)
at
org.apache.hadoop.fs.ChecksumFileSystem.completeLocalOutput(
ChecksumFileSystem.java:622)
at org.apache.nutch.indexer.IndexMerger.merge(IndexMerger.java
:104)
at org.apache.nutch.indexer.IndexMerger.run(IndexMerger.java:150)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
at org.apache.nutch.indexer.IndexMerger.main(IndexMerger.java:113)
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general