Re: [Nutch-general] Problem with merge-output

chris sleeman Sun, 08 Jul 2007 13:42:22 -0700

My apologies..The dedup error was "mea culpa".... I was using an older
version of the lucene jar..


However, I still am getting the -
"IndexMerger: java.io.IOException: Target crawl-test/index/merge-output
already exists" exception.

Once a crawl is completed successfully, can I simply delete the merge-output
dir for my next crawl, or is merge-output used elsewhere?

Regards,
Chris


On 7/9/07, chris sleeman <[EMAIL PROTECTED]> wrote:


Hi,
I am also facing this same problem. Have you figured out a solution to
this yet?

Also i keep getting the following error every time i recrawl -

DeleteDuplicates: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
at org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java
:491)
at org.apache.nutch.indexer.DeleteDuplicates.run(DeleteDuplicates.java:515)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
at org.apache.nutch.indexer.DeleteDuplicates.main(DeleteDuplicates.java
:499)

Can anyone please help me out with these problems?

Thanks,
-Chris




On 6/21/07, Phạm Hải Thanh <[EMAIL PROTECTED]> wrote:
>
> Hi all,
>
> After recrawl several times, I have problem with the directory:
> merge-output. I have digged into mail archive and found some clue: you
> should use a new dir name for the new merge, e.g., merge-output_new,
> then mv merge-output_new to merge-output.
>
>
>
> Anyone can show me exactly how to do this ?
>
> Thanks a lot
>
>
>
>
> ============================================================================
>
> After refetching database during index merging I get following error.
>
>
>
> 2007-04-27 15:58:37,787 FATAL indexer.IndexMerger - IndexMerger:
>
> java.io.IOException: Target /usr/local/nutch/nutchdb/index/merge-output
> already
>
> exists
>
>         at org.apache.hadoop.fs.FileUtil.checkDest (FileUtil.java:230)
>
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:70)
>
>         at
>
> org.apache.hadoop.fs.LocalFileSystem.copyFromLocalFile(
> LocalFileSystem.java:49)
>
>         at
>
> org.apache.hadoop.fs.FileSystem.moveFromLocalFile(FileSystem.java:750)
>
>         at
>
> org.apache.hadoop.fs.ChecksumFileSystem.completeLocalOutput(
> ChecksumFileSystem.java:622)
>
>         at org.apache.nutch.indexer.IndexMerger.merge (IndexMerger.java
> :104)
>
>         at org.apache.nutch.indexer.IndexMerger.run(IndexMerger.java
> :150)
>
>         at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
>
>         at org.apache.nutch.indexer.IndexMerger.main (IndexMerger.java
> :113)
>
>

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Problem with merge-output

Reply via email to