Crawling our internal developement sites with nutch, I've seen
the error shown below.  Is this a known issue?  If so, are there
any workarounds?  Checking the nutch issues under jira didn't
seem to turn up anything...

-Samrobb

Exception in thread "main" java.io.IOException: key out of order: Version: 5
ID: 7f8e32b425e96680fb3345b8df5cc0d2
DomainID: -2727267178038559659
URL: 
http://xyz.timesys.com/mirrors/debian/oldstable/main/diskS-i386/3.0.23-2002-05-21/doc/it/footnotes.it.html
AnchorText: footnotes.it.html
targetHasOutlink: false
 after Version: 5
ID: 7f8e32b425e96680fb3345b8df5cc0d2
DomainID: -2727267178038559659
URL: 
http://xyz.timesys.com/mirrors/debian/oldstable/main/disks-i386/3.0.23-2002-05-21/doc/it/fdisk.txt
AnchorText: fdisk.txt
targetHasOutlink: false

        at org.apache.nutch.io.MapFile$Writer.checkKey(MapFile.java:134)
        at org.apache.nutch.io.MapFile$Writer.append(MapFile.java:120)
        at 
org.apache.nutch.db.WebDBWriter$LinksByMD5Processor.mergeEdits(WebDBWriter.java:1042)
        at 
org.apache.nutch.db.WebDBWriter$CloseProcessor.closeDown(WebDBWriter.java:557)
        at org.apache.nutch.db.WebDBWriter.close(WebDBWriter.java:1612)
        at 
org.apache.nutch.tools.UpdateDatabaseTool.close(UpdateDatabaseTool.java:321)
        at 
org.apache.nutch.tools.UpdateDatabaseTool.main(UpdateDatabaseTool.java:371)
        at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:141)


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to