Crawling our internal developement sites with nutch, I've seen the error shown below. Is this a known issue? If so, are there any workarounds? Checking the nutch issues under jira didn't seem to turn up anything...
-Samrobb Exception in thread "main" java.io.IOException: key out of order: Version: 5 ID: 7f8e32b425e96680fb3345b8df5cc0d2 DomainID: -2727267178038559659 URL: http://xyz.timesys.com/mirrors/debian/oldstable/main/diskS-i386/3.0.23-2002-05-21/doc/it/footnotes.it.html AnchorText: footnotes.it.html targetHasOutlink: false after Version: 5 ID: 7f8e32b425e96680fb3345b8df5cc0d2 DomainID: -2727267178038559659 URL: http://xyz.timesys.com/mirrors/debian/oldstable/main/disks-i386/3.0.23-2002-05-21/doc/it/fdisk.txt AnchorText: fdisk.txt targetHasOutlink: false at org.apache.nutch.io.MapFile$Writer.checkKey(MapFile.java:134) at org.apache.nutch.io.MapFile$Writer.append(MapFile.java:120) at org.apache.nutch.db.WebDBWriter$LinksByMD5Processor.mergeEdits(WebDBWriter.java:1042) at org.apache.nutch.db.WebDBWriter$CloseProcessor.closeDown(WebDBWriter.java:557) at org.apache.nutch.db.WebDBWriter.close(WebDBWriter.java:1612) at org.apache.nutch.tools.UpdateDatabaseTool.close(UpdateDatabaseTool.java:321) at org.apache.nutch.tools.UpdateDatabaseTool.main(UpdateDatabaseTool.java:371) at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:141) ------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
