Dennis Kubes wrote: > Ok, I ran some bigger test crawls > 150K with the 0.9RC. Everything > worked fine (inject, generate, fetch, updatedb, readdb, linkdb, > mergesegs, mergdb, merge, index) except delete duplicates on which I am > getting this error when running against segment indexes on the DFS. > > Because of the way I am automating some of my crawls (sorting names by > alpha and only running part of the list), only one segment part-xxxxx > had results and then others had 0 results. I don't know if that would > cause this and I don't think this bug is critical for the 0.9 release > but I wanted to bring it up.
Please try the patch included at the end. > > My guess would be that this is a small bug within the lucene libraries > when the directories have 0 results. What is everyone's opinion on this > in terms of the release? My vote would be to move forward with the > release. I think we should move forward. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com Index: DeleteDuplicates.java =================================================================== --- DeleteDuplicates.java (revision 521176) +++ DeleteDuplicates.java (working copy) @@ -158,19 +158,28 @@ public class DDRecordReader implements RecordReader { private IndexReader indexReader; - private int maxDoc; - private int doc; + private int maxDoc = 0; + private int doc = 0; private Text index; public DDRecordReader(FileSplit split, JobConf job, Text index) throws IOException { - indexReader = IndexReader.open(new FsDirectory(FileSystem.get(job), split.getPath(), false, job)); - maxDoc = indexReader.maxDoc(); + try { + indexReader = IndexReader.open(new FsDirectory(FileSystem.get(job), split.getPath(), false, job)); + maxDoc = indexReader.maxDoc(); + } catch (IOException ioe) { + LOG.warn("Can't open index at " + split + ", skipping. (" + ioe.getMessage() + ")"); + indexReader = null; + } this.index = index; } public boolean next(Writable key, Writable value) throws IOException { + + // skip empty indexes + if (indexReader == null || maxDoc <= 0) + return false; // skip deleted documents while (indexReader.isDeleted(doc) && doc < maxDoc) doc++; ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
