[ https://issues.apache.org/jira/browse/LUCENE-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891707#action_12891707 ]
Kyle L. commented on LUCENE-2553: --------------------------------- Gotcha. Thanks for the info, I will make the changes to the docId and let you know if it comes up again. I do have some questions relating to your comments: # You say it's not performant (the documentation says the same but no explanation as to why). What I find unclear is that the API for {{IndexSearcher}} only provides doc(...) methods for pulling elements out one at a time. If I were to store the re-based ids and only load them after all the ids have been collected, I would expect there to be a batch {{doc(Set<Integer>)}} to which I would ascribe performance improvements over iterating over every collected document id. What exactly makes loading the document ids faster outside of the {{Collector}}? Perhaps is there the risk that the same rebased document id may be collected twice during a search? # It would be great if the documentation for {{Collector}} were to be enhanced to answer this question and provide some pointers to other people who may have needs for a bare-bones simple {{Collector}} like the one I mentioned above. Would you like me to create a JIRA task for this? Anyhoo, thanks for your help! > IOException: read past EOF > -------------------------- > > Key: LUCENE-2553 > URL: https://issues.apache.org/jira/browse/LUCENE-2553 > Project: Lucene - Java > Issue Type: Bug > Components: Search > Affects Versions: 3.0.2 > Reporter: Kyle L. > > We have been getting an {{IOException}} with the following stack trace: > \\ > \\ > {noformat} > java.io.IOException: read past EOF > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) > at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) > at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92) > at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218) > at > org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901) > at > com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520) > at > org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212) > at org.apache.lucene.search.Searcher.search(Searcher.java:67) > ... > {noformat} > \\ > \\ > We have implemented a basic custom collector that collects all hits in an > unordered manner: > {code} > private class AllHitsUnsortedCollector extends Collector { > private Log logger = > LogFactory.getLog(AllHitsUnsortedCollector.class); > private IndexReader reader; > private int baselineDocumentId; > private List<Document> matchingDocuments = new ArrayList<Document>(); > > @Override > public boolean acceptsDocsOutOfOrder() { > return true; > } > @Override > public void collect(int docId) throws IOException { > int documentId = baselineDocumentId + docId; > Document document = reader.document(documentId, > getFieldSelector()); > > if (document == null) { > logger.info("Null document from search results!"); > } else { > matchingDocuments.add(document); > } > } > @Override > public void setNextReader(IndexReader segmentReader, int baseDocId) > throws IOException { > this.reader = segmentReader; > this.baselineDocumentId = baseDocId; > } > @Override > public void setScorer(Scorer scorer) throws IOException { > // do nothing > } > public List<Document> getMatchingDocuments() { > return matchingDocuments; > } > } > {code} > The exception arises when users perform searches while indexing/optimization > is occurring. Our {{IndexReader}} is read-only. From the documentation I have > read, a read-only {{IndexReader}} instance should be immune from any > uncommitted index changes and should return consistent results during > indexing and optimization. As this exception occurs during > indexing/optimization, it seems to me that the read-only {{IndexReader}} is > somehow stumbling upon the uncommitted content? > The problem is difficult to replicate as it is sporadic in nature and so far > has only occurred in Production. > We have rebuilt the indexes a number of times, but that does not seem to > alleviate the issue. > Any other information I can provide that will help isolate the issue? > The most likely other possibility is that the {{Collector}} we have written > is doing something it shouldn't. Any pointers? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org