[ 
https://issues.apache.org/jira/browse/LUCENE-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891707#action_12891707
 ] 

Kyle L. commented on LUCENE-2553:
---------------------------------

Gotcha. Thanks for the info, I will make the changes to the docId and let you 
know if it comes up again. I do have some questions relating to your comments:

# You say it's not performant (the documentation says the same but no 
explanation as to why). What I find unclear is that the API for 
{{IndexSearcher}} only provides doc(...) methods for pulling elements out one 
at a time. If I were to store the re-based ids and only load them after all the 
ids have been collected, I would expect there to be a batch 
{{doc(Set<Integer>)}} to which I would ascribe performance improvements over 
iterating over every collected document id. What exactly makes loading the 
document ids faster outside of the {{Collector}}? Perhaps is there the risk 
that the same rebased document id may be collected twice during a search?
# It would be great if the documentation for {{Collector}} were to be enhanced 
to answer this question and provide some pointers to other people who may have 
needs for a bare-bones simple {{Collector}} like the one I mentioned above. 
Would you like me to create a JIRA task for this?

Anyhoo, thanks for your help!

> IOException: read past EOF
> --------------------------
>
>                 Key: LUCENE-2553
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2553
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: Kyle L.
>
> We have been getting an {{IOException}} with the following stack trace:
> \\
> \\
> {noformat}
> java.io.IOException: read past EOF
>       at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>       at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>       at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>       at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
>       at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
>       at 
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
>       at 
> com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
>       at 
> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
>       at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
>       at org.apache.lucene.search.Searcher.search(Searcher.java:67)
>         ...
> {noformat}
> \\
> \\
> We have implemented a basic custom collector that collects all hits in an 
> unordered manner:
> {code}
>     private class AllHitsUnsortedCollector extends Collector {
>         private Log logger = 
> LogFactory.getLog(AllHitsUnsortedCollector.class); 
>         private IndexReader reader;
>         private int baselineDocumentId;
>         private List<Document> matchingDocuments = new ArrayList<Document>();
>         
>         @Override
>         public boolean acceptsDocsOutOfOrder() {
>             return true;
>         }
>         @Override
>         public void collect(int docId) throws IOException {
>             int documentId = baselineDocumentId + docId;
>             Document document = reader.document(documentId, 
> getFieldSelector());
>             
>             if (document == null) {
>                 logger.info("Null document from search results!");
>             } else {
>                 matchingDocuments.add(document);
>             }
>         }
>         @Override
>         public void setNextReader(IndexReader segmentReader, int baseDocId) 
> throws IOException {
>             this.reader = segmentReader;
>             this.baselineDocumentId = baseDocId;
>         }
>         @Override
>         public void setScorer(Scorer scorer) throws IOException {
>             // do nothing
>         }
>         public List<Document> getMatchingDocuments() {
>             return matchingDocuments;
>         }
>     }
> {code}
> The exception arises when users perform searches while indexing/optimization 
> is occurring. Our {{IndexReader}} is read-only. From the documentation I have 
> read, a read-only {{IndexReader}} instance should be immune from any 
> uncommitted index changes and should return consistent results during 
> indexing and optimization. As this exception occurs during 
> indexing/optimization, it seems to me that the read-only {{IndexReader}} is 
> somehow stumbling upon the uncommitted content? 
> The problem is difficult to replicate as it is sporadic in nature and so far 
> has only occurred in Production.
> We have rebuilt the indexes a number of times, but that does not seem to 
> alleviate the issue.
> Any other information I can provide that will help isolate the issue? 
> The most likely other possibility is that the {{Collector}} we have written 
> is doing something it shouldn't. Any pointers?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to