[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464012 ]
Chuck Williams commented on LUCENE-769: --------------------------------------- I have this same issue with a constantly changing large index where users needs a current view. The frist search after each frequent IndexReader reopen is slow due primarily to the requirement to rebuild the FieldCache for sort fields. I don't believe this patch, or any continuation along these lines, will help my issue. Documents are lage and queries frequently return large results sets, say 20% of the entire multi-million document index or more. Hundreds of thousands of document() retrievals, even with a fast LOAD_AND_BREAK FieldSelector finding sort fields at the beginning of each Document, is not going to beat FieldCache's single traversal of the postings for the sort fieds. Another approach I've looked at is Robert Engel's IndexReader.reopen(). I think this direction is more promising. Artem, you might want to look at this. At least the version I've seen is not integrated with FieldCache, but it seems this would be feasible. Segments to the left of the first changed segment maintain their doc-ids, so an improved FieldCache could iterate just the postings in the first changed segment and those to the right. Unless somebody else does this first, it's on my list to improve IndexReader.reopen() with this optimization and to make other enhancements my app needs (e.g., support for ParallelReader -- the current implementation fails in this case). A specific comment on the new patch: the introduction of FieldSelectors is too restrictive. The same doc-id may be retrieved using multiple FieldSelectors in different calls to IndexReader.document(). Any implementation of the cache needs to support this. > [PATCH] Performance improvement for some cases of sorted search > --------------------------------------------------------------- > > Key: LUCENE-769 > URL: https://issues.apache.org/jira/browse/LUCENE-769 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: 2.0.0 > Reporter: Artem Vasiliev > Attachments: DocCachingSorting.patch, DocCachingSorting.patch, > StoredFieldSorting.patch > > > It's a small addition to Lucene that significantly lowers memory consumption > and improves performance for sorted searches with frequent index updates and > relatively big indexes (>1mln docs) scenario. This solution supports only > single-field sorting currently (which seem to be quite popular use case). > Multiple fields support can be added without much trouble. > The solution is this: documents from the sorting set (instead of given > field's values from the whole index - current FieldCache approach) are cached > in a WeakHashMap so the cached items are candidates for GC. Their fields > values are then fetched from the cache and compared while sorting. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]