[jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

Chuck Williams (JIRA) Thu, 11 Jan 2007 13:00:49 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464012
 ]


Chuck Williams commented on LUCENE-769:
---------------------------------------

I have this same issue with a constantly changing large index where users needs 
a current view.  The frist search after each frequent IndexReader reopen is 
slow due primarily to the requirement to rebuild the FieldCache for sort fields.

I don't believe this patch, or any continuation along these lines, will help my 
issue.  Documents are lage and queries frequently return large results sets, 
say 20% of the entire multi-million document index or more.  Hundreds of 
thousands of document() retrievals, even with a fast LOAD_AND_BREAK 
FieldSelector finding sort fields at the beginning of each Document, is not 
going to beat FieldCache's single traversal of the postings for the sort fieds.

Another approach I've looked at is Robert Engel's IndexReader.reopen().  I 
think this direction is more promising.  Artem, you might want to look at this. 
 At least the version I've seen is not integrated with FieldCache, but it seems 
this would be feasible.  Segments to the left of the first changed segment 
maintain their doc-ids, so an improved FieldCache could iterate just the 
postings in the first changed segment and those to the right.  Unless somebody 
else does this first, it's on my list to improve IndexReader.reopen() with this 
optimization and to make other enhancements my app needs (e.g., support for 
ParallelReader -- the current implementation fails in this case).

A specific comment on the new patch:  the introduction of FieldSelectors is too 
restrictive.  The same doc-id may be retrieved using multiple FieldSelectors in 
different calls to IndexReader.document().  Any implementation of the cache 
needs to support this.


> [PATCH] Performance improvement for some cases of sorted search
> ---------------------------------------------------------------
>
>                 Key: LUCENE-769
>                 URL: https://issues.apache.org/jira/browse/LUCENE-769
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.0.0
>            Reporter: Artem Vasiliev
>         Attachments: DocCachingSorting.patch, DocCachingSorting.patch, 
> StoredFieldSorting.patch
>
>
> It's a small addition to Lucene that significantly lowers memory consumption 
> and improves performance for sorted searches with frequent index updates and 
> relatively big indexes (>1mln docs) scenario. This solution supports only 
> single-field sorting currently (which seem to be quite popular use case). 
> Multiple fields support can be added without much trouble.
> The solution is this: documents from the sorting set (instead of given 
> field's values from the whole index - current FieldCache approach) are cached 
> in a WeakHashMap so the cached items are candidates for GC.  Their fields 
> values are then fetched from the cache and compared while sorting.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

Reply via email to