[jira] Commented: (JCR-974) Manage Lucene FieldCaches per index segment

Christoph Kiehl (JIRA) Wed, 20 Jun 2007 05:13:46 -0700

    [ 
https://issues.apache.org/jira/browse/JCR-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506506
 ]


Christoph Kiehl commented on JCR-974:
-------------------------------------

Regarding your ItemStateManagerBasedSortComparator.patch: This patch doesn't 
work well in our scenario because we've got fairly large resultsets. I think 
your patch might handle small result sets better than my patch, but for large 
result sets there are too many documents from different index segments. Using 
your patch my query takes about 100000ms while using our patch it needs between 
200ms and 1000ms.

One of the other features of my patch is that it creates the caches lazily per 
index segment. We also played around with a global term cache so if the same 
term is returned by different index segments the same String object is used for 
the FieldCache. This minimizes the FieldCache size if one term is contained in 
multiple index segments. In our case the default FieldCache was about 4MB for a 
certain field while the patched FieldCache was about 2.5MB.

> Manage Lucene FieldCaches per index segment
> -------------------------------------------
>
>                 Key: JCR-974
>                 URL: https://issues.apache.org/jira/browse/JCR-974
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.3
>            Reporter: Christoph Kiehl
>         Attachments: ItemStateManagerBasedSortComparator.patch, patch.txt
>
>
> Jackrabbit uses an IndexSearcher which searches on a single IndexReader which 
> is most likely to be an instance of CachingMultiReader. On every search that 
> does sorting or range queries a FieldCache is populated and associated with 
> this instance of a CachingMultiReader. On successive queries which operate on 
> this CachingMultiReader you will get a tremendous speedup for queries which 
> can reuse  those associated FieldCache instances.
> The problem is that Jackrabbit creates a new CachingMultiReader _everytime_ 
> one of the underlying indexes are modified. This means if you just change 
> _one_ item in the repository you will need to rebuild all those FieldCaches 
> because the existing FieldCaches are associated with the old instance of 
> CachingMultiReader.
> This does not only lead to slow search response times for queries which 
> contains range queries or are sorted by a field but also leads to massive 
> memory consumption (depending on the size of your indexes) because there 
> might be multiple instances of CachingMultiReaders in use if you have a 
> scenario where a lot of queries and item modifications are executed 
> concurrently.
> The goal is to keep those FieldCaches as long as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-974) Manage Lucene FieldCaches per index segment

Reply via email to