[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

Jason Rutherglen (JIRA) Sun, 19 Dec 2010 16:25:34 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973060#action_12973060
 ]


Jason Rutherglen commented on LUCENE-2312:
------------------------------------------

I thought about opening a separate issue for RT FieldCache. In trunk it looks
like we need to add a grow method to CachedArray. However this doesn't quite
solve the problem of filling in the field cache values either on demand or as
documents are added. I'm not yet sure how the on-demand case'll work, which is
probably the most logical to implement. 

The difficult use case is terms index, as it returns the term for a docid from
an ord value. I don't think maintaining a terms index is possible on a rapidly
changing index, eg, efficiently keeping an ordered terms array. Additionally,
we can't easily tap into the CSML terms dictionary as it'll be changing and
doesn't offer ord access. 

Perhaps we'd need to hardcode the terms index to return the term for a docid,
ie, force calls to getTermsIndex to return DocTerms. The underlying doc terms
field cache can be iteratively built on-demand. I wonder if there are
compatibility issues with returning a DocTermsIndex that only effectively
implements DocTerms?

> Search on IndexWriter's RAM Buffer
> ----------------------------------
>
>                 Key: LUCENE-2312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2312
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Assignee: Michael Busch
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2312.patch
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

Reply via email to