[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

Michael McCandless (JIRA) Thu, 28 Jan 2010 02:56:02 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael McCandless updated LUCENE-2111:
---------------------------------------

    Attachment: LUCENE-2111.patch

Attached patch with a major reworking of some parts of flex:

  * Simplified how the StandardTermsDictReader/Writer interacts with
    the postings impl.  The PostingsReader for the codec is now
    stateless, capturing all state for a given term in a dedicated
    TermState class (which also works well w/ caching, since we needed
    to capture state for that anyway).

  * Merged docs & positions readers, in the codec's impl and in the
    exposed flex API.  It was just too hairy before, with separate
    classes for reading docs & positions.  This is a step back towards
    current trunk API, ie, up front you ask for either a DocsEnum or a
    DocsAndPositionsEnum.

  * Modified API semantics: if a field or term does not exist, then
    IndexReader.termDocs/PositionsEnum may now return null (previously
    they returned a fake empty enum).  This means more Weight.scorer()
    may return null.

  * I added IndexReader.getSubReaderDocBase (there is a separate jira
    issue open for this) -- this is now more important because a
    filter can no longer guess its doc base by adding up docCount of
    all readers it sees since if the scorer for that segment is null,
    Filter.getDocIdSet will not be called.

  * Changed the reuse of Docs/AndPositionsEnum to be explicit.
    Previously the Terms or TermsEnum instance was holding a private
    reused instance... but that was no good because typically we can
    share the TermsEnum but cannot share postings enums.

  * Likeways, changed the public flex reading API, so that you don't
    separately ask for positions enum at each doc.  Instead, up front
    you either ask for a DocsEnum or a DocsAndPositionsEnum.  This
    matches how the current Lucene APIs work.

  * Terms dict cache is now at the top level, not per field (this
    matches how trunk works, ie all fields share the 1024 sized cache)

I cutover all codecs to the new API... all tests pass if you switch
the default codec (in oal.index.codec.Codecs.getWriter) to any of the
four.


> Wrapup flexible indexing
> ------------------------
>
>                 Key: LUCENE-2111
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2111
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Flex Branch
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1
>
>         Attachments: LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch
>
>
> Spinoff from LUCENE-1458.
> The flex branch is in fairly good shape -- all tests pass, initial search 
> performance testing looks good, it survived several visits from the Unicode 
> policeman ;)
> But it still has a number of nocommits, could use some more scrutiny 
> especially on the "emulate old API on flex index" and vice/versa code paths, 
> and still needs some more performance testing.  I'll do these under this 
> issue, and we should open separate issues for other self contained fixes.
> The end is in sight!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

Reply via email to