[
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-2111:
---------------------------------------
Attachment: LUCENE-2111.patch
Attached patch with a major reworking of some parts of flex:
* Simplified how the StandardTermsDictReader/Writer interacts with
the postings impl. The PostingsReader for the codec is now
stateless, capturing all state for a given term in a dedicated
TermState class (which also works well w/ caching, since we needed
to capture state for that anyway).
* Merged docs & positions readers, in the codec's impl and in the
exposed flex API. It was just too hairy before, with separate
classes for reading docs & positions. This is a step back towards
current trunk API, ie, up front you ask for either a DocsEnum or a
DocsAndPositionsEnum.
* Modified API semantics: if a field or term does not exist, then
IndexReader.termDocs/PositionsEnum may now return null (previously
they returned a fake empty enum). This means more Weight.scorer()
may return null.
* I added IndexReader.getSubReaderDocBase (there is a separate jira
issue open for this) -- this is now more important because a
filter can no longer guess its doc base by adding up docCount of
all readers it sees since if the scorer for that segment is null,
Filter.getDocIdSet will not be called.
* Changed the reuse of Docs/AndPositionsEnum to be explicit.
Previously the Terms or TermsEnum instance was holding a private
reused instance... but that was no good because typically we can
share the TermsEnum but cannot share postings enums.
* Likeways, changed the public flex reading API, so that you don't
separately ask for positions enum at each doc. Instead, up front
you either ask for a DocsEnum or a DocsAndPositionsEnum. This
matches how the current Lucene APIs work.
* Terms dict cache is now at the top level, not per field (this
matches how trunk works, ie all fields share the 1024 sized cache)
I cutover all codecs to the new API... all tests pass if you switch
the default codec (in oal.index.codec.Codecs.getWriter) to any of the
four.
> Wrapup flexible indexing
> ------------------------
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Affects Versions: Flex Branch
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 3.1
>
> Attachments: LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch,
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch
>
>
> Spinoff from LUCENE-1458.
> The flex branch is in fairly good shape -- all tests pass, initial search
> performance testing looks good, it survived several visits from the Unicode
> policeman ;)
> But it still has a number of nocommits, could use some more scrutiny
> especially on the "emulate old API on flex index" and vice/versa code paths,
> and still needs some more performance testing. I'll do these under this
> issue, and we should open separate issues for other self contained fixes.
> The end is in sight!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]