[
https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972759#action_12972759
]
Michael McCandless commented on LUCENE-2694:
--------------------------------------------
I love seeing cacheCurrentTerm removed!!
OK I think we are close! A bunch of smallish things:
* I think we should remove TermsEnum.docFreq and .ord? Ie replace
with .termState().docFreq() and .ord()?
* At first I was thinking we should merge up TermStateBase into
TermState but actually there are cases (eg PulsingCodec, which )
where you want the separation.
* Maybe rename TermStateBase -> PrefixCodedTermState? Ie this is
really the TermState impl used by any codec using
PrefixCodedTerms? EG the fact that it stores the filePointer into
a _X.tis file is particular to it...
* Maybe rename MockTermState -> BasicTermState? At first I was
thinking the codec should return null if it cannot seek by
TermState... (I generally don't like mock returns that hide/lose
information...) but then it's convenient to always have something
to hold the docFreq for the term to avoid lots of special cased
code... so I think it's OK?
* We lost the "clone using new" in StandardTermState...
* Maybe revert changes to AppendingCodec? (Ie let it pass its terms
dict cache size again)
* I wonder if we can somehow make PerReaderTermState use an array
(keyed by sub reader index) instead... seems like a new HashMap
per Term in an MTQ could be heavy. It's tricky because we don't
store enough information (ie to quickly map parent reader + sub
reader -> sub index). But I don't think this should hold up
committing... since our defaults don't typically allow for *that*
many terms in-flight it should be fine...
* It's a little spooky the TermQuery.scorer calls .take()
(destructive), eg it means if you ask for scorer again on same
reader you get diff't behavior? Can we make that a .get() instead
of .take()? (This may also bite us if we use diff't threads to
score each segment, ie suddenly this .take() must be thread safe).
In fact, same deal w/ nulling out the TQ.perReaderTermState?
* The comment on top of TermStateByteStart looks wrong?
* Small whitespace issue -- missing space on "if(". Also, our
generics are not supposed to have whitespace inside, eg we
shouldn't have the space in "new DoubleBarrelLRUCache<FieldAndTerm,
TermStateBase>(termsCacheSize);"
* I think the TQ ctor that takes both docFreq and states can drop
the docFreq? Ie it can ask the states for it?
> MTQ rewrite + weight/scorer init should be single pass
> ------------------------------------------------------
>
> Key: LUCENE-2694
> URL: https://issues.apache.org/jira/browse/LUCENE-2694
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch,
> LUCENE-2694.patch
>
>
> Spinoff of LUCENE-2690 (see the hacked patch on that issue)...
> Once we fix MTQ rewrite to be per-segment, we should take it further and make
> weight/scorer init also run in the same single pass as rewrite.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]