[jira] Commented: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass

Michael McCandless (JIRA) Sat, 18 Dec 2010 01:36:25 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972759#action_12972759
 ]


Michael McCandless commented on LUCENE-2694:
--------------------------------------------

I love seeing cacheCurrentTerm removed!!

OK I think we are close!  A bunch of smallish things:

  * I think we should remove TermsEnum.docFreq and .ord?  Ie replace
    with .termState().docFreq() and .ord()?

  * At first I was thinking we should merge up TermStateBase into
    TermState but actually there are cases (eg PulsingCodec, which )
    where you want the separation.

  * Maybe rename TermStateBase -> PrefixCodedTermState?  Ie this is
    really the TermState impl used by any codec using
    PrefixCodedTerms?  EG the fact that it stores the filePointer into
    a _X.tis file is particular to it...

  * Maybe rename MockTermState -> BasicTermState?  At first I was
    thinking the codec should return null if it cannot seek by
    TermState... (I generally don't like mock returns that hide/lose
    information...) but then it's convenient to always have something
    to hold the docFreq for the term to avoid lots of special cased
    code... so I think it's OK?

  * We lost the "clone using new" in StandardTermState...

  * Maybe revert changes to AppendingCodec?  (Ie let it pass its terms
    dict cache size again)

  * I wonder if we can somehow make PerReaderTermState use an array
    (keyed by sub reader index) instead... seems like a new HashMap
    per Term in an MTQ could be heavy.  It's tricky because we don't
    store enough information (ie to quickly map parent reader + sub
    reader -> sub index). But I don't think this should hold up
    committing... since our defaults don't typically allow for *that*
    many terms in-flight it should be fine...

  * It's a little spooky the TermQuery.scorer calls .take()
    (destructive), eg it means if you ask for scorer again on same
    reader you get diff't behavior?  Can we make that a .get() instead
    of .take()?  (This may also bite us if we use diff't threads to
    score each segment, ie suddenly this .take() must be thread safe).
    In fact, same deal w/ nulling out the TQ.perReaderTermState?

  * The comment on top of TermStateByteStart looks wrong?

  * Small whitespace issue -- missing space on "if(".  Also, our
    generics are not supposed to have whitespace inside, eg we
    shouldn't have the space in "new DoubleBarrelLRUCache<FieldAndTerm, 
TermStateBase>(termsCacheSize);"

  * I think the TQ ctor that takes both docFreq and states can drop
    the docFreq?  Ie it can ask the states for it?


> MTQ rewrite + weight/scorer init should be single pass
> ------------------------------------------------------
>
>                 Key: LUCENE-2694
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2694
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, 
> LUCENE-2694.patch
>
>
> Spinoff of LUCENE-2690 (see the hacked patch on that issue)...
> Once we fix MTQ rewrite to be per-segment, we should take it further and make 
> weight/scorer init also run in the same single pass as rewrite.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass

Reply via email to