[ 
https://issues.apache.org/jira/browse/LUCENE-4688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555201#comment-13555201
 ] 

Michael McCandless commented on LUCENE-4688:
--------------------------------------------

I think it's interesting/powerful to enable across-segment reuse: none
of our other reuse APIs (DocsEnum, D&PEnum) can do that.

But I'm not sure we should do it: to take full advantage of it
requires API changes (like the MTQ.getTermsEnum change) ... we'd have
to do something similar to Weight/Scorer to share the D/&PEnum across
segments.

The patch itself is spooky: this BlockTree code is hairy, and I'm not
sure that the reset() isn't going to cause subtle corner-case bugs.
(Separately: we need to simplify this code: it's unapproachable now).

The benchmark gain is impressive, but, we are talking about 10 seconds
over 2M docs right? So 5 micro-seconds (.005 msec) per document?  In a
more realistic scenario (indexing more "normal" docs) surely this is a
minor part of the time ...

The app can always reuse itself per-segment today ... I think reuse is
rather expert so it's OK to offer that as the way to reuse?

                
> Reuse TermsEnum in BlockTreeTermsReader
> ---------------------------------------
>
>                 Key: LUCENE-4688
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4688
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>    Affects Versions: 4.0, 4.1
>            Reporter: Simon Willnauer
>             Fix For: 4.2, 5.0
>
>         Attachments: LUCENE-4688.patch
>
>
> Opening a TermsEnum comes with a significant cost at this point if done 
> frequently like primary key lookups or if many segments are present. 
> Currently we don't reuse it at all and create a lot of objects even if the 
> enum is just used for a single seekExact (ie. TermQuery). Stressing the 
> Terms#iterator(reuse) call shows significant gains with reuse...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to