On Mon, 2011-02-21 at 16:00 +0100, Simon Willnauer wrote: > For all real codecs seek(BR, TermState) should be as fast as it gets. > There are some codecs which simply forward to seek(BR) so if you have > the TermState already you won't loose anything. This might also answer > your other question, if you pass an empty BytesRef to a codec that did > not override the seek(BR, TermState) method it will seek to the empty > term and your code might not work anymore.
Thanks, that makes sense. It seems to me that I'll have to use the strategy pattern and make a TermsEnum-implementation-aware wrapper (or rather codec-aware?), if I want the "best" ordinal-seeker. Toke: > > I tried calling with an empty BytesRef term. This gave me an empty > > result back for the call itself, but the correct terms for subsequent > > calls to next. This works perfectly for my scenario. However, that was > > just an experiment using the default variable gap codec, so I am unsure > > if I can count on this behavior for any given codec? > > what do you mean by an empty result for the call itself? Sorry, I mixed things up. I mean I tried calling with an empty term and getting the term with the term()-method, which returned an empty BytesRef after the initial call. Anyway, since codec are free to fall back to BytesRef-seek, my options are reduced to seek(Bytesref, TermState) with real values or seek(Bytesref) which I expect is normally log(n) or better. > can't you us a codec that supports ord for your facet / sort fields? That was also Mike McCandless suggestion in https://issues.apache.org/jira/browse/LUCENE-2843 I think this might be counter-productive. If a non-ordinal-supporting codec has significantly lower impact on memory, the extra bookkeeping for a BytesRef/TermState-seek-cache might be small enough so that the total overhead is still less than that of an ordinal-supporting codec. I did try a quick experiment with the variable gap vs. fixed gap codec, where I kept every 32nd BytesRef+TermState for the variable gap. With a 50M term field, this increased the overhead from 600MB to 800MB (or about 130 bytes for each BytesRef/TermState-pair, ignoring the memory-impact-difference for variable vs. fixed). This clearly does not support my theory. I'll have to make a proper test, but a strong recommendation of using ordinal-supporting codec might very well be the best solution. Thanks for helping, Toke Eskildsen --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
