Re: Lucene's default settings & back compatibility

Shai Erera Sat, 23 May 2009 23:20:40 -0700

One thing I don't fully understand about actsAsVersion (and I know it was
said that we may want to drop that approach) - for how long does it stay? I
mean, let's take the invalidAcronym. It is a change in back-compat, yes. But
for how long are we expected to support it? And if we decide to support it
for one minor release, or even one major release, will that ctor be
deprecated? (I think it must be deprecated ...)

Also, Mike - you suggested coming up with newer names to methods to reflect
new features (such as a boolean saying whether to score when you sort). This
is strongly related to our ability to add methdods to interfaces/abstract
classes. If we add an abstract method to Searcher with the new boolean,
we're breaking back-compat.

Those specific problems (scoring when sorting) came into play only since the
introducation of the "fast and easy" search methods (which if you look at
their signature - they are not so fast and easy anymore). If we had just
search(Collector, Query) (and maybe a couple other variants which need to
take into account more than just Collector and Query) you won't have that
problem.

A reviewer, or anyone else will be required to first create a Collector.
They read somewhere that TFC is used for sorting and that it has a bunch of
static create() methods. If they don't read it, they at least see a sample
somewhere. So they create a TFC and maybe they see a couple of completions
to create or not, but at least the changes are local to TFC. We can add more
create() variants to TFC w/o breaking back-compat, because TFC is not
extandable.

Coosing the defaults of each create() is bound to whether we want the
defaults to always reflect the best usage (which I prefer). At least in the
scoring example, I was under the impression we keep scoring for the sake of
back-compat, even if by changing it, it means nothing too bad will happen
(we all kind of agree that scoring when sorting is useless, but because of
our back-compat policy we can't change it). I think there Grant's proposal
to decide on a case-by-case basis would have eliminated scoring when sorting
by default.

Shai

On Fri, May 22, 2009 at 11:14 PM, Michael McCandless <
[email protected]> wrote:

> On Fri, May 22, 2009 at 3:37 PM, DM Smith <[email protected]> wrote:
>
> > So, what is it that they use that leads to such unfavorable results?
>
> I think it's simply that they take each search engine, get it to index
> their collection in the most obvious way, perhaps having read a
> tutorial somewhere, and test that.  I'm guessing they don't spend much
> time tuning any of the search engines for what they are testing.  So
> those with the best defaults make the best impression.  First
> impressions count :)
>
> So eg they don't turn off CFS, don't increase IW's RAM buffer, don't
> turn off scoring when sorting by field, fail to omitTFAP when testing
> "pure boolean" searching, etc.
>
> These tunings are well known to all of us, but to 95% of Lucene users,
> including your casual reviewer, they aren't.
>
> I expect non-reviewers do the same, when they want try out different
> search engines.  I think it's the vast minority of people who actually
> come out to java-user to ask for help, and I bet most "potential new
> users" never discover the tuning tips on the wiki.
>
> (And: I fully agree, said reviewer and said new user *should* to do
> their homework and tune each engine to their fullest; likewise,
> readers of such reviews *should* scrutinize whether the testing was
> fair; yet typically they don't).
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Lucene's default settings & back compatibility

Reply via email to