On Fri, May 22, 2009 at 12:52 PM, Marvin Humphrey
<[email protected]> wrote:
>
>> when working on 3.1 if we make some great improvement, I'd like new users in
>> 3.1 to see the improvement by default.
>
> Sounds like an argument for more frequent major releases.
Yeah. Or "rebranding" what we now call minor as major releases, by
changing our policy ;) Or "rebranding" to Lucene 2009.
But: localized improvements (like the sizable performance gain from
turning off scoring when sorting by field) should not have to wait for
a major release to benefit new users. I think they should be on by
default on the next release.
Will Lucy do scoring when sorting by field, by default?
>> On thinking about it more... automagically storing the "actsAsVersion"
>> in the index, and then having IndexWriter (for example) ask the
>> analyzer for a tokenStream matching that version, seems a little too
>> sneaky.
>
> Can you elaborate?
>
> In KinoSearch SVN trunk, satellite classes like QueryParser and Highlighter
> have to be passed a Schema, which contains all the Analyzers. Analyzers
> aren't satellite classes under this model -- they are a fixed property of a
> FullTextType field spec. Think of them as baked into an SQL field definition.
>
> You can create a Schema from scratch to pass to the QueryParser, but it's
> easier to just get it from the Searcher. Translating to Java...
>
> Searcher searcher = new Searcher("/path/to/index");
> QueryParser qparser = new QueryParser(searcher.getSchema());
>
> I don't see how that's so different from getting an analyzer actsAsVersion
> number from the index.
I agree in KS/Lucy, it works well, because you must explicitly pass in
Schema to each of the satellite classes.
But in Lucene, if whenever IndexWriter asked analyzer for a
tokenstream, it passed in the actsAsVersion it had loaded from the
index, that's sneaky. I'd rather have it explicit (like KS/Lucy), so
you'd have to IndexWrter.getActsAsVersion, then pass that into your
analyzer when you create it. It's the automatic under-the-hood
passing that makes me nervous and I think would confuse users.
(That said, unrelated to this discussion, I would actually like to
record per-segment which version of Lucene wrote the segment; this
would be very helpful when debugging issues like LUCENE-1474 where I
need to know if the segments were written by 2.4.0 or 2.4.1).
> Now, where stuff might start to get complicated is PerFieldAnalyzerWrapper...
> is that where the sneakiness gets overwhelming?
Per-class actsAsVersion would work well here -- PFAW would just
forward the required version when requesting the tokenStream.
>> I prefer the up-front "you specify actsAsVersion" when you
>> create the analyzer, only for analyzers that have changed across
>> releases. So things like WhitespaceAnalyzer would likely never need
>> an actsAsVersion arg.
>
> Hmm, this is kind of hard. I'd prefer that the argument remain optional, so
> that new users don't have to think about it.
I wouldn't mind optional, but only if it defaults to latest and
greatest. The goal here is to have new users always see the best of
Lucene when they start out.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]