Re: Lucene versioning logic

Shai Erera Mon, 04 Aug 2014 06:27:32 -0700

Right, we can also use a SetOnce wrapper to restrict that.

Shai



On Mon, Aug 4, 2014 at 4:05 PM, Steve Molloy <[email protected]> wrote:

>  I think this proposal is what makes the most sense since this discussion
> started. As for making an instance not modifiable, the setVersion could
> check if it was already called and error-out if it was. Then you could go
> from default to version 123, but at least you couldn't hop from version to
> version while the analyzer is live. This should mostly be handled by
> factories anyhow, so if factories explicitly calls setVersion all the time
> before returning an instance, the instances wouldn't be modifiable.
>
> My 2 cents. :)
>
> Steve
>  ------------------------------
> *From:* Shai Erera [[email protected]]
> *Sent:* August 3, 2014 8:51 AM
> *To:* [email protected]
> *Subject:* Re: Lucene versioning logic
>
>   OK I see, it's the Tokenizers and Filters that usually change, the
> Analyzer only needs Version to determine which TokenStream chain to return,
> and so we achieve that w/ Ryan's proposal of setVersion().
>
>  I'd still feel better if Version was a final setting on an Analyzer,
> i.e. that a single Analyzer instance always behaves consistently, and
> cannot alternate behavior down-stream if someone called setVersion(). But
> this is a really stupid thing to do. Maybe setVersion() can return an
> Analyzer, so you're sure that instance is not modifiable. But maybe this is
> just over engineering...
>
>  I'm +0.5 to that. I prefer Version to be mandatory somehow (class name,
> ctor argument), but I can live with setVersion as well...
>
> Shai
>
>
> On Sun, Aug 3, 2014 at 3:30 PM, Robert Muir <[email protected]> wrote:
>
>> No, I didn't say any of this, please read it again :)
>>
>> Old back-compat *Tokenizer/Tokenfilter* are named this way.
>> *Tokenizer/Tokenfilter*
>> Only the old ones.
>> Just to be clear: *Tokenizer/Tokenfilter*
>> Their Factories still use Version to produce the right one (as we
>> can't remove version from there, or we will have complaints from solr
>> developers).
>>
>> So users who want the version-style back compat can just use the
>> factories. really.
>> On the other hand new users can do 'new LowerCaseFilter()' without the
>> bullshit.
>>
>> For Analyzers, there is a setter. Users who want to use *OUR
>> ANALYZERS* with back compat, call the setter. But its not
>> mandatory-in-your-face-ctor.
>>
>> I am +1 to Ryan's proposal, so please look for more elaboration there.
>> I am -1 to putting Versions in the name of Analyzers.
>>
>> On Sun, Aug 3, 2014 at 8:21 AM, Shai Erera <[email protected]> wrote:
>> > Oh, I misread this part "I do think its ok to name ..." -- replaced "do"
>> > with "don't" :).
>> >
>> > So you say that if we have a FooAnalyzer in 4.5 and change its behavior
>> in
>> > 4.9, then we add a Foo45Analyzer as a back-compat support, and
>> FooAnalyzer
>> > in 4.9 keeps its name, but with different behavior?
>> >
>> > That means that an app who didn't read CHANGES will be broken upon
>> upgrade,
>> > but if it does read CHANGES, it at least has a way to retain desired
>> > behavior.
>> >
>> > So the thing now is whether FooAnalyzer is always _current_ and an app
>> > should choose a backwards version of it (if it wants to), vs if
>> FooAnalyzer
>> > is _always the same_, and if you want to move forward you have to
>> explicitly
>> > use a NewFooAnalyzer?
>> >
>> > Of course, when FooAnalyzer takes a Version, then an app only needs to
>> > change its Version CONSTANT, to get best behavior ... but as you point
>> out,
>> > seems like we failed to implement that approach in our code already,
>> which
>> > suggests this approach is not intuitive to our committers, so why do we
>> > expect our users to understand it ...
>> >
>> > I am +1 on either of the approaches (both get rid of Version.java). I
>> don't
>> > feel bad with asking users to read CHANGES before they upgrade, and it
>> does
>> > mean that FooAnalyzer always gives you the best behavior, which is
>> important
>> > for new users or if you always re-index. Vs the second approach which
>> always
>> > prefers backwards compatibility, and telling users to read the javadocs
>> (and
>> > CHANGES)  in order to find the best version of FooAnalyzer.
>> >
>> > There is another issue w/ a global Version CONSTANT, which today we
>> > encourage apps to use -- if you use two analyzers, but you want to work
>> with
>> > a different Version of each (because of all sorts of reasons), having a
>> > global constant is bad. The explicit Foo45Analyzer (or Foo49Analyzer,
>> > whichever) lets you mix whichever versions that you want.
>> >
>> > Shai
>> >
>> > On Sun, Aug 3, 2014 at 3:02 PM, Robert Muir <[email protected]> wrote:
>> >>
>> >> You don't read what i wrote.
>> >>
>> >> Read it again.
>> >>
>> >>
>> >> On Sun, Aug 3, 2014 at 7:49 AM, Shai Erera <[email protected]> wrote:
>> >> > Yes, I agree that Foo49Analyzer is an odd name. Better if it was
>> named
>> >> > FooAnalyzerWithNoApostrophe, and I'm fine if that Analyzer chose to
>> name
>> >> > its
>> >> > different versions like that. But in the absence of better naming
>> ideas,
>> >> > I
>> >> > proposed the Foo49Analyzer.
>> >> >
>> >> > If we already have such Analyzers, then we are in fact implementing
>> that
>> >> > approach, only didn't make that decision globally. So whether it's
>> odd
>> >> > or
>> >> > not, let's first agree if we are willing to have these analyzers in
>> our
>> >> > code
>> >> > base (i.e. w/ the back-compat support). If we do, we can let each
>> >> > Analyzer
>> >> > decide on its naming.
>> >> >
>> >> > Analyzers aren't Codecs, I agree, and sticking the Lucene version in
>> >> > their
>> >> > name is probably not the best thing to do, as the Lucene version is
>> more
>> >> > associated with the index format. But if a fixed Analyzer cannot
>> come up
>> >> > w/
>> >> > a better name, I think the Lucene version there is not that horrible.
>> >> >
>> >> > And, it lets us easily remove Version.java.
>> >> >
>> >> > Shai
>> >> >
>> >> >
>> >> > On Sun, Aug 3, 2014 at 2:43 PM, Robert Muir <[email protected]>
>> wrote:
>> >> >>
>> >> >> On Sat, Aug 2, 2014 at 12:41 PM, Shai Erera <[email protected]>
>> wrote:
>> >> >> > Another proposal that I made on LUCENE-5859 is to get rid of
>> Version
>> >> >> > (for
>> >> >> > Analyzers) and follow the solution we have with Codecs. If an
>> >> >> > Analyzer
>> >> >> > changes its runtime behavior, and e.g not marked @experimental, it
>> >> >> > can
>> >> >> > create a Foo49Analyzer with the new behavior. That way, apps are
>> >> >> > still
>> >> >> > safe
>> >> >> > when they upgrade, since their Foo45Analyzer still exists (but
>> >> >> > deprecated).
>> >> >> > And they can always copy a Foo45Analyzer when they upgrade to
>> Lucene
>> >> >> > 6.0
>> >> >> > where it no longer exists... with this approach, there's no single
>> >> >> > version
>> >> >> > across the app - it just uses the specific Analyzer impls.
>> >> >>
>> >> >> But the usability here would be really bad.
>> >> >>
>> >> >> For codecs there isn't much a better thing to name it anyway, and
>> >> >> codecs are super-expert to change.
>> >> >>
>> >> >> For analyzers usability is paramount.
>> >> >>
>> >> >> I do think its ok to name _backwards_ compat tokenizer/tokenfilter
>> >> >> classes this way. In fact its already this way in trunk for any back
>> >> >> compat *actually doing something*: Lucene43NgramTokenizer,
>> >> >> Lucene47WordDelimiterFilter. The Version parameters are just for
>> show,
>> >> >> not doing anything!
>> >> >>
>> >> >>
>> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: [email protected]
>> >> >> For additional commands, e-mail: [email protected]
>> >> >>
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>    ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] For additional
> commands, e-mail: [email protected]
>

Re: Lucene versioning logic

Reply via email to