Re: Lucene versioning logic

Shai Erera Sun, 03 Aug 2014 05:52:18 -0700

OK I see, it's the Tokenizers and Filters that usually change, the Analyzer
only needs Version to determine which TokenStream chain to return, and so
we achieve that w/ Ryan's proposal of setVersion().


I'd still feel better if Version was a final setting on an Analyzer, i.e.
that a single Analyzer instance always behaves consistently, and cannot
alternate behavior down-stream if someone called setVersion(). But this is
a really stupid thing to do. Maybe setVersion() can return an Analyzer, so
you're sure that instance is not modifiable. But maybe this is just over
engineering...

I'm +0.5 to that. I prefer Version to be mandatory somehow (class name,
ctor argument), but I can live with setVersion as well...

Shai


On Sun, Aug 3, 2014 at 3:30 PM, Robert Muir <rcm...@gmail.com> wrote:

> No, I didn't say any of this, please read it again :)
>
> Old back-compat *Tokenizer/Tokenfilter* are named this way.
> *Tokenizer/Tokenfilter*
> Only the old ones.
> Just to be clear: *Tokenizer/Tokenfilter*
> Their Factories still use Version to produce the right one (as we
> can't remove version from there, or we will have complaints from solr
> developers).
>
> So users who want the version-style back compat can just use the
> factories. really.
> On the other hand new users can do 'new LowerCaseFilter()' without the
> bullshit.
>
> For Analyzers, there is a setter. Users who want to use *OUR
> ANALYZERS* with back compat, call the setter. But its not
> mandatory-in-your-face-ctor.
>
> I am +1 to Ryan's proposal, so please look for more elaboration there.
> I am -1 to putting Versions in the name of Analyzers.
>
> On Sun, Aug 3, 2014 at 8:21 AM, Shai Erera <ser...@gmail.com> wrote:
> > Oh, I misread this part "I do think its ok to name ..." -- replaced "do"
> > with "don't" :).
> >
> > So you say that if we have a FooAnalyzer in 4.5 and change its behavior
> in
> > 4.9, then we add a Foo45Analyzer as a back-compat support, and
> FooAnalyzer
> > in 4.9 keeps its name, but with different behavior?
> >
> > That means that an app who didn't read CHANGES will be broken upon
> upgrade,
> > but if it does read CHANGES, it at least has a way to retain desired
> > behavior.
> >
> > So the thing now is whether FooAnalyzer is always _current_ and an app
> > should choose a backwards version of it (if it wants to), vs if
> FooAnalyzer
> > is _always the same_, and if you want to move forward you have to
> explicitly
> > use a NewFooAnalyzer?
> >
> > Of course, when FooAnalyzer takes a Version, then an app only needs to
> > change its Version CONSTANT, to get best behavior ... but as you point
> out,
> > seems like we failed to implement that approach in our code already,
> which
> > suggests this approach is not intuitive to our committers, so why do we
> > expect our users to understand it ...
> >
> > I am +1 on either of the approaches (both get rid of Version.java). I
> don't
> > feel bad with asking users to read CHANGES before they upgrade, and it
> does
> > mean that FooAnalyzer always gives you the best behavior, which is
> important
> > for new users or if you always re-index. Vs the second approach which
> always
> > prefers backwards compatibility, and telling users to read the javadocs
> (and
> > CHANGES)  in order to find the best version of FooAnalyzer.
> >
> > There is another issue w/ a global Version CONSTANT, which today we
> > encourage apps to use -- if you use two analyzers, but you want to work
> with
> > a different Version of each (because of all sorts of reasons), having a
> > global constant is bad. The explicit Foo45Analyzer (or Foo49Analyzer,
> > whichever) lets you mix whichever versions that you want.
> >
> > Shai
> >
> > On Sun, Aug 3, 2014 at 3:02 PM, Robert Muir <rcm...@gmail.com> wrote:
> >>
> >> You don't read what i wrote.
> >>
> >> Read it again.
> >>
> >>
> >> On Sun, Aug 3, 2014 at 7:49 AM, Shai Erera <ser...@gmail.com> wrote:
> >> > Yes, I agree that Foo49Analyzer is an odd name. Better if it was named
> >> > FooAnalyzerWithNoApostrophe, and I'm fine if that Analyzer chose to
> name
> >> > its
> >> > different versions like that. But in the absence of better naming
> ideas,
> >> > I
> >> > proposed the Foo49Analyzer.
> >> >
> >> > If we already have such Analyzers, then we are in fact implementing
> that
> >> > approach, only didn't make that decision globally. So whether it's odd
> >> > or
> >> > not, let's first agree if we are willing to have these analyzers in
> our
> >> > code
> >> > base (i.e. w/ the back-compat support). If we do, we can let each
> >> > Analyzer
> >> > decide on its naming.
> >> >
> >> > Analyzers aren't Codecs, I agree, and sticking the Lucene version in
> >> > their
> >> > name is probably not the best thing to do, as the Lucene version is
> more
> >> > associated with the index format. But if a fixed Analyzer cannot come
> up
> >> > w/
> >> > a better name, I think the Lucene version there is not that horrible.
> >> >
> >> > And, it lets us easily remove Version.java.
> >> >
> >> > Shai
> >> >
> >> >
> >> > On Sun, Aug 3, 2014 at 2:43 PM, Robert Muir <rcm...@gmail.com> wrote:
> >> >>
> >> >> On Sat, Aug 2, 2014 at 12:41 PM, Shai Erera <ser...@gmail.com>
> wrote:
> >> >> > Another proposal that I made on LUCENE-5859 is to get rid of
> Version
> >> >> > (for
> >> >> > Analyzers) and follow the solution we have with Codecs. If an
> >> >> > Analyzer
> >> >> > changes its runtime behavior, and e.g not marked @experimental, it
> >> >> > can
> >> >> > create a Foo49Analyzer with the new behavior. That way, apps are
> >> >> > still
> >> >> > safe
> >> >> > when they upgrade, since their Foo45Analyzer still exists (but
> >> >> > deprecated).
> >> >> > And they can always copy a Foo45Analyzer when they upgrade to
> Lucene
> >> >> > 6.0
> >> >> > where it no longer exists... with this approach, there's no single
> >> >> > version
> >> >> > across the app - it just uses the specific Analyzer impls.
> >> >>
> >> >> But the usability here would be really bad.
> >> >>
> >> >> For codecs there isn't much a better thing to name it anyway, and
> >> >> codecs are super-expert to change.
> >> >>
> >> >> For analyzers usability is paramount.
> >> >>
> >> >> I do think its ok to name _backwards_ compat tokenizer/tokenfilter
> >> >> classes this way. In fact its already this way in trunk for any back
> >> >> compat *actually doing something*: Lucene43NgramTokenizer,
> >> >> Lucene47WordDelimiterFilter. The Version parameters are just for
> show,
> >> >> not doing anything!
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >> >>
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Lucene versioning logic

Reply via email to