OK I see, it's the Tokenizers and Filters that usually change, the Analyzer only needs Version to determine which TokenStream chain to return, and so we achieve that w/ Ryan's proposal of setVersion().
I'd still feel better if Version was a final setting on an Analyzer, i.e. that a single Analyzer instance always behaves consistently, and cannot alternate behavior down-stream if someone called setVersion(). But this is a really stupid thing to do. Maybe setVersion() can return an Analyzer, so you're sure that instance is not modifiable. But maybe this is just over engineering... I'm +0.5 to that. I prefer Version to be mandatory somehow (class name, ctor argument), but I can live with setVersion as well... Shai On Sun, Aug 3, 2014 at 3:30 PM, Robert Muir <[email protected]> wrote: > No, I didn't say any of this, please read it again :) > > Old back-compat *Tokenizer/Tokenfilter* are named this way. > *Tokenizer/Tokenfilter* > Only the old ones. > Just to be clear: *Tokenizer/Tokenfilter* > Their Factories still use Version to produce the right one (as we > can't remove version from there, or we will have complaints from solr > developers). > > So users who want the version-style back compat can just use the > factories. really. > On the other hand new users can do 'new LowerCaseFilter()' without the > bullshit. > > For Analyzers, there is a setter. Users who want to use *OUR > ANALYZERS* with back compat, call the setter. But its not > mandatory-in-your-face-ctor. > > I am +1 to Ryan's proposal, so please look for more elaboration there. > I am -1 to putting Versions in the name of Analyzers. > > On Sun, Aug 3, 2014 at 8:21 AM, Shai Erera <[email protected]> wrote: > > Oh, I misread this part "I do think its ok to name ..." -- replaced "do" > > with "don't" :). > > > > So you say that if we have a FooAnalyzer in 4.5 and change its behavior > in > > 4.9, then we add a Foo45Analyzer as a back-compat support, and > FooAnalyzer > > in 4.9 keeps its name, but with different behavior? > > > > That means that an app who didn't read CHANGES will be broken upon > upgrade, > > but if it does read CHANGES, it at least has a way to retain desired > > behavior. > > > > So the thing now is whether FooAnalyzer is always _current_ and an app > > should choose a backwards version of it (if it wants to), vs if > FooAnalyzer > > is _always the same_, and if you want to move forward you have to > explicitly > > use a NewFooAnalyzer? > > > > Of course, when FooAnalyzer takes a Version, then an app only needs to > > change its Version CONSTANT, to get best behavior ... but as you point > out, > > seems like we failed to implement that approach in our code already, > which > > suggests this approach is not intuitive to our committers, so why do we > > expect our users to understand it ... > > > > I am +1 on either of the approaches (both get rid of Version.java). I > don't > > feel bad with asking users to read CHANGES before they upgrade, and it > does > > mean that FooAnalyzer always gives you the best behavior, which is > important > > for new users or if you always re-index. Vs the second approach which > always > > prefers backwards compatibility, and telling users to read the javadocs > (and > > CHANGES) in order to find the best version of FooAnalyzer. > > > > There is another issue w/ a global Version CONSTANT, which today we > > encourage apps to use -- if you use two analyzers, but you want to work > with > > a different Version of each (because of all sorts of reasons), having a > > global constant is bad. The explicit Foo45Analyzer (or Foo49Analyzer, > > whichever) lets you mix whichever versions that you want. > > > > Shai > > > > On Sun, Aug 3, 2014 at 3:02 PM, Robert Muir <[email protected]> wrote: > >> > >> You don't read what i wrote. > >> > >> Read it again. > >> > >> > >> On Sun, Aug 3, 2014 at 7:49 AM, Shai Erera <[email protected]> wrote: > >> > Yes, I agree that Foo49Analyzer is an odd name. Better if it was named > >> > FooAnalyzerWithNoApostrophe, and I'm fine if that Analyzer chose to > name > >> > its > >> > different versions like that. But in the absence of better naming > ideas, > >> > I > >> > proposed the Foo49Analyzer. > >> > > >> > If we already have such Analyzers, then we are in fact implementing > that > >> > approach, only didn't make that decision globally. So whether it's odd > >> > or > >> > not, let's first agree if we are willing to have these analyzers in > our > >> > code > >> > base (i.e. w/ the back-compat support). If we do, we can let each > >> > Analyzer > >> > decide on its naming. > >> > > >> > Analyzers aren't Codecs, I agree, and sticking the Lucene version in > >> > their > >> > name is probably not the best thing to do, as the Lucene version is > more > >> > associated with the index format. But if a fixed Analyzer cannot come > up > >> > w/ > >> > a better name, I think the Lucene version there is not that horrible. > >> > > >> > And, it lets us easily remove Version.java. > >> > > >> > Shai > >> > > >> > > >> > On Sun, Aug 3, 2014 at 2:43 PM, Robert Muir <[email protected]> wrote: > >> >> > >> >> On Sat, Aug 2, 2014 at 12:41 PM, Shai Erera <[email protected]> > wrote: > >> >> > Another proposal that I made on LUCENE-5859 is to get rid of > Version > >> >> > (for > >> >> > Analyzers) and follow the solution we have with Codecs. If an > >> >> > Analyzer > >> >> > changes its runtime behavior, and e.g not marked @experimental, it > >> >> > can > >> >> > create a Foo49Analyzer with the new behavior. That way, apps are > >> >> > still > >> >> > safe > >> >> > when they upgrade, since their Foo45Analyzer still exists (but > >> >> > deprecated). > >> >> > And they can always copy a Foo45Analyzer when they upgrade to > Lucene > >> >> > 6.0 > >> >> > where it no longer exists... with this approach, there's no single > >> >> > version > >> >> > across the app - it just uses the specific Analyzer impls. > >> >> > >> >> But the usability here would be really bad. > >> >> > >> >> For codecs there isn't much a better thing to name it anyway, and > >> >> codecs are super-expert to change. > >> >> > >> >> For analyzers usability is paramount. > >> >> > >> >> I do think its ok to name _backwards_ compat tokenizer/tokenfilter > >> >> classes this way. In fact its already this way in trunk for any back > >> >> compat *actually doing something*: Lucene43NgramTokenizer, > >> >> Lucene47WordDelimiterFilter. The Version parameters are just for > show, > >> >> not doing anything! > >> >> > >> >> --------------------------------------------------------------------- > >> >> To unsubscribe, e-mail: [email protected] > >> >> For additional commands, e-mail: [email protected] > >> >> > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
