Another proposal that I made on LUCENE-5859 is to get rid of Version (for
Analyzers) and follow the solution we have with Codecs. If an Analyzer
changes its runtime behavior, and e.g not marked @experimental, it can
create a Foo49Analyzer with the new behavior. That way, apps are still safe
when they upgrade, since their Foo45Analyzer still exists (but deprecated).
And they can always copy a Foo45Analyzer when they upgrade to Lucene 6.0
where it no longer exists... with this approach, there's no single version
across the app - it just uses the specific Analyzer impls.

Anyway, +1 to make bugfix release first class citizens. That's also the
only way to make sure we support back compat if a bug is fixed in an
Analyzer in e.g. 4.8.2.

Shai


On Sat, Aug 2, 2014 at 11:54 AM, Michael McCandless <
[email protected]> wrote:

> +1, this sounds like a great solution.  It simplifies the APIs (no
> more required Version to Analyzer), it consolidates the version logic
> to a "single source", dot releases are first class.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Aug 1, 2014 at 7:47 PM, Ryan Ernst <[email protected]> wrote:
> > There has been a lot of heated discussion recently about version
> > tracking in Lucene [1] [2].  I wanted to have a fresh discussion
> > outside of jira to give a full description of the current state of
> > things, the problems I have heard, and a proposed solution.
> >
> > CURRENT
> >
> > We have 2 pieces of code that handle “versioning.”  The first is
> > Constants.LUCENE_MAIN_VERSION, which is written to the SegmentsInfo
> > for each segment.  This is a string version which is used to detect
> > when the current version of lucene is newer than the version that
> > wrote the segment (and how/if an upgrade to to a newer codec should be
> > done). There is some complication with the “display” version and
> > non-display version, which are distinguished by whether the version of
> > lucene was an official release, or an alpha/beta version (which was
> > added specifically for the 4.0.0 release ramp up).  This string
> > version also has its own parsing and comparison methods.
> >
> > The second piece of versioning code is in Version.java, which is an
> > enum used by analyzers to maintain backwards compatible behavior given
> > a specific version of lucene.  The enum only contains values for dot
> > releases of lucene, not bug fixes (which was what spurred the recent
> > discussions over version). Analyzers’ constructors take a required
> > Version parameter, which is only actually used by the few analyzers
> > that have changed behavior recently.  Version.java contains a separate
> > version parsing and comparison methods.
> >
> >
> > CONCERNS
> >
> > * Having 2 different pieces of code that do very similar things is
> > confusing for development.  Very few developers appear to really
> > understand the current system (especially when trying to understand
> > the alpha/beta setup).
> >
> > * Users are generally confused by the Version passed to analyzers: I
> > know I was when I first started working with Lucene, and
> > Version.CURRENT_VERSION was deprecated because users used that without
> > understanding the implications.
> >
> > * Most analyzers currently have dead code constructors, since they
> > never make use of Version.  There are also a lot of classes used by
> > analyzers which contain similar dead code.
> >
> > * Backwards compatibility needs to be handled in some fashion, to
> > ensure users have a path to upgrade from one version of lucene to
> > another, without requiring immediate re-indexing.
> >
> >
> > PROPOSAL
> >
> > I propose the following:
> >
> > * Consolidate all version related enumeration, including reading and
> > writing string versions, into Version.java.  Have a static method that
> > returns the current lucene version (replacing
> > Constants.LUCENE_MAIN_VERSION).
> >
> > * Make bug fix releases first class in the enumeration, so that they
> > can be distinguished for any compatibility issues that come up.
> >
> > * Remove all snapshot/alpha/beta versioning logic.  Alpha/beta was
> > really only necessary for 4.0 because of the extreme changes that were
> > being made.  The system is much more stable now, and 5.0 should not
> > require preview releases, IMO.  I don’t think snapshots should be a
> > concern because any user building an index from an unreleased build
> > (which they built themselves) is just asking for trouble.  They do so
> > at their own risk (of figuring out how to upgrade their indexes if
> > they are not trash-able).  Backwards compatibility can be handled by
> > adding the alpha/beta/final versions of 4.0 to the enum (and special
> > parsing logic for this).  If lucene changes so much that we need
> > alpha/beta type discrimination in the future, we can revisit the
> > system if simply having extra versions in the enum won't work.
> >
> > * Analyzers constructors should have Version removed, and a setter
> > should be added which allows production users to set the version used.
> > This way any analyzers can still use version if it is set to something
> > other than current (which would be the default), but users simply
> > prototyping do not need to worry about it.
> >
> > * Classes that analyzers use, which take Version, should have Version
> > removed, and the analyzers should choose which settings/variants of
> > those classes to use based on the version they have set. In other
> > words, all version variant logic should be contained within the
> > analyzers.  For example, Lucene47WordDelimiterFilter, or
> > StandardAnalyzer can take the unicode version.
> > Factories could still take Version (e.g. TokenizerFactory,
> > TokenFilterFactory, etc) to produce the correct component (so nothing
> > will change for solr in this regard).
> >
> > I’m sure not everyone will be happy with what I have proposed, but I’m
> > hoping we can work out a solution together, and then implement in a
> > team-like fashion, the way I have seen the community work in the past,
> > and I hope to see again in the future.
> >
> > Thanks
> > Ryan
> >
> > [1] https://issues.apache.org/jira/browse/LUCENE-5850
> > [2] https://issues.apache.org/jira/browse/LUCENE-5859
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to