Re: 6.0 Release

Jack Krupansky Thu, 31 Mar 2016 09:17:30 -0700

Reindexing for the proposed changes in numeric fields...

We in Solr land have this split personality about reindexing - sometimes
blithely telling users "oh, if you want to make that schema change, then
you will have to reindex (all of your data)" and then insisting on index
compatibility and that installing a new release will "not require
reindexing." When I was at Lucid with their packaged version of Solr,
automatically reindexing was a single-click operation, so it required nary
a second thought. But in raw Solr land reindexing is a cause of great
concern, anxiety, pain, and in many cases outright impossibility. The later
typically because using store="TRUE" for all of your fields is considered a
very bad thing.


But what's the story in Elasticsearch land? First, they have this concept
of a "_source" field which can keep a fully copy of the entire original
input document, so that any document can always be fully updated and...
reindexed. They also have a scrolling feature to make it easy to bulk copy
from one index into another. Again, making it easy to reindex or migrate
from an old index to a fresh new one.

And now, we read in a recent blog post that "Reindex is coming!", making
reindexing even easier than ever in ES:
https://www.elastic.co/blog/reindex-is-coming

In short, reindexing is much less of a huge deal in ES land than it is in
Solr. IOW, telling users "you must reindex" is not the end of the world for
ES users.

So, the point of all of this is to ask a question of the Elasticsearch guys
(also known as Lucene guys): How does Elasticsearch plan on dealing with
this transition from current numerics to dimensional points? The recent
blog post is here:
https://www.elastic.co/blog/lucene-points-6.0

It merely says "As of this writing, Elasticsearch has not yet exposed
points, but I expect that will change soon." The question I have is will ES
simply tell users "you must reindex" to use a future release of ES that is
based on Lucene 6.0, or... will ES offer some index migration tool, or...
will ES automatically and transparently upgrade existing ES numeric fields
to dimensional points, or... will ES support both numeric formats, or...
will ES have some JSON syntax for selecting between the two numeric formats?

Not that I actually expect ES to fully disclose future product plans here,
but... do they actually have some kind of secret plan to make users fully
happy with this transition of numeric formats, or do they simply plan to
say "you must (manually) reindex", or... have they in fact not yet thought
through these migration issues?

The real point here is that it will be senseless for the Solr guys to work
through and propose a sensible migration plan (including the decision as to
when and how the current numerics will be deprecated) if ES staff have some
hidden plan in play. If it really is simply a matter that ES hasn't thought
through the migration process or that easy reindexing is the ES answer,
then fine, but it would be helpful to state that explicitly. I'm not making
any presumption about which of these scenarios is the truth, just trying to
figure out what's really going on, and whether easy reindexing in ES is at
the root of this mad push to deprecate current numerics before dimensional
points is fully baked.


-- Jack Krupansky

On Thu, Mar 31, 2016 at 11:39 AM, Jack Krupansky <[email protected]>
wrote:

> Robert's detailing of the remaining work to get the rest of Lucene off of
> current (current release, soon to be legacy) numerics is enlightening.
>
> Personally, I had thought that it was Solr that was holding up an imminent
> Lucene/Solr 6.0 release, but now I'm thinking:
>
> 1. The new "point stuff" (did I mention that I didn't like or approve of
> the current name?) seems more like a work in progress...
> 2. I'd label the "point stuff" as experimental for 6.0.
> 3. I wouldn't hold up 6.0 for any further baking of the "point stuff" or
> migration of other Lucene features off current numeric types.
> 4. The rest of Lucene can be weaned off the current numerics at a more
> leaisurely pace, like for 6.1 or 6.2.
> 5. Once the new "point stuff" is finally full baked, and the rest of
> Lucene is migrated off current numerics, and... Solr has made "point stuff"
> its default numeric type (6.1 or 6.2?), AND Lucene or Solr comes up with a
> sound migration plan and/or index migration tool for current numerics, only
> THEN should the current numerics become deprecated.
> 6. I'm not absolutely certain, but I think the 6.0 changes to Solr to use
> the Lucene LegacyXXX numeric field types should be fine for an initial 6.0
> release, meaning backcompat is assured for 6.x.
> 7. I'm imagining that with a manually-invoked index upgrade tool a current
> (5.x) numeric field can be migrated to a "point stuff" field type. A Lucene
> heavy will have to confirm that feasibility.
> 8. I'm imagining that a typical Solr site would be okay with the
> requirement that they have to explicitly, manually run such an index
> upgrade tool to migrate from current (5.x) numerics to "point stuff". And
> that they could either do that once Solr adds support for "point stuff"
> fields or when they migrate from 6.x to 7.x. Bonus points if Solr can have
> a variation of the index upgrade tool that discovers and upgrades all
> current numeric fields.
>
> What else? (I'll ask some questions about Elasticsearch plans in a
> separate message.)
>
>
>
>
> -- Jack Krupansky
>
> On Thu, Mar 31, 2016 at 12:31 AM, David Smiley <[email protected]>
> wrote:
>
>> That was an excellent summary Rob; thanks.
>> Minor nit: BBoxSpatialStrategy isn't/wasn't deprecated.  It was enhanced
>> to use PointValues.
>>
>> I too would like to see the legacy numerics stay in "backwards-codecs" as
>> you describe with precisionStep specified on the Analyzer.
>>
>> I disagree with Shawn about #5, that a user with a Solr 6.0 index must be
>> able to upgrade straight to 7.0.  Perhaps this has been the case for every
>> major release in the past, and it would be nice if it continues if for no
>> other reason than consistency.  But, IMO, that's kind of cosmetic -- it
>> isn't important.  What matters is that an eventual 6.x release occurs that
>> allows someone to upgrade to 7.0 -- that there's a path forward.  And that
>> one can always upgrade from one 6.x release to any greater 6.x release.
>>
>> Quoting Adrien:
>> bq. Detour: In the future I wonder that we should consider having
>> separate release cycles again. In addition to giving Solr more time to use
>> new Lucene features like here, it would also remove the issue that we had
>> when releasing 5.3.2 after 5.4.0, which makes perfectly sense from a Solr
>> perspective but not from Lucene since it introduces blind spots in the
>> testing of index backward compatibility.
>>
>> +1 to that!  I've had that thought.  It would be awesome for Solr to
>> release when it feels it's right, independently of Lucene.  If that's too
>> difficult/problematic then perhaps keep synchronizing releases but allow
>> Lucene & Solr's release version to vary.    Then we'd be having a Solr 5.6
>> release here.
>>
>> ~ David
>>
>> On Wed, Mar 30, 2016 at 9:39 PM Robert Muir <[email protected]> wrote:
>>
>>> On Wed, Mar 30, 2016 at 12:43 PM, Adrien Grand <[email protected]>
>>> wrote:
>>> > Hi Shawn,
>>> >
>>> > I think marking the legacy fields/queries as deprecated in Lucene in
>>> 6.0 is
>>> > the right thing to do in order to encourage users to migrate to the new
>>> > points API. If Solr needs to keep them around for 7.x, it would be
>>> fine to
>>> > move them to solr/ instead of lucene/ instead of a hard removal. Given
>>> that
>>> > it works on top of the postings API, it would not break.
>>>
>>> Also see my issue (https://issues.apache.org/jira/browse/LUCENE-7075)
>>> where I proposed to at least get things headed to the backwards/ jar.
>>> And the uninverting issue is still being discussed. If you look at
>>> linked issues you will see the deprecated encoding is involved with
>>> the following modules:
>>>
>>> * core (not just field/query/utils classes, but stuff like
>>> precisionStep in the .document api!)
>>> * spatial (Deprecated GeoPoint encoding etc)
>>> * spatial-extras (Deprecated Bbox encoding etc)
>>> * misc (UninvertingReader)
>>> * queryparser (flexible and xml)
>>> * join
>>>
>>> The purpose of that issue is to make sure people have the stuff they
>>> need to move their code of the old encoding. I personally thought this
>>> would make the transition easier, and it was finding bugs/problems in
>>> points and improving the apis. I imagined it would just be me, but i
>>> created a ton of linked issues all up front just in case. I did not
>>> think anyone else would really be excited to work on these, because
>>> its not particularly exciting stuff, but thanks Nick, David, Martijn,
>>> etc who did. I didn't try to plan any grandiose schemes of *actually
>>> pulling the old encoding out* because this was plenty on its own. I
>>> tried to work on the fieldcache only because I was talking to Tomas
>>> and he mentioned it as a difficulty in cutting over solr. But I bailed
>>> after encountering complexity, and don't think it is the way to go,
>>> read the issue for my explanation.
>>>
>>> To me, this is why we have a backwards compatibility policy for N-1,
>>> it has to be a volunteer thing for some of this stuff: can't all be on
>>> Mike.
>>>
>>> I do personally think it is enough to release, "removing" or "moving"
>>> deprecations is something to worry about for master branch.
>>>
>>> I did mention in the issue an idea for a first step would be to get
>>> the core/ stuff pulled out somewhere better.  Maybe the core/ stuff
>>> should go to the backwards-codec jar if we can detangle the
>>> deprecations from the .document api (e.g. maybe precisionStep can be a
>>> parameter on a tokenizer or analyzer or something, so its a little bit
>>> harder to use, but still works and not holding back core/'s .document
>>> api). But what to do about the other stuff?
>>>
>>> If i wanted to start removing deprecations now, I would be trying to
>>> just factor out the core/ NumericRangeQuery/NumericField stuff out to
>>> the backwards-codec jar. I hate modules depending on other ones, I
>>> really do, but just to iterate, I'd temporarily make all those other
>>> modules depend on backwards-codec/ jar and then remove deprecations
>>> from each one-by-one. Its too much to do all at once. I think we can
>>> do it this way iteratively without breaking solr.
>>>
>>> If solr wants to hang on to e.g. some spatial field with old numerics
>>> for an additional time (since it was still using it for 6.0), then the
>>> deprecated spatial field can be moved to solr. If not, lets nuke it.
>>>
>>> To me this seems the least controversial path, and its something that
>>> can be done iteratively. It has the downside of keeping "core"
>>> deprecated legacy numerics around for an extra major release in the
>>> backwards-codec jar. I think this "extra" back compat is ok in this
>>> case. Uwe made clean code :)
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>> --
>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> http://www.solrenterprisesearchserver.com
>>
>
>

Re: 6.0 Release

Reply via email to