So what's the latest plan on the release? I just back ported SOLR-8888 to branch_6x. If the release is imminent I'll hold off on releasing until 6.1. But if this release continues to be delayed I'd like to backport to branch_6_0.
Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Mar 31, 2016 at 12:16 PM, Jack Krupansky <[email protected]> wrote: > Reindexing for the proposed changes in numeric fields... > > We in Solr land have this split personality about reindexing - sometimes > blithely telling users "oh, if you want to make that schema change, then > you will have to reindex (all of your data)" and then insisting on index > compatibility and that installing a new release will "not require > reindexing." When I was at Lucid with their packaged version of Solr, > automatically reindexing was a single-click operation, so it required nary > a second thought. But in raw Solr land reindexing is a cause of great > concern, anxiety, pain, and in many cases outright impossibility. The later > typically because using store="TRUE" for all of your fields is considered a > very bad thing. > > But what's the story in Elasticsearch land? First, they have this concept > of a "_source" field which can keep a fully copy of the entire original > input document, so that any document can always be fully updated and... > reindexed. They also have a scrolling feature to make it easy to bulk copy > from one index into another. Again, making it easy to reindex or migrate > from an old index to a fresh new one. > > And now, we read in a recent blog post that "Reindex is coming!", making > reindexing even easier than ever in ES: > https://www.elastic.co/blog/reindex-is-coming > > In short, reindexing is much less of a huge deal in ES land than it is in > Solr. IOW, telling users "you must reindex" is not the end of the world for > ES users. > > So, the point of all of this is to ask a question of the Elasticsearch > guys (also known as Lucene guys): How does Elasticsearch plan on dealing > with this transition from current numerics to dimensional points? The > recent blog post is here: > https://www.elastic.co/blog/lucene-points-6.0 > > It merely says "As of this writing, Elasticsearch has not yet exposed > points, but I expect that will change soon." The question I have is will ES > simply tell users "you must reindex" to use a future release of ES that is > based on Lucene 6.0, or... will ES offer some index migration tool, or... > will ES automatically and transparently upgrade existing ES numeric fields > to dimensional points, or... will ES support both numeric formats, or... > will ES have some JSON syntax for selecting between the two numeric formats? > > Not that I actually expect ES to fully disclose future product plans here, > but... do they actually have some kind of secret plan to make users fully > happy with this transition of numeric formats, or do they simply plan to > say "you must (manually) reindex", or... have they in fact not yet thought > through these migration issues? > > The real point here is that it will be senseless for the Solr guys to work > through and propose a sensible migration plan (including the decision as to > when and how the current numerics will be deprecated) if ES staff have some > hidden plan in play. If it really is simply a matter that ES hasn't thought > through the migration process or that easy reindexing is the ES answer, > then fine, but it would be helpful to state that explicitly. I'm not making > any presumption about which of these scenarios is the truth, just trying to > figure out what's really going on, and whether easy reindexing in ES is at > the root of this mad push to deprecate current numerics before dimensional > points is fully baked. > > > -- Jack Krupansky > > On Thu, Mar 31, 2016 at 11:39 AM, Jack Krupansky <[email protected] > > wrote: > >> Robert's detailing of the remaining work to get the rest of Lucene off of >> current (current release, soon to be legacy) numerics is enlightening. >> >> Personally, I had thought that it was Solr that was holding up an >> imminent Lucene/Solr 6.0 release, but now I'm thinking: >> >> 1. The new "point stuff" (did I mention that I didn't like or approve of >> the current name?) seems more like a work in progress... >> 2. I'd label the "point stuff" as experimental for 6.0. >> 3. I wouldn't hold up 6.0 for any further baking of the "point stuff" or >> migration of other Lucene features off current numeric types. >> 4. The rest of Lucene can be weaned off the current numerics at a more >> leaisurely pace, like for 6.1 or 6.2. >> 5. Once the new "point stuff" is finally full baked, and the rest of >> Lucene is migrated off current numerics, and... Solr has made "point stuff" >> its default numeric type (6.1 or 6.2?), AND Lucene or Solr comes up with a >> sound migration plan and/or index migration tool for current numerics, only >> THEN should the current numerics become deprecated. >> 6. I'm not absolutely certain, but I think the 6.0 changes to Solr to use >> the Lucene LegacyXXX numeric field types should be fine for an initial 6.0 >> release, meaning backcompat is assured for 6.x. >> 7. I'm imagining that with a manually-invoked index upgrade tool a >> current (5.x) numeric field can be migrated to a "point stuff" field type. >> A Lucene heavy will have to confirm that feasibility. >> 8. I'm imagining that a typical Solr site would be okay with the >> requirement that they have to explicitly, manually run such an index >> upgrade tool to migrate from current (5.x) numerics to "point stuff". And >> that they could either do that once Solr adds support for "point stuff" >> fields or when they migrate from 6.x to 7.x. Bonus points if Solr can have >> a variation of the index upgrade tool that discovers and upgrades all >> current numeric fields. >> >> What else? (I'll ask some questions about Elasticsearch plans in a >> separate message.) >> >> >> >> >> -- Jack Krupansky >> >> On Thu, Mar 31, 2016 at 12:31 AM, David Smiley <[email protected]> >> wrote: >> >>> That was an excellent summary Rob; thanks. >>> Minor nit: BBoxSpatialStrategy isn't/wasn't deprecated. It was enhanced >>> to use PointValues. >>> >>> I too would like to see the legacy numerics stay in "backwards-codecs" >>> as you describe with precisionStep specified on the Analyzer. >>> >>> I disagree with Shawn about #5, that a user with a Solr 6.0 index must >>> be able to upgrade straight to 7.0. Perhaps this has been the case for >>> every major release in the past, and it would be nice if it continues if >>> for no other reason than consistency. But, IMO, that's kind of cosmetic -- >>> it isn't important. What matters is that an eventual 6.x release occurs >>> that allows someone to upgrade to 7.0 -- that there's a path forward. And >>> that one can always upgrade from one 6.x release to any greater 6.x release. >>> >>> Quoting Adrien: >>> bq. Detour: In the future I wonder that we should consider having >>> separate release cycles again. In addition to giving Solr more time to use >>> new Lucene features like here, it would also remove the issue that we had >>> when releasing 5.3.2 after 5.4.0, which makes perfectly sense from a Solr >>> perspective but not from Lucene since it introduces blind spots in the >>> testing of index backward compatibility. >>> >>> +1 to that! I've had that thought. It would be awesome for Solr to >>> release when it feels it's right, independently of Lucene. If that's too >>> difficult/problematic then perhaps keep synchronizing releases but allow >>> Lucene & Solr's release version to vary. Then we'd be having a Solr 5.6 >>> release here. >>> >>> ~ David >>> >>> On Wed, Mar 30, 2016 at 9:39 PM Robert Muir <[email protected]> wrote: >>> >>>> On Wed, Mar 30, 2016 at 12:43 PM, Adrien Grand <[email protected]> >>>> wrote: >>>> > Hi Shawn, >>>> > >>>> > I think marking the legacy fields/queries as deprecated in Lucene in >>>> 6.0 is >>>> > the right thing to do in order to encourage users to migrate to the >>>> new >>>> > points API. If Solr needs to keep them around for 7.x, it would be >>>> fine to >>>> > move them to solr/ instead of lucene/ instead of a hard removal. >>>> Given that >>>> > it works on top of the postings API, it would not break. >>>> >>>> Also see my issue (https://issues.apache.org/jira/browse/LUCENE-7075) >>>> where I proposed to at least get things headed to the backwards/ jar. >>>> And the uninverting issue is still being discussed. If you look at >>>> linked issues you will see the deprecated encoding is involved with >>>> the following modules: >>>> >>>> * core (not just field/query/utils classes, but stuff like >>>> precisionStep in the .document api!) >>>> * spatial (Deprecated GeoPoint encoding etc) >>>> * spatial-extras (Deprecated Bbox encoding etc) >>>> * misc (UninvertingReader) >>>> * queryparser (flexible and xml) >>>> * join >>>> >>>> The purpose of that issue is to make sure people have the stuff they >>>> need to move their code of the old encoding. I personally thought this >>>> would make the transition easier, and it was finding bugs/problems in >>>> points and improving the apis. I imagined it would just be me, but i >>>> created a ton of linked issues all up front just in case. I did not >>>> think anyone else would really be excited to work on these, because >>>> its not particularly exciting stuff, but thanks Nick, David, Martijn, >>>> etc who did. I didn't try to plan any grandiose schemes of *actually >>>> pulling the old encoding out* because this was plenty on its own. I >>>> tried to work on the fieldcache only because I was talking to Tomas >>>> and he mentioned it as a difficulty in cutting over solr. But I bailed >>>> after encountering complexity, and don't think it is the way to go, >>>> read the issue for my explanation. >>>> >>>> To me, this is why we have a backwards compatibility policy for N-1, >>>> it has to be a volunteer thing for some of this stuff: can't all be on >>>> Mike. >>>> >>>> I do personally think it is enough to release, "removing" or "moving" >>>> deprecations is something to worry about for master branch. >>>> >>>> I did mention in the issue an idea for a first step would be to get >>>> the core/ stuff pulled out somewhere better. Maybe the core/ stuff >>>> should go to the backwards-codec jar if we can detangle the >>>> deprecations from the .document api (e.g. maybe precisionStep can be a >>>> parameter on a tokenizer or analyzer or something, so its a little bit >>>> harder to use, but still works and not holding back core/'s .document >>>> api). But what to do about the other stuff? >>>> >>>> If i wanted to start removing deprecations now, I would be trying to >>>> just factor out the core/ NumericRangeQuery/NumericField stuff out to >>>> the backwards-codec jar. I hate modules depending on other ones, I >>>> really do, but just to iterate, I'd temporarily make all those other >>>> modules depend on backwards-codec/ jar and then remove deprecations >>>> from each one-by-one. Its too much to do all at once. I think we can >>>> do it this way iteratively without breaking solr. >>>> >>>> If solr wants to hang on to e.g. some spatial field with old numerics >>>> for an additional time (since it was still using it for 6.0), then the >>>> deprecated spatial field can be moved to solr. If not, lets nuke it. >>>> >>>> To me this seems the least controversial path, and its something that >>>> can be done iteratively. It has the downside of keeping "core" >>>> deprecated legacy numerics around for an extra major release in the >>>> backwards-codec jar. I think this "extra" back compat is ok in this >>>> case. Uwe made clean code :) >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>>> -- >>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker >>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: >>> http://www.solrenterprisesearchserver.com >>> >> >> >
