Jack I think you misunderstood Robert's summary: Lucene and its modules has already switched from legacy numerics to points, thanks to the hard work of multiple devs (thank you all!). This was very healthy: it uncovered exciting bugs. What Rob is describing is how we now go about iteratively removing legacy numerics in master, but that doesn't block the 6.0 release.
Most recent points-related work (geo improvements, low level BKD optimizations) have been going to 6.1 not 6.0. The 6.0 changes really have generally tapered off, and so it makes sense to release now (it made sense even a week ago). The timing here shouldn't be surprising: it was nearly 7 weeks ago that I suggested we get the ball rolling for 6.0.0. For Solr I think there are multiple options, already enumerated in this thread. As Adrien pointed out, since legacy numerics builds on postings, Solr can easily keep them "alive" even after Lucene has removed them in master, if that's really necessary. At the end of the day, whether you use Elasticsearch, Solr, or Lucene directly, you'll need to both update your code and reindex previous documents, to take advantage of points. There is no magic bullet ... it comes down to physics really ;) That said, a magical migration tool would be neat; maybe you can work up a patch? I haven't heard anyone else suggest this idea ... Mike McCandless http://blog.mikemccandless.com On Thu, Mar 31, 2016 at 11:39 AM, Jack Krupansky <[email protected]> wrote: > Robert's detailing of the remaining work to get the rest of Lucene off of > current (current release, soon to be legacy) numerics is enlightening. > > Personally, I had thought that it was Solr that was holding up an imminent > Lucene/Solr 6.0 release, but now I'm thinking: > > 1. The new "point stuff" (did I mention that I didn't like or approve of the > current name?) seems more like a work in progress... > 2. I'd label the "point stuff" as experimental for 6.0. > 3. I wouldn't hold up 6.0 for any further baking of the "point stuff" or > migration of other Lucene features off current numeric types. > 4. The rest of Lucene can be weaned off the current numerics at a more > leaisurely pace, like for 6.1 or 6.2. > 5. Once the new "point stuff" is finally full baked, and the rest of Lucene > is migrated off current numerics, and... Solr has made "point stuff" its > default numeric type (6.1 or 6.2?), AND Lucene or Solr comes up with a sound > migration plan and/or index migration tool for current numerics, only THEN > should the current numerics become deprecated. > 6. I'm not absolutely certain, but I think the 6.0 changes to Solr to use > the Lucene LegacyXXX numeric field types should be fine for an initial 6.0 > release, meaning backcompat is assured for 6.x. > 7. I'm imagining that with a manually-invoked index upgrade tool a current > (5.x) numeric field can be migrated to a "point stuff" field type. A Lucene > heavy will have to confirm that feasibility. > 8. I'm imagining that a typical Solr site would be okay with the requirement > that they have to explicitly, manually run such an index upgrade tool to > migrate from current (5.x) numerics to "point stuff". And that they could > either do that once Solr adds support for "point stuff" fields or when they > migrate from 6.x to 7.x. Bonus points if Solr can have a variation of the > index upgrade tool that discovers and upgrades all current numeric fields. > > What else? (I'll ask some questions about Elasticsearch plans in a separate > message.) > > > > > -- Jack Krupansky > > On Thu, Mar 31, 2016 at 12:31 AM, David Smiley <[email protected]> > wrote: >> >> That was an excellent summary Rob; thanks. >> Minor nit: BBoxSpatialStrategy isn't/wasn't deprecated. It was enhanced >> to use PointValues. >> >> I too would like to see the legacy numerics stay in "backwards-codecs" as >> you describe with precisionStep specified on the Analyzer. >> >> I disagree with Shawn about #5, that a user with a Solr 6.0 index must be >> able to upgrade straight to 7.0. Perhaps this has been the case for every >> major release in the past, and it would be nice if it continues if for no >> other reason than consistency. But, IMO, that's kind of cosmetic -- it >> isn't important. What matters is that an eventual 6.x release occurs that >> allows someone to upgrade to 7.0 -- that there's a path forward. And that >> one can always upgrade from one 6.x release to any greater 6.x release. >> >> Quoting Adrien: >> bq. Detour: In the future I wonder that we should consider having separate >> release cycles again. In addition to giving Solr more time to use new Lucene >> features like here, it would also remove the issue that we had when >> releasing 5.3.2 after 5.4.0, which makes perfectly sense from a Solr >> perspective but not from Lucene since it introduces blind spots in the >> testing of index backward compatibility. >> >> +1 to that! I've had that thought. It would be awesome for Solr to >> release when it feels it's right, independently of Lucene. If that's too >> difficult/problematic then perhaps keep synchronizing releases but allow >> Lucene & Solr's release version to vary. Then we'd be having a Solr 5.6 >> release here. >> >> ~ David >> >> On Wed, Mar 30, 2016 at 9:39 PM Robert Muir <[email protected]> wrote: >>> >>> On Wed, Mar 30, 2016 at 12:43 PM, Adrien Grand <[email protected]> wrote: >>> > Hi Shawn, >>> > >>> > I think marking the legacy fields/queries as deprecated in Lucene in >>> > 6.0 is >>> > the right thing to do in order to encourage users to migrate to the new >>> > points API. If Solr needs to keep them around for 7.x, it would be fine >>> > to >>> > move them to solr/ instead of lucene/ instead of a hard removal. Given >>> > that >>> > it works on top of the postings API, it would not break. >>> >>> Also see my issue (https://issues.apache.org/jira/browse/LUCENE-7075) >>> where I proposed to at least get things headed to the backwards/ jar. >>> And the uninverting issue is still being discussed. If you look at >>> linked issues you will see the deprecated encoding is involved with >>> the following modules: >>> >>> * core (not just field/query/utils classes, but stuff like >>> precisionStep in the .document api!) >>> * spatial (Deprecated GeoPoint encoding etc) >>> * spatial-extras (Deprecated Bbox encoding etc) >>> * misc (UninvertingReader) >>> * queryparser (flexible and xml) >>> * join >>> >>> The purpose of that issue is to make sure people have the stuff they >>> need to move their code of the old encoding. I personally thought this >>> would make the transition easier, and it was finding bugs/problems in >>> points and improving the apis. I imagined it would just be me, but i >>> created a ton of linked issues all up front just in case. I did not >>> think anyone else would really be excited to work on these, because >>> its not particularly exciting stuff, but thanks Nick, David, Martijn, >>> etc who did. I didn't try to plan any grandiose schemes of *actually >>> pulling the old encoding out* because this was plenty on its own. I >>> tried to work on the fieldcache only because I was talking to Tomas >>> and he mentioned it as a difficulty in cutting over solr. But I bailed >>> after encountering complexity, and don't think it is the way to go, >>> read the issue for my explanation. >>> >>> To me, this is why we have a backwards compatibility policy for N-1, >>> it has to be a volunteer thing for some of this stuff: can't all be on >>> Mike. >>> >>> I do personally think it is enough to release, "removing" or "moving" >>> deprecations is something to worry about for master branch. >>> >>> I did mention in the issue an idea for a first step would be to get >>> the core/ stuff pulled out somewhere better. Maybe the core/ stuff >>> should go to the backwards-codec jar if we can detangle the >>> deprecations from the .document api (e.g. maybe precisionStep can be a >>> parameter on a tokenizer or analyzer or something, so its a little bit >>> harder to use, but still works and not holding back core/'s .document >>> api). But what to do about the other stuff? >>> >>> If i wanted to start removing deprecations now, I would be trying to >>> just factor out the core/ NumericRangeQuery/NumericField stuff out to >>> the backwards-codec jar. I hate modules depending on other ones, I >>> really do, but just to iterate, I'd temporarily make all those other >>> modules depend on backwards-codec/ jar and then remove deprecations >>> from each one-by-one. Its too much to do all at once. I think we can >>> do it this way iteratively without breaking solr. >>> >>> If solr wants to hang on to e.g. some spatial field with old numerics >>> for an additional time (since it was still using it for 6.0), then the >>> deprecated spatial field can be moved to solr. If not, lets nuke it. >>> >>> To me this seems the least controversial path, and its something that >>> can be done iteratively. It has the downside of keeping "core" >>> deprecated legacy numerics around for an extra major release in the >>> backwards-codec jar. I think this "extra" back compat is ok in this >>> case. Uwe made clean code :) >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >> -- >> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker >> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: >> http://www.solrenterprisesearchserver.com > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
