I think it makes sense to allow deletions against 3.x segments? This has always been part of our index back-compat promise. Ie, as long as all segments were written with version N-1, in version N of Lucene you can open up the index and do anything (searching, adding/updating/deleting docs, etc.) with it.
But I agree this puts a back-compat burden on Codec: it must have a writable live docs implementation even if the rest of it is read-only. (Fortunately we can't change norms anymore!). Mike McCandless http://blog.mikemccandless.com On Wed, Aug 29, 2012 at 5:13 AM, Uwe Schindler <[email protected]> wrote: > Hi Robert, > > In my opinion, we *must* allow changing LiveDocs in any codec that is for > migration (also in 5.x) but forced read-only. Of course in 5.x no longer > needs support for 3.x indexes, but that’s another story. The reason for this > *requirement* (and therefore the issue is blocker) is the special case for > LiveDocs: > - Every part of an index except Livedocs is unmodifiable after the segment > was flushed/committed/whatever. This allows to make a read-only codec like we > have currently! > - BUT: The Livedocs file is special, as it can and must be modified after the > segment was created (e.g. for document updates). To allow correct migration > of older indexes, we must support this also in read-only codecs. We must also > add this to the "backwards guideline". > > The important thing is: Lucene3xCodec (or any later backwards-read-only > codec) should prevent creating new index segments using this codec, but the > above special case for deleting documents must be allowed, otherwise the > whole backwards strategy is useless because you have no chance to migrate > old, non-read-only indexes live. As noted in the issue, only allowing > addDocument() [because it writes to a new segment], but not allowing > updateDocument [because it also modifies the old index if document is > existing] is crazy. The problems are e.g. that the Exception may not be > visible on the first updateDocument call, because the upodate was in fact an > add. UpdateDocument could also pass if the updated document is deleting > another document already merged to a new 4.0 index... (Hoss explained that > very good) > > An alternative approach I would also favour is another one: If IndexWriter > detects that the current segment to delete documents on has no writeable > LiveDocs, it could trigger a merge and apply the deletion after merge. In > that case this segment is migrated on the fly. This is a heavier but a > cleaner approach than the patch on the issue. But it is trickier to do, > because of concurrent merges. > > I agree we need a better test, but I don't see any problems with the current > patch. Please discuss this on the issue > https://issues.apache.org/jira/browse/LUCENE-4339. > > I agree that you cannot use an index that contains 2.x segments (and for that > the migration tool was provided in Lucene 3.x, so you can download 3.6.1 > core.jar and run IndexUpgrader). In that case it will throw ex on open, > that’s fine. > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: [email protected] > > >> -----Original Message----- >> From: Robert Muir [mailto:[email protected]] >> Sent: Wednesday, August 29, 2012 10:29 AM >> To: [email protected] >> Subject: Re: Lucene3xCodec doesn't allow deletions in 4x? >> >> On Wed, Aug 29, 2012 at 3:13 AM, Uwe Schindler <[email protected]> wrote: >> > Hi, >> > >> > In the early days (I mean in the time when it was already read only until >> > we >> refactored the IndexReader.delete()/Codec stuff), this was working, because >> the LiveDocs were always handled in a special way. Making it now 100% read- >> only is in my opinion very bad, as it does not allow to update documents in >> a 3.x >> index anymore, so you have no chance, you must run IndexUpgrader. >> > >> >> It didn't really go down like that, instead, at one point, this was working. >> Then >> later as the APIs changed, it was not really feasible anymore. I added the >> UOE >> for that reason. I knew exactly what the tradeoffs were when I did this. >> >> It just happens that now, its (seemingly) easy and feasible to re-enable >> have it >> working again (due to changes in LUCENE-4050/LUCENE-4055), which is why I >> suggested the patch. But we should think it through, be careful, and make >> sure >> I'm not missing or forgetting anything: my test is very trivial. >> >> I don't think we should add this back compat requirement/test to >> TestBackwardsCompatibility in trunk. if it goes in, its 4.0 only and only >> because >> we are agreeing to do this on a case-by-case basis. >> >> In general you cannot 'seamlessly' upgrade from one version to the next. if >> you >> have a 3.x index for example, it might contain some 2.x segments and be >> working fine in 3.x, that doesn't mean 4.x will read it, etc, etc. This is >> nothing >> new. So you always must take some measures. >> >> At one point we had decided an upgrade-tool-approach to 4.x was fine. >> I don't think we should forget that either. We just have online back compat >> because Mike spent a ton of time to do the work. I don't want us to require >> this >> "feature" in the future. We might want to refactor codec apis or something >> like >> that in 5.0 in a way where its no longer feasible again. >> >> -- >> lucidworks.com >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] For additional >> commands, e-mail: [email protected] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
