Hi Robert, In my opinion, we *must* allow changing LiveDocs in any codec that is for migration (also in 5.x) but forced read-only. Of course in 5.x no longer needs support for 3.x indexes, but that’s another story. The reason for this *requirement* (and therefore the issue is blocker) is the special case for LiveDocs: - Every part of an index except Livedocs is unmodifiable after the segment was flushed/committed/whatever. This allows to make a read-only codec like we have currently! - BUT: The Livedocs file is special, as it can and must be modified after the segment was created (e.g. for document updates). To allow correct migration of older indexes, we must support this also in read-only codecs. We must also add this to the "backwards guideline".
The important thing is: Lucene3xCodec (or any later backwards-read-only codec) should prevent creating new index segments using this codec, but the above special case for deleting documents must be allowed, otherwise the whole backwards strategy is useless because you have no chance to migrate old, non-read-only indexes live. As noted in the issue, only allowing addDocument() [because it writes to a new segment], but not allowing updateDocument [because it also modifies the old index if document is existing] is crazy. The problems are e.g. that the Exception may not be visible on the first updateDocument call, because the upodate was in fact an add. UpdateDocument could also pass if the updated document is deleting another document already merged to a new 4.0 index... (Hoss explained that very good) An alternative approach I would also favour is another one: If IndexWriter detects that the current segment to delete documents on has no writeable LiveDocs, it could trigger a merge and apply the deletion after merge. In that case this segment is migrated on the fly. This is a heavier but a cleaner approach than the patch on the issue. But it is trickier to do, because of concurrent merges. I agree we need a better test, but I don't see any problems with the current patch. Please discuss this on the issue https://issues.apache.org/jira/browse/LUCENE-4339. I agree that you cannot use an index that contains 2.x segments (and for that the migration tool was provided in Lucene 3.x, so you can download 3.6.1 core.jar and run IndexUpgrader). In that case it will throw ex on open, that’s fine. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [email protected] > -----Original Message----- > From: Robert Muir [mailto:[email protected]] > Sent: Wednesday, August 29, 2012 10:29 AM > To: [email protected] > Subject: Re: Lucene3xCodec doesn't allow deletions in 4x? > > On Wed, Aug 29, 2012 at 3:13 AM, Uwe Schindler <[email protected]> wrote: > > Hi, > > > > In the early days (I mean in the time when it was already read only until we > refactored the IndexReader.delete()/Codec stuff), this was working, because > the LiveDocs were always handled in a special way. Making it now 100% read- > only is in my opinion very bad, as it does not allow to update documents in a > 3.x > index anymore, so you have no chance, you must run IndexUpgrader. > > > > It didn't really go down like that, instead, at one point, this was working. > Then > later as the APIs changed, it was not really feasible anymore. I added the UOE > for that reason. I knew exactly what the tradeoffs were when I did this. > > It just happens that now, its (seemingly) easy and feasible to re-enable have > it > working again (due to changes in LUCENE-4050/LUCENE-4055), which is why I > suggested the patch. But we should think it through, be careful, and make sure > I'm not missing or forgetting anything: my test is very trivial. > > I don't think we should add this back compat requirement/test to > TestBackwardsCompatibility in trunk. if it goes in, its 4.0 only and only > because > we are agreeing to do this on a case-by-case basis. > > In general you cannot 'seamlessly' upgrade from one version to the next. if > you > have a 3.x index for example, it might contain some 2.x segments and be > working fine in 3.x, that doesn't mean 4.x will read it, etc, etc. This is > nothing > new. So you always must take some measures. > > At one point we had decided an upgrade-tool-approach to 4.x was fine. > I don't think we should forget that either. We just have online back compat > because Mike spent a ton of time to do the work. I don't want us to require > this > "feature" in the future. We might want to refactor codec apis or something > like > that in 5.0 in a way where its no longer feasible again. > > -- > lucidworks.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] For additional > commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
