I think it makes sense to allow deletions against 3.x segments?

This has always been part of our index back-compat promise.  Ie, as
long as all segments were written with version N-1, in version N of
Lucene you can open up the index and do anything (searching,
adding/updating/deleting docs, etc.) with it.

But I agree this puts a back-compat burden on Codec: it must have a
writable live docs implementation even if the rest of it is read-only.
 (Fortunately we can't change norms anymore!).

Mike McCandless

http://blog.mikemccandless.com

On Wed, Aug 29, 2012 at 5:13 AM, Uwe Schindler <[email protected]> wrote:
> Hi Robert,
>
> In my opinion, we *must* allow changing LiveDocs in any codec that is for 
> migration (also in 5.x) but forced read-only. Of course in 5.x no longer 
> needs support for 3.x indexes, but that’s another story. The reason for this 
> *requirement* (and therefore the issue is blocker) is the special case for 
> LiveDocs:
> - Every part of an index except Livedocs is unmodifiable after the segment 
> was flushed/committed/whatever. This allows to make a read-only codec like we 
> have currently!
> - BUT: The Livedocs file is special, as it can and must be modified after the 
> segment was created (e.g. for document updates). To allow correct migration 
> of older indexes, we must support this also in read-only codecs. We must also 
> add this to the "backwards guideline".
>
> The important thing is: Lucene3xCodec (or any later backwards-read-only 
> codec) should prevent creating new index segments using this codec, but the 
> above special case for deleting documents must be allowed, otherwise the 
> whole backwards strategy is useless because you have no chance to migrate 
> old, non-read-only indexes live. As noted in the issue, only allowing 
> addDocument() [because it writes to a new segment], but not allowing 
> updateDocument [because it also modifies the old index if document is 
> existing] is crazy. The problems are e.g. that the Exception may not be 
> visible on the first updateDocument call, because the upodate was in fact an 
> add. UpdateDocument could also pass if the updated document is deleting 
> another document already merged to a new 4.0 index... (Hoss explained that 
> very good)
>
> An alternative approach I would also favour is another one: If IndexWriter 
> detects that the current segment to delete documents on has no writeable 
> LiveDocs, it could trigger a merge and apply the deletion after merge. In 
> that case this segment is migrated on the fly. This is a heavier but a 
> cleaner approach than the patch on the issue. But it is trickier to do, 
> because of concurrent merges.
>
> I agree we need a better test, but I don't see any problems with the current 
> patch. Please discuss this on the issue 
> https://issues.apache.org/jira/browse/LUCENE-4339.
>
> I agree that you cannot use an index that contains 2.x segments (and for that 
> the migration tool was provided in Lucene 3.x, so you can download 3.6.1 
> core.jar and run IndexUpgrader). In that case it will throw ex on open, 
> that’s fine.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
>
>
>> -----Original Message-----
>> From: Robert Muir [mailto:[email protected]]
>> Sent: Wednesday, August 29, 2012 10:29 AM
>> To: [email protected]
>> Subject: Re: Lucene3xCodec doesn't allow deletions in 4x?
>>
>> On Wed, Aug 29, 2012 at 3:13 AM, Uwe Schindler <[email protected]> wrote:
>> > Hi,
>> >
>> > In the early days (I mean in the time when it was already read only until 
>> > we
>> refactored the IndexReader.delete()/Codec stuff), this was working, because
>> the LiveDocs were always handled in a special way. Making it now 100% read-
>> only is in my opinion very bad, as it does not allow to update documents in 
>> a 3.x
>> index anymore, so you have no chance, you must run IndexUpgrader.
>> >
>>
>> It didn't really go down like that, instead, at one point, this was working. 
>> Then
>> later as the APIs changed, it was not really feasible anymore. I added the 
>> UOE
>> for that reason. I knew exactly what the tradeoffs were when I did this.
>>
>> It just happens that now, its (seemingly) easy and feasible to re-enable 
>> have it
>> working again (due to changes in LUCENE-4050/LUCENE-4055), which is why I
>> suggested the patch. But we should think it through, be careful, and make 
>> sure
>> I'm not missing or forgetting anything: my test is very trivial.
>>
>> I don't think we should add this back compat requirement/test to
>> TestBackwardsCompatibility in trunk. if it goes in, its 4.0 only and only 
>> because
>> we are agreeing to do this on a case-by-case basis.
>>
>> In general you cannot 'seamlessly' upgrade from one version to the next. if 
>> you
>> have a 3.x index for example, it might contain some 2.x segments and be
>> working fine in 3.x, that doesn't mean 4.x will read it, etc, etc. This is 
>> nothing
>> new. So you always must take some measures.
>>
>> At one point we had decided an upgrade-tool-approach to 4.x was fine.
>> I don't think we should forget that either. We just have online back compat
>> because Mike spent a ton of time to do the work. I don't want us to require 
>> this
>> "feature" in the future. We might want to refactor codec apis or something 
>> like
>> that in 5.0 in a way where its no longer feasible again.
>>
>> --
>> lucidworks.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected] For additional
>> commands, e-mail: [email protected]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to