Re: deprecating Versions

DM Smith Mon, 29 Nov 2010 11:15:26 -0800

On 11/29/2010 01:03 PM, Robert Muir wrote:

On Mon, Nov 29, 2010 at 12:51 PM, DM Smith<[email protected]>  wrote:

I'd have to look to be sure: IIRC, Turkish was one. The treatment of 'i' was
buggy. Russian had it's own encoding that was replaced with UTF-8. The
QueryParser had bug fixes. There is some effort to migrate away from stemmer
to snowball, but at least the Dutch one is not "identical".

but none of these broke backwards compatibility, they all respect the
Version constant!
The SnowballAnalyzer respects the version constant for the buggy
turkish lowercasing! If you use VERSION.LUCENE_30 (or less) it wrongly
lowercases so you get your old buggy behavior.

Even the old buggy Dutch stemmer is still there, and if you use
DutchAnalyzer(Version.LUCENE_30) (or less) it stems incorrectly so you
get your old buggy behavior!

The russian was the same way, same with the QueryParser.

So I'm sorry, I am left confused about where the backwards breaks are?

Strictly speaking there are none, in the present. The user of Lucene canchoose to break compatibility and retain old (and in these cases, buggy)behavior. This maintains Lucene's bw-compat policy.

This thread talked about removing the Version constants in the future? Iwent back and re-read the thread. Perhaps I misunderstood. I saw severalthoughts:Deprecate version constants 1 version back and remove those 2 versionsback.

Remove all version constants and use versioned jars instead.

If there is no way to select a prior behavior except to select a singlejar that had lots of analyzers (or analyzer parts) in it, then I'm stuckwith older code that is perhaps buggy. I can't pick a later analyzer forEnglish and an earlier, buggy analyzer for Turkish. I have to get all ofthem from one jar. (Unless we get into renaming packages and/orclasses). So I can't get some improvements while ignoring others.

I think there is a problem with deprecating and removing constants too.In trunk, which will be 4.0, it needs to be able to read and/or upgrade2.x indexes. From an analyzer perspective, an index is invalid if theanalyzer would produce a different token stream for the same input. Ifthe 2.x version constants are gone, then the index built with 2.xversion constants is no longer valid. (It might be valid, but how canone have any confidence of that?) Upgrading the index to the newinternal format cannot change this. A buggy lowercase Turkish word willstill be buggy after upgrade. (This is a 3.0 version constant that in5.0 will still need to be around).

We either need more frequent releases (forcing the issue earlier andeliminating stale code earlier) or something's gotta give.

That said. As a user, I don't care any more. I'll give. The benefit of abetter index outweighs backward compatibility for me.


-- DM


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: deprecating Versions

Reply via email to