[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

Michael McCandless (JIRA) Mon, 16 Nov 2009 14:04:04 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778583#action_12778583
 ]


Michael McCandless commented on LUCENE-2074:
--------------------------------------------

bq. I feel bad about this whole Version Enum

I think this is simply a sign of 1) Lucene's maturity, and 2) that we
take back compat seriously.  I actually think we don't yet use it
enough...

EG, LUCENE-1255 was one nasty bug, that we at first fixed, but then
rolled back, because of the back-compat break.  Then it was
rediscovered and opened again, as LUCENE-1542, when we decided it was
nasty enough to just fix it and put an entry in CHANGES that you
hopefully will read.

But it really is a back-compat break, in that apps could quite easily
be relying on the buggy behavior.  I think that bug would have been a
good reason to add Version to IW.

Fixing invalid acronyms in StandardAnalyzer, but then leaving it
broken by default, was the original "inspiration" for Version.  We
shouldn't every fix a bug, but then be forced to leave the bug in
place due to back compat.

Version lets us fix bugs, change defaults for the better, etc., w/o
compromising on our back compat policy.  It's an impoprtant
tool...

bq. The problem is, these are the hard backwards compat situations that it was 
created for - the whole analyzer package was/is bound to have lots of Version 
stuff.

Right, I think Version will especially find its way into changes that
alter what's indexed (analyzers, bugs like LUCENE-1255, etc.).

> Use a separate JFlex generated Unicode 4 by Java 5 compatible 
> StandardTokenizer
> -------------------------------------------------------------------------------
>
>                 Key: LUCENE-2074
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2074
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 3.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.0
>
>         Attachments: jflexwarning.patch, LUCENE-2074.patch, LUCENE-2074.patch
>
>
> The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
> (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
> regenerate the file.
> After regeneration the Tokenizer behaves different for some characters. 
> Because of that we should only use the new TokenizerImpl when 
> Version.LUCENE_30 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

Reply via email to