[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

Uwe Schindler (JIRA) Wed, 02 Dec 2009 14:48:44 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Uwe Schindler updated LUCENE-2074:
----------------------------------

    Attachment: LUCENE-2074.patch

This patch now implements my latest proposal about the filenames. To easy see, 
what changed in the TokenizerImpls, the patch cannot be applied before doing 
some copy/rename before.

Do the following:
- svn copy StandardTokenizerImpl.* to StandardTokenizerImplOrig.*
- svn move StandardTokenizerImpl.* to StandardTokenizerImpl31.*

After that you have two copies of the original Tokenizer Impls. After that 
apply the patch. The patch clearly shows, that even after regeneration with 
Java 1.5, the original version using Java 1.4 (Unicode 3) is equal to before 
(esp. the DFA matrix). The 31-version is different (other matrix).

If we later create new versions, we can call them 32 etc.

This patch solves the JFlex 1.4 problem with needing the explicit java version. 
It currently requires the trunk version of JFlex, which would be no problem for 
this parsers (as verified, that they produce the same DFA & code for 1.4). So 
other speak up, Steven Rowe? What do you think. Only developers need the trunk 
version at the moment as the generated files are in the checkout.

Hopefully JFlex 1.5 comes out until we release 3.1, I would be happy. In later 
issues we can optimize the newly added 31 version with more unicode features, 
the Orig version stays as it is. We could also remove the special cases in the 
latest version like replaceInvlaidAcronym and so on, as this only applies for 
Version.LUCENE_2x.

> Use a separate JFlex generated Unicode 4 by Java 5 compatible 
> StandardTokenizer
> -------------------------------------------------------------------------------
>
>                 Key: LUCENE-2074
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2074
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 3.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1
>
>         Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
> LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
> LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch
>
>
> The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
> (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
> regenerate the file.
> After regeneration the Tokenizer behaves different for some characters. 
> Because of that we should only use the new TokenizerImpl when 
> Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

Reply via email to