[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

Uwe Schindler (JIRA) Wed, 02 Dec 2009 08:58:45 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Uwe Schindler updated LUCENE-2074:
----------------------------------

    Attachment: LUCENE-2074.patch

Attached is a patch.

To regenerate the parsers you can run "ant jflex", but the sysprop jflex.home 
has to point to the JFlex trunk checkout, where mvn install has run before (I 
changed build.xml and common-build.xml to work correctly).

I added a test that tests the tokenization in Java 1.4 (Version.LUCENE_30) and 
Java 1.5 mode (Version.LUCENE_CURRENT). There are two JFlex files, one that is 
Unicode 3.0 (Java 1.4.1) compatible (and even when run in JDK 5, it produces 
now an Java 1.4 compatible parser!) and one with unicode version 4.0 (Java 5).

> Use a separate JFlex generated Unicode 4 by Java 5 compatible 
> StandardTokenizer
> -------------------------------------------------------------------------------
>
>                 Key: LUCENE-2074
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2074
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 3.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1
>
>         Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
> LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
> LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch
>
>
> The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
> (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
> regenerate the file.
> After regeneration the Tokenizer behaves different for some characters. 
> Because of that we should only use the new TokenizerImpl when 
> Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

Reply via email to