[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uwe Schindler updated LUCENE-2074: ---------------------------------- Attachment: LUCENE-2074.patch This patch now implements my latest proposal about the filenames. To easy see, what changed in the TokenizerImpls, the patch cannot be applied before doing some copy/rename before. Do the following: - svn copy StandardTokenizerImpl.* to StandardTokenizerImplOrig.* - svn move StandardTokenizerImpl.* to StandardTokenizerImpl31.* After that you have two copies of the original Tokenizer Impls. After that apply the patch. The patch clearly shows, that even after regeneration with Java 1.5, the original version using Java 1.4 (Unicode 3) is equal to before (esp. the DFA matrix). The 31-version is different (other matrix). If we later create new versions, we can call them 32 etc. This patch solves the JFlex 1.4 problem with needing the explicit java version. It currently requires the trunk version of JFlex, which would be no problem for this parsers (as verified, that they produce the same DFA & code for 1.4). So other speak up, Steven Rowe? What do you think. Only developers need the trunk version at the moment as the generated files are in the checkout. Hopefully JFlex 1.5 comes out until we release 3.1, I would be happy. In later issues we can optimize the newly added 31 version with more unicode features, the Orig version stays as it is. We could also remove the special cases in the latest version like replaceInvlaidAcronym and so on, as this only applies for Version.LUCENE_2x. > Use a separate JFlex generated Unicode 4 by Java 5 compatible > StandardTokenizer > ------------------------------------------------------------------------------- > > Key: LUCENE-2074 > URL: https://issues.apache.org/jira/browse/LUCENE-2074 > Project: Lucene - Java > Issue Type: Bug > Affects Versions: 3.0 > Reporter: Uwe Schindler > Assignee: Uwe Schindler > Fix For: 3.1 > > Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, > LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, > LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch > > > The current trunk version of StandardTokenizerImpl was generated by Java 1.4 > (according to the warning). In Java 3.0 we switch to Java 1.5, so we should > regenerate the file. > After regeneration the Tokenizer behaves different for some characters. > Because of that we should only use the new TokenizerImpl when > Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org