[ https://issues.apache.org/jira/browse/LUCENE-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Rowe updated LUCENE-3747: -------------------------------- Attachment: LUCENE-3747.patch * I ran {{perl generateJavaUnicodeWordBreakTest.pl}} and deleted the previously-generated {{WordBreakTestUnicode_6_0_0.java}} in favor of the new {{WordBreakTestUnicode_6_1_0.java}}. The new full svn script is: {noformat} svn rm lucene/test-framework/src/java/org/apache/lucene/util/TestRuleIcuHack.java svn rm lucene/analysis/icu/lib/icu4j-4.8.1.1.jar.sha1 svn rm solr/contrib/extraction/lib/icu4j-4.8.1.1.jar.sha1 svn rm solr/contrib/analysis-extras/lib/icu4j-4.8.1.1.jar.sha1 svn rm lucene/analysis/common/src/test/org/apache/lucene/analysis/core/WordBreakTestUnicode_6_0_0.java {noformat} * Updated to automate the following via a new ant target {{gen-utr30-data-files}}, which {{gennorm2}} now depends on: - Download nfc.txt, nfkc.txt and nfkc_cf.txt from Unicode.org - Convert round-trip mappings in nfc.txt to one-way mappings if the right-hand side contains [:Diacritic:] - Expand UnicodeSet rules in the other norm2 files. Where I couldn't figure out a rule, I put in an annotation ("# Rule: verbatim") to leave the following mappings as-is. Robert, I couldn't discern any logic to the exceptions you made to the "[:Diacritic:]>" mappings, so I left it at the full [:Diacritic:] set; feel free to amend the rule. After these changes, I ran {{ant gennorm2}}. All tests pass. I think this is ready to go. (More work to be done on branch_4x, where the current Unicode 6.0 JFlex-based implementations need to be acessible via LUCENE_36.) > Support Unicode 6.1.0 > --------------------- > > Key: LUCENE-3747 > URL: https://issues.apache.org/jira/browse/LUCENE-3747 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis > Affects Versions: 3.5, 4.0-ALPHA > Reporter: Steven Rowe > Priority: Minor > Attachments: LUCENE-3747.patch, LUCENE-3747.patch > > > Now that Unicode 6.1.0 has been released, Lucene/Solr should support it. > JFlex trunk now supports Unicode 6.1.0. > Tasks include: > * Upgrade ICU4J to v49 (after it's released, on 2012-03-21, according to > http://icu-project.org). > * Use {{icu}} module tools to regenerate the supplementary character > additions to JFlex grammars. > * Version the JFlex grammars: copy the current implementations to > {{*Impl3<X>}}; cause the versioning tokenizer wrappers to instantiate this > version when the {{Version}} c-tor param is in the range 3.1 to the version > in which these changes are released (excluding the range endpoints); then > change the specified Unicode version in the non-versioned JFlex grammars from > 6.0 to 6.1. > * Regenerate JFlex scanners, including {{StandardTokenizerImpl}}, > {{UAX29URLEmailTokenizerImpl}}, and {{HTMLStripCharFilter}}. > * Using {{generateJavaUnicodeWordBreakTest.pl}}, generate and then run > {{WordBreakTestUnicode_6_1_0.java}} under > {{modules/analysis/common/src/test/org/apache/lucene/analysis/core/}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org