[ https://issues.apache.org/jira/browse/LUCENE-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991186#comment-12991186 ]
Robert Muir commented on LUCENE-2906: ------------------------------------- {quote} How will this differ from the SmartChineseAnalyzer? {quote} The SmartChineseAnalyzer is for Simplified Chinese only... this is about the language-independent technique similar to what CJKAnalyzer does today. {quote} I doubt it but can this be in 3.1? {quote} Well i hate the way CJKAnalyzer treats things like supplementary characters (wrongly). This is definitely a bug, and fixed here. Part of me wants to fix this as quickly as possible. At the same time though, I would prefer 3.2... otherwise I would feel like I am rushing things. I don't think 3.2 needs to come a year after 3.1... in fact since we have a stable branch I think its stupid to make bugfix releases like 3.1.1 when we could just push out a new minor version (3.2) with bugfixes instead. The whole branch is intended to be stable changes, so I think this is better use of our time. But this is just my opinion, we can discuss it later on the list as one idea to promote more rapid releases. > Filter to process output of ICUTokenizer and create overlapping bigrams for > CJK > -------------------------------------------------------------------------------- > > Key: LUCENE-2906 > URL: https://issues.apache.org/jira/browse/LUCENE-2906 > Project: Lucene - Java > Issue Type: New Feature > Components: Analysis > Reporter: Tom Burton-West > Priority: Minor > Attachments: LUCENE-2906.patch > > > The ICUTokenizer produces unigrams for CJK. We would like to use the > ICUTokenizer but have overlapping bigrams created for CJK as in the CJK > Analyzer. This filter would take the output of the ICUtokenizer, read the > ScriptAttribute and for selected scripts (Han, Kana), would produce > overlapping bigrams. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org