[ https://issues.apache.org/jira/browse/LUCENE-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094902#comment-13094902 ]
Tom Burton-West commented on LUCENE-2906: ----------------------------------------- Any chance this might get implemented for 3.4? > Filter to process output of ICUTokenizer and create overlapping bigrams for > CJK > -------------------------------------------------------------------------------- > > Key: LUCENE-2906 > URL: https://issues.apache.org/jira/browse/LUCENE-2906 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis > Reporter: Tom Burton-West > Priority: Minor > Fix For: 3.4, 4.0 > > Attachments: LUCENE-2906.patch > > > The ICUTokenizer produces unigrams for CJK. We would like to use the > ICUTokenizer but have overlapping bigrams created for CJK as in the CJK > Analyzer. This filter would take the output of the ICUtokenizer, read the > ScriptAttribute and for selected scripts (Han, Kana), would produce > overlapping bigrams. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org