[
https://issues.apache.org/jira/browse/LUCENE-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991186#comment-12991186
]
Robert Muir commented on LUCENE-2906:
-------------------------------------
{quote}
How will this differ from the SmartChineseAnalyzer?
{quote}
The SmartChineseAnalyzer is for Simplified Chinese only... this is about the
language-independent technique similar to what CJKAnalyzer does today.
{quote}
I doubt it but can this be in 3.1?
{quote}
Well i hate the way CJKAnalyzer treats things like supplementary characters
(wrongly).
This is definitely a bug, and fixed here. Part of me wants to fix this as
quickly as possible.
At the same time though, I would prefer 3.2... otherwise I would feel like I am
rushing things.
I don't think 3.2 needs to come a year after 3.1... in fact since we have a
stable branch I think its
stupid to make bugfix releases like 3.1.1 when we could just push out a new
minor version (3.2) with
bugfixes instead. The whole branch is intended to be stable changes, so I think
this is better use
of our time. But this is just my opinion, we can discuss it later on the list
as one idea to promote
more rapid releases.
> Filter to process output of ICUTokenizer and create overlapping bigrams for
> CJK
> --------------------------------------------------------------------------------
>
> Key: LUCENE-2906
> URL: https://issues.apache.org/jira/browse/LUCENE-2906
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Analysis
> Reporter: Tom Burton-West
> Priority: Minor
> Attachments: LUCENE-2906.patch
>
>
> The ICUTokenizer produces unigrams for CJK. We would like to use the
> ICUTokenizer but have overlapping bigrams created for CJK as in the CJK
> Analyzer. This filter would take the output of the ICUtokenizer, read the
> ScriptAttribute and for selected scripts (Han, Kana), would produce
> overlapping bigrams.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]