[ 
https://issues.apache.org/jira/browse/LUCENE-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991186#comment-12991186
 ] 

Robert Muir commented on LUCENE-2906:
-------------------------------------

{quote}
How will this differ from the SmartChineseAnalyzer?
{quote}

The SmartChineseAnalyzer is for Simplified Chinese only... this is about the 
language-independent technique similar to what CJKAnalyzer does today.

{quote}
I doubt it but can this be in 3.1?
{quote}

Well i hate the way CJKAnalyzer treats things like supplementary characters 
(wrongly).
This is definitely a bug, and fixed here. Part of me wants to fix this as 
quickly as possible.

At the same time though, I would prefer 3.2... otherwise I would feel like I am 
rushing things.

I don't think 3.2 needs to come a year after 3.1... in fact since we have a 
stable branch I think its
stupid to make bugfix releases like 3.1.1 when we could just push out a new 
minor version (3.2) with
bugfixes instead. The whole branch is intended to be stable changes, so I think 
this is better use
of our time. But this is just my opinion, we can discuss it later on the list 
as one idea to promote 
more rapid releases.


> Filter to process output of ICUTokenizer and create overlapping bigrams for 
> CJK 
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-2906
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2906
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Analysis
>            Reporter: Tom Burton-West
>            Priority: Minor
>         Attachments: LUCENE-2906.patch
>
>
> The ICUTokenizer produces unigrams for CJK. We would like to use the 
> ICUTokenizer but have overlapping bigrams created for CJK as in the CJK 
> Analyzer.  This filter would take the output of the ICUtokenizer, read the 
> ScriptAttribute and for selected scripts (Han, Kana), would produce 
> overlapping bigrams.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to