[ 
https://issues.apache.org/jira/browse/SOLR-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419330#comment-13419330
 ] 

Robert Muir commented on SOLR-3653:
-----------------------------------

{quote}
Because parts of it are also words, which should be searchable.
{quote}

Says who? There is no real word boundaries in this language. 

If you want to start indexing individual characters, just use StandardTokenizer.

None of your examples are "failures" of this tokenizer. This is what it has in 
its dictionary!
                
> Support Smart Simplified Chinese in Solr - include clean-up bigramming filter
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-3653
>                 URL: https://issues.apache.org/jira/browse/SOLR-3653
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Lance Norskog
>         Attachments: SOLR-3653.patch, SmartChineseType.pdf
>
>
> The "Smart" Simplified Chinese toolkit in lucene/analysis/smartcn has no Solr 
> factories. Also, since it is a statistical algorithm, it is not perfect.
> This patch supplies factories and a schema.xml type for the existing Lucene 
> Smart Chinese implementation, and includes a "fixup" class to handle the 
> occasional mistake made by the Smart Chinese implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to