deyinchen created LUCENE-6111:
---------------------------------

             Summary: Add Chinese Word Segmentation Analyzer with Ansj 
implementation
                 Key: LUCENE-6111
                 URL: https://issues.apache.org/jira/browse/LUCENE-6111
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/analysis
    Affects Versions: 4.6
            Reporter: deyinchen
            Priority: Minor
             Fix For: 4.6


When I use mahout-0.9 depending on lucene-4.6 to run Kmeans clustering 
algorithm, I find that the default word segmentation analyzer class named 
'org.apache.lucene.analysis.standard.StandardAnalyzer' is very ugly, only 
single word could be splitted.However, ansj Chinese word segmentation tool is 
widely used in Chinese document-tokenizer, and I am willing to add it to 
support lucene.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to