> I'm actually working on chinese texts in TEI.
> I would like to know if stemming chinese text is possible in BaseX, as 
> we can do with other languages (like english or deutsch)?
> Or maybe there is a way to add this functionnality with Lucene?
> 
> Best regards,
> Philippe Pons
> 

Dear Philippe, 

if by stemming you mean the removal of prefixes and suffixes to arrive at 
normalized word stems, the concept simply doesn’t apply to Chinese, so no it 
can’t be done. 

What you are most likely looking for is the ability to tokenize strings into 
n-grams, which lucene can do. 
https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenFilter.html
 
<https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenFilter.html>

Greetings
Duncan

Reply via email to