Re: [basex-talk] BaseX-Talk Digest, Vol 130, Issue 5

Duncan Paterson Mon, 12 Oct 2020 03:12:47 -0700

> I'm actually working on chinese texts in TEI.
> I would like to know if stemming chinese text is possible in BaseX, as 
> we can do with other languages (like english or deutsch)?
> Or maybe there is a way to add this functionnality with Lucene?
> 
> Best regards,
> Philippe Pons
>


Dear Philippe, 

if by stemming you mean the removal of prefixes and suffixes to arrive at 
normalized word stems, the concept simply doesn’t apply to Chinese, so no it 
can’t be done. 

What you are most likely looking for is the ability to tokenize strings into 
n-grams, which lucene can do. 
https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenFilter.html
 
<https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenFilter.html>

Greetings
Duncan

Re: [basex-talk] BaseX-Talk Digest, Vol 130, Issue 5

Reply via email to