krinsang commented on PR #645:
URL: https://github.com/apache/lucenenet/pull/645#issuecomment-1265829393

   > Wow! Very interesting contribution. It does not look like Java Lucene 
4.8.0 or 4.8.1 contain the `KoreanAnalyzer` however they do contain a 
`CJKAnalyzer` which is intended to cover Chinese, Japanese, and Korean.
   > 
   > Which Java Lucene version is this contribution a port of?
   
   Nice to meet, you. This is a port of Lucene 8.11.0. The problem with the CJK 
Analyzer that I ran into was the method `TokenStreamComponents` stratifies 
using a bigram strategy instead of removing non-root words. In the Java 
implementation of the KoreanAnalyzer, I noticed that the 
`TokenStreamComponents` method exhibits a stemming behavior. I am using the 
Java library to perform offline jobs via Scala, and C# for online analysis of 
keywords.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to