krinsang commented on PR #645: URL: https://github.com/apache/lucenenet/pull/645#issuecomment-1265829393
> Wow! Very interesting contribution. It does not look like Java Lucene 4.8.0 or 4.8.1 contain the `KoreanAnalyzer` however they do contain a `CJKAnalyzer` which is intended to cover Chinese, Japanese, and Korean. > > Which Java Lucene version is this contribution a port of? Nice to meet, you. This is a port of Lucene 8.11.0. The problem with the CJK Analyzer that I ran into was the method `TokenStreamComponents` stratifies using a bigram strategy instead of removing non-root words. In the Java implementation of the KoreanAnalyzer, I noticed that the `TokenStreamComponents` method exhibits a stemming behavior. I am using the Java library to perform offline jobs via Scala, and C# for online analysis of keywords. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
