FW: Challenges with Chinese Query Matching and Wildcard Search in Lucene (StandardAnalyzer / CJKAnalyzer)

Singh, Divya Tue, 08 Jul 2025 11:48:55 -0700


From: Singh, Divya <divyasi...@siemens.com.INVALID>
Sent: 04 July 2025 14:40
To: d...@lucene.apache.org
Cc: Birajdar, Sharad (DI SW PLM LCS APPS ALM R&D7) <sharad.biraj...@siemens.com>
Subject: FW: Challenges with Chinese Query Matching and Wildcard Search in 
Lucene (StandardAnalyzer / CJKAnalyzer)

From: Thakare, Monika (ext) (DI SW PLM LCS APPS ALM R&D7) 
<monika.thakare....@siemens.com<mailto:monika.thakare....@siemens.com>>
Sent: 04 July 2025 09:56
To: java-user@lucene.apache.org<mailto:java-user@lucene.apache.org>
Cc: Singh, Divya (DI SW PLM LCS APPS ALM R&D7) 
<divyasi...@siemens.com<mailto:divyasi...@siemens.com>>
Subject: Challenges with Chinese Query Matching and Wildcard Search in Lucene 
(StandardAnalyzer / CJKAnalyzer)

Dear Team,

We're currently working with Lucene version 9.4.2, using the 
lucene-analysis-common-9.4.2.jar package, and have encountered inconsistencies 
while performing search queries on Chinese text—particularly full names like 
"黄朝辉".
We've used Luke 9 to inspect the index and observed the following behavior:

Queries that return results:

  *   "黄", "朝", "辉"
  *   "黄*", "朝*", "辉*"
Queries that do not return results:

  *   "黄朝", "朝辉"
  *   "黄朝*", "朝辉*"
  *   Full match: "黄朝辉", "黄朝辉*"

It seems that compound tokens or full-name queries are not matching as 
expected—even with wildcards—despite successful indexing.
To explore alternatives, we attempted to use CJKAnalyzer from 
org.apache.lucene.analysis.cjk, but encountered an Eclipse restriction:
Access restriction: The type 'CJKAnalyzer' is not API

We'd greatly appreciate your insight on:

  1.  Whether wildcard queries are supported for Chinese text using 
StandardAnalyzer or another analyzer.
  2.  How compound or full-name queries in Chinese are expected to behave, and 
whether specific tokenization issues might be involved.
  3.  The proper way to use CJKAnalyzer without encountering access 
restrictions, or alternative analyzers better suited for this use case.

Thank you for your time and any guidance you can provide. We're open to 
suggestions regarding analyzer choice, configuration, or best practices for 
Chinese query handling

Thanks and Regards
Monika Thakare

FW: Challenges with Chinese Query Matching and Wildcard Search in Lucene (StandardAnalyzer / CJKAnalyzer)

Reply via email to