Hi,

I had a question about the Japanese user dictionary. We have a user
dictionary that used to work but after attempting to upgrade Lucene, it
fails with the following error:

Caused by: java.lang.RuntimeException: Illegal user dictionary entry レコーダー
- the concatenated segmentation (レコーダー) does not match the surface form
(レコーダー)
    at
org.apache.lucene.analysis.ja.dict.UserDictionary.<init>(UserDictionary.java:123)

The specific commit causing this error is here
<https://github.com/apache/lucene/commit/73ba88a50dec64f367caa88d277c26dfd1d8883b#diff-75fd48fadfd3d011e9c34c4310ef66e9009edfbc738fd82deb5661a8edb5c5d9>.
The only thing that seems to differ is that the characters are full-width
vs half-width, so I was wondering if this is intended behavior or a bug/too
restrictive. Any suggestions for fixing this would be greatly appreciated!
Thanks!

Reply via email to