Hello fellows of Lucene,

I just discovered that the _ character is a word separator in the StandardAnalyzer.
Can it be?
It broke our usage of a field that stores a comma-separated list of "uri-fragments" which, of course, contain _: the standard-analyzer splits these as separate term which fully-fuzzifies the search.

Is there any rationale? A past debate about that?
I would feel my candid approach to be rather common: underscore makes new words out of existing words, dash makes composed words.

I sure know I can try to adapt standard-analyzer! I wanted to know the reasons.

paul

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to