Hello, I'm surprised and in doubt it may happen. Would you mind to upload a short test reproducing it?
On Wed, Sep 20, 2023 at 11:44 PM Amitesh Kumar <amiteshk...@gmail.com> wrote: > Thanks Mikhail! > > I have tried all other tokenizers from Lucene4.4. In case of > WhitespaceTokwnizer, it loses romanizing of special chars like - etc > > > On Wed, Sep 20, 2023 at 16:39 Mikhail Khludnev <m...@apache.org> wrote: > > > Hello, > > Check the whitespace tokenizer. > > > > On Wed, Sep 20, 2023 at 7:46 PM Amitesh Kumar <amiteshk...@gmail.com> > > wrote: > > > > > Hi, > > > > > > I am facing a requirement change to get % sign retained in searches. > e.g. > > > > > > Sample search docs: > > > 1. Number of boys 50 > > > 2. My score was 50% > > > 3. 40-50% for pass score > > > > > > Search query: 50% > > > Expected results: Doc-2, Doc-3 i.e. > > > My score was > > > 1. 50% > > > 2. 40-50% for pass score > > > > > > Actual result: All 3 documents (because tokenizer strips off the % both > > > during indexing as well as searching and hence matches all docs with 50 > > in > > > it. > > > > > > On the implementation front, I am using a set of filters like > > > lowerCaseFilter, EnglishPossessiveFilter etc in addition to base > > tokenizer > > > StandardTokenizer. > > > > > > Per my analysis suggests, StandardTokenizer strips off the % I am > > facing a > > > requirement change to get % sign retained in searches. e.g > > > > > > Sample search docs: > > > 1. Number of boys 50 > > > 2. My score was 50% > > > 3. 40-50% for pass score > > > > > > Search query: 50% > > > Expected results: Doc-2, Doc-3 i.e. > > > My score was 50% > > > 40-50% for pass score > > > > > > Actual result: All 4 documents > > > > > > On the implementation front, I am using a set of filters like > > > lowerCaseFilter, EnglishPossessiveFilter etc in addition to base > > tokenizer > > > StandardTokenizer. > > > > > > Per my analysis, StandardTOkenizer strips off the % sign and hence the > > > behavior.Has someone faced similar requirement? Any help/guidance is > > highly > > > appreciated. > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > > -- Sincerely yours Mikhail Khludnev