Re: Indexing accented characters, then searching by any form

Dora Tue, 25 Nov 2008 07:22:03 -0800

Karl Wettin wrote:
> 
> Try this (dry coded) snippet instead:
> 
> StandardAnalyzer objAnalyzer = new StandardAnalyzer() {
>    public TokenStream tokenStream(String fieldName, Reader reader) {
>      return new ISOLatin1AccentFilter(super.tokenStream(fieldName,  
> reader));
>    }
> }
> 

I tried this, but it does not work as expected.

I am using an utility class with a static method that gives me an analyzer:

public static Analyzer getAnalyzer() 
        {  
                StandardAnalyzer objAnalyzer = new StandardAnalyzer() {
                           public TokenStream tokenStream(String fieldName, 
Reader reader) {
                             return new 
ISOLatin1AccentFilter(super.tokenStream(fieldName,
reader));
                           }
                        };
                        return objAnalyzer;
                }
        }

So when I need the analyzer (for indexing or searching) I perform an
UtilityClass.getAnalyzer() call.

It works for my query parser: The accent are correctly removed when
performing the search.
If my index contains "cafe" searching for "café" will find the documents
containing "cafe"

But when explore my index with Luke I can see that the indexer does not use
the ISOLatin1AccentFilter  (I tested with a breakpoint in the overriden
tokenStream method) and if the document contains "café", the index will
contain "café".

As a consequence, search on word having accent is not possible: the index
contains the accent, while it is removed by the search process.

So my index contains "café", but when I search for "café" the filter changes
it in "cafe" and it gives no hit...

Any clue on why my filter is not used at time of indexation ?




-- 
View this message in context: 
http://www.nabble.com/Indexing-accented-characters%2C-then-searching-by-any-form-tp15412778p20682548.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: Indexing accented characters, then searching by any form

Reply via email to