Karl Wettin wrote:
>
> Try this (dry coded) snippet instead:
>
> StandardAnalyzer objAnalyzer = new StandardAnalyzer() {
> public TokenStream tokenStream(String fieldName, Reader reader) {
> return new ISOLatin1AccentFilter(super.tokenStream(fieldName,
> reader));
> }
> }
>
I tried this, but it does not work as expected.
I am using an utility class with a static method that gives me an analyzer:
public static Analyzer getAnalyzer()
{
StandardAnalyzer objAnalyzer = new StandardAnalyzer() {
public TokenStream tokenStream(String fieldName,
Reader reader) {
return new
ISOLatin1AccentFilter(super.tokenStream(fieldName,
reader));
}
};
return objAnalyzer;
}
}
So when I need the analyzer (for indexing or searching) I perform an
UtilityClass.getAnalyzer() call.
It works for my query parser: The accent are correctly removed when
performing the search.
If my index contains "cafe" searching for "café" will find the documents
containing "cafe"
But when explore my index with Luke I can see that the indexer does not use
the ISOLatin1AccentFilter (I tested with a breakpoint in the overriden
tokenStream method) and if the document contains "café", the index will
contain "café".
As a consequence, search on word having accent is not possible: the index
contains the accent, while it is removed by the search process.
So my index contains "café", but when I search for "café" the filter changes
it in "cafe" and it gives no hit...
Any clue on why my filter is not used at time of indexation ?
--
View this message in context:
http://www.nabble.com/Indexing-accented-characters%2C-then-searching-by-any-form-tp15412778p20682548.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]