You can extend the DefaultAnalyzer.
The only thing you have to do, is to rewrite the method tokenStream like
this:
/** Constructs a [EMAIL PROTECTED] StandardTokenizer} filtered by a [EMAIL
PROTECTED]
StandardFilter}, a [EMAIL PROTECTED] LowerCaseFilter} and a [EMAIL PROTECTED]
StopFilter}. */
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new StandardTokenizer(reader);
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopSet);
result = new ISOLatin1AccentFilter(result);
return result;
}
anorman wrote:
This looks like exactly what I want. Would I implement this along with
another analyzer such as the standard or stand alone? Does anyone have any
code examples of implementing such a thing?
Thanks,
Albert
karl wettin-3 wrote:
27 aug 2007 kl. 16.03 skrev anorman:
I have a searchable index of documents which contain french and
spanish
diacritics (è, é, À) etc. I would like to make the content
searchable so
that when a user searches for a word such as "Amèrique" or "Amerique"
(without diacritic) then it returns the same results.
Has anyone set up something similar?
ISOLatin1AccentFilter
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]