You can extend the DefaultAnalyzer.
The only thing you have to do, is to rewrite the method tokenStream like this:

 /** Constructs a [EMAIL PROTECTED] StandardTokenizer} filtered by a [EMAIL 
PROTECTED]
 StandardFilter}, a [EMAIL PROTECTED] LowerCaseFilter} and a [EMAIL PROTECTED] 
StopFilter}. */
 public TokenStream tokenStream(String fieldName, Reader reader) {
   TokenStream result = new StandardTokenizer(reader);
   result = new StandardFilter(result);
   result = new LowerCaseFilter(result);
   result = new StopFilter(result, stopSet);
   result = new ISOLatin1AccentFilter(result);
   return result;
 }


anorman wrote:
This looks like exactly what I want.  Would I implement this along with
another analyzer such as the standard or stand alone?  Does anyone have any
code examples of implementing such a thing?

Thanks,
Albert




karl wettin-3 wrote:
27 aug 2007 kl. 16.03 skrev anorman:

I have a searchable index of documents which contain french and spanish diacritics (è, é, À) etc. I would like to make the content searchable so
that when a user searches for a word such as "Amèrique" or "Amerique"
(without diacritic) then it returns the same results.

Has anyone set up something similar?
ISOLatin1AccentFilter

--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to