Re: Searching Diacritics

thomas arni Mon, 27 Aug 2007 07:30:07 -0700

You can extend the DefaultAnalyzer.

The only thing you have to do, is to rewrite the method tokenStream likethis:


 /** Constructs a [EMAIL PROTECTED] StandardTokenizer} filtered by a [EMAIL 
PROTECTED]
 StandardFilter}, a [EMAIL PROTECTED] LowerCaseFilter} and a [EMAIL PROTECTED] 
StopFilter}. */
 public TokenStream tokenStream(String fieldName, Reader reader) {
   TokenStream result = new StandardTokenizer(reader);
   result = new StandardFilter(result);
   result = new LowerCaseFilter(result);
   result = new StopFilter(result, stopSet);
   result = new ISOLatin1AccentFilter(result);
   return result;
 }


anorman wrote:

This looks like exactly what I want.  Would I implement this along with
another analyzer such as the standard or stand alone?  Does anyone have any
code examples of implementing such a thing?

Thanks,
Albert




karl wettin-3 wrote:

27 aug 2007 kl. 16.03 skrev anorman:

I have a searchable index of documents which contain french andspanishdiacritics (è, é, À) etc. I would like to make the contentsearchable so
that when a user searches for a word such as "Amèrique" or "Amerique"
(without diacritic) then it returns the same results.

Has anyone set up something similar?

ISOLatin1AccentFilter

--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Searching Diacritics

Reply via email to