On Feb 28, 2005, at 10:01 AM, [EMAIL PROTECTED] wrote:
Hello,
I would like to build a search engine using several different languages -
f.e. Spanish names, French names, ...

Will your text be a mix of languages within a single field? Or would each document (or field) be a single language?


- Using a different analyzer for each language would be one solution.

You will most likely have to use a different analyzer for each language, though that depends on the answers to the above.


- But how about replacing each special character (Umlaute, ...ä, ö, ...)
with its html special character before indexing and doing the same with
each search query before searching??

An HTML entity is more than one character. The simplest is to leave the characters as-is, in Unicode.


This seems to me the simplest approach to handling this issues - ?

What are the drawbacks? No Stem search? Other considerations?

Stemming is language-specific, which factors into your analyzer(s) choices.


        Erik


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to