Tansley, Robert wrote:
What if we're trying to index multiple languages in the same site?  Is
it best to have:

1/ one index for all languages
2/ one index for all languages, with an extra language field so searches
can be constrained to a particular language
3/ separate indices for each language?

I'd use 2/. In particular, use the same field for the content, title, etc., even if when produced by different analyzers. Have a "lang" field that names the language of the document.

At query time, use an analyzer selected by the user's environment (e.g., HTTP lang header). If folks are getting false positives, where a term in another language that means something different is matching their query, they can use a "lang" pulldown to remove documents from other languages, implemented as a Lucene Filter.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to