Re: Indexing multiple languages

Paul Libbrecht Fri, 03 Jun 2005 05:54:36 -0700

Robert,

Le 2 juin 05, à 21:42, Tansley, Robert a écrit :

It seems that there are even more options --
4/ One index, with a separate Lucene document for each (item,language)combination, with one field that specifies the language5/ One index, one Lucene document per item, with field names thatinclude the language (e.g. title_en, title_cn)I quite like 4, because you can search with no language constraint, orwith one as Paul suggests below.

You can in both cases. In the second, you need to expand the query (iesearching for carrot would search text_en:carrot or text_cn:carrot",which, I think is fair as long as you don't a two kilometer's list oflanguages.

However, some "non language-specific" data might need to be repeated(e.g. dates), unless we had an extra Lucene document for all that. Iwonder what the various pros and cons in terms of index size andperformance would be in each case? I really don't have enoughknowledge of Lucene to have any idea...

If you separate the indices you won't, as far as I know, be able toquery simultaneously (e.g. some text which, as well, is newenough....).


paul


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Indexing multiple languages

Reply via email to