Hi

You could use the ParalellReader for this if you have all documents in all 
languages. Then, the metadata fields can be stored in one of the field data 
files, while each languages gets its own field data file...

max

-----Original Message-----
From: Paul Libbrecht [mailto:[EMAIL PROTECTED]
Sent: Friday, June 03, 2005 14:23
To: java-user@lucene.apache.org
Subject: Re: Indexing multiple languages


Robert,

Le 2 juin 05, à 21:42, Tansley, Robert a écrit :
> It seems that there are even more options --
> 4/ One index, with a separate Lucene document for each (item,language) 
> combination, with one field that specifies the language
> 5/ One index, one Lucene document per item, with field names that 
> include the language (e.g. title_en, title_cn)
> I quite like 4, because you can search with no language constraint, or 
> with one as Paul suggests below.

You can in both cases. In the second, you need to expand the query (ie 
searching for carrot would search text_en:carrot or text_cn:carrot", 
which, I think is fair as long as you don't a two kilometer's list of 
languages.

> However, some "non language-specific" data might need to be repeated 
> (e.g. dates), unless we had an extra Lucene document for all that.  I 
> wonder what the various pros and cons in terms of index size and 
> performance would be in each case?  I really don't have enough 
> knowledge of Lucene to have any idea...

If you separate the indices you won't, as far as I know, be able to 
query simultaneously (e.g. some text which, as well, is new 
enough....).

paul


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to