Hi, I worked some time ago on a similar system (using Solr) and used the multiple indices route (the multicore feature in Solr). In our case, the "same" document could exist in different languages; different localized versions of the same information (same Solr unique id for each l10n version).
This allowed to have the same index structure across locales but different settings for each (synonyms, stemmers, etc). Maintenance was easier this way; when refining/updating the settings (say adding synonyms or stemmers for instance), you may need to reindex and smaller indices allow faster deployments. It's also "dead-easy" to add a new language (esp. compared to the one index solution). It also makes replication or partitioning easier. Overall, IMO, this is a more scalable architecture than the single-index one. Users were able to set in which language they were "fluent" (default being browser locale) so queries would only be performed in those and results "clustered" per locale (no need to return results that can not be understood...). Besides, IMO, scoring / ordering documents in different languages is a bit like comparing apples and oranges. Finally, query expansion can also be used in the multiple indices case and might even use automated/guided translation. In my experience, multiple indices had many advantages over the single index solution, be them functional or operational. YMMV. Hope this helps, Henrib -- View this message in context: http://n3.nabble.com/Designing-a-multilingual-index-tp688766p690625.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org