Hi,

 

I'm indexing documents, and some of them are provided in several languages. 
Thanks to this mailing list participants, I know that I have two choices to 
index these multiple instances of documents. Either, I create languages 
specific field, either I index the translations in different documents, adding 
the language field.

I choose the second solution, because first, the translated documents will not 
be the majority of documents that I need to index, second is that at search 
time, if I don't want to restrict the search to one language, with solution 
one, I have a query with potentially lot of fields to cover all languages. 
Also, the second option makes it faster to filter the results by language, if 
specified. 

 

However, with this solution, when the query is not filtered by a language and 
that the user search for fields common to any language, such as author for 
instance, I will have as much results as I have translations. I'm wondering if 
there is a way to have a "distinct filter". For instance, I have a common field 
"docId" for the translations of one document, and I don't want to have two 
documents with the same "docId" in my results.

Also, even if the user didn't put restrictions on language, I want to give back 
the results in its default language if it's available, but I don't want to do a 
filter query, because I don't want to restrict the search to only this language.

So basically, if the default language of the user is English, and that I have 
translations of the matching documents in English, it will be the only one 
send, otherwise, it should take the first translation available for this 
document.

Any hint of how I could do this?

 

Thanks,

 

Mélanie

 

 

Reply via email to