Re: Designing a multilingual index

henrib Fri, 02 Apr 2010 01:33:00 -0700

I agree that if you dont know the "source" language - or can't determine it -
there is a lot of uncertainty in trying to transmogriphy the query from one
language to another!  TIKA and Nutch do have language determination tools
though (ngram profiles if I'm not mistaken). And you also can interact with
the end-user before issuing the query to confirm the language if
necessary("did you mean" kind of feature).
Assuming you can determine the query language and you do have "dictionaries"
of important terms per field, I tend to think you increase precision.


The simple route is to ignore the language, use ngrams, forget stemmers & al
and just fire; recall will likely be good, precision not that much.

Cheers

Henrib
-- 
View this message in context: 
http://n3.nabble.com/Designing-a-multilingual-index-tp688766p692481.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Designing a multilingual index

Reply via email to