Re: AW: Best practices for multiple languages?

Bill Janssen Wed, 19 Jan 2011 15:30:18 -0800

Paul Libbrecht <p...@hoplahup.net> wrote:

> I did several changes of this sort and the precision and recall
> measures went better in particular in presence of language-indication
> failure which happened to be very common in our authoring environment.


There are two kinds of failures:  no language, or wrong language.

For no language, I fall back to StandardAnalyzer, so I should have
results similar to yours.  For wrong language, well, I'm using OTS
trigram-based language guessers, and they're pretty good these days.

> >> Wouldn't it be better to prefer precise matches (a field that is
> >> analyzed with StandardAnalyzer for example) but also allow matches are
> >> stemmed.

Yes, I think it might improve things, but again, by how much?  Stemming is
better than no stemming, in terms of recall.  But this approach would also
improve precision.

Bill

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: AW: Best practices for multiple languages?

Reply via email to