MoreLikeThisQuery term frequency caching

Richard Marr Mon, 06 Apr 2009 23:28:58 -0700

Hi all,

I've been exploring MoreLikeThisQuery as part of a recent project and
something that came out of that might be useful to others here.


I found that using MoreLikeThisQuery could be quite slow for my use
case, but that most of the time involved was spent looking up term
frequencies to calculate weightings. Since those term frequencies
usually don't need to be anywhere near real-time I found that caching
them in a hashmap had a very good cost/benefit ratio for my
application, speeding up MLT queries by an order of magnitude.

My use case was possibly unusual in that I was looking at a limited
vocabulary rather than full English, but in theory other applications
that make use of the MLT class could benefit.

So at this point I have some questions: (1) Have others experienced
similar performance characteristics for MLT code? (2) Am I missing
some fatal flaw in this approach? (3) Are the modifications worth
sharing?

Cheers,

Rich

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

MoreLikeThisQuery term frequency caching

Reply via email to