Is there a simple, efficient way to compute similarity of documents indexed with Lucene?

My first, naive idea is to use the entire contents of one document as a query to the second document, and use the score as a similarity measurement. But I think I'm probably way off base with that.

Can any IR pros set me straight? Thanks very much.

Matt


-- Matt Chaput Word Monkey Side Effects Software Inc.

"A goddamned ray of sunshine all the goddamned time"
-- Sparkle Hayter


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to