Re: TermVector usage

Marvin Humphrey Tue, 21 Feb 2006 05:54:27 -0800


On Feb 20, 2006, at 9:47 PM, Otis Gospodnetic wrote:

As far as I can tell, most people use TermVectors for "more likethis" queries (see MoreLikeThis class in contrib/ somewhere)


On Feb 21, 2006, at 5:39 AM, Erik Hatcher wrote:

I use term vectors for "more like this" queries, such as the linksyou'll see here:
<http://www.rossettiarchive.org/rose/?query=%2B%28%2Bblessed+%2Bdamozel%29+%2B%28archivetype%3Arad%29>

Thanks, Otis and Erik. (MoreLikeThis is under contrib/similarity.)Looking at the way MoreLikeThis is implemented, my impression is thatit wouldn't hurt and might help a smidge to store the term vectorwith the stored document.

What I don't yet see is a benefit to having all TermVectors resideside-by-side in the same file. A full vector-space search whichcompares complete document vectors and thus needs to scan through allTermVectors for each query is the only application I've thought of sofar. Of course such a beast is impractical for a search engine ofany reasonable size, so you need some method of data reduction.LSI's decomposition is one way of hacking at that problem, but youdon't do that on the fly at search-time. :) Another is the heuristicprocess applied by the MoreLikeThis class, but MoreLikeThis onlyneeds a single document's TermVectors.


Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: TermVector usage

Reply via email to