Greets,

KinoSearch 0.05, which for now I'm calling a "loose port" of Lucene, was published to CPAN a few weeks ago. It's nice and fast, but missing some features, most notably multiple segment support and incremental indexing. Before I get to that though, I'm adding excerpting and highlighting.

The version of KinoSearch which preceded the Lucene-based rewrite also had a highlighter which depended on what were effectively TermVectors with stored offsets. However, unlike Lucene, these were stored along with the stored fields. As I've been preparing to port all the support apparatus for TermVectors, I've been wondering whether I shouldn't go back to that. It sure would be less work to code up. Theoretically there ought to be less disk activity, too.

From following the Lucene lists off and on, I've gotten the impression that lots of people use TermVectors to feed the highlighter, but I haven't seen many applications for them besides that. LSI-type ideas percolate every once in a while. Besides highlighting, how many people are using TermVectors and how are they using them?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to