Greets,
KinoSearch 0.05, which for now I'm calling a "loose port" of Lucene,
was published to CPAN a few weeks ago. It's nice and fast, but
missing some features, most notably multiple segment support and
incremental indexing. Before I get to that though, I'm adding
excerpting and highlighting.
The version of KinoSearch which preceded the Lucene-based rewrite
also had a highlighter which depended on what were effectively
TermVectors with stored offsets. However, unlike Lucene, these were
stored along with the stored fields. As I've been preparing to port
all the support apparatus for TermVectors, I've been wondering
whether I shouldn't go back to that. It sure would be less work to
code up. Theoretically there ought to be less disk activity, too.
From following the Lucene lists off and on, I've gotten the
impression that lots of people use TermVectors to feed the
highlighter, but I haven't seen many applications for them besides
that. LSI-type ideas percolate every once in a while. Besides
highlighting, how many people are using TermVectors and how are they
using them?
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]