As part of my work on XTF for the California Digital Library, I've written such
a highlighter. You can see it in action here:
http://texts.cdlib.org/escholarship/
It supports multi-field highlighting, and ranks the matches within a document
field. It highlights the extent of the actual hits, as well as the terms within
a hit (click on a text hit to see this highlighting). I think that's what Doug
means by "phrasal" matching.
Unfortunately, it involves significant additions to the Lucene core. In essence
it relies on an amped-up span system that is capable of scoring the spans, as
well as recording which spans matched for each document field.
This is the second rev of the code, and was designed to be contributed to back
into Lucene. It's already apache licensed, and pretty well documented. I also
tried to ensure zero speed impact for queries that don't need span recording.
Here's the project page: http://sourceforge.net/projects/xtf
A few weeks ago I joined the Lucene dev mailing list, and I've been trying to
get the lay of the land before I suggest changes to the Lucene core. Okay,
that's only partly true. Actually, I've never contributed to a project like
this before, and have been trying to work up the courage.
The code is based on 1.4.3; if people are interested, I'll work on a patch to
the current svn trunk. I'll also have to port our test suite over to junit.
--Martin
On Fri, 06 May 2005 12:04:25 -0700, Doug Cutting wrote:
>�There's a post over at SearchEngineWatch theorizing about how
>�Google produces summaries.
>�
>�http://forums.searchenginewatch.com/showthread.php?threadid=5448
>�
>�Lucene's current highlighter doesn't easily support multi-fields,
>�nor does it take phrasal matching into account. �It might be useful
>�to have a highligher API that takes a Document and summarizes all
>�of its fields, incorporating their boosts in fragment scores. �
>�Thoughts?
>�
>�Doug
>�
>�
>�--------------------------------------------------------------------
>�- To unsubscribe, e-mail: [EMAIL PROTECTED]
>�For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]