markharw00d wrote:
>>Isn't it semi trivial if you are not interested in the fragments (I
swear it seems that most people are not)? I
I haven't conducted a survey but it's the typical web search engine
scenario - select only a small subset of the matching document content
for display in SERPS. I would expect that to be a pretty commonplace
requirement for which we should retain a solution.
No doubt. I certainly am not suggesting you ditch fragments and I have
no evidence more people just want to highlight a doc...it's just the
impression that I get from the mailing list is that most people just
want to highlight the returned doc...I am sure plenty of people need
google style results too, but my experience with Lucene has not often
been in the area of web search engines. I bet a lot of users would
benefit from a highlighter that highlights actual hits and doesn't
summarize though (both would be great). I wouln't claim to be an
authority on any of this though...take my opinion for what its worth --
very little.
Maybe a new highlighter with no attempt at summarising could more
easily address phrase support for small pieces of content. It will
always be hard to faithfully represent all possible query match logic
- especially if there are NOTs, ANDs and ORs mixed in with all the
term proximity logic e.g. NotNear. Some compromise is required. I did
suggest that spans maybe a better basis for highlighting than terms
and pointed at some existing code to get you along this path - see
here http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2
I have some code that you wrote that seems to turn almost any query into
a series of spans. Perhaps it is not as robust as my limited testing
made it seem.
There are also a couple of other Highlighter packages contributed
recently which I listed in my previous mail but I simply haven't had
the time to look at in detail so they may be useful. Anyone had any
experience of those?
Non of them seem to do full span highlighting...again based on my
limited investigation.
>> every new highlight has to be compared against every previous
highlight for overlap
Yes, Analyzers that produce overlapping tokens are an added
complication when implementing highlighting logic. I think we have a
reasonable Junit test containing several of the more exotic analyzer
scenarios which you could/should use for testing any other highlighter
implementation.
thanks for the tip.
I appreciate your response Mark. I will continue to look at your span
extractor...I thought that it alone was enough to what I wanted, but
your comments seem to suggest maybe I'll need more. I hope not <g> If I
do manage something I will be sure to post my results.
- Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]