Re: Multiword Highlighting

Mark Miller Sat, 27 Jan 2007 15:40:55 -0800


markharw00d wrote:

>>Isn't it semi trivial if you are not interested in the fragments (Iswear it seems that most people are not)? I
I haven't conducted a survey but it's the typical web search enginescenario - select only a small subset of the matching document contentfor display in SERPS. I would expect that to be a pretty commonplacerequirement for which we should retain a solution.

No doubt. I certainly am not suggesting you ditch fragments and I haveno evidence more people just want to highlight a doc...it's just theimpression that I get from the mailing list is that most people justwant to highlight the returned doc...I am sure plenty of people needgoogle style results too, but my experience with Lucene has not oftenbeen in the area of web search engines. I bet a lot of users wouldbenefit from a highlighter that highlights actual hits and doesn'tsummarize though (both would be great). I wouln't claim to be anauthority on any of this though...take my opinion for what its worth --very little.

Maybe a new highlighter with no attempt at summarising could moreeasily address phrase support for small pieces of content. It willalways be hard to faithfully represent all possible query match logic- especially if there are NOTs, ANDs and ORs mixed in with all theterm proximity logic e.g. NotNear. Some compromise is required. I didsuggest that spans maybe a better basis for highlighting than termsand pointed at some existing code to get you along this path - seehere http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2

I have some code that you wrote that seems to turn almost any query intoa series of spans. Perhaps it is not as robust as my limited testingmade it seem.

There are also a couple of other Highlighter packages contributedrecently which I listed in my previous mail but I simply haven't hadthe time to look at in detail so they may be useful. Anyone had anyexperience of those?

Non of them seem to do full span highlighting...again based on mylimited investigation.

>> every new highlight has to be compared against every previoushighlight for overlapYes, Analyzers that produce overlapping tokens are an addedcomplication when implementing highlighting logic. I think we have areasonable Junit test containing several of the more exotic analyzerscenarios which you could/should use for testing any other highlighterimplementation.

thanks for the tip.

I appreciate your response Mark. I will continue to look at your spanextractor...I thought that it alone was enough to what I wanted, butyour comments seem to suggest maybe I'll need more. I hope not <g> If Ido manage something I will be sure to post my results.



- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Multiword Highlighting

Reply via email to