markharw00d wrote:
>>Isn't it semi trivial if you are not interested in the fragments (I swear it seems that most people are not)? I

I haven't conducted a survey but it's the typical web search engine scenario - select only a small subset of the matching document content for display in SERPS. I would expect that to be a pretty commonplace requirement for which we should retain a solution.
No doubt. I certainly am not suggesting you ditch fragments and I have no evidence more people just want to highlight a doc...it's just the impression that I get from the mailing list is that most people just want to highlight the returned doc...I am sure plenty of people need google style results too, but my experience with Lucene has not often been in the area of web search engines. I bet a lot of users would benefit from a highlighter that highlights actual hits and doesn't summarize though (both would be great). I wouln't claim to be an authority on any of this though...take my opinion for what its worth -- very little.

Maybe a new highlighter with no attempt at summarising could more easily address phrase support for small pieces of content. It will always be hard to faithfully represent all possible query match logic - especially if there are NOTs, ANDs and ORs mixed in with all the term proximity logic e.g. NotNear. Some compromise is required. I did suggest that spans maybe a better basis for highlighting than terms and pointed at some existing code to get you along this path - see here http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2
I have some code that you wrote that seems to turn almost any query into a series of spans. Perhaps it is not as robust as my limited testing made it seem.

There are also a couple of other Highlighter packages contributed recently which I listed in my previous mail but I simply haven't had the time to look at in detail so they may be useful. Anyone had any experience of those?
Non of them seem to do full span highlighting...again based on my limited investigation.

>> every new highlight has to be compared against every previous highlight for overlap Yes, Analyzers that produce overlapping tokens are an added complication when implementing highlighting logic. I think we have a reasonable Junit test containing several of the more exotic analyzer scenarios which you could/should use for testing any other highlighter implementation.
thanks for the tip.

I appreciate your response Mark. I will continue to look at your span extractor...I thought that it alone was enough to what I wanted, but your comments seem to suggest maybe I'll need more. I hope not <g> If I do manage something I will be sure to post my results.


- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to