>> the answer has never been satisfactory Is this the original question? http://www.nabble.com/Which-field-matched---tf4141549.html#a11780757
What actually formed the basis of a document match is hidden in a tree of heterogeneous Query objects and to be efficient their match output is limited to document ids and scores- not some detailed analysis of which sections of a document matched. It is therefore hard/impossible to have any highlighting solution which provides answers for all query types and the existing Highlighter relies on a rough heuristic where QueryTermExtractor output is used to find the list of query terms used and field TokenStreams are analyzed for content. You mention Highlighter performance would be bad for wildcard queries. Have you tried it? If it does turn out to be bad (many wildcard variants produced) might I suggest the following: 1) Dissect the unrewritten Query and find all WildcardQuery objects 2) Create a custom analyzer that re-implements the wildcard logic and produces a highlighter-friendly token stream i,e Given a query of Fred W* and data of Fred West was arrested the analyzer would produce: Fred [W*|West] was arrested ..where the tokens "W*" and "West" appear at the same position 3) Add a special wildcard term (W*) to the list of Query terms given to the Highlighter. This would then match with the W* injected into the content in step 2) This would avoid the overhead of picking through all the wildcard variants produced by the wildcardQuery but at the cost of extra coding on your part and the runtime cost of re-executing wildcard logic on all terms in the selected documents' TokenStreams. The difference in runtime cost may prove minimal. Cheers Mark ___________________________________________________________ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]