Will, That highlighting works as designed - highlighting breaks up overlap areas into separate matches rather than returning nested highlit structures (nested structures can be tricky to deal with).
One solution may be to rewrite the tree and combine siblings if they are adjacent. You could write a recursive pair of functions to do this (use "Removing all Attributes" and other examples from http://en.wikibooks.org/wiki/XQuery/Filtering_Nodes as a sample.). Or you could use XSLT or actually use cts:highlight in a second pass. Note that if you use a cts:or-query() for highlighting, you can access the particular disjuncts that matched via $cts:queries. This can be used to set a particular ID or attribute into the <match> which can clear up ambiguity when multiple queries overlap like this. Yours, Damon ________________________________________ From: [email protected] [[email protected]] On Behalf Of Will Thompson [[email protected]] Sent: Friday, May 13, 2011 6:08 PM To: General MarkLogic Developer Discussion Subject: [MarkLogic Dev General] cts:highlight problem with overlapping word queries If I am searching for matches in a doc using OR'd word-queries, and one query (A) happens to contain the text of another query (B), cts:highlight doesn't behave ideally when replacing the text of query A. Here's a simplified example: let $p := <p>From the desk of a Top Secret Government Agency.</p> return cts:highlight($p, cts:or-query( (cts:word-query("Top Secret Government Agency"), cts:word-query("Secret Government Agency"))), <m>{$cts:text}</m>) The output is: <p>From the desk of a <m>Top </m><m>Secret Government Agency</m>.</p> Ideally, I would expect: <p>From the desk of a <m>Top <m>Secret Government Agency</m></m>.</p> but if I could even get this it would be fine: <p>From the desk of a <m>Top Secret Government Agency</m>.</p> I tried to intervene inside the replacement param of cts:highlight, using xdmp:set to keep track of matches and bailing out if the current match was found in the set of already-matched. But the output is not an artifact of one replacement being followed by another, mangling the output - cts:highlight really does think that "Top " & "Secret Government Agency" are the matches. Any suggestions on how to prevent this from happening? Thanks, Will _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
