Damon,
Great point about $cts:queries -- I was able to exploit this to do it in one
pass. It doesn't handle nesting, but it will always keep the highest level
match, which is what I need.
Here is the updated example:
let $p := <p>From the desk of a Top Secret Government Agency.</p>
let $q := cts:or-query((cts:word-query("Top Secret Government
Agency"),cts:word-query("Secret Government Agency")))
return
cts:highlight($p,$q,
(
if (count($cts:queries) gt 1)
then xdmp:set($cts:action, "continue")
else
(
let $matched-text := <x>{$cts:queries}</x>/cts:word-query/cts:text/data(.)
return
<m>{$matched-text}</m>
)
))
Outputs:
<p>From the desk of a <m>Top Secret Government Agency</m>.</p>
This solution obviously relies on assumptions about what's inside the or-query,
but the example could be modified to handle overlapping in other scenarios.
Best,
Will
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Damon Feldman
Sent: Sunday, May 15, 2011 9:55 AM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] cts:highlight problem with overlapping
word queries
Will,
That highlighting works as designed - highlighting breaks up overlap areas into
separate matches rather than returning nested highlit structures (nested
structures can be tricky to deal with).
One solution may be to rewrite the tree and combine siblings if they are
adjacent. You could write a recursive pair of functions to do this (use
"Removing all Attributes" and other examples from
http://en.wikibooks.org/wiki/XQuery/Filtering_Nodes as a sample.). Or you
could use XSLT or actually use cts:highlight in a second pass.
Note that if you use a cts:or-query() for highlighting, you can access the
particular disjuncts that matched via $cts:queries. This can be used to set a
particular ID or attribute into the <match> which can clear up ambiguity when
multiple queries overlap like this.
Yours,
Damon
________________________________________
From: [email protected]
[[email protected]] On Behalf Of Will Thompson
[[email protected]]
Sent: Friday, May 13, 2011 6:08 PM
To: General MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] cts:highlight problem with overlapping word
queries
If I am searching for matches in a doc using OR'd word-queries, and one query
(A) happens to contain the text of another query (B), cts:highlight doesn't
behave ideally when replacing the text of query A.
Here's a simplified example:
let $p := <p>From the desk of a Top Secret Government Agency.</p>
return
cts:highlight($p,
cts:or-query(
(cts:word-query("Top Secret Government Agency"),
cts:word-query("Secret Government Agency"))),
<m>{$cts:text}</m>)
The output is:
<p>From the desk of a <m>Top </m><m>Secret Government Agency</m>.</p>
Ideally, I would expect:
<p>From the desk of a <m>Top <m>Secret Government Agency</m></m>.</p>
but if I could even get this it would be fine:
<p>From the desk of a <m>Top Secret Government Agency</m>.</p>
I tried to intervene inside the replacement param of cts:highlight, using
xdmp:set to keep track of matches and bailing out if the current match was
found in the set of already-matched. But the output is not an artifact of one
replacement being followed by another, mangling the output - cts:highlight
really does think that "Top " & "Secret Government Agency" are the matches.
Any suggestions on how to prevent this from happening?
Thanks,
Will
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general