Suppose you have a database that stores information about books in a
book store. There is one document per subject. Each document
contains the books categorized under that subject area. For example,
here is sports.xml:
<books subject="sports">
<book id='1'>
<author>
<firstname>James</firstname>
<lastname>Johnson</lastname>
</author>
<title>Running by Moonlight</title>
<pages>220</pages>
</book>
<book id='2'>
<author>
<firstname>Marie</firstname>
<lastname>Franklin</lastname>
</author>
<title>Optimum Nutrition for Peak Bowling Performance</title>
<pages>2</pages>
</book>
</books>
If the following options node is used:
<options xmlns="http://marklogic.com/appservices/search">
<constraint name="author-firstname">
<word>
<element ns="" name="firstname" />
</word>
</constraint>
<constraint name="title">
<word>
<element ns="" name="title" />
</word>
</constraint>
</options>
...then these queries will both return sports.xml, as desired:
author-firstname:james title:running
author-firstname:marie title:bowling
However, using that approach, the following query would erroneously
return sports.xml because, though both constraints are satisfied, they
are not satisfied within the same book (author-firstname from book #1
and title from book #2):
author-firstname:james title:bowling
Now, I know that setting fragment roots at <book> could work in this
particular case, but I do not want to use fragmentation.
One (annoyingly complicated) possibility: define a custom constraint
for the book search that would be called like this:
book-query:"firstname:james title:bowling"
(book-query is a constraint with multiple related sub-queries in a "phrase").
This would require writing a constraint module that would have to
parse the $right part ("firstname:james title:bowling") itself and
generate a cts:query. I would have to write such a library module for
every set of constraints that needs to match in a common subtree (in
the example above, <book> is the only common subtree, but real data
may have several similar situations per document). The constraint
modules might use the Search API themselves to separately find matches
for each sub-query, then post-process the results to filter out
documents without matches to all criteria occurring within a common
ancestor. For example, the custom constraint module implementing the
book-query would retrieve result sequences A and B for firstname and
title, respectively. Then post-processing would find that A and B
both contain sports.xml. However, because the search:match nodes for
the sub-query constraints do not share a common <book> ancestor
(/books/book[1]/author/firstname vs. /books/book[2]/title), sports.xml
does NOT match.
I certainly hope there's a better approach. This one seems
overly-complicated (to put it mildly). It's computationally expensive
since max-matches would have to be set to a large number to get all
matches and in the worst case (large number of matches, no common
ancestor), all matches would have to be examined (while searching for
a common ancestor).
Am I missing some key way to use the Search API that would be just the
ticket for this?
Thanks,
Karl
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general