[MarkLogic Dev General] Require separate search constraints to be satisfied within same doc subtree

Karl Erisman Thu, 01 Apr 2010 16:57:16 -0700

Suppose you have a database that stores information about books in a
book store.  There is one document per subject.  Each document
contains the books categorized under that subject area.  For example,
here is sports.xml:


    <books subject="sports">
        <book id='1'>
            <author>
                <firstname>James</firstname>
                <lastname>Johnson</lastname>
            </author>
            <title>Running by Moonlight</title>
            <pages>220</pages>
        </book>
        <book id='2'>
            <author>
                <firstname>Marie</firstname>
                <lastname>Franklin</lastname>
            </author>
            <title>Optimum Nutrition for Peak Bowling Performance</title>
            <pages>2</pages>
        </book>
    </books>

If the following options node is used:

    <options xmlns="http://marklogic.com/appservices/search";>
        <constraint name="author-firstname">
            <word>
                <element ns="" name="firstname" />
            </word>
        </constraint>
        <constraint name="title">
            <word>
                <element ns="" name="title" />
            </word>
        </constraint>
    </options>

...then these queries will both return sports.xml, as desired:

    author-firstname:james title:running
    author-firstname:marie title:bowling

However, using that approach, the following query would erroneously
return sports.xml because, though both constraints are satisfied, they
are not satisfied within the same book (author-firstname from book #1
and title from book #2):

    author-firstname:james title:bowling

Now, I know that setting fragment roots at <book> could work in this
particular case, but I do not want to use fragmentation.

One (annoyingly complicated) possibility: define a custom constraint
for the book search that would be called like this:

    book-query:"firstname:james title:bowling"

(book-query is a constraint with multiple related sub-queries in a "phrase").

This would require writing a constraint module that would have to
parse the $right part ("firstname:james title:bowling") itself and
generate a cts:query.  I would have to write such a library module for
every set of constraints that needs to match in a common subtree (in
the example above, <book> is the only common subtree, but real data
may have several similar situations per document).  The constraint
modules might use the Search API themselves to separately find matches
for each sub-query, then post-process the results to filter out
documents without matches to all criteria occurring within a common
ancestor.  For example, the custom constraint module implementing the
book-query would retrieve result sequences A and B for firstname and
title, respectively.  Then post-processing would find that A and B
both contain sports.xml.  However, because the search:match nodes for
the sub-query constraints do not share a common <book> ancestor
(/books/book[1]/author/firstname vs. /books/book[2]/title), sports.xml
does NOT match.

I certainly hope there's a better approach.  This one seems
overly-complicated (to put it mildly).  It's computationally expensive
since max-matches would have to be set to a large number to get all
matches and in the worst case (large number of matches, no common
ancestor), all matches would have to be examined (while searching for
a common ancestor).

Am I missing some key way to use the Search API that would be just the
ticket for this?

Thanks,
Karl
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

[MarkLogic Dev General] Require separate search constraints to be satisfied within same doc subtree

Reply via email to