> By “which query” I mean which of the 125,000 separate query docs actually
> matched for a given cts:reverse-query() call.
cts:search(
doc(),
cts:reverse-query(doc("newdoc.xml"))
)
This will return all the docs containing any serialized queries which would
match newdoc.xml.
> I guess my question is: in the case where the reverse query is applied to an
> element that is not a full document, does the “brute force” have to be
> applied for every candidate query or only for those that match containing
> document of the input element?
In general I avoid putting any xpath in the first arg. In the JavaScript API
it's not even possible, because it gives a false sense of optimization.
> If the brute force cost is applied to each query then doing a two-phase
> search would be faster: determine which reverse queries apply to the input
> document and then use those to find the elements within the input document
> that actually matched. But if the brute force cost only applies to those
> queries that match the containing doc then ML internally must produce the
> faster result than doing it in my own code.
>
> But as you say, that calls into the question the use of reverse queries at
> all: why not simply run the 125,000 forward queries and update each element
> matched as appropriate?
Yep. If it's a one-time batch job and you're trying to minimize the time then
this would be faster, I bet.
> Or it may simply be that we need to do some horizontal scaling and invest in
> additional D-nodes.
You're going to do this often?
-jh-
_______________________________________________
General mailing list
[email protected]
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general