All,

We know that ML will optimize an "order by" clause if it can with a range
index. I've done this with element, attribute, and path range indexes on
many occasions.  However, I'm faced with a scenario for which I can't seem
to find a straightforward optimization.  Your help would be appreciated!

Consider a database full of documents.  The documents all contain an
element with local name "created", but the namespaces vary.  While
normalization may be an option for the future, I'm trying to see if there
is a way to avoid that for now.

So, consider the following three documents:
<a:root xmlns:a="a"> bob
<a:created>2015-10-28T00:00:00Z</a:created></a:root>
<b:root xmlns:b="b"> bob
<b:created>2015-10-29T00:00:00Z</b:created></b:root>
<c:root xmlns:c="c"> bob
<c:created>2015-10-30T00:00:00Z</c:created></c:root>

I can define a path range index using the wildcarded path:
//*:created

The path range index clearly works as expected, because I can retrieve all
three values above via this cts:values statement:
cts:values(cts:path-reference('//*:created'))

However, if I run a cts:search to find the three docs above and order the
results by this path, query-trace shows that the sort is not optimized:
for $item in
cts:search(doc(), 'bob')
order by $item//*:created
return $item

The query-trace output does not show that a range index was used to
optimize the order by:
2015-10-30 15:26:01.909 Info: App-Services: at 4:11: Analyzing path for
search: fn:doc()
2015-10-30 15:26:01.909 Info: App-Services: at 4:11: Step 1 is searchable:
fn:doc()
2015-10-30 15:26:01.909 Info: App-Services: at 4:11: Path is fully
searchable.
2015-10-30 15:26:01.909 Info: App-Services: at 4:11: Gathering constraints.
2015-10-30 15:26:01.910 Info: App-Services: at 4:11: Search query
contributed 1 constraint: cts:word-query("bobh", ("lang=en"), 1)
2015-10-30 15:26:01.910 Info: App-Services: at 4:11: Executing search.
2015-10-30 15:26:01.910 Info: App-Services: at 4:11: Selected 3 fragments
to filter.

I've tried a number of other approaches to the path range index, including
paths that did not contain wildcards but instead included the specific
namespace possibilities:
//(a:created|b:created|c:created)
/(a:root|b:root|c:root)/(a:created|b:created|c:created)

I've considered creating a field encompassing the three elements and a
field range index, but I don't know of a way to construct the order by
clause such that the field range index would be used.  Is that even
possible?

Any suggestions are welcome!  If this sort of approach is a dead end, we
will need to wait until we have the opportunity to normalize the data to
optimize this.

Thank you!

-Bob
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to