I think that would be an RFE. It's complicated a bit by the desire for 
fragment-ids: those aren't exposed at all, and might be considered an 
implementation detail.

Maybe the RFE would be for something like cts:path, a sort of combination of 
cts:frequency and xdmp:path? I think that would still require a fair amount of 
new plumbing in the co-occurrence code, but the use-case is interesting.

Getting back to your problem, it sounds like you want to be able to query for 
all documents where CONTROLSTRING eq AUDITSTRING, without filtering?

Ideally you would pre-calculate this when ingesting or updating documents, and 
keep it in a new element. This could be done directly, or using a CPF pipeline. 
Then you could use an element-value-query to check that new element - or a 
range query if you want to build an extra range index on it. This assumes you 
can trust the ingestion and update processes, but since you are using something 
called AUDITSTRING that may not be true.

Less efficient but probably still tractable is to use the co-occurrences data 
in a two-stage query. Get the co-occurrences of CONTROLSTRING and AUDITSTRING 
with the 'map' option. Then use an XQuery expression to extract only the map 
keys where key eq value. Then use cts:uris() with an and-query of 
element-value-query on those CONTROLSTRING and AUDITSTRING values to get the 
matching document URIs. This assumes that URIs are enough to get to the right 
fragment: if that isn't true, you'll need a range index on some element that is 
unique for the fragments you care about.

There are some other interesting possibilities around the edges, but that is 
the main idea. With this approach there will be only two database round-trips: 
one for the co-occurrence and one for the uri lookup. The query for the uri 
lookup might get quite large, but lookups on large queries aren't necessarily a 
problem. This sounds like a reporting query, so even taking a minute or two to 
run might be fine.

-- Mike

On 7 May 2012, at 13:05 , bek wrote:

> I'm looking for a way to do an equivalence search that is supported by the 
> Search API.  I would also like this to work without having to filter my 
> results (aka only use my indexes).  It seems co-occurrence is set up for this:
> 
> """
> Returns value co-occurrences (that is, pairs of values, both of which appear 
> in the same fragment) from the specified element value lexicon(s). The values 
> are returned as an XML element with two children, each child containing one 
> of the co-occurring values.
> """
> 
> roughly:
> 
>    xs:QName("CONTROLSTRING") == xs:QName("AUDITSTRING")
> 
> the co-occurrence function seems to return the value of the QName and not the 
> fragment.  Would I have to double query to get the fragment ID?
> 
> Here's a series of abbreviated values
> 
> frag:1 controlstring:ABC1234 auditstring:ZYX8989
> frag:2 controlstring:ABC2345 auditstring:ABC2345
> frag:3 controlstring:GHI3423 auditstring:GHI3423
> 
> I'd like to get back frag:2 and frag:3
> 
> Is equivalence searching possible in the Search API?  It doesn't seem so in 
> 4.2 nor in the docs I've seen for 5x.  Is there some other way to do this?
> 
> Thanks
> 
> 
> bek
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to