If below doesn¹t perform well enough, try my doc-count UDF, which does
effectively the same, but runs on D-nodes (where the data lives), instead
on the E-node (where your code is executed):

http://github-search.demo.marklogic.com/detail/grtjn/doc-count-udf.json

Note: it requires URI lexicon, and a range index on the element as well..

Cheers,

Geert

On 11/18/16, 8:58 AM, "[email protected] on behalf
of Justin Makeig" <[email protected] on behalf of
[email protected]> wrote:

>The code below illustrates how you can calculate co-occurrences between
>an element and the URI of the documents that contain instances of that
>element. Then, for each URI it counts the total occurrences. Note, that
>you'll need to have the URI lexicon enabled and an element range index on
>x. 
>
>Justin
>
>(: Insert some dummy data :)
>let $docs := (
>  <a><x>B</x><x>BB</x></a>,
>  <a><x>B</x></a>,
>  <a><c>C</c></a>,
>  <a><x>B</x><x>BBB</x></a>
>)
>return 
>  for $doc at $i in $docs
>  return xdmp:document-insert($i || '.xml', $doc)
>;
>(: Calculate counts of <x/> grouped by document URIs. Requires element
>range index on xs:QName('x') :)
>let $co-occurr := cts:value-co-occurrences(cts:uri-reference(),
>cts:element-reference(xs:QName('x')), 'map')
>for $uri in map:keys($co-occurr)
>return $uri || ': ' || fn:count(map:get($co-occurr, $uri))
>
>
>
>--
>Justin Makeig
>Director, Product Management
>MarkLogic
>[email protected]
>
>
>> On Nov 17, 2016, at 11:19 PM, Raghu <[email protected]>
>>wrote:
>> 
>> Hi All,
>> 
>> I've got around 40 million XML documents out of which few documents are
>>having an element say element x twice (they are supposed to have only
>>one element x), I need to find the list of documents are there with
>>multiple occurrences of that element x. what would be the ideal way to
>>query them?
>> 
>> Thanks in adavance
>> _______________________________________________
>> General mailing list
>> [email protected]
>> Manage your subscription at:
>> http://developer.marklogic.com/mailman/listinfo/general
>
>
>
>
>_______________________________________________
>General mailing list
>[email protected]
>Manage your subscription at:
>http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to