If below doesn¹t perform well enough, try my doc-count UDF, which does effectively the same, but runs on D-nodes (where the data lives), instead on the E-node (where your code is executed):
http://github-search.demo.marklogic.com/detail/grtjn/doc-count-udf.json Note: it requires URI lexicon, and a range index on the element as well.. Cheers, Geert On 11/18/16, 8:58 AM, "[email protected] on behalf of Justin Makeig" <[email protected] on behalf of [email protected]> wrote: >The code below illustrates how you can calculate co-occurrences between >an element and the URI of the documents that contain instances of that >element. Then, for each URI it counts the total occurrences. Note, that >you'll need to have the URI lexicon enabled and an element range index on >x. > >Justin > >(: Insert some dummy data :) >let $docs := ( > <a><x>B</x><x>BB</x></a>, > <a><x>B</x></a>, > <a><c>C</c></a>, > <a><x>B</x><x>BBB</x></a> >) >return > for $doc at $i in $docs > return xdmp:document-insert($i || '.xml', $doc) >; >(: Calculate counts of <x/> grouped by document URIs. Requires element >range index on xs:QName('x') :) >let $co-occurr := cts:value-co-occurrences(cts:uri-reference(), >cts:element-reference(xs:QName('x')), 'map') >for $uri in map:keys($co-occurr) >return $uri || ': ' || fn:count(map:get($co-occurr, $uri)) > > > >-- >Justin Makeig >Director, Product Management >MarkLogic >[email protected] > > >> On Nov 17, 2016, at 11:19 PM, Raghu <[email protected]> >>wrote: >> >> Hi All, >> >> I've got around 40 million XML documents out of which few documents are >>having an element say element x twice (they are supposed to have only >>one element x), I need to find the list of documents are there with >>multiple occurrences of that element x. what would be the ideal way to >>query them? >> >> Thanks in adavance >> _______________________________________________ >> General mailing list >> [email protected] >> Manage your subscription at: >> http://developer.marklogic.com/mailman/listinfo/general > > > > >_______________________________________________ >General mailing list >[email protected] >Manage your subscription at: >http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
