Thanks again for looking into this Geert! I tried a mix of your approach (minus the -$uris part) and mine and got better results. But that will not give me the ability to sort the whole database based on occurrence. Just got me the document(s) with the maximum number of occurrences. I tried this query in production where we have 1.4 million documents and the total number of file-elements is roughly 25 million. Got the result back in about 3 minutes. So it was definitely an improvement. But it will not scale over time. Thanks for looking down the UDF path. Hopefully this could lead to a more general an useful approach.
Cheers, Johan On Sat, Jun 27, 2015 at 8:06 PM Geert Josten <[email protected]> wrote: > My approach was similar, but tried to sum all frequencies per uri. > Unfortunately, that approach gets slower with more documents, and more > distinct file sizes. Adding a simple count attribute or element in the file > somewhere would greatly simplify the run-time calculation, and that is what > I would normally recommend. For the sake of completeness I’ll give it some > more thought to see if there are ways to improve on the 3 minutes. A UDF > might be useful, would have to try that.. > > Cheers, > Geert > > From: Johan Mörén <[email protected]> > Reply-To: MarkLogic Developer Discussion <[email protected]> > Date: Saturday, June 27, 2015 at 1:23 AM > To: MarkLogic Developer Discussion <[email protected]> > Subject: Re: [MarkLogic Dev General] Find the document(s) with max > occurrences of an element-attribute reference > > Hi Christopher > > I tried your approach but still without success. I think the case might > be that your example is using a fixed vale for size ("yes"). And since > frequency is based on the the value you get the right results. > > Regards, > Johan > > > > On Sat, Jun 27, 2015 at 12:34 AM Christopher Hamlin <[email protected]> > wrote: > >> Hi Johan, >> >> Maybe I'm not clear on what you want. >> >> I just tried something. I created documents in a database using >> >> xquery version "1.0-ml"; >> for $i in 1 to 100 >> let $doc := <doc>{(1 to $i)!<file size='yes'/>}</doc> >> let $uri := '/'||$i||'.xml' >> return xdmp:document-insert ($uri, $doc) >> >> so for example >> >> /1.xml => >> >> <doc> >> <file size="yes"/> >> </doc> >> >> and >> >> /2.xml => >> >> <doc> >> <file size="yes"/> >> <file size="yes"/> >> </doc> >> >> and so on. >> >> With a file/@size element-attribute range index, the query >> >> xquery version '1.0-ml'; >> let $uris := cts:uri-reference() >> let $ea := cts:element-attribute-reference (xs:QName ('file'), >> xs:QName ('size'), >> 'collation=http://marklogic.com/collation/codepoint') >> return >> for $tuple in cts:value-tuples(($uris, $ea), >> ('item-frequency','frequency-order','descending','limit=3')) >> return fn:concat ($tuple[1], ' -> ', cts:frequency ($tuple)) >> >> returns >> >> /100.xml -> 100 >> /99.xml -> 99 >> /98.xml -> 98 >> /97.xml -> 97 >> /96.xml -> 96 >> /95.xml -> 95 >> /94.xml -> 94 >> /93.xml -> 93 >> /92.xml -> 92 >> /91.xml -> 91 >> >> Is this close to what you want? >> >> Regards, >> >> Chris >> >> On Fri, Jun 26, 2015 at 12:41 PM, Johan Mörén <[email protected]> >> wrote: >> > Hi Christopher! >> > >> > I'm not sure where you wan't me to use these options. But i tried to add >> > them to the cts:value-tuples() but that did not return the expected >> result. >> > >> > like this >> > >> > ... >> > for $tuple in >> > cts:value-tuples( >> > ( >> > cts:uri-reference(), >> > $sizeRef >> > ), >> > ("frequency-order","descending","limit=10") >> > >> > ) >> > ... >> > >> > Regards, >> > Johan >> > >> > On Fri, Jun 26, 2015 at 5:58 PM Christopher Hamlin <[email protected]> >> > wrote: >> >> >> >> If you just want something like top ten, I think it's more direct >> >> possibly. >> >> >> >> Can you try returning frequency-order, descending, limit=10? Are those >> >> options you can use? >> >> >> >> _______________________________________________ >> >> General mailing list >> >> [email protected] >> >> Manage your subscription at: >> >> http://developer.marklogic.com/mailman/listinfo/general >> > >> > >> > _______________________________________________ >> > General mailing list >> > [email protected] >> > Manage your subscription at: >> > http://developer.marklogic.com/mailman/listinfo/general >> > >> _______________________________________________ >> General mailing list >> [email protected] >> Manage your subscription at: >> http://developer.marklogic.com/mailman/listinfo/general >> > _______________________________________________ > General mailing list > [email protected] > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general >
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
