Re: [MarkLogic Dev General] Find the document(s) with max occurrences of an element-attribute reference

Geert Josten Sat, 27 Jun 2015 11:06:06 -0700

My approach was similar, but tried to sum all frequencies per uri. 
Unfortunately, that approach gets slower with more documents, and more distinct 
file sizes. Adding a simple count attribute or element in the file somewhere 
would greatly simplify the run-time calculation, and that is what I would 
normally recommend. For the sake of completeness I’ll give it some more thought 
to see if there are ways to improve on the 3 minutes. A UDF might be useful, 
would have to try that..


Cheers,
Geert

From: Johan Mörén <[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Saturday, June 27, 2015 at 1:23 AM
To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Subject: Re: [MarkLogic Dev General] Find the document(s) with max occurrences 
of an element-attribute reference

Hi Christopher

I tried your approach but still without success. I think the case might be that 
your example is using a fixed vale for size ("yes"). And since frequency is 
based on the the value you get the right results.

Regards,
Johan



On Sat, Jun 27, 2015 at 12:34 AM Christopher Hamlin 
<[email protected]<mailto:[email protected]>> wrote:
Hi Johan,

Maybe I'm not clear on what you want.

I just tried something.  I created documents in a database using

xquery version "1.0-ml";
for $i in 1 to 100
let $doc := <doc>{(1 to $i)!<file size='yes'/>}</doc>
let $uri := '/'||$i||'.xml'
return xdmp:document-insert ($uri, $doc)

so for example

/1.xml =>

<doc>
<file size="yes"/>
</doc>

and

/2.xml =>

<doc>
<file size="yes"/>
<file size="yes"/>
</doc>

and so on.

With a file/@size element-attribute range index, the query

xquery version '1.0-ml';
let $uris := cts:uri-reference()
let $ea := cts:element-attribute-reference (xs:QName ('file'),
xs:QName ('size'),
'collation=http://marklogic.com/collation/codepoint')
return
    for $tuple in cts:value-tuples(($uris, $ea),
('item-frequency','frequency-order','descending','limit=3'))
    return fn:concat ($tuple[1], ' -> ', cts:frequency ($tuple))

returns

/100.xml -> 100
/99.xml -> 99
/98.xml -> 98
/97.xml -> 97
/96.xml -> 96
/95.xml -> 95
/94.xml -> 94
/93.xml -> 93
/92.xml -> 92
/91.xml -> 91

Is this close to what you want?

Regards,

Chris

On Fri, Jun 26, 2015 at 12:41 PM, Johan Mörén 
<[email protected]<mailto:[email protected]>> wrote:
> Hi Christopher!
>
> I'm not sure where you wan't me to use these options. But i tried to add
> them to the cts:value-tuples()  but that did not return the expected result.
>
> like this
>
> ...
> for $tuple in
>     cts:value-tuples(
>       (
>         cts:uri-reference(),
>         $sizeRef
>       ),
>       ("frequency-order","descending","limit=10")
>
>     )
> ...
>
> Regards,
> Johan
>
> On Fri, Jun 26, 2015 at 5:58 PM Christopher Hamlin 
> <[email protected]<mailto:[email protected]>>
> wrote:
>>
>> If you just want something like top ten, I think it's more direct
>> possibly.
>>
>> Can you try returning frequency-order, descending, limit=10? Are those
>> options you can use?
>>
>> _______________________________________________
>> General mailing list
>> [email protected]<mailto:[email protected]>
>> Manage your subscription at:
>> http://developer.marklogic.com/mailman/listinfo/general
>
>
> _______________________________________________
> General mailing list
> [email protected]<mailto:[email protected]>
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Find the document(s) with max occurrences of an element-attribute reference

Reply via email to