Re: [MarkLogic Dev General] Speeding up xquery returning aggregates

Mark Shanks Thu, 22 Sep 2016 11:32:37 -0700

As a follow-up, we found that the query was super fast with a small dataset 
(e.g., 10,000 records). On the other hand, with a large dataset (40 million, 
and pulling around 1 milllion records), we found that the query would be super 
fast with 1 or 2 facets, e.g.:



let $tuples :=
  cts:value-tuples(
    (
    cts:element-reference(xs:QName("Site"))
    ),
   (),
   cts:and-query((
     cts:element-range-query(xs:QName("Audit_Date"), ">", 
xs:date("2010-01-01")),
     cts:element-range-query(xs:QName("Audit_Date"), "<", xs:date("2011-01-01"))
   ))
  )

or


let $tuples :=
  cts:value-tuples(
    (
    cts:element-reference(xs:QName("Site")),
       cts:element-reference(xs:QName("Department"))
    ),
   (),
   cts:and-query((
     cts:element-range-query(xs:QName("Audit_Date"), ">", 
xs:date("2010-01-01")),
     cts:element-range-query(xs:QName("Audit_Date"), "<", xs:date("2011-01-01"))
   ))
  )

but would take a massive performance hit once the facets are increased to 3, 
and 4 was much slower again. E.g.:


let $tuples :=
  cts:value-tuples(
    (
    cts:element-reference(xs:QName("Site")),
       cts:element-reference(xs:QName("Department")),
    cts:element-reference(xs:QName("LOB"))
    ),
   (),
   cts:and-query((
     cts:element-range-query(xs:QName("Audit_Date"), ">", 
xs:date("2010-01-01")),
     cts:element-range-query(xs:QName("Audit_Date"), "<", xs:date("2011-01-01"))
   ))
  )

By performance hit, I mean the first two queries would take 1 second each. 
Pulling 3 facets would take 250 seconds, and pulling 4 facets would take 350 
seconds. Anyone have any idea of what is going on under the hood to lead to 
such a breaking point between 1-2 facets and more facets? Any better way to do 
the query in such circumstances to avoid the performance hit?


Thanks.

________________________________
From: [email protected] 
<[email protected]> on behalf of Mark Shanks 
<[email protected]>
Sent: Wednesday, 21 September 2016 4:35:32 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Speeding up xquery returning aggregates


Hi Rob,


Your suggestion worked very well! Super fast, at least with the relatively 
small dataset I'm using at present.


Thanks.

________________________________
From: [email protected] 
<[email protected]> on behalf of Rob Szkutak 
<[email protected]>
Sent: Saturday, 17 September 2016 7:28:01 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Speeding up xquery returning aggregates

Hi,

The fastest way to do that I can think of would be to index Data/Site, 
Data/Department, Data/LOB, /Data/Audit_Date.

Next, you could use cts:value-tuples() to build your result set directly out of 
the in-memory indexes without needing to pull document fragments . Finally, you 
would just need to return your concatenation.

It would look something like this (Not tested) :

let $tuples :=
  cts:value-tuples(
    (
    cts:element-reference(xs:QName("Site")),
       cts:element-reference(xs:QName("Department")),
    cts:element-reference(xs:QName("LOB"))
    ),
   (),
   cts:and-query((
     cts:element-range-query(xs:QName("Audit_Date"), ">", 
xs:date("2010-01-01")),
     cts:element-range-query(xs:QName("Audit_Date"), "<", 
xs:date("2011-01-01")),
     cts:or-query((
         cts:element-value-query(xs:QName("Classification"), "Finding"),
         cts:element-value-query(xs:QName("Classification"), "Observation")
     ))
   ))
  )

for $each in $tuples
return
  fn:concat($each[1], |, $each[2], |, $each[3], cts:frequency($each))

Best,
Rob

Rob Szkutak
Senior Consultant
MarkLogic Corporation
[email protected]
www.marklogic.com<http://www.marklogic.com>

________________________________
From: [email protected] 
[[email protected]] on behalf of Mark Shanks 
[[email protected]]
Sent: Friday, September 16, 2016 3:55 PM
To: 'General MarkLogic Developer Discussion'
Subject: [MarkLogic Dev General] Speeding up xquery returning aggregates


Hi,


I'm trying to find the best way to return the results of what would be the 
following equivalent sql statement:


select count(*) from Data

where Audit_Date > "2010-01-01" and Audit_Date < "2011-01-01" and 
(Classification = "Finding" or Classification = "Observation")

group by Site, Department, LOB


I didn't test this sql statement, but it should give you the idea... Anyway, I 
came up with the following xquery equivalent:


for $s in distinct-values(/Data/Site)
return
for $d in distinct-values(/Data/Department)
return
for $lob in distinct-values(/Data/LOB)
return concat($s,'|',$d,'|',$lob,'|',
count(
for $x in (/Data[Site=$s and Department=$d and LOB=$lob and 
(Classification='Finding' or Classification='Observation')])
let $date as xs:dateTime := $x/Audit_Date
where $date gt xs:dateTime("2010-01-01T00:00:00")
and $date lt xs:dateTime("2011-01-01T00:00:00")
return ($x)
)
)


It works fine and is not super-slow, but isn't particularly fast either. Is 
this the most efficient way to get this type of information out of marklogic? 
Assuming the fields are indexed, would some search command be faster? Or maybe 
subset the data better?


Thanks,


Mark

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Speeding up xquery returning aggregates

Reply via email to