Hello,

as a rather unexperienced xquery and basex user, I'm puzzled by the timings of 
running the same xquery on two very similar xml files.

For an academic paper, I'm trying to benchmark finding certain successive 
(windowed) <item> tags inside a <seq> tag. This is the full query:

declare function local:check-sequence($items as element(item)*) as 
element(item)* {
    if ($items[1]/@x = $items[2]/@x and $items[2]/@x = $items[3]/@x) then
        if (sum($items/@y) < 0.5) then $items else ()
    else ()
};

for $seq in doc("data.xml")//seq
return
 <seq>{
 for $i in 1 to (count($seq/item) - 2)
 let $window := subsequence($seq/item, $i, 3)
 return <match>{local:check-sequence($window)}</match>
 }</seq>

Here's the puzzling thing. When I run this on an xml file that has one <seq> 
with 10000 <item>s, i.e. <seqs><seq> <item>[*10000] </seq></seqs>, I get the 
following reasonable timings:

- Parsing: 0.63 ms
- Compiling: 0.95 ms
- Optimizing: 30.59 ms
- Evaluating: 11.48 ms
- Printing: 0.61 ms
- Total Time: 44.27 ms

However, once I run the same query on an xml file with two (!) <seq> of 10000 
items each, the runtime does not double, but is now more than 50 times of that 
of the first query:

- Parsing: 0.6 ms
- Compiling: 1.3 ms
- Optimizing: 48.13 ms
- Evaluating: 2843.67 ms
- Printing: 1.12 ms
- Total Time: 2894.82 ms

I don't see how I induce any exponential runtimes, and esp. check-sequence 
should be constant. Am I missing something fundamental here?

The result size of the second version is about twice the size as the result 
size of the first version.

Thanks in advance for any pointers.

Bernhard Liebl

Reply via email to