Hi Bernhard,

The short answer is that BaseX exploits the fact that your contains a
single seq element, and evaluates it faster than how it would be evaluated
trivially. With multiple sequences, the optimization does not come into
play. I’ll check if we can improve that; thank you [1].

To reduce the runtime complexity by yourself, you can bind the items to an
extra variable…

  let $items := $seq/item
  for $i in 1 to count($items) - 2
  let $window := subsequence($items, $i, 3)

…or use the window clause [2], as suggested by Martin:

  for sliding window $w in 1 to 5
    start at $s
    end   at $e when $e - $s + 1 = 3
  return array { $w }

Best,
Christian

[1] https://github.com/BaseXdb/basex/issues/2316
[2] https://docs.basex.org/main/XQuery_4.0#window_clause




On Mon, Jul 15, 2024 at 8:24 AM Bernhard Liebl <
li...@informatik.uni-leipzig.de> wrote:

> Hello,
>
> as a rather unexperienced xquery and basex user, I'm puzzled by the
> timings of running the same xquery on two very similar xml files.
>
> For an academic paper, I'm trying to benchmark finding certain successive
> (windowed) <item> tags inside a <seq> tag. This is the full query:
>
> declare function local:check-sequence($items as element(item)*) as
> element(item)* {
>     if ($items[1]/@x = $items[2]/@x and $items[2]/@x = $items[3]/@x) then
>         if (sum($items/@y) < 0.5) then $items else ()
>     else ()
> };
>
> for $seq in doc("data.xml")//seq
> return
>  <seq>{
>  for $i in 1 to (count($seq/item) - 2)
>  let $window := subsequence($seq/item, $i, 3)
>  return <match>{local:check-sequence($window)}</match>
>  }</seq>
>
> Here's the puzzling thing. When I run this on an xml file that has one
> <seq> with 10000 <item>s, i.e. <seqs><seq> <item>[*10000] </seq></seqs>, I
> get the following reasonable timings:
>
> - Parsing: 0.63 ms
> - Compiling: 0.95 ms
> - Optimizing: 30.59 ms
> - Evaluating: 11.48 ms
> - Printing: 0.61 ms
> - Total Time: 44.27 ms
>
> However, once I run the same query on an xml file with two (!) <seq> of
> 10000 items each, the runtime does not double, but is now more than 50
> times of that of the first query:
>
> - Parsing: 0.6 ms
> - Compiling: 1.3 ms
> - Optimizing: 48.13 ms
> - Evaluating: 2843.67 ms
> - Printing: 1.12 ms
> - Total Time: 2894.82 ms
>
> I don't see how I induce any exponential runtimes, and esp. check-sequence
> should be constant. Am I missing something fundamental here?
>
> The result size of the second version is about twice the size as the
> result size of the first version.
>
> Thanks in advance for any pointers.
>
> Bernhard Liebl
>
>

Reply via email to