Just to report: with the latest BaseX snapshot, my timing issues have gone away, and querying multiple sequences behaves as expected.
Thank you! Bernhard Liebl Am 15. Juli 2024, 18:39 +0200 schrieb Christian Grün <christian.gr...@gmail.com>: > An new snapshot version of BaseX is available [1]: > > • In a trivial implementation, the repeated evaluation of > subsequence($seq/item, $i, 3) results in a repeated evaluation of $seq/item, > which is expensive. > • In the current BaseX release, the result of $seq/item is already cached > when it’s evaluated multiple times, but caching is disabled if the context > changes. This explains why multiple input sequences slow down everything. > • With the latest snapshot, the cache is updated with each context switch. > • If the context changes too often, caching is disabled (it’s always a > tradeoff to cache data or process it iteratively). > > Hope this helps, > Christian > > [1] https://files.basex.org/releases/latest/ > > > > > On Mon, Jul 15, 2024 at 11:52 AM Christian Grün <christian.gr...@gmail.com> > > wrote: > > > Hi Bernhard, > > > > > > The short answer is that BaseX exploits the fact that your contains a > > > single seq element, and evaluates it faster than how it would be > > > evaluated trivially. With multiple sequences, the optimization does not > > > come into play. I’ll check if we can improve that; thank you [1]. > > > > > > To reduce the runtime complexity by yourself, you can bind the items to > > > an extra variable… > > > > > > let $items := $seq/item > > > for $i in 1 to count($items) - 2 > > > let $window := subsequence($items, $i, 3) > > > > > > …or use the window clause [2], as suggested by Martin: > > > > > > for sliding window $w in 1 to 5 > > > start at $s > > > end at $e when $e - $s + 1 = 3 > > > return array { $w } > > > > > > Best, > > > Christian > > > > > > [1] https://github.com/BaseXdb/basex/issues/2316 > > > [2] https://docs.basex.org/main/XQuery_4.0#window_clause > > > > > > > > > > > > > > > > On Mon, Jul 15, 2024 at 8:24 AM Bernhard Liebl > > > > <li...@informatik.uni-leipzig.de> wrote: > > > > > Hello, > > > > > > > > > > as a rather unexperienced xquery and basex user, I'm puzzled by the > > > > > timings of running the same xquery on two very similar xml files. > > > > > > > > > > For an academic paper, I'm trying to benchmark finding certain > > > > > successive (windowed) <item> tags inside a <seq> tag. This is the > > > > > full query: > > > > > > > > > > declare function local:check-sequence($items as element(item)*) as > > > > > element(item)* { > > > > > if ($items[1]/@x = $items[2]/@x and $items[2]/@x = $items[3]/@x) > > > > > then > > > > > if (sum($items/@y) < 0.5) then $items else () > > > > > else () > > > > > }; > > > > > > > > > > for $seq in doc("data.xml")//seq > > > > > return > > > > > <seq>{ > > > > > for $i in 1 to (count($seq/item) - 2) > > > > > let $window := subsequence($seq/item, $i, 3) > > > > > return <match>{local:check-sequence($window)}</match> > > > > > }</seq> > > > > > > > > > > Here's the puzzling thing. When I run this on an xml file that has > > > > > one <seq> with 10000 <item>s, i.e. <seqs><seq> <item>[*10000] > > > > > </seq></seqs>, I get the following reasonable timings: > > > > > > > > > > - Parsing: 0.63 ms > > > > > - Compiling: 0.95 ms > > > > > - Optimizing: 30.59 ms > > > > > - Evaluating: 11.48 ms > > > > > - Printing: 0.61 ms > > > > > - Total Time: 44.27 ms > > > > > > > > > > However, once I run the same query on an xml file with two (!) <seq> > > > > > of 10000 items each, the runtime does not double, but is now more > > > > > than 50 times of that of the first query: > > > > > > > > > > - Parsing: 0.6 ms > > > > > - Compiling: 1.3 ms > > > > > - Optimizing: 48.13 ms > > > > > - Evaluating: 2843.67 ms > > > > > - Printing: 1.12 ms > > > > > - Total Time: 2894.82 ms > > > > > > > > > > I don't see how I induce any exponential runtimes, and esp. > > > > > check-sequence should be constant. Am I missing something fundamental > > > > > here? > > > > > > > > > > The result size of the second version is about twice the size as the > > > > > result size of the first version. > > > > > > > > > > Thanks in advance for any pointers. > > > > > > > > > > Bernhard Liebl > > > > >