Re: [basex-talk] Puzzled by timing difference of two similar queries

Bernhard Liebl Tue, 16 Jul 2024 04:09:52 -0700

Just to report: with the latest BaseX snapshot, my timing issues have gone 
away, and querying multiple sequences behaves as expected.


Thank you!

Bernhard Liebl
Am 15. Juli 2024, 18:39 +0200 schrieb Christian Grün 
<christian.gr...@gmail.com>:
> An new snapshot version of BaseX is available [1]:
>
> • In a trivial implementation, the repeated evaluation of 
> subsequence($seq/item, $i, 3) results in a repeated evaluation of $seq/item, 
> which is expensive.
> • In the current BaseX release, the result of $seq/item is already cached 
> when it’s evaluated multiple times, but caching is disabled if the context 
> changes. This explains why multiple input sequences slow down everything.
> • With the latest snapshot, the cache is updated with each context switch.
> • If the context changes too often, caching is disabled (it’s always a 
> tradeoff to cache data or process it iteratively).
>
> Hope this helps,
> Christian
>
> [1] https://files.basex.org/releases/latest/
>
>
>
> > On Mon, Jul 15, 2024 at 11:52 AM Christian Grün <christian.gr...@gmail.com> 
> > wrote:
> > > Hi Bernhard,
> > >
> > > The short answer is that BaseX exploits the fact that your contains a 
> > > single seq element, and evaluates it faster than how it would be 
> > > evaluated trivially. With multiple sequences, the optimization does not 
> > > come into play. I’ll check if we can improve that; thank you [1].
> > >
> > > To reduce the runtime complexity by yourself, you can bind the items to 
> > > an extra variable…
> > >
> > >   let $items := $seq/item
> > >   for $i in 1 to count($items) - 2
> > >   let $window := subsequence($items, $i, 3)
> > >
> > > …or use the window clause [2], as suggested by Martin:
> > >
> > >   for sliding window $w in 1 to 5
> > >     start at $s
> > >     end   at $e when $e - $s + 1 = 3
> > >   return array { $w }
> > >
> > > Best,
> > > Christian
> > >
> > > [1] https://github.com/BaseXdb/basex/issues/2316
> > > [2] https://docs.basex.org/main/XQuery_4.0#window_clause
> > >
> > >
> > >
> > >
> > > > On Mon, Jul 15, 2024 at 8:24 AM Bernhard Liebl 
> > > > <li...@informatik.uni-leipzig.de> wrote:
> > > > > Hello,
> > > > >
> > > > > as a rather unexperienced xquery and basex user, I'm puzzled by the 
> > > > > timings of running the same xquery on two very similar xml files.
> > > > >
> > > > > For an academic paper, I'm trying to benchmark finding certain 
> > > > > successive (windowed) <item> tags inside a <seq> tag. This is the 
> > > > > full query:
> > > > >
> > > > > declare function local:check-sequence($items as element(item)*) as 
> > > > > element(item)* {
> > > > >     if ($items[1]/@x = $items[2]/@x and $items[2]/@x = $items[3]/@x) 
> > > > > then
> > > > >         if (sum($items/@y) < 0.5) then $items else ()
> > > > >     else ()
> > > > > };
> > > > >
> > > > > for $seq in doc("data.xml")//seq
> > > > > return
> > > > >  <seq>{
> > > > >  for $i in 1 to (count($seq/item) - 2)
> > > > >  let $window := subsequence($seq/item, $i, 3)
> > > > >  return <match>{local:check-sequence($window)}</match>
> > > > >  }</seq>
> > > > >
> > > > > Here's the puzzling thing. When I run this on an xml file that has 
> > > > > one <seq> with 10000 <item>s, i.e. <seqs><seq> <item>[*10000] 
> > > > > </seq></seqs>, I get the following reasonable timings:
> > > > >
> > > > > - Parsing: 0.63 ms
> > > > > - Compiling: 0.95 ms
> > > > > - Optimizing: 30.59 ms
> > > > > - Evaluating: 11.48 ms
> > > > > - Printing: 0.61 ms
> > > > > - Total Time: 44.27 ms
> > > > >
> > > > > However, once I run the same query on an xml file with two (!) <seq> 
> > > > > of 10000 items each, the runtime does not double, but is now more 
> > > > > than 50 times of that of the first query:
> > > > >
> > > > > - Parsing: 0.6 ms
> > > > > - Compiling: 1.3 ms
> > > > > - Optimizing: 48.13 ms
> > > > > - Evaluating: 2843.67 ms
> > > > > - Printing: 1.12 ms
> > > > > - Total Time: 2894.82 ms
> > > > >
> > > > > I don't see how I induce any exponential runtimes, and esp. 
> > > > > check-sequence should be constant. Am I missing something fundamental 
> > > > > here?
> > > > >
> > > > > The result size of the second version is about twice the size as the 
> > > > > result size of the first version.
> > > > >
> > > > > Thanks in advance for any pointers.
> > > > >
> > > > > Bernhard Liebl
> > > > >

Re: [basex-talk] Puzzled by timing difference of two similar queries

Reply via email to