Re: [basex-talk] BenchX - The curious case of query number 9

Leonard Wörteler Mon, 06 Apr 2015 06:43:42 -0700

Hi Andy,

Am 06.04.2015 um 13:13 schrieb Andy Bunce:

I have compared the timings for databases created from sources with and
without the -s option. In principle these should provide very similar
results.


xmlgen /f 0.2 /o folder/auction

xmlgen /f 0.2  /o folder/auction /s 400

In the first case a single file auction is created. In the second 35
files (auction00000... auction00034) are created.
In both cases the file or files are loaded to a single database to
query.  In most cases the query performance is very similar.
The exception is q09.xq. This appears to be 2 or 3 orders of magnitude
slower against the database created from the split sources.

The actual query result seems to be the same. I have seen the same
performance effect for factors f= 0.2, 0.5, 1 but not the somewhat
trival f=0.

Any idea what is happening here?

this is a case of heuristics not quite working out every time. The queryworking on a single file is so fast because it is rewritten to twonested index lookups:

for $p in db:open-pre("auction_full",0)/site/people/person
let $a :=
  for $t in db:attribute("auction_full", $p/@id)
      /self::person/parent::buyer/parent::closed_auction
  return element item {
    db:attribute("auction_full", $t/itemref/@item)/self::id/parent::item[
      parent::europe/parent::regions/parent::site/parent::document-node()
    ]/name/text()
  }
return element person { attribute name { $p/name/text() }, $a }

This rewriting only works if `$ca` as well as `$ei` are inlined intotheir respective `for` loops (which are then rewritten to XPathexpressions and finally index lookups).

Since the expressions bound to those variables are not constants atcompile time, inlining them into a loop (in this case `for $p in$auction/...`) will initially duplicate work, so it is not generallysafe to do. Because of exactly those index rewritings we are talkingabout here, we still want to inline *cheap* axis paths. The cheapness isdetermined in the `Path#cheap()` method [1]. Currently it only allowsfor a single document node as root node, which is the cause of thedifferences you are seeing.

While every heuristic can be tweaked (and I have no opinion on thisspecific one), there will always be cases like this where small changeslead to unexpectedly big differences in running time. The alternative isto just be consistently slow ;-).


Hope that helps,
  Leo

[1]https://github.com/BaseXdb/basex/blob/20dbe4c/basex-core/src/main/java/org/basex/query/expr/path/Path.java#L280-295

Re: [basex-talk] BenchX - The curious case of query number 9

Reply via email to