Thanks for the explanation Leo. So perhaps the surprise is that no other
queries seem to be affected by this.

/Andy

On 6 April 2015 at 14:43, Leonard Wörteler <
leonard.woerte...@uni-konstanz.de> wrote:

> Hi Andy,
>
> Am 06.04.2015 um 13:13 schrieb Andy Bunce:
>
>> I have compared the timings for databases created from sources with and
>> without the -s option. In principle these should provide very similar
>> results.
>>
>> xmlgen /f 0.2 /o folder/auction
>>
>> xmlgen /f 0.2  /o folder/auction /s 400
>>
>> In the first case a single file auction is created. In the second 35
>> files (auction00000... auction00034) are created.
>> In both cases the file or files are loaded to a single database to
>> query.  In most cases the query performance is very similar.
>> The exception is q09.xq. This appears to be 2 or 3 orders of magnitude
>> slower against the database created from the split sources.
>>
>> The actual query result seems to be the same. I have seen the same
>> performance effect for factors f= 0.2, 0.5, 1 but not the somewhat
>> trival f=0.
>>
>> Any idea what is happening here?
>>
>
> this is a case of heuristics not quite working out every time. The query
> working on a single file is so fast because it is rewritten to two nested
> index lookups:
>
>  for $p in db:open-pre("auction_full",0)/site/people/person
>> let $a :=
>>   for $t in db:attribute("auction_full", $p/@id)
>>       /self::person/parent::buyer/parent::closed_auction
>>   return element item {
>>     db:attribute("auction_full", $t/itemref/@item)/self::id/parent::item[
>>       parent::europe/parent::regions/parent::site/parent::document-node()
>>     ]/name/text()
>>   }
>> return element person { attribute name { $p/name/text() }, $a }
>>
>
> This rewriting only works if `$ca` as well as `$ei` are inlined into their
> respective `for` loops (which are then rewritten to XPath expressions and
> finally index lookups).
>
> Since the expressions bound to those variables are not constants at
> compile time, inlining them into a loop (in this case `for $p in
> $auction/...`) will initially duplicate work, so it is not generally safe
> to do. Because of exactly those index rewritings we are talking about here,
> we still want to inline *cheap* axis paths. The cheapness is determined in
> the `Path#cheap()` method [1]. Currently it only allows for a single
> document node as root node, which is the cause of the differences you are
> seeing.
>
> While every heuristic can be tweaked (and I have no opinion on this
> specific one), there will always be cases like this where small changes
> lead to unexpectedly big differences in running time. The alternative is to
> just be consistently slow ;-).
>
> Hope that helps,
>   Leo
>
> [1] https://github.com/BaseXdb/basex/blob/20dbe4c/basex-core/
> src/main/java/org/basex/query/expr/path/Path.java#L280-295
>

Reply via email to