Hi Gioele,

> I wonder if the presence of the namespace somehow confuses the optimizer.

Exactly, that’s the reason. For some historical reason (but not such a
wise one, as most quoted “historical reasons” are), we decided to
index the node names without considering the namespace URI. As a
result, the index:element-names function will yield…

  <entry count="2">xml</entry>

…for the following document:

  <xml>
    <xml xmlns='uri'/>
  </xml>

For the same reason, various optimizations that are based on the
database statistics will only get into effect if a document contains
no, or at most one global, namespace declaration. In various cases,
optimizations could still be made possible (e.g. if we know that the
element/attribute names with and without namespace URIs are distinct),
but that hasn’t been implemented so far.

Cheers,
Christian


> I was stressing the BaseX 8.6 planner/optimizer when I noticed that
> expressions like `count(//elem)` are not optimized at all, even though they
> are correctly indexed, as demonstrated by `index:element-names()`.
>
> The current database is a 300 MB TEI document. All the elements are in the
> `http://www.tei-c.org/ns/1.0` namespace.
>
> The following test case will report the correct number, but it will take a
> couple of seconds to run, instead of a few milliseconds.
>
> ```
> declare namespace tei="http://www.tei-c.org/ns/1.0";;
>
> let $n := index:element-names("monier")[. = 're']/@count
>
> let $c := count(//tei:re)
>
> return <res><in-index>{$n}</in-index><in-doc>{$c}</in-doc></res>
> ```
>
> I wonder if the presence of the namespace somehow confuses the optimizer.
> The same problem can be observed running the same test case with
>
> ```
> declare default element namespace "http://www.tei-c.org/ns/1.0";;
> [...]
> let $c := count(//re)
> ```
>
> Regards,
>
> --
> Gioele Barabucci <[email protected]>
>

Reply via email to