[basex-talk] Optimization of a slow query with `//`

Gioele Barabucci Fri, 12 Jun 2015 01:43:03 -0700

Hello,

I am working on an application that retrieves its data from a TEI XMLfile via BaseX. The following query lies at the core of this applicationbut is too slow to be used in production: on a modern PC it requiresabout 600 ms to run over a 4MB file (1/10 of the complete dataset). Anysuggestion on how to improve its performance (without changing theunderlying TEI files) would be much appreciated.


Here is the query:

    declare namespace tei='http://www.tei-c.org/ns/1.0';

    /tei:TEI/tei:text/tei:body//
      *[self::tei:entry or self::tei:re]
      [./tei:form/tei:orth[. = "arci"]
        [ancestor-or-self::*
          [@xml:lang][1]
          [(starts-with(@xml:lang, "san"))]
        ]
      ]

In human terms is should return all the `tei:entry` or `tei:re` that

* have the word "arci" in their `/tei:form/tei:orth` element,
* their nearest `xml:lang` attribute starts with 'san'.

I made some tests and it turned out that the main culprit is the use of`//` in the first line. (_Main_ culprit, not the only one...)

I use the `//` axis because I do not know what is the structure of theunderlying TEI file. I expect BaseX to keep track of all the `tei:entry`and `tei:re` elements and their parents, so selecting the correct onesshould be quite fast anyway. But the measurements disagree with myassumptions...


What could I do to improve the performance of this query?


Now, some remarks based on some small tests I have done:

1. Removing the

    [ancestor-or-self::*[....]]

predicate slashes the run time in half, but the query is still way too slow.

2. Changing

    ./tei:form/tei:orth[. = "arci"]

to

    ./tei:form[1]/tei:orth[1][. = "arci"]

makes the query even slower.

3. changing `starts-with(@xml:lang, "san")` to `@xml:lang = 'san-xxx'`has a negligible effect.


4. Dropping the `[1]` from

    [@xml:lang][1]

makes the whole query twice as fast.

Regards,

--
Gioele Barabucci <[email protected]>

[basex-talk] Optimization of a slow query with `//`

Reply via email to