Hi Gerrit,

Thanks for both the observation and the test case. The bug has been fixed,
a new snapshot is available [1,2].

All the best,
Christian

[1] https://files.basex.org/releases/latest/
[2] https://github.com/BaseXdb/basex/issues/2052


On Thu, Dec 2, 2021 at 9:35 AM Imsieke, Gerrit, le-tex <
gerrit.imsi...@le-tex.de> wrote:

> Hi Christian,
>
> I wrote a query for a customer who wants to analyze their legacy ISO
> 12083 math formulas, in this case for detecting multiple subsequent
> <roman> elements of length >= 3 with only whitespace in between.
>
> This is a synthetic test document:
>
> <doc>
>    <p>
>      <formula><roman>tan</roman> <roman>tan</roman></formula>
>    </p>
>    <p>
>      <formula><roman>sin</roman> <sup>2</sup> <roman>sin</roman></formula>
>    </p>
>    <p>
>      <formula><roman>cos</roman><sup>3</sup> <roman>cos</roman></formula>
>    </p>
> </doc>
>
> And this is the query I wrote:
>
> let $rms := //(formula | dformula)//roman[string-length() gt 2]
>                                           [
>
> following-sibling::node()[1]/self::text()[not(normalize-space())]
>                                           ]
>                                           [
>
> following-sibling::*[1]/self::roman[string-length() gt 2]
>                                           ]/..,
>      $docs := for $rm-context in $rms
>               let $path := db:path($rm-context)
>               group by $path
>               return <doc path="{$path}">{
>                 $rm-context
>               }</doc>
> return
> <result count="{count($rms)}" docs="{count($docs)}">{
>    $docs
> }</result>
>
> BaseX (up to version 9.6.3) erroneously reports all three <formula>
> elements as a result, while only the first should be reported.
>
> This can be remedied by using parentheses, as in
> (following-sibling::node())[1]/self::text() and
> (following-sibling::*)[1]/self::roman. But this is inefficient, and the
> original query should just work™.
>
> In the optimized original query there is
> following-sibling::text()[fn:position() = 1] and
> following-sibling::roman[fn:position() = 1]. These are  incorrect
> optimizations of following-sibling::node()[1]/self::text() and
> following-sibling::*[1]/self::roman.
>
> Gerrit
>
>
>
> --
> Gerrit Imsieke
> Geschäftsführer / Managing Director
> le-tex publishing services GmbH
> Weissenfelser Str. 84, 04229 Leipzig, Germany
> Phone +49 341 355356 110, Fax +49 341 355356 510
> gerrit.imsi...@le-tex.de, http://www.le-tex.de
>
> Registergericht / Commercial Register: Amtsgericht Leipzig
> Registernummer / Registration Number: HRB 24930
>
> Geschäftsführer / Managing Directors:
> Gerrit Imsieke, Svea Jelonek, Thomas Schmidt
>

Reply via email to