Hi Gerrit, Thanks for both the observation and the test case. The bug has been fixed, a new snapshot is available [1,2].
All the best, Christian [1] https://files.basex.org/releases/latest/ [2] https://github.com/BaseXdb/basex/issues/2052 On Thu, Dec 2, 2021 at 9:35 AM Imsieke, Gerrit, le-tex < gerrit.imsi...@le-tex.de> wrote: > Hi Christian, > > I wrote a query for a customer who wants to analyze their legacy ISO > 12083 math formulas, in this case for detecting multiple subsequent > <roman> elements of length >= 3 with only whitespace in between. > > This is a synthetic test document: > > <doc> > <p> > <formula><roman>tan</roman> <roman>tan</roman></formula> > </p> > <p> > <formula><roman>sin</roman> <sup>2</sup> <roman>sin</roman></formula> > </p> > <p> > <formula><roman>cos</roman><sup>3</sup> <roman>cos</roman></formula> > </p> > </doc> > > And this is the query I wrote: > > let $rms := //(formula | dformula)//roman[string-length() gt 2] > [ > > following-sibling::node()[1]/self::text()[not(normalize-space())] > ] > [ > > following-sibling::*[1]/self::roman[string-length() gt 2] > ]/.., > $docs := for $rm-context in $rms > let $path := db:path($rm-context) > group by $path > return <doc path="{$path}">{ > $rm-context > }</doc> > return > <result count="{count($rms)}" docs="{count($docs)}">{ > $docs > }</result> > > BaseX (up to version 9.6.3) erroneously reports all three <formula> > elements as a result, while only the first should be reported. > > This can be remedied by using parentheses, as in > (following-sibling::node())[1]/self::text() and > (following-sibling::*)[1]/self::roman. But this is inefficient, and the > original query should just work™. > > In the optimized original query there is > following-sibling::text()[fn:position() = 1] and > following-sibling::roman[fn:position() = 1]. These are incorrect > optimizations of following-sibling::node()[1]/self::text() and > following-sibling::*[1]/self::roman. > > Gerrit > > > > -- > Gerrit Imsieke > Geschäftsführer / Managing Director > le-tex publishing services GmbH > Weissenfelser Str. 84, 04229 Leipzig, Germany > Phone +49 341 355356 110, Fax +49 341 355356 510 > gerrit.imsi...@le-tex.de, http://www.le-tex.de > > Registergericht / Commercial Register: Amtsgericht Leipzig > Registernummer / Registration Number: HRB 24930 > > Geschäftsführer / Managing Directors: > Gerrit Imsieke, Svea Jelonek, Thomas Schmidt >