Yes, I can see the problem: &DUMMY; ist interpreted as unknown entity and
thus replaced with a question mark (a better choice would be the Unicode
Replacement Character xFFFD anyway, from today's perspective). We'll keep
that in mind and think about alternatives.

If your input is supposed to be interpreted as a single text fragment, one
fallback solution (for now) would be

data(parse-xml('<x>' || $string || '</x>'))






Zimmel, Daniel <d.zim...@esvmedien.de> schrieb am Di., 21. Nov. 2023, 18:34:

> Thanks for the insight!
>
>
>
> I can see the benefit with your example – if you look at my example, it is
> clearly eating the text (“DUMMY”) which might be an edge case, but is
> obviously a problem when you think the function will give you an error in
> case of non-wellformedness – some text has silently been deleted.
>
>
>
> Daniel
>
>
>
> *Von:* Christian Grün <christian.gr...@gmail.com>
> *Gesendet:* Dienstag, 21. November 2023 16:59
> *An:* Zimmel, Daniel <d.zim...@esvmedien.de>
> *Cc:* basex-talk@mailman.uni-konstanz.de
> *Betreff:* Re: [basex-talk] Bug in parse-xml-fragment() and ampersand
> entity?
>
>
>
> Hi Daniel,
>
>
>
> Yes, I assume we’ll need to call it a bug… Although what BaseX is
> currently doing is known to us to be out of spec behavior. The function
> fn:parse-xml-fragments is based on our internal XML parser, which is much
> faster than the standard XML parser (in particular for small input), and it
> tolerates input that’s not perfectly well-formed. In addition, it accepts
> HTML entities without a linked DTD:
>
>
>
>    parse-xml-fragment(`&auml;`)
>
>
>
> We should at least document the behavior or (better) introduce a custom
> BaseX function for it.
>
>
>
> Hope this helps (for now),
>
> Christian
>
>
>
>
>
>
>
> On Tue, Nov 21, 2023 at 3:17 PM Zimmel, Daniel <d.zim...@esvmedien.de>
> wrote:
>
> Hi,
>
> is this a bug?
>
> Query:
>         parse-xml-fragment('Tom &amp; Jerry')
>
> Result:
>         Tom ? Jerry
>
> Same result with:
>         parse-xml-fragment('Tom &amp;DUMMY; Jerry')
>
> BaseX 10.7
>
> Saxon complains correctly that the resulting document node is not
> well-formed.
> BaseX should also return an error, shouldn't it?
>
> Best, Daniel
>
>

Reply via email to