Chapman Flack <c...@anastigmatix.net> writes:
> On 03/16/19 16:55, Tom Lane wrote:
>> What do you think of the idea I just posted about parsing off the DOCTYPE
>> thing for ourselves, and not letting libxml see it?

> The principled way of doing that would be to pre-parse to find a DOCTYPE,
> and if there is one, leave it there and parse the input as we do for
> 'document'. Per XML, if there is a DOCTYPE, the document must satisfy
> the 'document' syntax requirements, and per SQL/XML:2006-and-later,
> 'content' is a proper superset of 'document', so if we were asked for
> 'content' and can successfully parse it as 'document', we're good,
> and if we see a DOCTYPE and yet it incurs a parse error as 'document',
> well, that's what needed to happen.

Hm, so, maybe just

(1) always try to parse as document.  If successful, we're done.

(2) otherwise, if allowed by xmloption, try to parse using our
current logic for the CONTENT case.

This avoids adding any new assumptions about how libxml acts,
which is what I was hoping to achieve.

One interesting question is which error to report if both (1) and (2)
fail.

                        regards, tom lane

Reply via email to