On 03/17/19 15:06, Tom Lane wrote:
> The error message issue is indeed a concern, but I don't see why it's
> complicated if you agree that
>
>> If the query asked for CONTENT, any error result should be one you could get
>> when parsing as CONTENT.
>
> That just requires us to save the first error message and be sure to issue
> that one not the DOCUMENT one, no?
I confess I haven't looked hard yet at how to do that. The way errors come
out of libxml is more involved than "here's a message", so there's a choice
of (a) try to copy off that struct in a way that's sure to survive
re-executing the parser, and then use the copy, or (b) generate a message
right away from the structured information and save that, and I guess b
might not be too bad; a might not be too bad, or it might, and slide right
back into the kind of libxml-behavior-assumptions you're wanting to avoid.
Meanwhile, here is a patch on the lines I proposed earlier, with a
pre-check. Any performance hit that it could entail (which I'd really
expect to be de minimis, though I haven't benchmarked) ought to be
compensated by the strlen I changed to strnlen in parse_xml_decl (as
there's really no need to run off and count the whole rest of the input
just to know if 1, 2, 3, or 4 bytes are available to decode a UTF-8 char).
... and, yes, I know that could be an independent patch, and then the
performance effect here should be measured from there. But it was near
what I was doing anyway, so I included it here.
Attaching both still-outstanding patches (this one and docfix) so the
CF app doesn't lose one.
Regards,
-Chap
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index b462c06..94b46a6 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4282,16 +4282,21 @@ SET XML OPTION { DOCUMENT | CONTENT };
SET xmloption TO { DOCUMENT | CONTENT };
</synopsis>
The default is <literal>CONTENT</literal>, so all forms of XML
- data are allowed.
+ data are allowed except as noted below.
</para>
<note>
<para>
- With the default XML option setting, you cannot directly cast
- character strings to type <type>xml</type> if they contain a
- document type declaration, because the definition of XML content
- fragment does not accept them. If you need to do that, either
- use <literal>XMLPARSE</literal> or change the XML option.
+ In the SQL:2006 and later standard, the <literal>CONTENT</literal> form
+ is a proper superset of the <literal>DOCUMENT</literal> form, and so the
+ default XML option setting would allow casting to XML from character
+ strings in either form. <productname>PostgreSQL</productname>, however,
+ uses the SQL:2003 definition in which <literal>CONTENT</literal> form
+ cannot contain a document type declaration. Therefore, there is no one
+ setting of the XML option that will allow casting to XML from every valid
+ character string. The default will work almost always, but for a document
+ with a DTD, it will be necessary to change the XML option or
+ use <literal>XMLPARSE</literal> specifying <literal>DOCUMENT</literal>.
</para>
</note>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 03859a7..0017aab 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10140,8 +10140,13 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
<sect1 id="functions-xml">
+
<title>XML Functions</title>
+ <indexterm>
+ <primary>XML Functions</primary>
+ </indexterm>
+
<para>
The functions and function-like expressions described in this
section operate on values of type <type>xml</type>. Check <xref
@@ -10344,8 +10349,8 @@ SELECT xmlelement(name foo, xmlattributes('xyz' as bar),
encoding, depending on the setting of the configuration parameter
<xref linkend="guc-xmlbinary"/>. The particular behavior for
individual data types is expected to evolve in order to align the
- SQL and PostgreSQL data types with the XML Schema specification,
- at which point a more precise description will appear.
+ PostgreSQL mappings with those specified in SQL:2006 and later,
+ as discussed in <xref linkend="functions-xml-limits-casts"/>.
</para>
</sect3>
@@ -10587,10 +10592,12 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
</synopsis>
<para>
- The function <function>xmlexists</function> returns true if the
- XPath expression in the first argument returns any nodes, and
- false otherwise. (If either argument is null, the result is
- null.)
+ The function <function>xmlexists</function> evaluates an XPath 1.0
+ expression (the first argument), with the passed value as its context item.
+ The function returns false if the result of that evaluation yields an
+ empty nodeset, true if it yields any other value. (The function returns
+ null if an argument is null.) A nonnull value passed as the context item
+ must be an XML document, not a content fragment or any non-XML value.
</para>
<para>
@@ -10607,24 +10614,12 @@ SELECT xmlexists('//town[text() = ''Toronto'']' PASSING BY VALUE '<towns><town>T
<para>
The <literal>BY REF</literal> or <literal>BY VALUE</literal> clauses
- have no effect in <productname>PostgreSQL</productname>, but are allowed
- for compatibility with other implementations. Per the <acronym>SQL</acronym>
- standard, the one that precedes any argument is required, and indicates
- the default for arguments that follow, and one may follow any argument to
- override the default.
- <productname>PostgreSQL</productname> ignores <literal>BY REF</literal>
- and passes by value always.
- </para>
-
- <para>
- In the <acronym>SQL</acronym> standard, an <function>xmlexists</function>
- construct evaluates an expression in the XQuery language, allows passing
- values for named parameters in the expression as well as for the context
- item, and does not require the passed values to be documents, or even of
- XML type.
- In <productname>PostgreSQL</productname>, this construct currently only
- evaluates an XPath 1.0 expression, and allows passing only one value,
- which must be an XML document, to be the context item.
+ are accepted in <productname>PostgreSQL</productname>, but ignored, as
+ discussed in <xref linkend="functions-xml-limits-postgresql"/>.
+ In the SQL standard, an <function>xmlexists</function> construct first
+ appears in SQL:2006 and evaluates an expression in the XML Query language,
+ but this implementation allows only an XPath 1.0 expression, as discussed
+ in <xref linkend="functions-xml-limits-xpath1"/>.
</para>
</sect3>
@@ -10730,12 +10725,12 @@ SELECT xml_is_well_formed_document('<pg:foo xmlns:pg="http://postgresql.org/stuf
</synopsis>
<para>
- The function <function>xpath</function> evaluates the XPath
+ The function <function>xpath</function> evaluates the XPath 1.0
expression <replaceable>xpath</replaceable> (a <type>text</type> value)
against the XML value
<replaceable>xml</replaceable>. It returns an array of XML values
- corresponding to the node set produced by the XPath expression.
- If the XPath expression returns a scalar value rather than a node set,
+ corresponding to the nodeset produced by the XPath expression.
+ If the XPath expression returns a scalar value rather than a nodeset,
a single-element array is returned.
</para>
@@ -10797,9 +10792,10 @@ SELECT xpath('//mydefns:b/text()', '<a xmlns="http://example.com"><b>test</b></a
<para>
The function <function>xpath_exists</function> is a specialized form
of the <function>xpath</function> function. Instead of returning the
- individual XML values that satisfy the XPath, this function returns a
- Boolean indicating whether the query was satisfied or not. This
- function is equivalent to the standard <literal>XMLEXISTS</literal> predicate,
+ individual XML values that satisfy the XPath 1.0 expression, this function
+ returns a Boolean indicating whether the query was satisfied or not
+ (specifically, whether it produced any value other than an empty nodeset).
+ This function is equivalent to the <literal>XMLEXISTS</literal> predicate,
except that it also offers support for a namespace mapping argument.
</para>
@@ -10840,8 +10836,8 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
<para>
The <function>xmltable</function> function produces a table based
- on the given XML value, an XPath filter to extract rows, and an
- optional set of column definitions.
+ on the given XML value, an XPath filter to extract rows, and a
+ set of column definitions.
</para>
<para>
@@ -10852,30 +10848,33 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
</para>
<para>
- The required <replaceable>row_expression</replaceable> argument is an XPath
- expression that is evaluated against the supplied XML document to
- obtain an ordered sequence of XML nodes. This sequence is what
- <function>xmltable</function> transforms into output rows.
+ The required <replaceable>row_expression</replaceable> argument is
+ an XPath 1.0 expression that is evaluated, passing the
+ <replaceable>document_expression</replaceable> as its context item, to
+ obtain a set of XML nodes. These nodes are what
+ <function>xmltable</function> transforms into output rows. (No rows
+ will be produced if the <replaceable>document_expression</replaceable>
+ is null, or the <replaceable>row_expression</replaceable> produces an
+ empty nodeset, or any value other than a nodeset.)
</para>
<para>
- <replaceable>document_expression</replaceable> provides the XML document to
- operate on.
- The argument must be a well-formed XML document; fragments/forests
- are not accepted.
+ <replaceable>document_expression</replaceable> provides the context item
+ for the <replaceable>row_expression</replaceable>. It must be a well-formed
+ XML document; fragments/forests are not accepted.
The <literal>BY REF</literal> or <literal>BY VALUE</literal> clauses are
- accepted, as described for the <function>xmlexists</function> predicate,
- but ignored; PostgreSQL currently passes XML by value always.
+ accepted but ignored, as discussed in
+ <xref linkend="functions-xml-limits-postgresql"/>.
+ In the SQL standard, an <function>xmltable</function> construct first
+ appears in SQL:2006 and evaluates expressions in the XML Query language,
+ but this implementation allows only XPath 1.0 expressions, as discussed
+ in <xref linkend="functions-xml-limits-xpath1"/>.
</para>
<para>
The mandatory <literal>COLUMNS</literal> clause specifies the list
of columns in the output table.
- If the <literal>COLUMNS</literal> clause is omitted, the rows in the result
- set contain a single column of type <literal>xml</literal> containing the
- data matched by <replaceable>row_expression</replaceable>.
- If <literal>COLUMNS</literal> is specified, each entry describes a
- single column.
+ Each entry describes a single column.
See the syntax summary above for the format.
The column name and type are required; the path, default and
nullability clauses are optional.
@@ -10889,42 +10888,78 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
</para>
<para>
- The <literal>column_expression</literal> for a column is an XPath expression
- that is evaluated for each row, relative to the result of the
- <replaceable>row_expression</replaceable>, to find the value of the column.
+ The <literal>column_expression</literal> for a column is an XPath 1.0
+ expression
+ that is evaluated for each row, with the current node from the
+ <replaceable>row_expression</replaceable> result as its context item,
+ to find the value of the column.
If no <literal>column_expression</literal> is given, then the column name
is used as an implicit path.
</para>
<para>
- If a column's XPath expression returns multiple elements, an error
- is raised.
- If the expression matches an empty tag, the result is an
- empty string (not <literal>NULL</literal>).
- Any <literal>xsi:nil</literal> attributes are ignored.
+ If a column's XPath expression returns a non-XML value (limited to
+ string, boolean, or double in XPath 1.0) and the column has a
+ PostgreSQL type other than <type>xml</type>, the column will be set
+ as if by assigning the value's string representation to the PostgreSQL
+ type (adjusting the "string representation" of a boolean to
+ <literal>1</literal> or <literal>0</literal> if the target column type
+ category is numeric, otherwise <literal>true</literal> or
+ <literal>false</literal>).
+ </para>
+
+ <para>
+ If the column's expression returns a non-empty set of XML nodes
+ and the target column's type is <type>xml</type>, the column will
+ be assigned the expression result exactly, if it is of document or
+ content form.
+ <footnote>
+ <para>
+ A result containing more than one element node at the top level, or
+ non-whitespace text outside of an element, is an example of content form.
+ An XPath result can be of neither form, for example if it returns an
+ attribute node selected from the element that contains it. Such a result
+ will be put into content form with each such disallowed node replaced by
+ its string value, as defined for the XPath 1.0
+ <function>string</function> function.
+ </para>
+ </footnote>
+ </para>
+
+ <para>
+ A non-XML result assigned to an <type>xml</type> output column produces
+ content, a single text node with the string value of the result.
+ An XML result assigned to a column of any other type may not have more than
+ one node, or an error is raised. If there is exactly one node, the column
+ will be set as if by assigning the node's string
+ value (as defined for the XPath 1.0 <function>string</function> function)
+ to the PostgreSQL type.
</para>
<para>
- The text body of the XML matched by the <replaceable>column_expression</replaceable>
- is used as the column value. Multiple <literal>text()</literal> nodes
- within an element are concatenated in order. Any child elements,
- processing instructions, and comments are ignored, but the text contents
- of child elements are concatenated to the result.
+ The string value of an XML element is the concatenation, in document order,
+ of all text nodes contained in that element and its descendants. The string
+ value of an element with no descendant text nodes is an
+ empty string (not <literal>NULL</literal>).
+ Any <literal>xsi:nil</literal> attributes are ignored.
Note that the whitespace-only <literal>text()</literal> node between two non-text
elements is preserved, and that leading whitespace on a <literal>text()</literal>
node is not flattened.
+ The XPath 1.0 <function>string</function> function may be consulted for the
+ rules defining the string value of other XML node types and non-XML values.
+ </para>
+
+ <para>
+ The conversion rules presented here are not exactly those of the SQL
+ standard, as discussed in <xref linkend="functions-xml-limits-casts"/>.
</para>
<para>
- If the path expression does not match for a given row but
- <replaceable>default_expression</replaceable> is specified, the value resulting
- from evaluating that expression is used.
- If no <literal>DEFAULT</literal> clause is given for the column,
- the field will be set to <literal>NULL</literal>.
- It is possible for a <replaceable>default_expression</replaceable> to reference
- the value of output columns that appear prior to it in the column list,
- so the default of one column may be based on the value of another
- column.
+ If the path expression returns an empty nodeset
+ (typically, when it does not match)
+ for a given row, the column will be set to <literal>NULL</literal>, unless
+ a <replaceable>default_expression</replaceable> is specified; then the
+ value resulting from evaluating that expression is used.
</para>
<para>
@@ -10936,20 +10971,14 @@ SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</m
</para>
<para>
- Unlike regular PostgreSQL functions, <replaceable>column_expression</replaceable>
- and <replaceable>default_expression</replaceable> are not evaluated to a simple
- value before calling the function.
- <replaceable>column_expression</replaceable> is normally evaluated
- exactly once per input row, and <replaceable>default_expression</replaceable>
- is evaluated each time a default is needed for a field.
- If the expression qualifies as stable or immutable the repeat
+ A <replaceable>default_expression</replaceable>, rather than being
+ evaluated immediately when <function>xmltable</function> is called,
+ is evaluated each time a default is needed for the column.
+ If the expression qualifies as stable or immutable, the repeat
evaluation may be skipped.
- Effectively <function>xmltable</function> behaves more like a subquery than a
- function call.
This means that you can usefully use volatile functions like
- <function>nextval</function> in <replaceable>default_expression</replaceable>, and
- <replaceable>column_expression</replaceable> may depend on other parts of the
- XML document.
+ <function>nextval</function> in
+ <replaceable>default_expression</replaceable>.
</para>
<para>
@@ -11297,6 +11326,293 @@ table2-mapping
]]></programlisting>
</figure>
</sect2>
+
+ <sect2 id="functions-xml-limits-compatibility">
+ <title>Limits and Compatibility</title>
+
+ <indexterm>
+ <primary>XML Functions</primary>
+ <secondary>limits and compatibility</secondary>
+ </indexterm>
+
+ <sect3 id="functions-xml-limits-xpath1">
+ <title>Queries restricted to XPath 1.0</title>
+
+ <para>
+ The <productname>PostgreSQL</productname>-specific functions
+ <function>xpath</function> and <function>xpath_exists</function> query
+ XML documents using the XPath language, and
+ <productname>PostgreSQL</productname> also provides XPath-only variants of
+ the standard functions <function>XMLEXISTS</function> and
+ <function>XMLTABLE</function>, which are defined in the SQL standard to use
+ the XQuery language. For all of these functions,
+ <productname>PostgreSQL</productname> relies on the
+ <productname>libxml</productname> library, which provides only XPath 1.0.
+ </para>
+
+ <para>
+ There is a strong connection between the XQuery language and XPath versions
+ 2.0 and later: any expression that is syntactically valid and executes
+ successfully in both produces the same result (with a minor exception for
+ expressions containing numeric character references or predefined entity
+ references, which XQuery replaces with the corresponding character while
+ XPath leaves them alone). But there is no such connection between XPath 1.0
+ and XQuery or the later XPath versions; it was an earlier language and
+ differs in many respects.
+ </para>
+
+ <para>
+ There are two categories of limitation to keep in mind: the restriction
+ from XQuery to XPath for the functions specified in the SQL standard, and
+ the restriction of XPath to version 1.0 for both the standard and the
+ <productname>PostgreSQL</productname>-specific functions.
+ </para>
+
+ <sect4>
+ <title>Restriction of XQuery to XPath</title>
+
+ <para>
+ Features of XQuery beyond those of XPath include:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ XQuery expressions can construct and return new XML nodes, in addition
+ to all possible XPath values. XPath can introduce and return values of
+ the atomic types (numbers, strings, and so on) but can only return XML
+ nodes already present in documents supplied as input to the expression.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ XQuery has control constructs for iteration, sorting, and grouping.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ XQuery allows the declaration and use of local functions.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ <para>
+ Recent XPath versions begin to offer overlapping capabilities
+ (such as the functional-style <function>for-each</function> and
+ <function>sort</function>, anonymous functions, and
+ <function>parse-xml</function> to create a node from a string),
+ but these were not available before XPath 3.0.
+ </para>
+ </sect4>
+
+ <sect4>
+ <title>Restriction of XPath to 1.0</title>
+
+ <para>
+ For developers familiar with XQuery and XPath 2.0 or later, or porting
+ queries from other systems, XPath 1.0 presents a number of differences to
+ contend with:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ The fundamental type of an XQuery/XPath expression, the
+ <type>sequence</type>, which can contain XML nodes, atomic values,
+ or both, does not exist in XPath 1.0. A 1.0 expression can only produce
+ a nodeset (possibly empty, or with one XML node or more), or a single
+ atomic value.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Unlike an XQuery/XPath sequence, which can contain any desired items
+ in any desired order, an XPath 1.0 nodeset has no guaranteed order and,
+ like any set, can have no member appear more than once. (The
+ <productname>libxml</productname> library does seem to always return
+ nodesets to <productname>PostgreSQL</productname> with their members
+ in the same relative order they had in the input document; it does not
+ commit to this behavior, and an XPath 1.0 expression cannot control
+ it.)
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ While XQuery/XPath provides all of the types defined in XML Schema
+ and many operators and functions over those types, XPath 1.0 has only
+ nodesets and three atomic types, <type>boolean</type>,
+ <type>double</type>, and <type>string</type>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ XPath 1.0 has no conditional operator. An XQuery/XPath expression
+ such as <userinput>if ( hat ) then hat/@size else "no hat"</userinput>
+ has no XPath 1.0 equivalent.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ XPath 1.0 has no ordering comparison operator for strings. Both
+ <userinput>"cat" < "dog"</userinput> and
+ <userinput>"cat" > "dog"</userinput> are false, because each is a
+ numeric comparison of two <literal>NaN</literal>s. In contrast,
+ <literal>=</literal> and <literal>!=</literal> do compare the strings
+ as strings.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ XPath 1.0 blurs the distinction between
+ <firstterm>value comparisons</firstterm> and
+ <firstterm>general comparisons</firstterm> as XQuery/XPath define them.
+ Both <userinput>sale/@hatsize = 7</userinput> and
+ <userinput>sale/@customer = "alice"</userinput> are existentially
+ quantified comparisons, true if there is any sale with the given value
+ for the attribute, but <userinput>sale/@taxable = false()</userinput>
+ is a value comparison to the
+ <firstterm>effective boolean value</firstterm> of a whole nodeset,
+ and true only if no sale has a <literal>taxable</literal> attribute
+ at all.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ In the XQuery/XPath data model, a <firstterm>document node</firstterm>
+ can have either document form (exactly one top-level element, only
+ comments and processing instructions outside of it) or content form
+ (with those constraints relaxed). Its equivalent in XPath 1.0, the
+ <firstterm>root node</firstterm>, can only be in document form.
+ This is part of the reason an <type>xml</type> value passed as the
+ context item to any <productname>PostgreSQL</productname> XPath-based
+ function must be in document form.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ <para>
+ The differences highlighted here are not all of them. In XQuery and
+ the 2.0 and later versions of XPath, there is an XPath 1.0 compatibility
+ mode, and the W3C lists of function library
+ <ulink url='https://www.w3.org/TR/2010/REC-xpath-functions-20101214/#xpath1-compatibility'>changes</ulink>
+ and language
+ <ulink url='https://www.w3.org/TR/xpath20/#id-backwards-compatibility'>changes</ulink>
+ applied in that mode offer a more complete account of the
+ differences, if still not exhaustive; the compatibility mode cannot make
+ the later languages exactly equivalent to XPath 1.0.
+ </para>
+ </sect4>
+
+ <sect4 id="functions-xml-limits-casts">
+ <title>Mappings between SQL and XML data types and values</title>
+
+ <para>
+ In SQL:2006 and later, both directions of conversion between standard SQL
+ data types and the XML Schema types are specified precisely. However, the
+ rules are expressed using the types and semantics of XQuery/XPath, and
+ have no direct application to the different data model of XPath 1.0.
+ </para>
+
+ <para>
+ When <productname>PostgreSQL</productname> maps SQL data values to XML
+ (as in <function>xmlelement</function>), or XML to SQL (as in the output
+ columns of <function>xmltable</function>), except for the few cases
+ treated specially, <productname>PostgreSQL</productname> simply assumes
+ that the XML data type's XPath 1.0 string form will be valid as the
+ text-input form of the SQL datatype, and conversely. This rule has the
+ virtue of simplicity while producing, for many data types, results similar
+ to the mappings specified in the standard.
+ </para>
+
+ <para>
+ Where interoperability with other systems is a concern, for some data
+ types, it may be necessary to use available data type formatting functions
+ (such as those in <xref linkend="functions-formatting"/>) explicitly in
+ queries to produce the standard mappings.
+ </para>
+ </sect4>
+ </sect3>
+
+ <sect3 id="functions-xml-limits-postgresql">
+ <title>
+ Incidental limits of the implementation
+ </title>
+
+ <para>
+ This section concerns limits that are not inherent in the
+ <productname>libxml</productname> library, but apply to the current
+ implementation in <productname>PostgreSQL</productname>.
+ </para>
+
+ <sect4>
+ <title>Only <literal>BY VALUE</literal> passing mechanism supported</title>
+
+ <para>
+ The SQL standard defines two <firstterm>passing mechanisms</firstterm>
+ that apply when passing an XML argument from SQL to an XML function or
+ receiving a result: <literal>BY REF</literal>, in which a particular XML
+ value retains its node identity, and <literal>BY VALUE</literal>, in which
+ the content of the XML is passed but node identity is not preserved. A
+ mechanism can be specified before a list of parameters, as the default
+ mechanism for all of them, or after any parameter, to override the
+ default.
+ </para>
+
+ <para>
+ To illustrate the difference, if
+ <replaceable>x</replaceable> is an XML value, these two queries in
+ an SQL:2006 environment would produce true and false, respectively:
+
+ <screen><![CDATA[
+SELECT XMLQUERY('$a is $b' PASSING BY REF x AS a, x AS b NULL ON EMPTY);
+SELECT XMLQUERY('$a is $b' PASSING BY VALUE x AS a, x AS b NULL ON EMPTY);
+]]></screen>
+ </para>
+
+ <para>
+ <productname>PostgreSQL</productname> will accept either
+ <literal>BY VALUE</literal> or <literal>BY REF</literal> in an
+ <function>XMLEXISTS</function> or <function>XMLTABLE</function> construct,
+ but ignores them; the <type>xml</type> data type holds a character-string
+ serialized representation, so there is no node identity to preserve,
+ and passing is always <literal>BY VALUE</literal>.
+ </para>
+ </sect4>
+
+ <sect4>
+ <title>Cannot pass named parameters to queries</title>
+
+ <para>
+ The XPath-based functions support passing one parameter to serve as the
+ XPath expression's context item, but do not support passing additional
+ values to be available to the expression as named parameters.
+ </para>
+ </sect4>
+
+ <sect4>
+ <title>No <type>XML(SEQUENCE)</type> type</title>
+
+ <para>
+ The <productname>PostgreSQL</productname> <type>xml</type> can only hold
+ a value in <literal>DOCUMENT</literal> or <literal>CONTENT</literal> form.
+ An XQuery/XPath expression context item must be a single XML node
+ or atomic value, while XPath 1.0 further restricts it to only an XML node,
+ and has no node type allowing <literal>CONTENT</literal>. The upshot is
+ that a well-formed <literal>DOCUMENT</literal> is the only form of XML
+ value that <productname>PostgreSQL</productname> can supply as an XPath
+ context item.
+ </para>
+ </sect4>
+ </sect3>
+ </sect2>
</sect1>
<sect1 id="functions-json">
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index bade0fe..e947613 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -569,7 +569,7 @@ X056 Advanced table mapping: data mapping YES
X057 Advanced table mapping: metadata mapping YES
X058 Advanced table mapping: base64 encoding of binary strings YES
X059 Advanced table mapping: hex encoding of binary strings YES
-X060 XMLParse: character string input and CONTENT option YES
+X060 XMLParse: character string input and CONTENT option YES uses SQL:2003 definition, so a document with a DTD cannot be parsed as CONTENT
X061 XMLParse: character string input and DOCUMENT option YES
X065 XMLParse: BLOB input and CONTENT option NO
X066 XMLParse: BLOB input and DOCUMENT option NO
@@ -593,7 +593,7 @@ X085 Predefined namespace prefixes NO
X086 XML namespace declarations in XMLTable NO
X090 XML document predicate YES
X091 XML content predicate NO
-X096 XMLExists NO XPath only
+X096 XMLExists NO XPath 1.0 only
X100 Host language support for XML: CONTENT option NO
X101 Host language support for XML: DOCUMENT option NO
X110 Host language support for XML: VARCHAR mapping NO
@@ -661,11 +661,11 @@ X282 XMLValidate with CONTENT option NO
X283 XMLValidate with SEQUENCE option NO
X284 XMLValidate: NAMESPACE without ELEMENT clause NO
X286 XMLValidate: NO NAMESPACE with ELEMENT clause NO
-X300 XMLTable NO XPath only
+X300 XMLTable NO XPath 1.0 only
X301 XMLTable: derived column list option YES
X302 XMLTable: ordinality column option YES
X303 XMLTable: column default option YES
-X304 XMLTable: passing a context item YES
+X304 XMLTable: passing a context item YES must be XML DOCUMENT
X305 XMLTable: initializing an XQuery variable NO
X400 Name and identifier mapping YES
X410 Alter column data type: XML type YES
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 94b46a6..752eadf 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4208,9 +4208,11 @@ a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
<para>
The <type>xml</type> type can store well-formed
<quote>documents</quote>, as defined by the XML standard, as well
- as <quote>content</quote> fragments, which are defined by the
- production <literal>XMLDecl? content</literal> in the XML
- standard. Roughly, this means that content fragments can have
+ as <quote>content</quote> fragments, which are defined by reference to the
+ more permissive
+ <ulink url="https://www.w3.org/TR/2010/REC-xpath-datamodel-20101214/#DocumentNode"><quote>document node</quote></ulink>
+ of the XQuery and XPath data model.
+ Roughly, this means that content fragments can have
more than one top-level element or character node. The expression
<literal><replaceable>xmlvalue</replaceable> IS DOCUMENT</literal>
can be used to evaluate whether a particular <type>xml</type>
@@ -4282,24 +4284,9 @@ SET XML OPTION { DOCUMENT | CONTENT };
SET xmloption TO { DOCUMENT | CONTENT };
</synopsis>
The default is <literal>CONTENT</literal>, so all forms of XML
- data are allowed except as noted below.
+ data are allowed.
</para>
- <note>
- <para>
- In the SQL:2006 and later standard, the <literal>CONTENT</literal> form
- is a proper superset of the <literal>DOCUMENT</literal> form, and so the
- default XML option setting would allow casting to XML from character
- strings in either form. <productname>PostgreSQL</productname>, however,
- uses the SQL:2003 definition in which <literal>CONTENT</literal> form
- cannot contain a document type declaration. Therefore, there is no one
- setting of the XML option that will allow casting to XML from every valid
- character string. The default will work almost always, but for a document
- with a DTD, it will be necessary to change the XML option or
- use <literal>XMLPARSE</literal> specifying <literal>DOCUMENT</literal>.
- </para>
- </note>
-
</sect2>
<sect2>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index e947613..32908c1 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -569,7 +569,7 @@ X056 Advanced table mapping: data mapping YES
X057 Advanced table mapping: metadata mapping YES
X058 Advanced table mapping: base64 encoding of binary strings YES
X059 Advanced table mapping: hex encoding of binary strings YES
-X060 XMLParse: character string input and CONTENT option YES uses SQL:2003 definition, so a document with a DTD cannot be parsed as CONTENT
+X060 XMLParse: character string input and CONTENT option YES
X061 XMLParse: character string input and DOCUMENT option YES
X065 XMLParse: BLOB input and CONTENT option NO
X066 XMLParse: BLOB input and DOCUMENT option NO
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 28b3eaa..d326140 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -141,6 +141,7 @@ static int parse_xml_decl(const xmlChar *str, size_t *lenp,
xmlChar **version, xmlChar **encoding, int *standalone);
static bool print_xml_decl(StringInfo buf, const xmlChar *version,
pg_enc encoding, int standalone);
+static bool xml_doctype_in_content(const xmlChar *str);
static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
bool preserve_whitespace, int encoding);
static text *xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt);
@@ -1243,8 +1244,19 @@ parse_xml_decl(const xmlChar *str, size_t *lenp,
if (xmlStrncmp(p, (xmlChar *) "<?xml", 5) != 0)
goto finished;
- /* if next char is name char, it's a PI like <?xml-stylesheet ...?> */
- utf8len = strlen((const char *) (p + 5));
+ /*
+ * If next char is name char, it's a PI like <?xml-stylesheet ...?> rather
+ * than an XMLDecl, so we have done what we came to do and found no XMLDecl.
+ *
+ * Use strnlen here because there is no need to go off counting all the rest
+ * of the input just to find out if there are enough bytes to decode one
+ * UTF-8 char. 6 is conservative: UTF-8 finally settled on 4 bytes as the
+ * longest encoding, but could some input come from a source that (a) saw
+ * the original drafts with up to 6 and (b) produces overlong encodings?
+ * Hardly likely, but it costs nearly nothing to accommodate it, while using
+ * 4 could be a change of behavior from when strlen was here.
+ */
+ utf8len = strnlen((const char *) (p + 5), 6);
utf8char = xmlGetUTF8Char(p + 5, &utf8len);
if (PG_XMLISNAMECHAR(utf8char))
goto finished;
@@ -1415,6 +1427,133 @@ print_xml_decl(StringInfo buf, const xmlChar *version,
return false;
}
+/*
+ * Test whether an input that is to be parsed as CONTENT contains a DTD.
+ *
+ * The SQL/XML:2003 definition of CONTENT ("XMLDecl? content") is not satisfied
+ * by a document with a DTD, which is a bit of a wart, as it means the CONTENT
+ * type is not a proper superset of DOCUMENT. SQL/XML:2006 and later fix that,
+ * by redefining content with reference to the "more permissive" Document Node
+ * of the XQuery/XPath Data Model, such that any DOCUMENT value is indeed also a
+ * CONTENT value. That definition is more useful, as CONTENT becomes usable for
+ * parsing input of unknown form (think pg_restore).
+ *
+ * As used below in parse_xml when parsing for CONTENT, libxml does not give us
+ * the 2006+ behavior, but only the 2003; it will choke if the input has a DTD.
+ * But without the full effort of bringing our whole implementation along to
+ * 2006+, we can provide the 2006+ definition of CONTENT easily enough, by
+ * detecting this case first and simply doing the parse as DOCUMENT.
+ *
+ * A DTD can be found arbitrarily far in, but that would be a contrived case; it
+ * will ordinarily start within a few dozen characters. We only need to see the
+ * start. The only things that can precede it are an XMLDecl (here, the caller
+ * will have called parse_xml_decl already), whitespace, comments, and
+ * processing instructions. This function need only return true if it sees a
+ * valid sequence of such things leading to <!DOCTYPE. It can simply return
+ * false in any other cases, including malformed input; that will mean the input
+ * gets parsed as CONTENT as originally planned, with libxml reporting any
+ * errors.
+ *
+ * By taking care to return false for malformed input, this ensures the user
+ * gets whatever error would be reported in a parse for CONTENT, as expected,
+ * rather than being surprised for errors about DOCUMENT form (except in the
+ * specific case we are here to recognize, where the input has a DTD and is
+ * therefore begging for DOCUMENT treatment).
+ *
+ * This is only to be called from xml_parse, when pg_xml_init has already
+ * been called. That the input is UTF-8 is heavily relied on.
+ */
+static bool
+xml_doctype_in_content(const xmlChar *str)
+{
+ const xmlChar *p = str;
+ const xmlChar *e;
+ const xmlChar *sp;
+ int utf8char;
+ int utf8len;
+
+ for ( ;; )
+ {
+ SKIP_XML_SPACE(p);
+
+ if ( '<' != *p )
+ return false;
+
+ ++ p; /* point to char following < */
+
+ if ( '!' == *p )
+ {
+ ++ p; /* point to char following <! */
+ if ( 0 == xmlStrncmp(p, (xmlChar *) "DOCTYPE", 7) )
+ return true;
+ if ( 0 != xmlStrncmp(p, (xmlChar *) "--", 2) )
+ return false;
+ /* consume remainder of a comment: find -- and a > must follow */
+ p = xmlStrstr(p + 2, (xmlChar *) "--");
+ if ( !p || '>' != p[2] )
+ return false; /* didn't find -- or > didn't follow */
+ p += 3; /* point to char following --> */
+ continue;
+ }
+
+ if ( '?' != *p ) /* a PI <?target something?> is the only choice left */
+ return false;
+
+ ++ p; /* point to char following <? */
+
+ /* The string ?> is forbidden within a PI; it can only mean the end. */
+ e = xmlStrstr(p, (xmlChar *) "?>");
+ if ( !e )
+ return false;
+
+ /*
+ * The target is from p to the first blank (or end of PI if no blank).
+ * The characters accepted by xmlIsBlank are all 7-bit; it will not be
+ * confounded by any multibyte UTF-8 characters, though it will end up
+ * seeing each byte individually.
+ */
+ for ( sp = p; sp < e; ++ sp )
+ if ( xmlIsBlank_ch(*sp) )
+ break;
+
+ if ( sp == p )
+ return false; /* there should have been at least one non-blank */
+
+ /* The target must not be, case-insensitively, "xml". */
+ if ( sp - p == 3 && 0 == xmlStrncasecmp(p, (xmlChar *) "xml", 3) )
+ return false;
+
+ utf8len = sp - p;
+ utf8char = xmlGetUTF8Char(p, &utf8len);
+ if ( ! PG_XMLISNAMECHAR(utf8char) )
+ return false;
+ /* certain chars are excluded as the FIRST name char */
+ if ( '-' == utf8char || '.' == utf8char || 0xB7 == utf8char ||
+ ('0' <= utf8char && utf8char <= '9') ||
+ ( 0x300 <= utf8char && utf8char <= 0x36F ) ||
+ 0x203F == utf8char || 0x2040 == utf8char )
+ return false;
+
+ p += utf8len;
+
+ while ( p < sp ) /* the rest can be anything PG_XMLISNAMECHAR accepts */
+ {
+ utf8len = sp - p;
+ utf8char = xmlGetUTF8Char(p, &utf8len);
+ if ( ! PG_XMLISNAMECHAR(utf8char) )
+ return false;
+ p += utf8len;
+ }
+
+ /*
+ * The target checks out as a NAME. There are no constraints on the rest
+ * of the content in a PI (besides that it can't be ?> relied on above),
+ * so it's good. Point at the char following the PI's ending ?>
+ */
+ p = e + 2;
+ }
+}
+
/*
* Convert a C string to XML internal representation
@@ -1433,6 +1572,9 @@ xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
xmlChar *string;
xmlChar *utf8string;
PgXmlErrorContext *xmlerrcxt;
+ int errcode_document_parse = ERRCODE_INVALID_XML_DOCUMENT;
+ char const *errmsg_document_parse = "invalid XML document";
+ char const *errmsg_content_parse = "invalid XML content";
volatile xmlParserCtxtPtr ctxt = NULL;
volatile xmlDocPtr doc = NULL;
@@ -1457,6 +1599,7 @@ xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
"could not allocate parser context");
+try_other_xmloption:
if (xmloption_arg == XMLOPTION_DOCUMENT)
{
/*
@@ -1472,8 +1615,8 @@ xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
XML_PARSE_NOENT | XML_PARSE_DTDATTR
| (preserve_whitespace ? 0 : XML_PARSE_NOBLANKS));
if (doc == NULL || xmlerrcxt->err_occurred)
- xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
- "invalid XML document");
+ xml_ereport(xmlerrcxt, ERROR, errcode_document_parse,
+ errmsg_document_parse);
}
else
{
@@ -1489,6 +1632,21 @@ xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
"invalid XML content: invalid XML declaration",
res_code);
+ /*
+ * The way we use libxml below to parse CONTENT will not accept the
+ * case (not allowed in SQL/XML:2003 but allowed in 2006+) where the
+ * content is, in fact, a document with a DTD. By detecting that
+ * case here and parsing as DOCUMENT instead, we provide the 2006+
+ * behavior.
+ */
+ if (xml_doctype_in_content(utf8string + count))
+ {
+ xmloption_arg = XMLOPTION_DOCUMENT;
+ errcode_document_parse = ERRCODE_INVALID_XML_CONTENT;
+ errmsg_document_parse = errmsg_content_parse;
+ goto try_other_xmloption;
+ }
+
doc = xmlNewDoc(version);
Assert(doc->encoding == NULL);
doc->encoding = xmlStrdup((const xmlChar *) "UTF-8");
@@ -1501,7 +1659,7 @@ xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
utf8string + count, NULL);
if (res_code != 0 || xmlerrcxt->err_occurred)
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_CONTENT,
- "invalid XML content");
+ errmsg_content_parse);
}
}
}
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 2085fa0..0aae600 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -532,6 +532,13 @@ LINE 1: EXECUTE foo ('bad');
DETAIL: line 1: Start tag expected, '<' not found
bad
^
+SELECT xml '<!DOCTYPE a><a/><b/>';
+ERROR: invalid XML document
+LINE 1: SELECT xml '<!DOCTYPE a><a/><b/>';
+ ^
+DETAIL: line 1: Extra content at the end of the document
+<!DOCTYPE a><a/><b/>
+ ^
SET XML OPTION CONTENT;
EXECUTE foo ('<bar/>');
xmlconcat
@@ -545,6 +552,45 @@ EXECUTE foo ('good');
<foo/>good
(1 row)
+SELECT xml '<!-- in SQL:2006+ a doc is content too--> <?y z?> <!DOCTYPE a><a/>';
+ xml
+--------------------------------------------------------------------
+ <!-- in SQL:2006+ a doc is content too--> <?y z?> <!DOCTYPE a><a/>
+(1 row)
+
+SELECT xml '<?xml version="1.0"?> <!-- hi--> <!DOCTYPE a><a/>';
+ xml
+------------------------------
+ <!-- hi--> <!DOCTYPE a><a/>
+(1 row)
+
+SELECT xml '<!DOCTYPE a><a/>';
+ xml
+------------------
+ <!DOCTYPE a><a/>
+(1 row)
+
+SELECT xml '<!-- hi--> oops <!DOCTYPE a><a/>';
+ERROR: invalid XML content
+LINE 1: SELECT xml '<!-- hi--> oops <!DOCTYPE a><a/>';
+ ^
+DETAIL: line 1: StartTag: invalid element name
+<!-- hi--> oops <!DOCTYPE a><a/>
+ ^
+SELECT xml '<!-- hi--> <oops/> <!DOCTYPE a><a/>';
+ERROR: invalid XML content
+LINE 1: SELECT xml '<!-- hi--> <oops/> <!DOCTYPE a><a/>';
+ ^
+DETAIL: line 1: StartTag: invalid element name
+<!-- hi--> <oops/> <!DOCTYPE a><a/>
+ ^
+SELECT xml '<!DOCTYPE a><a/><b/>';
+ERROR: invalid XML content
+LINE 1: SELECT xml '<!DOCTYPE a><a/><b/>';
+ ^
+DETAIL: line 1: Extra content at the end of the document
+<!DOCTYPE a><a/><b/>
+ ^
-- Test backwards parsing
CREATE VIEW xmlview1 AS SELECT xmlcomment('test');
CREATE VIEW xmlview2 AS SELECT xmlconcat('hello', 'you');
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 7f86696..d1a03b5 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -429,11 +429,53 @@ EXECUTE foo ('<bar/>');
ERROR: prepared statement "foo" does not exist
EXECUTE foo ('bad');
ERROR: prepared statement "foo" does not exist
+SELECT xml '<!DOCTYPE a><a/><b/>';
+ERROR: unsupported XML feature
+LINE 1: SELECT xml '<!DOCTYPE a><a/><b/>';
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+HINT: You need to rebuild PostgreSQL using --with-libxml.
SET XML OPTION CONTENT;
EXECUTE foo ('<bar/>');
ERROR: prepared statement "foo" does not exist
EXECUTE foo ('good');
ERROR: prepared statement "foo" does not exist
+SELECT xml '<!-- in SQL:2006+ a doc is content too--> <?y z?> <!DOCTYPE a><a/>';
+ERROR: unsupported XML feature
+LINE 1: SELECT xml '<!-- in SQL:2006+ a doc is content too--> <?y z?...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+HINT: You need to rebuild PostgreSQL using --with-libxml.
+SELECT xml '<?xml version="1.0"?> <!-- hi--> <!DOCTYPE a><a/>';
+ERROR: unsupported XML feature
+LINE 1: SELECT xml '<?xml version="1.0"?> <!-- hi--> <!DOCTYPE a><a/...
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+HINT: You need to rebuild PostgreSQL using --with-libxml.
+SELECT xml '<!DOCTYPE a><a/>';
+ERROR: unsupported XML feature
+LINE 1: SELECT xml '<!DOCTYPE a><a/>';
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+HINT: You need to rebuild PostgreSQL using --with-libxml.
+SELECT xml '<!-- hi--> oops <!DOCTYPE a><a/>';
+ERROR: unsupported XML feature
+LINE 1: SELECT xml '<!-- hi--> oops <!DOCTYPE a><a/>';
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+HINT: You need to rebuild PostgreSQL using --with-libxml.
+SELECT xml '<!-- hi--> <oops/> <!DOCTYPE a><a/>';
+ERROR: unsupported XML feature
+LINE 1: SELECT xml '<!-- hi--> <oops/> <!DOCTYPE a><a/>';
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+HINT: You need to rebuild PostgreSQL using --with-libxml.
+SELECT xml '<!DOCTYPE a><a/><b/>';
+ERROR: unsupported XML feature
+LINE 1: SELECT xml '<!DOCTYPE a><a/><b/>';
+ ^
+DETAIL: This functionality requires the server to be built with libxml support.
+HINT: You need to rebuild PostgreSQL using --with-libxml.
-- Test backwards parsing
CREATE VIEW xmlview1 AS SELECT xmlcomment('test');
CREATE VIEW xmlview2 AS SELECT xmlconcat('hello', 'you');
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 510b09b..9756b6e 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -512,6 +512,13 @@ LINE 1: EXECUTE foo ('bad');
DETAIL: line 1: Start tag expected, '<' not found
bad
^
+SELECT xml '<!DOCTYPE a><a/><b/>';
+ERROR: invalid XML document
+LINE 1: SELECT xml '<!DOCTYPE a><a/><b/>';
+ ^
+DETAIL: line 1: Extra content at the end of the document
+<!DOCTYPE a><a/><b/>
+ ^
SET XML OPTION CONTENT;
EXECUTE foo ('<bar/>');
xmlconcat
@@ -525,6 +532,45 @@ EXECUTE foo ('good');
<foo/>good
(1 row)
+SELECT xml '<!-- in SQL:2006+ a doc is content too--> <?y z?> <!DOCTYPE a><a/>';
+ xml
+--------------------------------------------------------------------
+ <!-- in SQL:2006+ a doc is content too--> <?y z?> <!DOCTYPE a><a/>
+(1 row)
+
+SELECT xml '<?xml version="1.0"?> <!-- hi--> <!DOCTYPE a><a/>';
+ xml
+------------------------------
+ <!-- hi--> <!DOCTYPE a><a/>
+(1 row)
+
+SELECT xml '<!DOCTYPE a><a/>';
+ xml
+------------------
+ <!DOCTYPE a><a/>
+(1 row)
+
+SELECT xml '<!-- hi--> oops <!DOCTYPE a><a/>';
+ERROR: invalid XML content
+LINE 1: SELECT xml '<!-- hi--> oops <!DOCTYPE a><a/>';
+ ^
+DETAIL: line 1: StartTag: invalid element name
+<!-- hi--> oops <!DOCTYPE a><a/>
+ ^
+SELECT xml '<!-- hi--> <oops/> <!DOCTYPE a><a/>';
+ERROR: invalid XML content
+LINE 1: SELECT xml '<!-- hi--> <oops/> <!DOCTYPE a><a/>';
+ ^
+DETAIL: line 1: StartTag: invalid element name
+<!-- hi--> <oops/> <!DOCTYPE a><a/>
+ ^
+SELECT xml '<!DOCTYPE a><a/><b/>';
+ERROR: invalid XML content
+LINE 1: SELECT xml '<!DOCTYPE a><a/><b/>';
+ ^
+DETAIL: line 1: Extra content at the end of the document
+<!DOCTYPE a><a/><b/>
+ ^
-- Test backwards parsing
CREATE VIEW xmlview1 AS SELECT xmlcomment('test');
CREATE VIEW xmlview2 AS SELECT xmlconcat('hello', 'you');
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 8057a46..71431d8 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -149,10 +149,17 @@ PREPARE foo (xml) AS SELECT xmlconcat('<foo/>', $1);
SET XML OPTION DOCUMENT;
EXECUTE foo ('<bar/>');
EXECUTE foo ('bad');
+SELECT xml '<!DOCTYPE a><a/><b/>';
SET XML OPTION CONTENT;
EXECUTE foo ('<bar/>');
EXECUTE foo ('good');
+SELECT xml '<!-- in SQL:2006+ a doc is content too--> <?y z?> <!DOCTYPE a><a/>';
+SELECT xml '<?xml version="1.0"?> <!-- hi--> <!DOCTYPE a><a/>';
+SELECT xml '<!DOCTYPE a><a/>';
+SELECT xml '<!-- hi--> oops <!DOCTYPE a><a/>';
+SELECT xml '<!-- hi--> <oops/> <!DOCTYPE a><a/>';
+SELECT xml '<!DOCTYPE a><a/><b/>';
-- Test backwards parsing