On 2024-07-07 22:43 +0200, Tom Lane wrote:
> As far as the errcontext changes go: I think we have to just bite
> the bullet and accept them. It looks like 2.13 has a completely
> different mechanism than prior versions for deciding when to issue
> XML_ERR_NOT_WELL_BALANCED. And it's not even clear that it's wrong;
> for example, in our first failing case
>
> DETAIL: line 1: xmlParseEntityRef: no name
> <invalidentity>&</invalidentity>
> ^
> -line 1: chunk is not well balanced
> -<invalidentity>&</invalidentity>
> - ^
>
> it's kind of hard to argue that the chunk isn't well-balanced.
>
> So we can either suppress errdetails from the expected output,
> or set up an additional expected-file. I'm leaning to the
> "\set VERBOSITY terse" solution.
+1 for \set VERBOSITY terse as a last resort.
But it looks to me as if "chunk is not well balanced" is just noise
because libxml2 reports more specific errors before that. For example:
SELECT xmlparse(content '<twoerrors>&idontexist;</unbalanced>');
ERROR: invalid XML content
DETAIL: line 1: Entity 'idontexist' not defined
<twoerrors>&idontexist;</unbalanced>
^
line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
<twoerrors>&idontexist;</unbalanced>
^
line 1: chunk is not well balanced
<twoerrors>&idontexist;</unbalanced>
^
Here, "Opening and ending tag mismatch" already covers the unbalanced
closing tag.
So how about just ignoring XML_ERR_NOT_WELL_BALANCED like in the
attached? This also adds test cases for an unclosed tag because I
wanted to see if I can trigger just "chunk is not well balanced", but
without success.
SELECT xmlparse(content '<unclosed>');
ERROR: invalid XML content
DETAIL: line 1: Premature end of data in tag unclosed line 1
<unclosed>
^
line 1: chunk is not well balanced
<unclosed>
^
libxml2 2.13 doesn't report "chunk ..." here either.
There's also this more explicit test case for unbalanced tags:
<parent><child></parent></child>
But I'm not sure if that's really necessary if we already have:
<twoerrors>&idontexist;</unbalanced>
The error messages are the same, except for the additional entity error.
--
Erik
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 8893be5682..4f45c90f54 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -2114,6 +2114,17 @@ xml_errorHandler(void *data, PgXmlErrorPtr error)
switch (domain)
{
case XML_FROM_PARSER:
+ /*
+ * Suppress errors about not well-balanced elements because libxml2
+ * already reports more specific errors in those cases. So, this
+ * error is redundant noise. Also, libxml2 2.13 no longer reports
+ * this error in every case.
+ */
+ if (error->code == XML_ERR_NOT_WELL_BALANCED)
+ return;
+
+ /* fall through */
+
case XML_FROM_NONE:
case XML_FROM_MEMORY:
case XML_FROM_IO:
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 6500cff885..d6a51f9e38 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -254,17 +254,11 @@ ERROR: invalid XML content
DETAIL: line 1: xmlParseEntityRef: no name
<invalidentity>&</invalidentity>
^
-line 1: chunk is not well balanced
-<invalidentity>&</invalidentity>
- ^
SELECT xmlparse(content '<undefinedentity>&idontexist;</undefinedentity>');
ERROR: invalid XML content
DETAIL: line 1: Entity 'idontexist' not defined
<undefinedentity>&idontexist;</undefinedentity>
^
-line 1: chunk is not well balanced
-<undefinedentity>&idontexist;</undefinedentity>
- ^
SELECT xmlparse(content '<invalidns xmlns=''<''/>');
xmlparse
---------------------------
@@ -283,9 +277,6 @@ DETAIL: line 1: Entity 'idontexist' not defined
<twoerrors>&idontexist;</unbalanced>
^
line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
-<twoerrors>&idontexist;</unbalanced>
- ^
-line 1: chunk is not well balanced
<twoerrors>&idontexist;</unbalanced>
^
SELECT xmlparse(content '<nosuchprefix:tag/>');
@@ -294,6 +285,19 @@ SELECT xmlparse(content '<nosuchprefix:tag/>');
<nosuchprefix:tag/>
(1 row)
+SELECT xmlparse(content '<unclosed>');
+ERROR: invalid XML content
+DETAIL: line 1: Premature end of data in tag unclosed line 1
+<unclosed>
+ ^
+SELECT xmlparse(content '<parent><child></parent></child>');
+ERROR: invalid XML content
+DETAIL: line 1: Opening and ending tag mismatch: child line 1 and parent
+<parent><child></parent></child>
+ ^
+line 1: Opening and ending tag mismatch: parent line 1 and child
+<parent><child></parent></child>
+ ^
SELECT xmlparse(document ' ');
ERROR: invalid XML document
DETAIL: line 1: Start tag expected, '<' not found
@@ -352,6 +356,19 @@ SELECT xmlparse(document '<nosuchprefix:tag/>');
<nosuchprefix:tag/>
(1 row)
+SELECT xmlparse(document '<unclosed>');
+ERROR: invalid XML document
+DETAIL: line 1: Premature end of data in tag unclosed line 1
+<unclosed>
+ ^
+SELECT xmlparse(document '<parent><child></parent></child>');
+ERROR: invalid XML document
+DETAIL: line 1: Opening and ending tag mismatch: child line 1 and parent
+<parent><child></parent></child>
+ ^
+line 1: Opening and ending tag mismatch: parent line 1 and child
+<parent><child></parent></child>
+ ^
SELECT xmlpi(name foo);
xmlpi
---------
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 953bac09e4..15ccbe1d35 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -77,6 +77,8 @@ SELECT xmlparse(content '<invalidns xmlns=''<''/>');
SELECT xmlparse(content '<relativens xmlns=''relative''/>');
SELECT xmlparse(content '<twoerrors>&idontexist;</unbalanced>');
SELECT xmlparse(content '<nosuchprefix:tag/>');
+SELECT xmlparse(content '<unclosed>');
+SELECT xmlparse(content '<parent><child></parent></child>');
SELECT xmlparse(document ' ');
SELECT xmlparse(document 'abc');
@@ -87,6 +89,8 @@ SELECT xmlparse(document '<invalidns xmlns=''<''/>');
SELECT xmlparse(document '<relativens xmlns=''relative''/>');
SELECT xmlparse(document '<twoerrors>&idontexist;</unbalanced>');
SELECT xmlparse(document '<nosuchprefix:tag/>');
+SELECT xmlparse(document '<unclosed>');
+SELECT xmlparse(document '<parent><child></parent></child>');
SELECT xmlpi(name foo);