On 28.07.25 04:47, Michael Paquier wrote:
> I understand that from the point of view of a
> maintainer this is rather bad, but from the customer point of view the
> current situation is also bad to deal with in the scope of a minor
> upgrade, because applications suddenly break.
I totally get it --- from the user’s perspective, it’s hard to see this
as a bugfix.
I was wondering whether using XML_PARSE_HUGE in xml_parse's options
could help address this, for example:
options = XML_PARSE_NOENT | XML_PARSE_DTDATTR | XML_PARSE_HUGE
| (preserve_whitespace ? 0 : XML_PARSE_NOBLANKS);
According to libxml2's parserInternals.h:
/**
* Maximum size allowed for a single text node when building a tree.
* This is not a limitation of the parser but a safety boundary feature,
* use XML_PARSE_HUGE option to override it.
* Introduced in 2.9.0
*/
#define XML_MAX_TEXT_LENGTH 10000000
/**
* Maximum size allowed when XML_PARSE_HUGE is set.
*/
#define XML_MAX_HUGE_LENGTH 1000000000
The XML_MAX_TEXT_LENGTH limit is what we're hitting now, but
XML_MAX_HUGE_LENGTH is extremely generous. Here's a quick PoC using
XML_PARSE_HUGE:
psql (19devel)
Type "help" for help.
postgres=# CREATE TABLE xmldata (message xml);
CREATE TABLE
postgres=# DO $$
DECLARE huge_size text := repeat('X', 1000000000);
BEGIN
INSERT INTO xmldata (message) VALUES
((('<foo><bar>' || huge_size ||'</bar></foo>')::xml));
END $$;
DO
postgres=# SELECT pg_size_pretty(length(message::text)::bigint) FROM
xmldata;
pg_size_pretty
----------------
954 MB
(1 row)
While XML_MAX_HUGE_LENGTH prevents unlimited memory usage, it still
opens the door to potential resource exhaustion. I couldn't find a way
to dynamically adjust this limit in libxml2.
One idea would be to guard XML_PARSE_HUGE behind a GUC --- say,
xml_enable_huge_parsing. That would at least allow controlled
environments to opt in. But of course, that wouldn't help current releases.
Best regards, Jim