On 28.07.25 04:47, Michael Paquier wrote:
> I understand that from the point of view of a
> maintainer this is rather bad, but from the customer point of view the
> current situation is also bad to deal with in the scope of a minor
> upgrade, because applications suddenly break.

I totally get it --- from the user’s perspective, it’s hard to see this
as a bugfix.

I was wondering whether using XML_PARSE_HUGE in xml_parse's options
could help address this, for example:

options = XML_PARSE_NOENT | XML_PARSE_DTDATTR | XML_PARSE_HUGE
          | (preserve_whitespace ? 0 : XML_PARSE_NOBLANKS);


According to libxml2's parserInternals.h:

/**
 * Maximum size allowed for a single text node when building a tree.
 * This is not a limitation of the parser but a safety boundary feature,
 * use XML_PARSE_HUGE option to override it.
 * Introduced in 2.9.0
 */
#define XML_MAX_TEXT_LENGTH 10000000

/**
 * Maximum size allowed when XML_PARSE_HUGE is set.
 */
#define XML_MAX_HUGE_LENGTH 1000000000

The XML_MAX_TEXT_LENGTH limit is what we're hitting now, but
XML_MAX_HUGE_LENGTH is extremely generous. Here's a quick PoC using
XML_PARSE_HUGE:

psql (19devel)
Type "help" for help.

postgres=# CREATE TABLE xmldata (message xml);
CREATE TABLE
postgres=# DO $$
DECLARE huge_size text := repeat('X', 1000000000);
BEGIN
  INSERT INTO xmldata (message) VALUES
  ((('<foo><bar>' || huge_size ||'</bar></foo>')::xml));
END $$;
DO
postgres=# SELECT pg_size_pretty(length(message::text)::bigint) FROM
xmldata;
 pg_size_pretty
----------------
 954 MB
(1 row)

While XML_MAX_HUGE_LENGTH prevents unlimited memory usage, it still
opens the door to potential resource exhaustion. I couldn't find a way
to dynamically adjust this limit in libxml2.

One idea would be to guard XML_PARSE_HUGE behind a GUC --- say,
xml_enable_huge_parsing. That would at least allow controlled
environments to opt in. But of course, that wouldn't help current releases.

Best regards, Jim


Reply via email to