On 04/27/2011 11:41 PM, Noah Misch wrote:
On Wed, Apr 27, 2011 at 11:22:37PM -0400, Andrew Dunstan wrote:
On 04/27/2011 05:30 PM, Noah Misch wrote:
To make things worse, the dump/reload problems seems to depend on your version
of libxml2, or something.  With git master, a CentOS 5 system with
2.6.26-2.1.2.8.el5_5.1 accepts the ^A byte, but an Ubuntu 8.04 LTS system with
2.6.31.dfsg-2ubuntu rejects it.  Even with a patch like this, systems with a
lenient libxml2 will be liable to store XML data that won't restore on a system
with a strict libxml2.  Perhaps we should emit a build-time warning if the local
libxml2 is lenient?
No, I think we need to be strict ourselves.
Then I suppose we'd also scan for invalid characters in xml_parse()?  Or, at
least, do so when linked to a libxml2 that neglects to do so itself?

Yep.

Injecting the check here aids "xmlelement" and "xmlforest" , but "xmlcomment"
and "xmlpi" still let the invalid byte through.  You can also still inject the
byte into an attribute value via "xmlelement".  I wonder if it wouldn't make
more sense to just pass any XML that we generate from scratch through libxml2.
There are a lot of holes to plug, otherwise.
Maybe there are, but I'd want lots of convincing that we should do that
at this stage. Maybe for 9.2. I think we can plug the holes fairly
simply for xmlpi and xmlcomment, and catch the attributes by moving this
check up into map_sql_value_to_xml_value().
I don't have much convincing to offer -- hunting down the holes seem fine, too.



I think I've done that. Here's the patch I have now. It looks like we can catch pretty much everything by putting checks in four places, which isn't too bad.

Please review and try to break.

cheers

andrew


diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index ee82d46..12cfd56 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -142,6 +142,20 @@ static void SPI_sql_row_to_xmlelement(int rownum, StringInfo result,
 #define NAMESPACE_XSI "http://www.w3.org/2001/XMLSchema-instance";
 #define NAMESPACE_SQLXML "http://standards.iso.org/iso/9075/2003/sqlxml";
 
+/* forbidden C0 control chars */
+#define FORBIDDEN_C0  \
+	"\x01\x02\x03\x04\x05\x06\x07\x08\x0B\x0C\x0E\x0F\x10\x11" \
+	"\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F"
+
+static inline void
+check_forbidden_c0(char * str)
+{
+	if (strpbrk(str,FORBIDDEN_C0) != NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("character out of range"),
+				 errdetail("XML does not support control characters.")));
+}
 
 #ifdef USE_LIBXML
 
@@ -411,6 +425,8 @@ xmlcomment(PG_FUNCTION_ARGS)
 	appendStringInfoText(&buf, arg);
 	appendStringInfo(&buf, "-->");
 
+	check_forbidden_c0(buf.data);
+	
 	PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
 #else
 	NO_XML_SUPPORT();
@@ -718,6 +734,8 @@ xmlpi(char *target, text *arg, bool arg_is_null, bool *result_is_null)
 	}
 	appendStringInfoString(&buf, "?>");
 
+	check_forbidden_c0(buf.data);
+	
 	result = stringinfo_to_xmltype(&buf);
 	pfree(buf.data);
 	return result;
@@ -1184,6 +1202,9 @@ xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
 										   encoding,
 										   PG_UTF8);
 
+	/* check for illegal XML chars */
+	check_forbidden_c0((char *) utf8string);
+
 	/* Start up libxml and its parser (no-ops if already done) */
 	pg_xml_init();
 	xmlInitParser();
@@ -1804,6 +1825,9 @@ map_sql_value_to_xml_value(Datum value, Oid type, bool xml_escape_strings)
 		getTypeOutputInfo(type, &typeOut, &isvarlena);
 		str = OidOutputFunctionCall(typeOut, value);
 
+		/* check for illegal XML chars */
+		check_forbidden_c0(str);
+
 		/* ... exactly as-is for XML, and when escaping is not wanted */
 		if (type == XMLOID || !xml_escape_strings)
 			return str;
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to