On 04/26/2011 05:11 PM, Noah Misch wrote:
On Mon, Apr 25, 2011 at 07:25:02PM -0400, Andrew Dunstan wrote:
I came across this today, while helping a customer. The following will
happily create a piece of XML with an embedded ^A:

    select xmlelement(name foo, null, E'abc\x01def');

Now, a ^A is totally forbidden in XML version 1.0, and allowed but only
as "" or equivalent in XML version 1.1, and not as a 0x01 byte
(see<http://en.wikipedia.org/wiki/XML#Valid_characters>)

ISTM this is something we should definitely try to fix ASAP, even if we
probably can't backpatch the fix.
+1.  Given that such a datum breaks dump+reload, it seems risky to do nothing at
all in the back branches.



Here's a patch along the lines suggested by Peter.

I'm not sure what to do about the back branches and cases where data is already in databases. This is fairly ugly. Suggestions welcome.

cheers

andrew
*** a/src/backend/utils/adt/xml.c
--- b/src/backend/utils/adt/xml.c
***************
*** 1844,1850 **** escape_xml(const char *str)
--- 1844,1865 ----
  			case '\r':
  				appendStringInfoString(&buf, "&#x0d;");
  				break;
+ 			case '\n':
+ 			case '\t':
+ 				appendStringInfoCharMacro(&buf, *p);
+ 				break;
  			default:
+ 				/* 
+ 				 * Any control char we haven't already explicitly handled
+ 				 * (i.e. TAB, NL and CR)is an error. 
+ 				 * If we ever support XML 1.1 they will be allowed,
+ 				 * but will have to be escaped.
+ 				 */
+ 				if (*p < '\x20')
+ 					ereport(ERROR,
+ 							(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ 							 errmsg("character out of range"),
+ 							 errdetail("XML does not support control characters.")));
  				appendStringInfoCharMacro(&buf, *p);
  				break;
  		}
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to