On 04/27/2011 05:30 PM, Noah Misch wrote:

I'm not sure what to do about the back branches and cases where data is
already in databases. This is fairly ugly. Suggestions welcome.
We could provide a script in (or linked from) the release notes for testing the
data in all your xml columns.

Yeah, we'll have to do something like that. What a blasted mess,

To make things worse, the dump/reload problems seems to depend on your version
of libxml2, or something.  With git master, a CentOS 5 system with
2.6.26-2.1.2.8.el5_5.1 accepts the ^A byte, but an Ubuntu 8.04 LTS system with
2.6.31.dfsg-2ubuntu rejects it.  Even with a patch like this, systems with a
lenient libxml2 will be liable to store XML data that won't restore on a system
with a strict libxml2.  Perhaps we should emit a build-time warning if the local
libxml2 is lenient?

No, I think we need to be strict ourselves.

+                               if (*p<  '\x20')
This needs to be an unsigned comparison.  On my system, "char" is signed, so
"SELECT xmlelement(name foo, null, E'\u0550')" fails incorrectly.

Good point. Perhaps we'd be better off using iscntrl(*p).


The XML character set forbids more than just control characters; see
http://www.w3.org/TR/xml/#charsets.  We also ought to reject, for example,
"SELECT xmlelement(name foo, null, E'\ufffe')".

Injecting the check here aids "xmlelement" and "xmlforest" , but "xmlcomment"
and "xmlpi" still let the invalid byte through.  You can also still inject the
byte into an attribute value via "xmlelement".  I wonder if it wouldn't make
more sense to just pass any XML that we generate from scratch through libxml2.
There are a lot of holes to plug, otherwise.



Maybe there are, but I'd want lots of convincing that we should do that at this stage. Maybe for 9.2. I think we can plug the holes fairly simply for xmlpi and xmlcomment, and catch the attributes by moving this check up into map_sql_value_to_xml_value().

This is a significant data integrity bug, much along the same lines as the invalidly encoded data holes we plugged a release or two back. I'm amazed we haven't hit it till now, but we're sure to see more of it - XML use with Postgres is growing substantially, I believe.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to