Re: xmlconcat (was [HACKERS] 9.0 release notes done)
Tom Lane t...@sss.pgh.pa.us wrote: Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp writes: Can we take the patch for 9.0? The bug is registered as an open item: http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items Given that there are still problems with it, applying the patch for 9.0 would mean changing the behavior of xmlconcat in 9.0 and then again in 9.1. I don't think that's a good idea. Better to leave it alone until we have a full fix. Ok, I added it in ToDo list, and removed it from 9.0 open items. better handling of PIs and DTDs in xmlconcat() Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: xmlconcat (was [HACKERS] 9.0 release notes done)
Andrew Dunstan and...@dunslane.net wrote: Hmm. OK. Well here is a patch that tries to fix the xmlconcat error, anyway. It seems to work, but maybe could stand a little tightening. Can we take the patch for 9.0? The bug is registered as an open item: http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items As far as the patch, I found there are still two issues even after it applied: 1. A linebreak is added at the line end DOCTYPE exists. =# SELECT xmlconcat('foo', xmlparse(DOCUMENT '!DOCTYPE htmlhtml/')); xmlconcat foohtml/+ (1 row) 2. DOCUMENT could have ?xml before DOCTYPE. =# SELECT xmlconcat('foo', xmlparse(DOCUMENT '?xml version=1.0? !DOCTYPE html html/')); xmlconcat --- foo (1 row) Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: xmlconcat (was [HACKERS] 9.0 release notes done)
Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp writes: Can we take the patch for 9.0? The bug is registered as an open item: http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items Given that there are still problems with it, applying the patch for 9.0 would mean changing the behavior of xmlconcat in 9.0 and then again in 9.1. I don't think that's a good idea. Better to leave it alone until we have a full fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: xmlconcat (was [HACKERS] 9.0 release notes done)
Peter Eisentraut wrote: On mån, 2010-03-22 at 19:38 -0400, Andrew Dunstan wrote: But if we are not comfortable about being able to do that safely, I would be OK with just raising an error if a concatenation is attempted where one value contains a DTD. The impact in practice should be low. Right. Can you find a way to do that using the libxml API? I haven't managed to, and I'm pretty sure I can construct XML that fails every simple string search test I can think of, either with a false negative or a false positive. The documentation on that is terse as usual. In any case, you will need to XML parse the input values, and so you might as well resort to parsing the output value to see if it is well-formed, which should catch this mistake and possibly others. Actually, I have come to the conclusion that the biggest problem in this area is that we accept XML documents with a leading DOCTYPE node at all. Our docs state: The xml type can store well-formed documents, as defined by the XML standard, as well as content fragments, which are defined by the production XMLDecl? content in the XML standard. A document with a leading DOCTYPE node matches neither of these rules, and when we strip the XMLDecl from a piece of XML where it's followed by a DOCTYPE node we turn something that is legal XML into something that isn't, even by our own (or possibly the standard's) relaxed definition. A doctypedecl can only follow an XMLDecl, see http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-prolog-dtd. So I think we need to go back to the drawing board a bit, rather than patch a particular reported error case. But these problems are not at all new to 9.0, and coming up to beta as I hope we are is not the time for it. I think it will have to wait to 9.1. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: xmlconcat (was [HACKERS] 9.0 release notes done)
On ons, 2010-03-24 at 14:51 -0400, Andrew Dunstan wrote: Actually, I have come to the conclusion that the biggest problem in this area is that we accept XML documents with a leading DOCTYPE node at all. Our docs state: The xml type can store well-formed documents, as defined by the XML standard, as well as content fragments, which are defined by the production XMLDecl? content in the XML standard. A document with a leading DOCTYPE node matches neither of these rules, and when we strip the XMLDecl from a piece of XML where it's followed by a DOCTYPE node we turn something that is legal XML into something that isn't, even by our own (or possibly the standard's) relaxed definition. A doctypedecl can only follow an XMLDecl, see http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-prolog-dtd. Our version of SQL/XML support references SQL:2003 which references XML 1.0, where omitting the XMLDecl is legal. You can't omit the XMLDecl in XML 1.1, because you need it to communicate the fact that it's version 1.1. But note that that is correctly supported: =# select xmlconcat('?xml version=1.0?foo/', '?xml version=1.0?bar/'); xmlconcat -- foo/bar/ and =# select xmlconcat('?xml version=1.1?foo/', '?xml version=1.1?bar/'); xmlconcat --- ?xml version=1.1?foo/bar/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: xmlconcat (was [HACKERS] 9.0 release notes done)
Peter Eisentraut wrote: Our version of SQL/XML support references SQL:2003 which references XML 1.0, where omitting the XMLDecl is legal. You can't omit the XMLDecl in XML 1.1, because you need it to communicate the fact that it's version 1.1. Hmm. OK. Well here is a patch that tries to fix the xmlconcat error, anyway. It seems to work, but maybe could stand a little tightening. cheers andrew Index: src/backend/utils/adt/xml.c === RCS file: /cvsroot/pgsql/src/backend/utils/adt/xml.c,v retrieving revision 1.97 diff -c -r1.97 xml.c *** src/backend/utils/adt/xml.c 3 Mar 2010 17:29:45 - 1.97 --- src/backend/utils/adt/xml.c 24 Mar 2010 22:05:19 - *** *** 418,424 --- 418,466 #endif } + #ifdef USE_LIBXML + static inline void + strip_dtd(char ** xmlstr) + { + + xmlDocPtr doc; + xmlChar *xmlbuff; + int buffersize; + booldtd_found = false; + xmlNodePtr child; + char * skip_xmldecl; + + if (strstr(*xmlstr,!DOCTYPE) == NULL) + return ; + + doc = xml_parse(cstring_to_text(*xmlstr), XMLOPTION_DOCUMENT, true, GetDatabaseEncoding()); + + for (child = doc-children; child != NULL; child = child-next) + { + if (child-type == XML_DOCUMENT_TYPE_NODE || + child-type == XML_DTD_NODE) + { + xmlUnlinkNode(child); + xmlFreeNode(child); + dtd_found = true; + } + } + if (dtd_found) + { + pfree(*xmlstr); + xmlDocDumpFormatMemory(doc, xmlbuff, buffersize, 0); + if (strncmp((char *)xmlbuff,?xml,5) == 0) + skip_xmldecl = strstr((char *)xmlbuff,?\n) + 3; + else + skip_xmldecl = (char *) xmlbuff; + *xmlstr = pstrdup(skip_xmldecl); + xmlFree(xmlbuff); + + } + xmlFreeDoc(doc); + } + #endif /* * TODO: xmlconcat needs to merge the notations and unparsed entities *** *** 460,465 --- 502,509 else if (xmlStrcmp(version, global_version) != 0) global_version_no_value = true; + strip_dtd(str); + appendStringInfoString(buf, str + len); pfree(str); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: xmlconcat (was [HACKERS] 9.0 release notes done)
Andrew Dunstan and...@dunslane.net writes: Hmm. OK. Well here is a patch that tries to fix the xmlconcat error, anyway. It seems to work, but maybe could stand a little tightening. I liked your previous idea (rethink the whole mess in 9.1) better. As far as the patch itself is concerned, the complete lack of error checks seems scary, and I wonder whether the case sensitivity and lack of whitespace tolerance in the string comparisons is OK. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: xmlconcat (was [HACKERS] 9.0 release notes done)
Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: Hmm. OK. Well here is a patch that tries to fix the xmlconcat error, anyway. It seems to work, but maybe could stand a little tightening. I liked your previous idea (rethink the whole mess in 9.1) better. As far as the patch itself is concerned, the complete lack of error checks seems scary, Yes, this wasn't intended as the final patch. If it's not wanted right now, that's fine too. I just wanted to get it on the record as possibly something useful when we do come to reconsider the whole mess. Getting to grips with the libxml2 API is no fun, and it's better not to have to repeat it if possible ;-) and I wonder whether the case sensitivity and lack of whitespace tolerance in the string comparisons is OK. The tokens were chosen with some care to be such that no whitespace tolerance would be needed (or correct). XML is case sensitive, so that's not an issue either. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: xmlconcat (was [HACKERS] 9.0 release notes done)
On mån, 2010-03-22 at 19:38 -0400, Andrew Dunstan wrote: But if we are not comfortable about being able to do that safely, I would be OK with just raising an error if a concatenation is attempted where one value contains a DTD. The impact in practice should be low. Right. Can you find a way to do that using the libxml API? I haven't managed to, and I'm pretty sure I can construct XML that fails every simple string search test I can think of, either with a false negative or a false positive. The documentation on that is terse as usual. In any case, you will need to XML parse the input values, and so you might as well resort to parsing the output value to see if it is well-formed, which should catch this mistake and possibly others. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: xmlconcat (was [HACKERS] 9.0 release notes done)
On sön, 2010-03-21 at 13:07 -0400, Andrew Dunstan wrote: Yeah, maybe. According to http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html the only legal child of an XML Document node that is not also a legal child of a DocumentFragment node is a DocumentType node. So we could probably just look for one of those in each argument node and strip it out. That should be fairly lightweight in the common case where it's not present - we'd just be searching for a fixed string. Removing it if found would be more complex. We'd have to parse the node to remove it, since a legal DocumentType node string could appear legally inside a CDATA node. According to the SQL/XML standard, the document type declaration should apparently be stripped when doing a concatenation. (This makes sense because the result of a concatenation can never be valid according to a DTD.) But if we are not comfortable about being able to do that safely, I would be OK with just raising an error if a concatenation is attempted where one value contains a DTD. The impact in practice should be low. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: xmlconcat (was [HACKERS] 9.0 release notes done)
Peter Eisentraut wrote: On sön, 2010-03-21 at 13:07 -0400, Andrew Dunstan wrote: Yeah, maybe. According to http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html the only legal child of an XML Document node that is not also a legal child of a DocumentFragment node is a DocumentType node. So we could probably just look for one of those in each argument node and strip it out. That should be fairly lightweight in the common case where it's not present - we'd just be searching for a fixed string. Removing it if found would be more complex. We'd have to parse the node to remove it, since a legal DocumentType node string could appear legally inside a CDATA node. According to the SQL/XML standard, the document type declaration should apparently be stripped when doing a concatenation. (This makes sense because the result of a concatenation can never be valid according to a DTD.) But if we are not comfortable about being able to do that safely, I would be OK with just raising an error if a concatenation is attempted where one value contains a DTD. The impact in practice should be low. Right. Can you find a way to do that using the libxml API? I haven't managed to, and I'm pretty sure I can construct XML that fails every simple string search test I can think of, either with a false negative or a false positive. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: xmlconcat (was [HACKERS] 9.0 release notes done)
Andrew Dunstan and...@dunslane.net writes: http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items I have just been looking at the xmlconcat bug on that list. I can't think of any better solution than parsing the resulting string to make sure it is well-formed before we return, That might be a reasonable thing to do as a safety check, but I can't escape the feeling that what this fundamentally is is a data typing error, traceable to the lack of differentiation between xml documents and xml fragments. Is there a way to attack it based on saying that the inputs can't be documents, or stripping the document overhead if they are? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: xmlconcat (was [HACKERS] 9.0 release notes done)
Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items I have just been looking at the xmlconcat bug on that list. I can't think of any better solution than parsing the resulting string to make sure it is well-formed before we return, That might be a reasonable thing to do as a safety check, but I can't escape the feeling that what this fundamentally is is a data typing error, traceable to the lack of differentiation between xml documents and xml fragments. Is there a way to attack it based on saying that the inputs can't be documents, or stripping the document overhead if they are? Yeah, maybe. According to http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html the only legal child of an XML Document node that is not also a legal child of a DocumentFragment node is a DocumentType node. So we could probably just look for one of those in each argument node and strip it out. That should be fairly lightweight in the common case where it's not present - we'd just be searching for a fixed string. Removing it if found would be more complex. We'd have to parse the node to remove it, since a legal DocumentType node string could appear legally inside a CDATA node. That has the advantage that it would fix the error rather than failing, but I'm slightly nervous about silently mangling user supplied XML. I guess we do that in a few other cases to make other combinations function sanely. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers