ID: 44648 Updated by: [EMAIL PROTECTED] Reported By: [EMAIL PROTECTED] Status: Open Bug Type: DOM XML related Operating System: Windows Vista PHP Version: 5.2.5 Assigned To: rrichards New Comment:
You should read the specs more closely. Names are most certainly checked and a DOMException with an INVALID_CHARACTER_ERROR error is thrown. Some of the others I need to look at because it is perfectly fine to create non well formed XML using DOM though it should error during serialization, so for those the bug would onl be in the saveXML routine. Other extensions it is not a bug because non-well formed XML support is required because the output when used in a larger context is well formed Previous Comments: ------------------------------------------------------------------------ [2008-04-05 23:02:49] [EMAIL PROTECTED] One more: ]]> is not allowed in CDATA blocks. I also suspect that the other XML extensions have bugs here. ------------------------------------------------------------------------ [2008-04-05 23:02:02] [EMAIL PROTECTED] IIRC, DOM does not make any demands on names or things like that. libxml2, which is known for its strictness, doesn't either. So, I'm still hoping that you'll let the checks be turned off. :-) Some things from my investigation: - Double hyphens (--) are not allowed in comments - Most of the text inputs don't check for UTF-8 well-formedness. Haven't tested numeric character entities either, but those are suspicious - Fake namespace declarations in attributes ($d->appendChild($d->createElement('foo:bar')); results in invalid XML, as foo namespace was never defined) All these result in a $d->saveXML(); that is invalid XML, and probably some more. ------------------------------------------------------------------------ [2008-04-05 22:54:04] [EMAIL PROTECTED] assign to self. The strictness is dependent upon the DOM specs and setAttribute should be throwing an exception in that case. While I am going to go through and check other methods, let me know if you come across any others that are not validating names correctly. ------------------------------------------------------------------------ [2008-04-05 21:55:06] [EMAIL PROTECTED] Description: ------------ libxml2 is fairly lenient when it comes to what it allows to go into its nodes; you can set attributes and tags with illegal characters in them and it won't complain. The burden is on the userland code to perform an appropriate check with the xmlValidate*() functions. PHP's DOM implementation is extremely spotty when it comes to these checks, which allows for some broken XML to easily be generated. For example, $d = new DOMDocument(); $d->appendChild($n = $d->createElement('a')); $n->setAttribute('"@', 'foo'); echo $d->saveXML(); outputs: <?xml version="1.0"?> <a "@="foo"/> Which is clearly incorrect. However, if I attempt to $d->createElement('a@'); DOM complains, because xmlValidateName was called on the element name. Now, I actually don't mind the lack of checking; the DOM tree is useful for things like HTML, where the rules are slightly different from XMLs; an HTML tree can contain a "a@" node, although it would not be valid HTML. (You can try it out for yourself on Firefox by putting that in a document and then inspecting the DOM). However, I want consistency, and I also want the ability to switch on strict checking when I so desire (especially when I'm producing XML). So I want all-or-nothing production checks in PHP DOM, adding another property in DOMDocument (or maybe even a global libxml configuration option) that specifies whether or not strict production checking should be done. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=44648&edit=1