ID: 31613 User updated by: brian dot sanders at cometsystems dot com Reported By: brian dot sanders at cometsystems dot com -Status: Bogus +Status: Open Bug Type: XML related Operating System: Fedora Core 3 PHP Version: 5.0.2 New Comment:
This is either a bug in PHP's DOM implementation, the underlying libxml2 interface, or the PHP documentation. The W3C spec is vague on the use of the property nodeValue: "The attributes nodeName, nodeValue and attributes are included as a mechanism to get at node information without casting down to the specific derived interface." However, what is the purpose of nodeValue besides getting and setting the value of a child text node? In these situations, all special characters, including ampersands should be encoded by the interface. Encoding them manually breaks a basic contract of DOM/XML/XSL, which states that each transport layer does not need to worry about the encoding of other transport layers. On the other hand, ampersands ARE properly encoded when setting the property textContent. Unfortunately they are not encoded when the text string is passed as the optional second arguement to DOMElement::createElement. The example in the manual (http://us4.php.net/manual/en/function.dom-domdocument-createelement.php) explicitely sets the text value of an element using this unsafe method. And to further complicate the issue, setting the textContent property on a node created by DOMElement::createElement() does NOT work. You must create a text node, set the textContent, then append the text node to the new element. So in summary: * Setting the textContent property encodes ampersands as & * Setting the nodeValue property does not encode ampersands, but instead truncates the string (silently with some versions of libxml2.) * The manual suggests using the optional second argument to DOMElement::createElement() to set the value of a node. * The manual also suggests using the nodeValue property to set the value of a node. * Both of these techniques will result in lost data for strings with ampersands, unless the string are encoded manually before being used. Previous Comments: ------------------------------------------------------------------------ [2005-01-20 12:54:45] [EMAIL PROTECTED] Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.php.net/manual/ and the instructions on how to report a bug at http://bugs.php.net/how-to-report.php You have to escape the value when writing to the nodeValue property in this case. This property is made available to elements as a convienience - the specs actually state that this property has no meaning for an element node. So, escaping must be done for an & so that entities could be used in the "value" of an element (Note the warning you get when you disable error supression) ------------------------------------------------------------------------ [2005-01-19 17:39:47] brian dot sanders at cometsystems dot com Description: ------------ We have encountered the following bug when using libxml2 (via PHP5): When we set the value of a text node, ampersands (&) are not being converted to XML entities. These raw characters truncate our text node, resulting in lost data. For example, after setting nodeValues, less-than's (<) show up as entities (<) in our XML file. However, ampersands (&) show up as raw characters (&). Note that in the example code we have silenced the output from setting the nodeValues, as this generates a PHP warning when used with newer versions of libxml2. We have experienced this bug on at least two machines, with the following configurations: MACHINE 1: LIBXML2: 2.6.16-3 (binary rpm) PHP: 5.0.2 (built from source) OS: Fedora Core Linux 3 KERNEL: 2.6.9-1.681_FC3smp MACHINE 2: LIBXML2: 2.6.7-28.4 (binary) PHP: 5.0.2 (built from source) OS: Suse Linux 9.1 KERNEL: 2.6.4-52-default Reproduce code: --------------- <?php /* load the document */ $dom = DomDocument::loadXML(<<<XML <?xml version="1.0"?> <test> <foo>Here is an < to test.</foo> <bar>Here is an & to test.</bar> </test> XML ); /* confirm we have the document verbatim */ print("HERE IS THE INITIAL DOCUMENT:\n\n"); print($dom->saveXML()); print("\n--------------------------------\n\n"); /* get the LT node */ $nodeList = $dom->getElementsByTagName('foo'); $nodeLT = $nodeList->item(0); /* get the AMP node */ $nodeList = $dom->getElementsByTagName('bar'); $nodeAMP = $nodeList->item(0); /* resave the node values */ @$nodeLT->nodeValue = $nodeLT->nodeValue; @$nodeAMP->nodeValue = $nodeAMP->nodeValue; /* show the altered document */ print("HERE IS THE DOCUMENT AFTER THE SAVE:\n\n"); print($dom->saveXML()); print("\n--------------------------------\n\n"); ?> Expected result: ---------------- HERE IS THE INITIAL DOCUMENT: <?xml version="1.0"?> <test> <foo>Here is an < to test.</foo> <bar>Here is an & to test.</bar> </test> -------------------------------- HERE IS THE DOCUMENT AFTER THE SAVE: <?xml version="1.0"?> <test> <foo>Here is an < to test.</foo> <bar>Here is an & to test.</bar> </test> Actual result: -------------- HERE IS THE INITIAL DOCUMENT: <?xml version="1.0"?> <test> <foo>Here is an < to test.</foo> <bar>Here is an & to test.</bar> </test> -------------------------------- HERE IS THE DOCUMENT AFTER THE SAVE: <?xml version="1.0"?> <test> <foo>Here is an < to test.</foo> <bar>Here is an </bar> </test> ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=31613&edit=1