ID:               31613
 User updated by:  brian dot sanders at cometsystems dot com
 Reported By:      brian dot sanders at cometsystems dot com
-Status:           Bogus
+Status:           Open
 Bug Type:         XML related
 Operating System: Fedora Core 3
 PHP Version:      5.0.2
 New Comment:

This is either a bug in PHP's DOM implementation, the underlying
libxml2 interface, or the PHP documentation.  

The W3C spec is vague on the use of the property nodeValue:

"The attributes nodeName, nodeValue and attributes are included as a
mechanism to get at node information without casting down to the
specific derived interface."

However, what is the purpose of nodeValue besides getting and setting
the value of a child text node?  In these situations, all special
characters, including ampersands should be encoded by the interface. 
Encoding them manually breaks a basic contract of DOM/XML/XSL, which
states that each transport layer does not need to worry about the
encoding of other transport layers.

On the other hand, ampersands ARE properly encoded when setting the
property textContent.  Unfortunately they are not encoded when the text
string is passed as the optional second arguement to
DOMElement::createElement.  The example in the manual
(http://us4.php.net/manual/en/function.dom-domdocument-createelement.php)
explicitely sets the text value of an element using this unsafe method. 
And to further complicate the issue, setting the textContent property on
a node created by DOMElement::createElement() does NOT work.  You must
create a text node, set the textContent, then append the text node to
the new element.

So in summary:
* Setting the textContent property encodes ampersands as &
* Setting the nodeValue property does not encode ampersands, but
instead truncates the string (silently with some versions of libxml2.)
* The manual suggests using the optional second argument to
DOMElement::createElement() to set the value of a node.
* The manual also suggests using the nodeValue property to set the
value of a node.
* Both of these techniques will result in lost data for strings with
ampersands, unless the string are encoded manually before being used.


Previous Comments:
------------------------------------------------------------------------

[2005-01-20 12:54:45] [EMAIL PROTECTED]

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

You have to escape the value when writing to the nodeValue property in
this case. This property is made available to elements as a
convienience - the specs actually state that this property has no
meaning for an element node. So, escaping must be done for an & so that
entities could be used in the "value" of an element (Note the warning
you get when you disable error supression)

------------------------------------------------------------------------

[2005-01-19 17:39:47] brian dot sanders at cometsystems dot com

Description:
------------
We have encountered the following bug when using libxml2 (via PHP5):

When we set the value of a text node, ampersands (&) are not being
converted to XML entities.  These raw characters truncate our text
node, resulting in lost data.

For example, after setting nodeValues, less-than's (<) show up as
entities (&lt;) in our XML file.  However, ampersands (&) show up as
raw characters (&). 

Note that in the example code we have silenced the output from setting
the nodeValues, as this generates a PHP warning when used with newer
versions of libxml2.

We have experienced this bug on at least two machines, with the
following configurations:

MACHINE 1:
 LIBXML2: 2.6.16-3 (binary rpm)
 PHP: 5.0.2 (built from source)
 OS: Fedora Core Linux 3
 KERNEL: 2.6.9-1.681_FC3smp

MACHINE 2:
 LIBXML2: 2.6.7-28.4 (binary)
 PHP: 5.0.2 (built from source)
 OS: Suse Linux 9.1
 KERNEL: 2.6.4-52-default 

Reproduce code:
---------------
<?php

/* load the document */
$dom = DomDocument::loadXML(<<<XML
<?xml version="1.0"?>
<test>
 <foo>Here is an &lt; to test.</foo>
 <bar>Here is an &amp; to test.</bar>
</test>
XML
);

/* confirm we have the document verbatim */
print("HERE IS THE INITIAL DOCUMENT:\n\n");
print($dom->saveXML());
print("\n--------------------------------\n\n");

/* get the LT node */
$nodeList = $dom->getElementsByTagName('foo');
$nodeLT = $nodeList->item(0);

/* get the AMP node */
$nodeList = $dom->getElementsByTagName('bar');
$nodeAMP = $nodeList->item(0);

/* resave the node values */
@$nodeLT->nodeValue = $nodeLT->nodeValue;
@$nodeAMP->nodeValue = $nodeAMP->nodeValue;

/* show the altered document */
print("HERE IS THE DOCUMENT AFTER THE SAVE:\n\n");
print($dom->saveXML());
print("\n--------------------------------\n\n");

?>


Expected result:
----------------
HERE IS THE INITIAL DOCUMENT:

<?xml version="1.0"?>
<test>
 <foo>Here is an &lt; to test.</foo>
 <bar>Here is an &amp; to test.</bar>
</test>

--------------------------------

HERE IS THE DOCUMENT AFTER THE SAVE:

<?xml version="1.0"?>

<test>
 <foo>Here is an &lt; to test.</foo>
 <bar>Here is an &amp; to test.</bar>
</test>


Actual result:
--------------
HERE IS THE INITIAL DOCUMENT:

<?xml version="1.0"?>
<test>
 <foo>Here is an &lt; to test.</foo>
 <bar>Here is an &amp; to test.</bar>
</test>

--------------------------------

HERE IS THE DOCUMENT AFTER THE SAVE:

<?xml version="1.0"?>

<test>
 <foo>Here is an &lt; to test.</foo>
 <bar>Here is an </bar>
</test>



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=31613&edit=1

Reply via email to