From: vesselin at awcreator dot com
Operating system: Linux
PHP version: 5.2.2
PHP Bug Type: DOM XML related
Bug description: The XML DOM loadHTML method incorrectly duplicates text nodes
Description:
------------
HTML documents loaded via DOMDocument->loadHTML() incorrectly loads some
text nodes twice. Please note that formatting and whitespace in the loaded
HTML is important. For example the bug does not show if the <h1> tag in the
sample code is not followed by spaces/tabs.
Reproduce code:
---------------
<?php
function dump_node ($node)
{
for (
$child = $node->firstChild;
$child !== null;
$child = $child->nextSibling
) {
printf ("NODE TYPE: %s\n", $child->nodeType);
switch ($child->nodeType) {
case XML_ELEMENT_NODE:
printf ("TYPE: ELEMENT, TAG: \"%s\"\n",
$child->tagName);
dump_node ($child);
break;
case XML_TEXT_NODE:
printf ("TYPE TEXT, TEXT: \"%s\"\n", htmlspecialchars
($child->wholeText));
break;
}
}
}
$html = <<<EOF
<html>
<body>
<table>
<tr>
<td>
<h1>Left col</h1>Some generic text
</td>
</tr>
</table>
</body>
</html>
EOF;
$document = new DOMDocument ();
$document->resolveExternals = true;
$document->loadHTML ($html);
dump_node ($document);
?>
Expected result:
----------------
A dump of all document nodes and only one text node that has "Some generic
text" as data.
Actual result:
--------------
A dump of all document nodes and two text nodes that have "Some generic
text" as data.
--
Edit bug report at http://bugs.php.net/?id=41374&edit=1
--
Try a CVS snapshot (PHP 4.4):
http://bugs.php.net/fix.php?id=41374&r=trysnapshot44
Try a CVS snapshot (PHP 5.2):
http://bugs.php.net/fix.php?id=41374&r=trysnapshot52
Try a CVS snapshot (PHP 6.0):
http://bugs.php.net/fix.php?id=41374&r=trysnapshot60
Fixed in CVS: http://bugs.php.net/fix.php?id=41374&r=fixedcvs
Fixed in release:
http://bugs.php.net/fix.php?id=41374&r=alreadyfixed
Need backtrace: http://bugs.php.net/fix.php?id=41374&r=needtrace
Need Reproduce Script: http://bugs.php.net/fix.php?id=41374&r=needscript
Try newer version: http://bugs.php.net/fix.php?id=41374&r=oldversion
Not developer issue: http://bugs.php.net/fix.php?id=41374&r=support
Expected behavior: http://bugs.php.net/fix.php?id=41374&r=notwrong
Not enough info:
http://bugs.php.net/fix.php?id=41374&r=notenoughinfo
Submitted twice:
http://bugs.php.net/fix.php?id=41374&r=submittedtwice
register_globals: http://bugs.php.net/fix.php?id=41374&r=globals
PHP 3 support discontinued: http://bugs.php.net/fix.php?id=41374&r=php3
Daylight Savings: http://bugs.php.net/fix.php?id=41374&r=dst
IIS Stability: http://bugs.php.net/fix.php?id=41374&r=isapi
Install GNU Sed: http://bugs.php.net/fix.php?id=41374&r=gnused
Floating point limitations: http://bugs.php.net/fix.php?id=41374&r=float
No Zend Extensions: http://bugs.php.net/fix.php?id=41374&r=nozend
MySQL Configuration Error: http://bugs.php.net/fix.php?id=41374&r=mysqlcfg