ID: 41374
User updated by: vesselin at awcreator dot com
Reported By: vesselin at awcreator dot com
Status: Open
Bug Type: DOM XML related
Operating System: Linux
PHP Version: 5.2.2
New Comment:
Actually this sentence:
"For example the bug does not show if the <h1>
tag in the sample code is not followed by spaces/tabs."
should be read as:
"For example the bug does not show if the <h1>
tag in the sample code is not PRECEDED by spaces/tabs."
Previous Comments:
------------------------------------------------------------------------
[2007-05-12 09:34:57] vesselin at awcreator dot com
Description:
------------
HTML documents loaded via DOMDocument->loadHTML() incorrectly loads
some text nodes twice. Please note that formatting and whitespace in the
loaded HTML is important. For example the bug does not show if the <h1>
tag in the sample code is not followed by spaces/tabs.
Reproduce code:
---------------
<?php
function dump_node ($node)
{
for (
$child = $node->firstChild;
$child !== null;
$child = $child->nextSibling
) {
printf ("NODE TYPE: %s\n", $child->nodeType);
switch ($child->nodeType) {
case XML_ELEMENT_NODE:
printf ("TYPE: ELEMENT, TAG: \"%s\"\n",
$child->tagName);
dump_node ($child);
break;
case XML_TEXT_NODE:
printf ("TYPE TEXT, TEXT: \"%s\"\n", htmlspecialchars
($child->wholeText));
break;
}
}
}
$html = <<<EOF
<html>
<body>
<table>
<tr>
<td>
<h1>Left col</h1>Some generic text
</td>
</tr>
</table>
</body>
</html>
EOF;
$document = new DOMDocument ();
$document->resolveExternals = true;
$document->loadHTML ($html);
dump_node ($document);
?>
Expected result:
----------------
A dump of all document nodes and only one text node that has "Some
generic text" as data.
Actual result:
--------------
A dump of all document nodes and two text nodes that have "Some generic
text" as data.
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=41374&edit=1