ID:               41374
 User updated by:  vesselin at awcreator dot com
 Reported By:      vesselin at awcreator dot com
 Status:           Open
 Bug Type:         DOM XML related
 Operating System: Linux
 PHP Version:      5.2.2
 New Comment:

Actually this sentence:
"For example the bug does not show if the <h1>
tag in the sample code is not followed by spaces/tabs."
should be read as:
"For example the bug does not show if the <h1>
tag in the sample code is not PRECEDED by spaces/tabs."


Previous Comments:
------------------------------------------------------------------------

[2007-05-12 09:34:57] vesselin at awcreator dot com

Description:
------------
HTML documents loaded via DOMDocument->loadHTML() incorrectly loads
some text nodes twice. Please note that formatting and whitespace in the
loaded HTML is important. For example the bug does not show if the <h1>
tag in the sample code is not followed by spaces/tabs.

Reproduce code:
---------------
<?php
function dump_node ($node)
{
        for (
                $child = $node->firstChild;
                $child !== null;
                $child = $child->nextSibling
        ) {
                printf ("NODE TYPE: %s\n", $child->nodeType);
                switch ($child->nodeType) {
                case XML_ELEMENT_NODE:
                        printf ("TYPE: ELEMENT, TAG: \"%s\"\n", 
$child->tagName);
                        dump_node ($child);
                        break;
                case XML_TEXT_NODE:
                        printf ("TYPE TEXT, TEXT: \"%s\"\n", htmlspecialchars
($child->wholeText));
                        break;
                }
        }
}

$html = <<<EOF
<html>
<body>
<table>
<tr>
<td>
          <h1>Left col</h1>Some generic text
</td>
</tr>
</table>
</body>
</html>
EOF;

$document = new DOMDocument ();
$document->resolveExternals = true;
$document->loadHTML ($html);
dump_node ($document);
?>


Expected result:
----------------
A dump of all document nodes and only one text node that has "Some
generic text" as data.

Actual result:
--------------
A dump of all document nodes and two text nodes that have "Some generic
text" as data.


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=41374&edit=1

Reply via email to