ID: 28628 Updated by: [EMAIL PROTECTED] Reported By: bart at mediawave dot nl -Status: Open +Status: Bogus Bug Type: DOM XML related Operating System: WinXP PHP Version: 5.0.0RC2 New Comment:
Sorry, but your problem does not imply a bug in PHP itself. For a list of more appropriate places to ask for help using PHP, please visit http://www.php.net/support.php as this bug system is not the appropriate forum for asking support questions. Thank you for your interest in PHP. HTML4 does not know anything about processing instructions. Therefore the HTML parser of libxml2 chokes on that (and PHP can't change that). Make XHTML out of it and use the XML parser (with loadXML() ), then it works Previous Comments: ------------------------------------------------------------------------ [2004-06-04 00:46:52] bart at mediawave dot nl Description: ------------ When loading a W3C valid HTML 4.01 html string with DOMDocument->loadHTML, DOM has trouble with php Processing Instructions (<?php ... ?>). html string Is Valid HTML 4.01 Transitional: http://validator.w3.org/check?uri=http%3A%2F%2Fwww.mediawave.nl%2Fhtmlfile.htm&charset=%28detect+automatically%29&doctype=%28detect+automatically%29&ss=1&verbose=1 Reproduce code: --------------- <?php $html = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Untitled Document</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> </head> <body> <p><?php echo "hello? world? are you there? Can you see me? :(" ?></p> </body> </html>'; $dom = new DomDocument; $dom->loadHTML($html); echo '<pre>', htmlspecialchars($dom->saveHTML()), '</pre>'; ?> Expected result: ---------------- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Untitled Document</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> </head> <body><p><?php echo "hello? world? are you there? Can you see me? :(" ?></p></body> </html> Actual result: -------------- Warning: DOMDocument::loadHTML() [function.loadHTML]: htmlParseStartTag: invalid element name in Entity, line: 9 in D:\Inetpub\wwwroot\test2.php on line 24 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Untitled Document</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> </head> <body><p></p></body> </html> ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=28628&edit=1