From: bart at mediawave dot nl Operating system: WinXP PHP version: 5.0.0RC2 PHP Bug Type: DOM XML related Bug description: PHP PI problem with dom->loadHTML
Description: ------------ When loading a W3C valid HTML 4.01 html string with DOMDocument->loadHTML, DOM has trouble with php Processing Instructions (<?php ... ?>). html string Is Valid HTML 4.01 Transitional: http://validator.w3.org/check?uri=http%3A%2F%2Fwww.mediawave.nl%2Fhtmlfile.htm&charset=%28detect+automatically%29&doctype=%28detect+automatically%29&ss=1&verbose=1 Reproduce code: --------------- <?php $html = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Untitled Document</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> </head> <body> <p><?php echo "hello? world? are you there? Can you see me? :(" ?></p> </body> </html>'; $dom = new DomDocument; $dom->loadHTML($html); echo '<pre>', htmlspecialchars($dom->saveHTML()), '</pre>'; ?> Expected result: ---------------- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Untitled Document</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> </head> <body><p><?php echo "hello? world? are you there? Can you see me? :(" ?></p></body> </html> Actual result: -------------- Warning: DOMDocument::loadHTML() [function.loadHTML]: htmlParseStartTag: invalid element name in Entity, line: 9 in D:\Inetpub\wwwroot\test2.php on line 24 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Untitled Document</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> </head> <body><p></p></body> </html> -- Edit bug report at http://bugs.php.net/?id=28628&edit=1 -- Try a CVS snapshot (php4): http://bugs.php.net/fix.php?id=28628&r=trysnapshot4 Try a CVS snapshot (php5): http://bugs.php.net/fix.php?id=28628&r=trysnapshot5 Fixed in CVS: http://bugs.php.net/fix.php?id=28628&r=fixedcvs Fixed in release: http://bugs.php.net/fix.php?id=28628&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=28628&r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=28628&r=needscript Try newer version: http://bugs.php.net/fix.php?id=28628&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=28628&r=support Expected behavior: http://bugs.php.net/fix.php?id=28628&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=28628&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=28628&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=28628&r=globals PHP 3 support discontinued: http://bugs.php.net/fix.php?id=28628&r=php3 Daylight Savings: http://bugs.php.net/fix.php?id=28628&r=dst IIS Stability: http://bugs.php.net/fix.php?id=28628&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=28628&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=28628&r=float