From: Operating system: Windows WAMP + LAMP(?) PHP version: 5.3.2 Package: DOM XML related Bug Type: Bug Bug description:DOMDocument::load() UTF-8 limitation
Description: ------------ The DOMDocument::load() function ONLY loads UTF-8 encoded files. Ex: 'article.php' contains : $xmlDoc = new DOMDocument(); $page = 'article.xsl'; $xmlDoc->load($page); $xmlDoc->load('cours.xml'); Let's consider 'article.xsl' contains '... Précédent ...' (not pure ASCII chars) If the content of 'article.xsl' is iso-8859-1 encoded, the subsequent error appears (same if 'cours.xml' is iso-8859-1 encoded): "DOMDocument::load() [domdocument.load]: Input is not proper UTF-8, indicate encoding ! Bytes: 0xE9 0x62 0x75 0x74 in file:///C:/wamp/www/xsl2/article.xsl, line: 71 in C:\wamp\www\xsl2\article.php on line 13" So, it's imperative to UTF-8 encode 'cours.xml' and 'article.xsl'. Of course $page = utf8_encode($page); ... is of no use, because the 'utf8_encode' only operates on the string 'article.xsl', and not on the file content !. CONCLUSION : It's not really a BUG in the ->load() function. But it would be really important to have a supplementary optional parameter, indicating the encoding of the incoming file: -----Desired improvment -----------> Add an optional parameter describing the $file actual encoding: $xmlDoc->load($page, 'iso-8859-1'); DOMDocument::load( string $file [, string $encoding]) The $encoding optional parameter thus would be useful to describe the actual $file encoding (if not UTF-8). ----------- END ---------------------- Test script: --------------- [test.php] <?php $xmlDoc = new DOMDocument(); $xmlDoc->load("cours.xml"); ?> [cours.xml] (no matter the line encoding... The problem is caused by the 'é' from 'éclair'...) <?xml version="1.0" encoding="UTF-8"?> <root> <chapitre titre="Titre du chapitre 1"> <partie titre="Titre de la partie 1"> Texte éclair </partie> </chapitre> </root> (displays): Warning: DOMDocument::load() [domdocument.load]: Input is not proper UTF-8, indicate encoding ! Bytes: 0xE9 0x63 0x6C 0x61 in file:///C:/wamp/www/xsl2/cours.xml, line: 5 in C:\wamp\www\xsl2\test.php on line 3 -- Edit bug report at http://bugs.php.net/bug.php?id=51325&edit=1 -- Try a snapshot (PHP 5.2): http://bugs.php.net/fix.php?id=51325&r=trysnapshot52 Try a snapshot (PHP 5.3): http://bugs.php.net/fix.php?id=51325&r=trysnapshot53 Try a snapshot (PHP 6.0): http://bugs.php.net/fix.php?id=51325&r=trysnapshot60 Fixed in SVN: http://bugs.php.net/fix.php?id=51325&r=fixed Fixed in SVN and need be documented: http://bugs.php.net/fix.php?id=51325&r=needdocs Fixed in release: http://bugs.php.net/fix.php?id=51325&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=51325&r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=51325&r=needscript Try newer version: http://bugs.php.net/fix.php?id=51325&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=51325&r=support Expected behavior: http://bugs.php.net/fix.php?id=51325&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=51325&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=51325&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=51325&r=globals PHP 4 support discontinued: http://bugs.php.net/fix.php?id=51325&r=php4 Daylight Savings: http://bugs.php.net/fix.php?id=51325&r=dst IIS Stability: http://bugs.php.net/fix.php?id=51325&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=51325&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=51325&r=float No Zend Extensions: http://bugs.php.net/fix.php?id=51325&r=nozend MySQL Configuration Error: http://bugs.php.net/fix.php?id=51325&r=mysqlcfg