ID: 50253 Updated by: j...@php.net Reported By: kanea at free dot fr -Status: Open +Status: Feedback Bug Type: DOM XML related Operating System: linux PHP Version: 5.2.6-1+lenny3 New Comment:
Please try using this snapshot: http://snaps.php.net/php5.2-latest.tar.gz For Windows: http://windows.php.net/snapshots/ Previous Comments: ------------------------------------------------------------------------ [2009-11-20 23:23:34] kanea at free dot fr I cannot test on another system ------------------------------------------------------------------------ [2009-11-20 23:01:27] kanea at free dot fr Description: ------------ I have the same problem with page from wikipedia. It seem that the loadhtml works with iso character in internal. Same bug that bug #32547 Reproduce code: --------------- this code works : $url="http://".$lang.".wikipedia.org/wiki/".$article; $this->dom=new DomDocument('1.0', 'UTF-8'); $str=file_get_contents($url); $this->dom->loadXML($str); $this->contenu = $this->dom->saveXml(); this code don't works : $url="http://".$lang.".wikipedia.org/wiki/".$article; $this->dom=new DomDocument('1.0', 'UTF-8'); $str=file_get_contents($url); $this->dom->loadHtml($str); $this->contenu = $this->dom->saveXml(); It seem that the loadhtml works with iso characters in internal. Expected result: ---------------- Code with utf-8 encoded characters Actual result: -------------- Code with bad characters ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=50253&edit=1