ID:               39269
 User updated by:  arturm at union dot com dot pl
 Reported By:      arturm at union dot com dot pl
 Status:           Open
 Bug Type:         DOM XML related
 Operating System: Windows
 PHP Version:      5.1.6
 New Comment:

Sorry, charset on bugs.php.net is not UTF-8. Please follow an original
thread on pl.comp.lang.php for source code:
http://groups.google.pl/group/pl.comp.lang.php/browse_frm/thread/e0de8a41d687aef3/d2c602e5ac1d40cb?hl=pl#d2c602e5ac1d40cb


Previous Comments:
------------------------------------------------------------------------

[2006-10-26 17:17:56] arturm at union dot com dot pl

Description:
------------
If you load HTML using DOM::loadHTML() wrong charset is used when non
US-ASCII characters are used in source before charset declaration in
meta tag.

Reproduce code:
---------------
<?php
header("Content-type: text/plain; charset=UTF-8");
$doc = new DOMDocument();
$doc->loadHTML('<title>&#261;</title>'
    .'<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">'
    .'<p>&#261;&#281;ó&#322;&#347;&#263;</p>');
echo $doc->encoding;
echo $doc->textContent;
?>

Expected result:
----------------
UTF-8&#261;&#281;ó&#322;&#347;&#263;

Actual result:
--------------
UTF-8Ä&#133;Ä&#133;Ä&#153;óÅ&#130;Å&#155;Ä&#135;


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=39269&edit=1

Reply via email to