ID: 48210
Updated by: [email protected]
Reported By: nilon at kartio dot org
-Status: Open
+Status: Bogus
Bug Type: mbstring related
Operating System: Debian Lenny
PHP Version: 5.2.9
New Comment:
Please note that encoding detection is not always perfect.
Especially, when the string is too short, the wrong detection might be
caused.
Previous Comments:
------------------------------------------------------------------------
[2009-05-09 17:58:09] nilon at kartio dot org
With strict option result is:
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
string(5) "UTF-8"
Still last one should return false.
------------------------------------------------------------------------
[2009-05-09 17:53:42] nilon at kartio dot org
Description:
------------
mb_detect_encoding detects latin1 'รค' as UTF-8 when it clearly isn't
multibyte character.
Reproduce code:
---------------
<?php
var_dump(mb_detect_encoding("\xe4", 'UTF-8, ISO-8859-1'));
var_dump(mb_detect_encoding("\xe4", 'ISO-8859-1, UTF-8'));
var_dump(mb_detect_encoding("\xe4", 'ISO-8859-1'));
var_dump(mb_detect_encoding("\xe4", 'UTF-8'));
?>
Expected result:
----------------
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
bool "false"
Actual result:
--------------
string(5) "UTF-8"
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
string(5) "UTF-8"
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=48210&edit=1