ID: 49958 Updated by: j...@php.net Reported By: mjong at magnafacta dot nl -Status: Open +Status: Feedback Bug Type: mbstring related Operating System: Windows Vista PHP Version: 5.2.11 New Comment:
Please read this page and especially the last two examples: http://www.php.net/manual/en/mbstring.configuration.php Are you using the proper options? Previous Comments: ------------------------------------------------------------------------ [2009-10-25 14:41:10] mjong at magnafacta dot nl [PLEASE ALLOW LONGER ENTRIES] Using strtoupper() all encodings create nonsense strings, but in half the cases the error can be tested using mb_check_encoding(). Strangely enough the 2-byte definitions including UTF-8 cannot be checked, while UTF-8 and the 4-byte encodings do OK. Using mb_strtoupper() a third of the encodings do a proper translation. Half of the encodings generate nonsense and none of them can be tested using mb_check_encoding(). The rest of the encodings fail when using them with mb_strtoupper(). With mb_strtoupper() UTF-8 and UTF-16 work OK, but now UTF-32 fails. "Stranger and stranger", Alice said. ------------------------------------------------------------------------ [2009-10-25 14:40:07] mjong at magnafacta dot nl Ah... I assumed that mb_check_encoding() will use mb_internal_encoding() when none is specified. Anyhow, I used UTF-8 as internal encoding, but I have just now tested it with all internal encodings (setting them using mb_internal_encoding()). ------------------------------------------------------------------------ [2009-10-23 10:21:00] j...@php.net mb_strtoupper() defaults to mb_internal_encoding(), so what does latter give you? If it's not the right encoding, then there is no bug here. And strtoupper() or ucfirst() or anything without mb_* in front of it aren't even supposed to work with such.. :) ------------------------------------------------------------------------ [2009-10-22 16:14:01] mjong at magnafacta dot nl Description: ------------ Using strtoupper, ms_strtoupper and derived functions like ucfirst produced incorrectly encoded strings when used on strings containing Japanese Hirigana or Katakana characters. As no uppercase versions of these characters exists they should be treated as e.g. numbers. Workaround: use mb_check_encoding to revert to the old string when this happens. Reproduce code: --------------- // $s = strtoupper('まてえす and ジョング'); $s = strtoupper( base64_decode('44Gm44GI44GZ'). ' and '. base64_decode('44K444On44Oz44Kw')); if (mb_check_encoding($s)) { echo $s; } else { echo 'Error'; } Expected result: ---------------- まてえす and ジョング Actual result: -------------- Error ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=49958&edit=1