ID: 49958 User updated by: mjong at magnafacta dot nl Reported By: mjong at magnafacta dot nl -Status: Feedback +Status: Open Bug Type: mbstring related Operating System: Windows Vista PHP Version: 5.2.11 New Comment:
mbstring.func_overload = 0. Setting it to 2 or 7 just means there is no difference between calling sttrtoupper() and mb_strtoupper(), but ucfirst() still produces incorrectly coded strings. This is using UTF- 8 for internal encoding. Using UTF-16 and UTF-32 basically crashes the program. mbstring.strict_detection = Off. Setting it to On does have no effect whatsoever. I.e. if I check with phpinfo() it stays off, no matter what I specify as value. This is not the case with other php.ini settings. Besides there is no documentation on its effect. I really researched this problem - but cannot not show you because then this form says I am spammming - but the crux of this problem is that UTF-8, UTF-16 and UTF-32 each behave differently when used with strtoupper() and mb_strtoupper() and all can result in string that are not valid encodings. I have a working solution for my case, but I think this is a solvable bug. Previous Comments: ------------------------------------------------------------------------ [2009-10-26 08:46:12] j...@php.net Please read this page and especially the last two examples: http://www.php.net/manual/en/mbstring.configuration.php Are you using the proper options? ------------------------------------------------------------------------ [2009-10-25 14:41:10] mjong at magnafacta dot nl [PLEASE ALLOW LONGER ENTRIES] Using strtoupper() all encodings create nonsense strings, but in half the cases the error can be tested using mb_check_encoding(). Strangely enough the 2-byte definitions including UTF-8 cannot be checked, while UTF-8 and the 4-byte encodings do OK. Using mb_strtoupper() a third of the encodings do a proper translation. Half of the encodings generate nonsense and none of them can be tested using mb_check_encoding(). The rest of the encodings fail when using them with mb_strtoupper(). With mb_strtoupper() UTF-8 and UTF-16 work OK, but now UTF-32 fails. "Stranger and stranger", Alice said. ------------------------------------------------------------------------ [2009-10-25 14:40:07] mjong at magnafacta dot nl Ah... I assumed that mb_check_encoding() will use mb_internal_encoding() when none is specified. Anyhow, I used UTF-8 as internal encoding, but I have just now tested it with all internal encodings (setting them using mb_internal_encoding()). ------------------------------------------------------------------------ [2009-10-23 10:21:00] j...@php.net mb_strtoupper() defaults to mb_internal_encoding(), so what does latter give you? If it's not the right encoding, then there is no bug here. And strtoupper() or ucfirst() or anything without mb_* in front of it aren't even supposed to work with such.. :) ------------------------------------------------------------------------ [2009-10-22 16:14:01] mjong at magnafacta dot nl Description: ------------ Using strtoupper, ms_strtoupper and derived functions like ucfirst produced incorrectly encoded strings when used on strings containing Japanese Hirigana or Katakana characters. As no uppercase versions of these characters exists they should be treated as e.g. numbers. Workaround: use mb_check_encoding to revert to the old string when this happens. Reproduce code: --------------- // $s = strtoupper('まてえす and ジョング'); $s = strtoupper( base64_decode('44Gm44GI44GZ'). ' and '. base64_decode('44K444On44Oz44Kw')); if (mb_check_encoding($s)) { echo $s; } else { echo 'Error'; } Expected result: ---------------- まてえす and ジョング Actual result: -------------- Error ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=49958&edit=1