#49958 [Opn->Fbk]: mb_strtoupper fails on Japanese characters

jani Mon, 26 Oct 2009 01:46:24 -0700

 ID:               49958
 Updated by:       [email protected]
 Reported By:      mjong at magnafacta dot nl
-Status:           Open
+Status:           Feedback
 Bug Type:         mbstring related
 Operating System: Windows Vista
 PHP Version:      5.2.11
 New Comment:


Please read this page and especially the last two examples:
http://www.php.net/manual/en/mbstring.configuration.php 

Are you using the proper options?


Previous Comments:
------------------------------------------------------------------------

[2009-10-25 14:41:10] mjong at magnafacta dot nl

[PLEASE ALLOW LONGER ENTRIES] 

Using strtoupper() all encodings create nonsense strings, but in half 
the cases the error can be tested using mb_check_encoding(). 
Strangely enough the 2-byte definitions including UTF-8 cannot be 
checked, while UTF-8 and the 4-byte encodings do OK.

Using mb_strtoupper() a third of the encodings do a proper 
translation. Half of the encodings generate nonsense and none of them 
can be tested using mb_check_encoding(). The rest of the encodings 
fail when using them with mb_strtoupper(). 
With mb_strtoupper() UTF-8 and UTF-16 work OK, but now UTF-32 fails. 
"Stranger and stranger", Alice said.

------------------------------------------------------------------------

[2009-10-25 14:40:07] mjong at magnafacta dot nl

Ah... I assumed that mb_check_encoding() will use 
mb_internal_encoding() when none is specified.

Anyhow, I used UTF-8 as internal encoding, but I have just now tested 
it with all internal encodings (setting them using 
mb_internal_encoding()).

------------------------------------------------------------------------

[2009-10-23 10:21:00] [email protected]

mb_strtoupper() defaults to mb_internal_encoding(), so what does latter
give you? If it's not the right encoding, then there is no bug here. And
strtoupper() or ucfirst() or anything without mb_* in front of it aren't
even supposed to work with such.. :)

------------------------------------------------------------------------

[2009-10-22 16:14:01] mjong at magnafacta dot nl

Description:
------------
Using strtoupper, ms_strtoupper and derived functions like ucfirst 
produced incorrectly encoded strings when used on strings containing 
Japanese Hirigana or Katakana characters.

As no uppercase versions of these characters exists they should be 
treated as e.g. numbers.

Workaround: use mb_check_encoding to revert to the old string when this

happens.

Reproduce code:
---------------
// $s = strtoupper('&#12414;&#12390;&#12360;&#12377; and
&#12472;&#12519;&#12531;&#12464;');
$s = strtoupper(
   base64_decode('44Gm44GI44GZ').
   ' and '.
   base64_decode('44K444On44Oz44Kw'));


if (mb_check_encoding($s)) {
  echo $s;
} else {
  echo 'Error';
}


Expected result:
----------------
&#12414;&#12390;&#12360;&#12377; and &#12472;&#12519;&#12531;&#12464;

Actual result:
--------------
Error


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=49958&edit=1

#49958 [Opn->Fbk]: mb_strtoupper fails on Japanese characters

Reply via email to