#49958 [Opn-Fbk]: mb_strtoupper fails on Japanese characters

2009-10-26 Thread jani
 ID:   49958
 Updated by:   j...@php.net
 Reported By:  mjong at magnafacta dot nl
-Status:   Open
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: Windows Vista
 PHP Version:  5.2.11
 New Comment:

Please read this page and especially the last two examples:
http://www.php.net/manual/en/mbstring.configuration.php 

Are you using the proper options?


Previous Comments:


[2009-10-25 14:41:10] mjong at magnafacta dot nl

[PLEASE ALLOW LONGER ENTRIES] 

Using strtoupper() all encodings create nonsense strings, but in half 
the cases the error can be tested using mb_check_encoding(). 
Strangely enough the 2-byte definitions including UTF-8 cannot be 
checked, while UTF-8 and the 4-byte encodings do OK.

Using mb_strtoupper() a third of the encodings do a proper 
translation. Half of the encodings generate nonsense and none of them 
can be tested using mb_check_encoding(). The rest of the encodings 
fail when using them with mb_strtoupper(). 
With mb_strtoupper() UTF-8 and UTF-16 work OK, but now UTF-32 fails. 
Stranger and stranger, Alice said.



[2009-10-25 14:40:07] mjong at magnafacta dot nl

Ah... I assumed that mb_check_encoding() will use 
mb_internal_encoding() when none is specified.

Anyhow, I used UTF-8 as internal encoding, but I have just now tested 
it with all internal encodings (setting them using 
mb_internal_encoding()).



[2009-10-23 10:21:00] j...@php.net

mb_strtoupper() defaults to mb_internal_encoding(), so what does latter
give you? If it's not the right encoding, then there is no bug here. And
strtoupper() or ucfirst() or anything without mb_* in front of it aren't
even supposed to work with such.. :)



[2009-10-22 16:14:01] mjong at magnafacta dot nl

Description:

Using strtoupper, ms_strtoupper and derived functions like ucfirst 
produced incorrectly encoded strings when used on strings containing 
Japanese Hirigana or Katakana characters.

As no uppercase versions of these characters exists they should be 
treated as e.g. numbers.

Workaround: use mb_check_encoding to revert to the old string when this

happens.

Reproduce code:
---
// $s = strtoupper('#12414;#12390;#12360;#12377; and
#12472;#12519;#12531;#12464;');
$s = strtoupper(
   base64_decode('44Gm44GI44GZ').
   ' and '.
   base64_decode('44K444On44Oz44Kw'));


if (mb_check_encoding($s)) {
  echo $s;
} else {
  echo 'Error';
}


Expected result:

#12414;#12390;#12360;#12377; and #12472;#12519;#12531;#12464;

Actual result:
--
Error





-- 
Edit this bug report at http://bugs.php.net/?id=49958edit=1



#49958 [Opn-Fbk]: mb_strtoupper fails on Japanese characters

2009-10-26 Thread jani
 ID:   49958
 Updated by:   j...@php.net
 Reported By:  mjong at magnafacta dot nl
-Status:   Open
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: Windows Vista
 PHP Version:  5.2.11
 New Comment:

Ever heard of http://www.pastebin.com/ ?


Previous Comments:


[2009-10-26 10:12:51] mjong at magnafacta dot nl

mbstring.func_overload = 0. Setting it to 2 or 7 just means there is 
no difference between calling sttrtoupper() and mb_strtoupper(), but 
ucfirst() still produces incorrectly coded strings. This is using UTF-
8 for internal encoding. Using UTF-16 and UTF-32 basically crashes the

program. 

mbstring.strict_detection = Off. Setting it to On does have no effect 
whatsoever. I.e. if I check with phpinfo() it stays off, no matter 
what I specify as value. This is not the case with other php.ini 
settings. Besides there is no documentation on its effect.

I really researched this problem - but cannot not show you because 
then this form says I am spammming - but the crux of this problem is 
that UTF-8, UTF-16 and UTF-32 each behave differently when used with 
strtoupper() and mb_strtoupper() and all can result in string that are

not valid encodings.

I have a working solution for my case, but I think this is a solvable 
bug.



[2009-10-26 08:46:12] j...@php.net

Please read this page and especially the last two examples:
http://www.php.net/manual/en/mbstring.configuration.php 

Are you using the proper options?



[2009-10-25 14:41:10] mjong at magnafacta dot nl

[PLEASE ALLOW LONGER ENTRIES] 

Using strtoupper() all encodings create nonsense strings, but in half 
the cases the error can be tested using mb_check_encoding(). 
Strangely enough the 2-byte definitions including UTF-8 cannot be 
checked, while UTF-8 and the 4-byte encodings do OK.

Using mb_strtoupper() a third of the encodings do a proper 
translation. Half of the encodings generate nonsense and none of them 
can be tested using mb_check_encoding(). The rest of the encodings 
fail when using them with mb_strtoupper(). 
With mb_strtoupper() UTF-8 and UTF-16 work OK, but now UTF-32 fails. 
Stranger and stranger, Alice said.



[2009-10-25 14:40:07] mjong at magnafacta dot nl

Ah... I assumed that mb_check_encoding() will use 
mb_internal_encoding() when none is specified.

Anyhow, I used UTF-8 as internal encoding, but I have just now tested 
it with all internal encodings (setting them using 
mb_internal_encoding()).



[2009-10-23 10:21:00] j...@php.net

mb_strtoupper() defaults to mb_internal_encoding(), so what does latter
give you? If it's not the right encoding, then there is no bug here. And
strtoupper() or ucfirst() or anything without mb_* in front of it aren't
even supposed to work with such.. :)



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/49958

-- 
Edit this bug report at http://bugs.php.net/?id=49958edit=1



#49958 [Opn-Fbk]: mb_strtoupper fails on Japanese characters

2009-10-23 Thread jani
 ID:   49958
 Updated by:   j...@php.net
 Reported By:  mjong at magnafacta dot nl
-Status:   Open
+Status:   Feedback
 Bug Type: mbstring related
 Operating System: Windows Vista
 PHP Version:  5.2.11
 New Comment:

mb_strtoupper() defaults to mb_internal_encoding(), so what does latter
give you? If it's not the right encoding, then there is no bug here. And
strtoupper() or ucfirst() or anything without mb_* in front of it aren't
even supposed to work with such.. :)


Previous Comments:


[2009-10-22 16:14:01] mjong at magnafacta dot nl

Description:

Using strtoupper, ms_strtoupper and derived functions like ucfirst 
produced incorrectly encoded strings when used on strings containing 
Japanese Hirigana or Katakana characters.

As no uppercase versions of these characters exists they should be 
treated as e.g. numbers.

Workaround: use mb_check_encoding to revert to the old string when this

happens.

Reproduce code:
---
// $s = strtoupper('#12414;#12390;#12360;#12377; and
#12472;#12519;#12531;#12464;');
$s = strtoupper(
   base64_decode('44Gm44GI44GZ').
   ' and '.
   base64_decode('44K444On44Oz44Kw'));


if (mb_check_encoding($s)) {
  echo $s;
} else {
  echo 'Error';
}


Expected result:

#12414;#12390;#12360;#12377; and #12472;#12519;#12531;#12464;

Actual result:
--
Error





-- 
Edit this bug report at http://bugs.php.net/?id=49958edit=1