ID:               31805
 Updated by:       [EMAIL PROTECTED]
 Reported By:      gullevek at gullevek dot org
-Status:           Open
+Status:           Bogus
 Bug Type:         mbstring related
 Operating System: gnu/linux
 PHP Version:      4.3.10
 New Comment:

Bogused by user request.



Previous Comments:
------------------------------------------------------------------------

[2005-02-11 02:39:51] gullevek at gullevek dot org

okay I will play around with the mbstring.language settings. But I
think that bug can be closed as false alarm from me. I am sorry.

------------------------------------------------------------------------

[2005-02-03 07:45:33] [EMAIL PROTECTED]

You look somewhat confused. First off, ISO-2022-JP is a 
"stateful" multibyte encoding and quite different from 
other stateless multibyte encodings such as Shift_JIS 
, EUC-JP and UTF-8.

What makes it "stateful" are escape sequences used to 
determine in which way consecutive octets following such 
an escape sequence are interpreted by the 
implementation.

With ISO-2022-JP, a single hiragana character most 
likely ends up with 8 bytes in a stream due to prepended 
"Shift-in" and appended "Shift-out" which are needed to 
switch the interpretation mode, to "JIS-kanji" and to 
"ASCII" respectively, while two hiragana characters 
would result in 10 bytes because those escape sequences 
are only needed when entering a chunk of multibyte 
"JIS-kanji" characters and leaving the chunk.

If the problem you are experiencing are actually caused 
by the wrong encoding detection, then setting 
mbstring.language to "Japanese" may fix it.

Encoding detection is based on a kind of heuristics, so 
its behaviour may vary between the releases.



------------------------------------------------------------------------

[2005-02-03 03:22:22] [EMAIL PROTECTED]

Please try using this CVS snapshot:

  http://snaps.php.net/php4-STABLE-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php4-win32-STABLE-latest.zip



------------------------------------------------------------------------

[2005-02-03 02:25:05] gullevek at gullevek dot org

one more comment.
the problem actually occoured, because mb_detect_enconding detects
utf-8, even if the string is iso-2022-jp

------------------------------------------------------------------------

[2005-02-03 02:20:43] gullevek at gullevek dot org

okay, it is not 100% a bug perhaps. problem is, if you have iso-2022-jp
encoded data, and you don't have default set, php doesn't read it
correctly (because iso-2022-jp is encoded very differently).
see example below. enter two characters, one 1 bit (eg a) and one two
bit (eg あ). then you will see, in the output with no iso set, the
length is wrong. But I don't know why 4.3.10 behaves different to 4.3.9
...

<?php
import_request_variables("p");
if ($send)
{
        echo "S: $string<br>";
        echo "D: ".mb_detect_encoding($string,"iso-2022-jp")."<br>";    
        echo strlen($string)." -- without iso: ".mb_strlen($string)." -- with
iso".mb_strlen($string,"iso-2022-jp")."<br>";
}
?>
<html><head>
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-2022-JP">
</head>
<body>
<form method="post" name="foo" enctype="multipart/form-data">
<input type="text" name="string" size="50" value="<? echo $string;
?>"><br>
<input type="submit" name="send" value="Send">
</form></body></html>

------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://bugs.php.net/31805

-- 
Edit this bug report at http://bugs.php.net/?id=31805&edit=1

Reply via email to