#28220 [Fbk->NoF]: mb_strwidth() returns wrong width values for some Hangul characters.
ID: 28220 Updated by: [EMAIL PROTECTED] Reported By: martin dot t dot kutschker at blackbox dot net -Status: Feedback +Status: No Feedback Bug Type:mbstring related PHP Version: Irrelevant New Comment: No feedback was provided for this bug for over a week, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open". Previous Comments: [2004-07-12 08:52:33] [EMAIL PROTECTED] We're still waiting for feedback, so leave it at that state. [2004-07-10 20:41:42] martin dot t dot kutschker at blackbox dot net I never tried the original code (only noticed the problem from reading the docs), so I did not test the diff. Anyway I'm offline for two weeks, so I won't be able to give the fix a try for some time. [2004-07-07 01:00:04] php-bugs at lists dot php dot net No feedback was provided for this bug for over a week, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open". [2004-06-29 14:25:26] [EMAIL PROTECTED] Try this patch and see if it works. http://www.voltex.jp/patches/bug28220- preliminary.patch.diff This patch is only applicable for PHP 4.3.2 or later. ~/src/php-4.3.7 $ patch -p0 -R < bug28220- preliminary.patch.diff [2004-05-04 11:53:53] martin dot t dot kutschker at blackbox dot net I rechecked EastAsianWidth and have found two more wide chars and noticed that the range 2E80..4DB5 is in fact split by a single half-width filler space char 1100..115F Hangul Choseong 2329LEFT-POINTING ANGLE BRACKET 232ARIGHT-POINTING ANGLE BRACKET 2E80-303E CJK and Kangxi radicals, ideographic chars 3041-4DB5 Hiragana, Katakana, Bopomofo and Hangul letters 4E00..D7A3 CJK ideographs, Yi and Hangul syllables F900..FA6A CJK compatibiliy ideographs FE30..FE6B presentation forms, punctuations, etc. FF01..FF60 full-width Latin letters FFE0FULLWIDTH CENT SIGN FFE1FULLWIDTH POUND SIGN FFE2FULLWIDTH NOT SIGN FFE3FULLWIDTH MACRON FFE4FULLWIDTH BROKEN BAR FFE5FULLWIDTH YEN SIGN FFE6FULLWIDTH WON SIGN Please also note that Unicode knows about "ambigous" (A) chars. See quotes from http://www.unicode.org/reports/tr11/ "In a broad sense, wide characters include W, F, and A (when in EA context), while narrow characters include N, Na, H, and A (when not in EA context)." "Ambiguous characters behave like wide or narrow characters depending on context (language tag, script identification, associated font, source of data, or explicit markup; all can provide the context). If the context cannot be established reliably they should be treated as narrow characters by default." So mb_strwidth could try to auto-detect the context (eg. by locale) or have an optional east-asian context argument. The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/28220 -- Edit this bug report at http://bugs.php.net/?id=28220&edit=1
#28220 [Fbk->NoF]: mb_strwidth() returns wrong width values for some Hangul characters.
ID: 28220 Updated by: [EMAIL PROTECTED] Reported By: martin dot t dot kutschker at blackbox dot net -Status: Feedback +Status: No Feedback Bug Type:mbstring related PHP Version: Irrelevant New Comment: No feedback was provided for this bug for over a week, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open". Previous Comments: [2004-06-29 14:25:26] [EMAIL PROTECTED] Try this patch and see if it works. http://www.voltex.jp/patches/bug28220- preliminary.patch.diff This patch is only applicable for PHP 4.3.2 or later. ~/src/php-4.3.7 $ patch -p0 -R < bug28220- preliminary.patch.diff [2004-05-04 11:53:53] martin dot t dot kutschker at blackbox dot net I rechecked EastAsianWidth and have found two more wide chars and noticed that the range 2E80..4DB5 is in fact split by a single half-width filler space char 1100..115F Hangul Choseong 2329LEFT-POINTING ANGLE BRACKET 232ARIGHT-POINTING ANGLE BRACKET 2E80-303E CJK and Kangxi radicals, ideographic chars 3041-4DB5 Hiragana, Katakana, Bopomofo and Hangul letters 4E00..D7A3 CJK ideographs, Yi and Hangul syllables F900..FA6A CJK compatibiliy ideographs FE30..FE6B presentation forms, punctuations, etc. FF01..FF60 full-width Latin letters FFE0FULLWIDTH CENT SIGN FFE1FULLWIDTH POUND SIGN FFE2FULLWIDTH NOT SIGN FFE3FULLWIDTH MACRON FFE4FULLWIDTH BROKEN BAR FFE5FULLWIDTH YEN SIGN FFE6FULLWIDTH WON SIGN Please also note that Unicode knows about "ambigous" (A) chars. See quotes from http://www.unicode.org/reports/tr11/ "In a broad sense, wide characters include W, F, and A (when in EA context), while narrow characters include N, Na, H, and A (when not in EA context)." "Ambiguous characters behave like wide or narrow characters depending on context (language tag, script identification, associated font, source of data, or explicit markup; all can provide the context). If the context cannot be established reliably they should be treated as narrow characters by default." So mb_strwidth could try to auto-detect the context (eg. by locale) or have an optional east-asian context argument. [2004-05-01 15:30:09] [EMAIL PROTECTED] This is a valid bug. # thanks Nuno. [2004-04-29 18:48:17] martin dot t dot kutschker at blackbox dot net Description: The table describing the width of the characters is wrong if you compare it with the table for Unicode 4.0: http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt For the BMP the wide/full-width chars are: 1100..115F Hangul Choseong 2E80..4DB5 CJK radicals and CJK Ideograph Extension A 4E00..D7A3 CJK Ideographs, Yi syll. and Hangul syll. F900..FA6A CJK compatibiliy ideographs FE30..FE6B presentation forms, punctuations, etc. FF01..FF60 full-width Latin letters FFE0FULLWIDTH CENT SIGN FFE1FULLWIDTH POUND SIGN FFE2FULLWIDTH NOT SIGN FFE3FULLWIDTH MACRON FFE4FULLWIDTH BROKEN BAR FFE5FULLWIDTH YEN SIGN FFE6FULLWIDTH WON SIGN I didn't check what the actual implementation does, but the docs are certainly wrong (if they mean Unicoe codepoints). -- Edit this bug report at http://bugs.php.net/?id=28220&edit=1