ID: 28220 Updated by: [EMAIL PROTECTED] Reported By: martin dot t dot kutschker at blackbox dot net -Status: Open +Status: Feedback Bug Type: mbstring related PHP Version: Irrelevant New Comment:
Try this patch and see if it works. http://www.voltex.jp/patches/bug28220- preliminary.patch.diff This patch is only applicable for PHP 4.3.2 or later. ~/src/php-4.3.7 $ patch -p0 -R < bug28220- preliminary.patch.diff Previous Comments: ------------------------------------------------------------------------ [2004-05-04 11:53:53] martin dot t dot kutschker at blackbox dot net I rechecked EastAsianWidth and have found two more wide chars and noticed that the range 2E80..4DB5 is in fact split by a single half-width filler space char 1100..115F Hangul Choseong 2329 LEFT-POINTING ANGLE BRACKET 232A RIGHT-POINTING ANGLE BRACKET 2E80-303E CJK and Kangxi radicals, ideographic chars 3041-4DB5 Hiragana, Katakana, Bopomofo and Hangul letters 4E00..D7A3 CJK ideographs, Yi and Hangul syllables F900..FA6A CJK compatibiliy ideographs FE30..FE6B presentation forms, punctuations, etc. FF01..FF60 full-width Latin letters FFE0 FULLWIDTH CENT SIGN FFE1 FULLWIDTH POUND SIGN FFE2 FULLWIDTH NOT SIGN FFE3 FULLWIDTH MACRON FFE4 FULLWIDTH BROKEN BAR FFE5 FULLWIDTH YEN SIGN FFE6 FULLWIDTH WON SIGN Please also note that Unicode knows about "ambigous" (A) chars. See quotes from http://www.unicode.org/reports/tr11/ "In a broad sense, wide characters include W, F, and A (when in EA context), while narrow characters include N, Na, H, and A (when not in EA context)." "Ambiguous characters behave like wide or narrow characters depending on context (language tag, script identification, associated font, source of data, or explicit markup; all can provide the context). If the context cannot be established reliably they should be treated as narrow characters by default." So mb_strwidth could try to auto-detect the context (eg. by locale) or have an optional east-asian context argument. ------------------------------------------------------------------------ [2004-05-01 15:30:09] [EMAIL PROTECTED] This is a valid bug. # thanks Nuno. ------------------------------------------------------------------------ [2004-04-29 18:48:17] martin dot t dot kutschker at blackbox dot net Description: ------------ The table describing the width of the characters is wrong if you compare it with the table for Unicode 4.0: http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt For the BMP the wide/full-width chars are: 1100..115F Hangul Choseong 2E80..4DB5 CJK radicals and CJK Ideograph Extension A 4E00..D7A3 CJK Ideographs, Yi syll. and Hangul syll. F900..FA6A CJK compatibiliy ideographs FE30..FE6B presentation forms, punctuations, etc. FF01..FF60 full-width Latin letters FFE0 FULLWIDTH CENT SIGN FFE1 FULLWIDTH POUND SIGN FFE2 FULLWIDTH NOT SIGN FFE3 FULLWIDTH MACRON FFE4 FULLWIDTH BROKEN BAR FFE5 FULLWIDTH YEN SIGN FFE6 FULLWIDTH WON SIGN I didn't check what the actual implementation does, but the docs are certainly wrong (if they mean Unicoe codepoints). ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=28220&edit=1