ID: 35711 Updated by: [EMAIL PROTECTED] Reported By: matteo at beccati dot com -Status: Analyzed +Status: Wont fix Bug Type: mbstring related Operating System: Debian GNU/Linux PHP Version: 5.1.1 Assigned To: hirokawa
Previous Comments: ------------------------------------------------------------------------ [2005-12-20 15:44:31] [EMAIL PROTECTED] Please note that encoding detection is not always perfect. Especially, when the string is too short, the wrong detection might be caused. In your case, it is not a bug, but it is the specification. UTF-8 is a variable length multibyte encoding format, the length of a character in UTF-8 is from one to six. Please look at ext/mbstring/libmbfl/filter/mbfilter_utf8.c:about 249L. 0xe8 is a valid byte sequence as the 1st character of 3 byte code. We cannot detect 0xe8 is ISO-8859-1 or UTF-8, because this byte is valid in both encodings. In this case, the response will be choose from the order defined by mb_detect_order(). I suggest to use the sufficient length of string for the reliable encoding detection. ------------------------------------------------------------------------ [2005-12-19 09:03:36] [EMAIL PROTECTED] Rui, can you check this out please? ------------------------------------------------------------------------ [2005-12-19 09:00:50] matteo at beccati dot com Oops, I just realized that I forgot the -u flag :) Here is the downlaodable patch: http://beccati.com/download/mbstring-patch-20051219.txt ------------------------------------------------------------------------ [2005-12-19 08:48:47] [EMAIL PROTECTED] Please provide any patches in unified diff format. (like the first one). And downloadable somewhere. ------------------------------------------------------------------------ [2005-12-16 23:50:13] matteo at beccati dot com I've made a patch which seems to fix the issue. It basicly checks filter status during judgement. Status seems to be != 0 only when it is matching a multibyte character. I added anyway a fallback to the old judgement routine, just in case no matching encoding is found. Index: ext/mbstring/libmbfl/mbfl/mbfilter.c =================================================================== RCS file: /repository/php-src/ext/mbstring/libmbfl/mbfl/mbfilter.c,v retrieving revision 1.7.2.1 diff -u -r1.7.2.1 mbfilter.c --- ext/mbstring/libmbfl/mbfl/mbfilter.c 5 Nov 2005 04:49:57 -0000 1.7.2.1 +++ ext/mbstring/libmbfl/mbfl/mbfilter.c 16 Dec 2005 22:46:26 -0000 @@ -575,12 +575,22 @@ for (i = 0; i < num; i++) { filter = &flist[i]; - if (!filter->flag) { + if (!filter->flag && !filter->status) { encoding = filter->encoding; break; } } + if (!encoding) { + for (i = 0; i < num; i++) { + filter = &flist[i]; + if (!filter->flag) { + encoding = filter->encoding; + break; + } + } + } + /* cleanup */ /* dtors should be called in reverse order */ i = num; while (--i >= 0) { ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/35711 -- Edit this bug report at http://bugs.php.net/?id=35711&edit=1