mbstring does not support numeric entities in HTML code. For example: echo urlencode( mb_convert_encoding("Е", "UTF-8", "HTML-ENTITIES") );
displays %F2%AF%B8%9F rather than the expected %D0%95. This bug was detected by Nick Wedd <[EMAIL PROTECTED]> and reported in the newsgroup comp.lang.php, Message-ID: <[EMAIL PROTECTED]>. I'd found the bug in the file ext/mbstring/libmbfl/filters/mbfilter_htmlent.c and added these features: - decode hex entities &xHHHH; - detect invalid digits - detect digits missing at all - detect values out of the range 0-0xffff Invalid values are returned verbatim. Apparently the right place for this patch should be http://cvs.sourceforge.jp/cgi-bin/viewcvs.cgi/php-i18n/ but currently the project isn't no more hosted there. The patch for ext/mbstring/libmbfl/filters/mbfilter_htmlent.c follows: 173a174,217 > static int mbfl_decode_numeric_entity(char *s, int s_len) > /* > s = numeric entity "ddd" or "xhhhh" > return: numeric value or -1 if not inside [0,0xffff] or invalid digits > */ > { > int ent, pos, c, d; > > ent = 0; > > if (*s == 'x' || *s == 'X') { > /* hexadecimal base */ > if ( s_len < 2 ) > return -1; /* no digits found */ > for (pos=1; pos<s_len; pos++) { > c = s[pos]; > if (isdigit(c)) > d = c - '0'; > else if (isxdigit(c)) > d = tolower(c) - 'a' + 10; > else > return -1; /* invalid hex digit */ > ent = (ent << 4) + d; > if (ent > 0xffff) > return -1; /* too big */ > } > > } else { > /* decimal base */ > if ( s_len < 1 ) > return -1; /* no digits found */ > for (pos=0; pos<s_len; pos++) { > c = s[pos]; > if (! isdigit(c) ) > return -1; /* invalid dec char */ > ent = ent*10 + (c - '0'); > if (ent > 0xffff) > return -1; /* too big */ > } > } > > return ent; > } > 192,193c236,246 < for (pos=2; pos<filter->status; pos++) { < ent = ent*10 + (buffer[pos] - '0'); --- > ent = mbfl_decode_numeric_entity(&buffer[2], > filter->status - 2); > if( ent >= 0 ){ > CK((*filter->output_function)(ent, > filter->data)); > filter->status = 0; > /*php_error_docref("ref.mbstring" > TSRMLS_CC, E_NOTICE, "mbstring decoded '%s'=%d", buffer, ent);*/ > } else { > /* failure */ > buffer[filter->status++] = ';'; > buffer[filter->status] = 0; > /* php_error_docref("ref.mbstring" > TSRMLS_CC, E_WARNING, "mbstring cannot decode '%s'", buffer); */ > mbfl_filt_conv_html_dec_flush(filter); 195,197d247 < CK((*filter->output_function)(ent, filter->data)); < filter->status = 0; < /*php_error_docref("ref.mbstring" TSRMLS_CC, E_NOTICE, "mbstring decoded '%s'=%d", buffer, ent);*/ Best regards, ___ /_|_\ Umberto Salsi \/_\/ www.icosaedro.it -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php