Re: [PHP-DEV] Re: Alternative mbstring implementation using ICU

Moriyoshi Koizumi Fri, 31 Jul 2009 16:04:39 -0700

Hi,

On Sat, Aug 1, 2009 at 1:37 AM, Stanislav Malyshev<s...@zend.com> wrote:
> Hi!
>
>>> mb_str* - shouldn't you in 6 just convert them to unicode and do all
>>> string
>>> operations with Unicode strings? Also, in 5 isn't there some intersection
>>> with grapheme_* functions?
>>
>> mb_strwidth() and mb_strimwidth() are not covered.
>
> True. I wonder what this function is useful for?


They calculate the total width of a string based on "east asian width"
property, which is still valid to give a rough measurement of the
rendered string.

>
>>> mb_output_handler - shouldn't setting the proper encoding in 6 do the
>>> same job?
>>> mb_convert_encoding - don't we already have a number of functions that do
>>> encoding conversions?
>>
>> I don't think It can gracefully handle characters that have no
>> corresponding entries in the target character set. I'm even thinking
>
> That's a common problem, IIRC PHP 6 converters have configurable error modes
> for that. Don't unicode_set_error_handler() and unicode_set_error_mode() do
> what you want?

I guess it isn't what I want. If my understanding is correct, a
handler set by unicode_set_error_handler() merely deals with the
aftermath and cannot interact with the converter.  There are good
reasons to support user-supplied mappings of characters in PUA to one
of legacy encodings such as Shift_JIS, not just replacing such
characters by placeholders.

In addition to these, shouldn't there be any case where one have to
manipulate Unicode strings on per-coded-character-basis rather than
per-grapheme-basis just like substr() in PHP6?

Regards,
Moriyoshi

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Re: Alternative mbstring implementation using ICU

Reply via email to