On Sat, Aug 1, 2009 at 9:02 AM, Stanislav Malyshev<s...@zend.com> wrote:
> Hi!
>
>> They calculate the total width of a string based on "east asian width"
>> property, which is still valid to give a rough measurement of the
>> rendered string.
>
> OK, I guess if it's some kind of special calculation that doesn't follow
> from others it should be preserved, there are tons of such special functions
> in PHP.
>
>>> That's a common problem, IIRC PHP 6 converters have configurable error
>>> modes
>>> for that. Don't unicode_set_error_handler() and unicode_set_error_mode()
>>> do
>>> what you want?
>>
>> I guess it isn't what I want. If my understanding is correct, a
>> handler set by unicode_set_error_handler() merely deals with the
>> aftermath and cannot interact with the converter.  There are good
>
> That depends. For some error modes, it says to converter to replace invalid
> chars with some other char or skip it. You can't however now specify custom
> mappings (I'm not sure ICU allows that, but maybe it can be simulated...).
> Here the question is - is it really worth to keep whole separate conversion
> system for just this, or can it be done with standard conversion, possibly
> somewhat tweaked?

It can be done through conversion error handlers. You can append an
encoded form of a codepoint for such unassigned characters to the
buffer within the handler.

And yes, it's worth providing separate conversion system.  You might
not be aware of it, but there are several sets of different character
sets, each of which is often represented with a specific encoding
scheme.  Shift_JIS is one of those.

>> In addition to these, shouldn't there be any case where one have to
>> manipulate Unicode strings on per-coded-character-basis rather than
>> per-grapheme-basis just like substr() in PHP6?
>
> In PHP 6 right now it's actually the only case, grapheme functions not even
> ported to PHP 6 yet (I know, not good) - but that's what regular str*
> functions should be doing, right?

What I am mainly interested in is 5.4, or something that will come
before 6.  BTW, it would be much better if there had been a sort of
coordination between the developers of mbstring and intl extension.

Moriyoshi

> --
> Stanislav Malyshev, Zend Software Architect
> s...@zend.com   http://www.zend.com/
> (408)253-8829   MSN: s...@zend.com
>

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to