And remember,

Its not just the number of times its send to ICU for conversion. Its
also the number of times your UTF-16 string has to be converted back
into utf-8 afterwards. This is why Apple makes its utf-16 strings
immutable. So they are read-only, and the utf-8 representation can be
cached afterward.

Think of it this way:

1. Load a utf-8 string from DB or file
2. Convert it to utf-16
3. Perform ICU conv 3-5 times
4. Page gets hit by memcache
5. utf-16 is converted back to utf-8
6. Something changes
 ? String was cached ?
7. need to spit out another utf-8 version of the string again

And a persistent web application can be held for many hours in memory.
Are we converting back to utf-8 every time? Then it might be better to
wrap the string conversions just around ICU.

I'd suggest selecting a real (but still as easy-to-work with as can be
found) unicode php app. One that has been written to use a unicode php
module. Then getting a single, representative page from it. By that I
mean the kind of page that gets accessed the most. So for imdb that
would be a movie's page, etc. The smalled 'slice' of the app, not the
whole thing. Dummy-out the other stuff.

Then convert that part (for rendering one page) into the current php6
unicode scheme. And can see what's what.



On Tue, Mar 16, 2010 at 8:04 PM, Ferenc Kovacs <tyr...@gmail.com> wrote:
> On Tue, Mar 16, 2010 at 8:05 PM, Stanislav Malyshev <s...@zend.com> wrote:
>> Hi!
>>
>>> On disk storage should probably be UTF-8 without any question? Windows
>>> use of widestrings for some files simple doubles up the on disk storage
>>
>> As file content, it's OK (an it'd be easy to add option to specify content
>> transformation if we wanted), but prescribing filenames as UTF-8 would
>> probably be not workable, since different systems (and maybe even different
>> filesystems inside same OS?) can have different opinions on that.
>>
>>> '3' is not a very processor friendly number, so working with 4 even
>>> though wasteful on memory, does make perfect sense. How long is it since
>>
>> I'm not sure it does. Most of PHP strings are short, so memory loss would be
>> very significant. Also, take into account that CPU caches aren't as big as
>> the main memory, and not fitting your data into the cache is expensive.
>>
>>> we had a 640k limit on working memory? SERVERS should have a good amount
>>
>> It doesn't matter how much memory you have, in numbers. Until we find an
>> unlimited source of computer memory left by the aliens in Himalayas, memory
>> costs money. It doesn't matter how much memory do you have - however many
>> gigs you have, you'll be able to run 3 times less PHP processes in new
>> version on the same hardware than in old version, which means new PHP would
>> cost you more to run. "Memory is cheap" is a very misunderstood expression -
>> it's only cheap if you always have much more than you need.
>>
>>> Probably 90% of the time a string will come in and go out without
>>> requiring any processing at all, so leave it as UTF-8 ? The only time we
>>
>> It might be great if we could do that. The problem might be that right now
>> AFAIK we don't have a good library to work with utf-8 strings (please
>> correct me if I'm wrong here).
> http://source.icu-project.org/repos/icu/icuhtml/trunk/design/strings/icu_utf8.html
> from ICU 3.6 changelog => The UTF-8 transformation functions and
> macros are faster.
> from 4.2 => UTF-8 friendly internal data structure for Unicode data lookup
> so it's seems that guys at ICU tries to close the gap between the
> UTF-16 and UTF-8 performance, so maybe it would be a good idea, to
> check out the current situation.
>
> Tyrael
>> --
>> Stanislav Malyshev, Zend Software Architect
>> s...@zend.com   http://www.zend.com/
>> (408)253-8829   MSN: s...@zend.com
>>
>> --
>> PHP Internals - PHP Runtime Development Mailing List
>> To unsubscribe, visit: http://www.php.net/unsub.php
>>
>>
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to