Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

Ferenc Kovacs Tue, 16 Mar 2010 13:05:16 -0700

On Tue, Mar 16, 2010 at 8:05 PM, Stanislav Malyshev <[email protected]> wrote:
> Hi!
>
>> On disk storage should probably be UTF-8 without any question? Windows
>> use of widestrings for some files simple doubles up the on disk storage
>
> As file content, it's OK (an it'd be easy to add option to specify content
> transformation if we wanted), but prescribing filenames as UTF-8 would
> probably be not workable, since different systems (and maybe even different
> filesystems inside same OS?) can have different opinions on that.
>
>> '3' is not a very processor friendly number, so working with 4 even
>> though wasteful on memory, does make perfect sense. How long is it since
>
> I'm not sure it does. Most of PHP strings are short, so memory loss would be
> very significant. Also, take into account that CPU caches aren't as big as
> the main memory, and not fitting your data into the cache is expensive.
>
>> we had a 640k limit on working memory? SERVERS should have a good amount
>
> It doesn't matter how much memory you have, in numbers. Until we find an
> unlimited source of computer memory left by the aliens in Himalayas, memory
> costs money. It doesn't matter how much memory do you have - however many
> gigs you have, you'll be able to run 3 times less PHP processes in new
> version on the same hardware than in old version, which means new PHP would
> cost you more to run. "Memory is cheap" is a very misunderstood expression -
> it's only cheap if you always have much more than you need.
>
>> Probably 90% of the time a string will come in and go out without
>> requiring any processing at all, so leave it as UTF-8 ? The only time we
>
> It might be great if we could do that. The problem might be that right now
> AFAIK we don't have a good library to work with utf-8 strings (please
> correct me if I'm wrong here).
http://source.icu-project.org/repos/icu/icuhtml/trunk/design/strings/icu_utf8.html
from ICU 3.6 changelog => The UTF-8 transformation functions and
macros are faster.
from 4.2 => UTF-8 friendly internal data structure for Unicode data lookup
so it's seems that guys at ICU tries to close the gap between the
UTF-16 and UTF-8 performance, so maybe it would be a good idea, to
check out the current situation.


Tyrael
> --
> Stanislav Malyshev, Zend Software Architect
> [email protected]   http://www.zend.com/
> (408)253-8829   MSN: [email protected]
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

Reply via email to