On Tue, Mar 16, 2010 at 8:05 PM, Stanislav Malyshev <s...@zend.com> wrote: > Hi! > >> On disk storage should probably be UTF-8 without any question? Windows >> use of widestrings for some files simple doubles up the on disk storage > > As file content, it's OK (an it'd be easy to add option to specify content > transformation if we wanted), but prescribing filenames as UTF-8 would > probably be not workable, since different systems (and maybe even different > filesystems inside same OS?) can have different opinions on that. > >> '3' is not a very processor friendly number, so working with 4 even >> though wasteful on memory, does make perfect sense. How long is it since > > I'm not sure it does. Most of PHP strings are short, so memory loss would be > very significant. Also, take into account that CPU caches aren't as big as > the main memory, and not fitting your data into the cache is expensive. > >> we had a 640k limit on working memory? SERVERS should have a good amount > > It doesn't matter how much memory you have, in numbers. Until we find an > unlimited source of computer memory left by the aliens in Himalayas, memory > costs money. It doesn't matter how much memory do you have - however many > gigs you have, you'll be able to run 3 times less PHP processes in new > version on the same hardware than in old version, which means new PHP would > cost you more to run. "Memory is cheap" is a very misunderstood expression - > it's only cheap if you always have much more than you need. > >> Probably 90% of the time a string will come in and go out without >> requiring any processing at all, so leave it as UTF-8 ? The only time we > > It might be great if we could do that. The problem might be that right now > AFAIK we don't have a good library to work with utf-8 strings (please > correct me if I'm wrong here). http://source.icu-project.org/repos/icu/icuhtml/trunk/design/strings/icu_utf8.html from ICU 3.6 changelog => The UTF-8 transformation functions and macros are faster. from 4.2 => UTF-8 friendly internal data structure for Unicode data lookup so it's seems that guys at ICU tries to close the gap between the UTF-16 and UTF-8 performance, so maybe it would be a good idea, to check out the current situation.
Tyrael > -- > Stanislav Malyshev, Zend Software Architect > s...@zend.com http://www.zend.com/ > (408)253-8829 MSN: s...@zend.com > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > > -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php