dreamcat four wrote:
On Tue, Mar 16, 2010 at 8:30 AM, Lester Caine <les...@lsces.co.uk> wrote:
'3' is not a very processor friendly number, so working with 4 even though
wasteful on memory, does make perfect sense. How long is it since we had a
640k limit on working memory? SERVERS should have a good amount of memory
for caching information anyway. SO is UTF-16 the right approach for
processing wide strings? It needs special code to handle everything wider
than 16 bits, but at what gain really? If all core functionality is handled
as 32 bit characters is there that much of an overhead over the additional
processing to get around strings of dissimilar sizes in UTF-16 ?

Just to re-enforce some of Lester's points above here.

4-byte per character is never slower that 2-bytes per character... its
faster if anything. Bear in mind that 4-byte has been the defacto size
for all modern cpu registers / 32-bit microarchitectures since....
like... Forever. Give a c compiler 4bytes of data... it'll say: thank
you very much, and more of the same please! It keeps em happy ;)

Sure UTF-16 can make sense. But only if your external representations
are also in UTF-16. So whats the default Unicode settings for MYSQL,
POSTGRE, etc? Well, are they always set to UTF-8, or UTF-16?

Just do the same as them.

All MySQL GA versions (not including the upcoming 5.5 which is not GA) can't eat UTF-16 queries but can receive UTF-16 results (although all MySQL GA releases that know character sets, 4.1, 5.0, 5.1, don't know anything about UTF-16 but only UCS-2, which are the characters in the BMP). It is probable (I can't say definitely due to Oracle's recognition rules) that 5.5 will have proper UTF-16. UTF-16 has its advantages.

If your unicode data includes mostly ASCII characters and here and there some non-ascii ones, then UTF-8 should be the choice - less disk space used, which means the HDD can read more data which in turn means more table rows server per second. Converting in the client (PHP) is ok, as it scales, just throw some more web servers. Scaling a RDBMS is completely different story

Best,
Andrey

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to