php-i18n Digest 16 Aug 2004 12:49:37 -0000 Issue 244

Topics (messages 752 through 752):

Re: Converting "\u00F4" style characters
        752 by: Jaap van Ganswijk

Administrivia:

To subscribe to the digest, e-mail:
        [EMAIL PROTECTED]

To unsubscribe from the digest, e-mail:
        [EMAIL PROTECTED]

To post to the list, e-mail:
        [EMAIL PROTECTED]


----------------------------------------------------------------------
--- Begin Message ---
At 2004-05-11 04:22, Asgeir Frimannsson wrote:
>        if ( $unicode < 128 ) {
>        
>            $utf8 = chr( $unicode );
>        
>        } elseif ( $unicode < 2048 ) {
>        
>            $utf8 .= chr( 192 +  ( ( $unicode - ( $unicode % 64 ) ) / 64 ) );

or as an alternative:

chr(0xc0|$unicode>>6)

>            $utf8 .= chr( 128 + ( $unicode % 64 ) );

chr(0x80|$unicode&0x3f)

>        
>        } else {
>        
>            $utf8 .= chr( 224 + ( ( $unicode - ( $unicode % 4096 ) ) / 4096 ) );

chr(0xe0|$unicode>>12)

>            $utf8 .= chr( 128 + ( ( ( $unicode % 4096 ) - ( $unicode % 64 ) ) / 64 ) 
> );

chr(0x80|$unicode>>6&0x3f)

>            $utf8 .= chr( 128 + ( $unicode % 64 ) );

chr(0x80|$unicode&0x3f)

This way it's all done with boolean bit operators in integers
(and not in floating point). Since this subroutine may
have to be called for upto each character in some document
this may be quite a a bit faster.

The code is also much shorter so easier to check and debug.
(Of course you can exchange the hex numbers for decimal ones,
but since these are special numbers within the
hexadecimal/binary system I prefer to write them in hex.)

Greetings,
Jaap

--- End Message ---

Reply via email to