Re: Encode, take three

Nick Ing-Simmons Wed, 13 Sep 2000 01:33:27 -0700
Jarkko Hietaniemi <[EMAIL PROTECTED]> writes:
>=head1 NAME
>
>Encode - character encodings
>
>=head2 TERMINOLOGY
>
>       byte    a number in the range 0..255
>       char    a character in the range 0..maxint (at least 2**32-1)
>
>The marker [INTERNAL] marks Internal Implementation Details, in
>general meant only for those who think they know what they are doing,
>such details may change in future releases.
>
>=head2 bytes
>
>       bytes_to_utf8(STRING)
>
>The bytes in STRING are encoded in-place into UTF-8.  Returns the new
>size of STRING, or undef if there's a failure.  [INTERNAL] Also the
>UTF-8 flag is turned on.

Is this a C or a perl API ?

If a perl API then converting to UTF8 means that substr() is going 
to give me a sequence of bytes which encode the string. As such they
have to have the internal UTF8 flag turned off.

>
>=head2 chars
>
>       chars_to_utf8(STRING)
>
>The chars in STRING are encoded in-place into UTF-8.  The chars are
>asssumed to be encodedin ISO 8859-1 (Latin 1) or US-ASCII.  

You took my name and used it exactly the opposite way to what I intended.
Maybe my name was not as clear as I thought.

My intent was that STRING is _ANY_ string in perl's internal representation.
The returned string is a sequence of bytes (0..255) which are the 
encoding of that string.

My names were meant to be used like this:

   sysread(Handle,$buffer,...);   # buffer seq of bytes 
   my $str = utf8_to_chars(substr($buffer,$start,$len));
   # now we have string of chars and we can use char ops ...
   my @words;
   foreach (split(/\s/,$str)
    {
     push(@words,ucfirst(lc($_)));
    }
   my $newstr = join(' ',@words);
   # get back byte stream that protocol needs
   my $bytes  = chars_to_utf8($newstr);
   syswrite(Handle,$bytes);  

You could have
   my $str = shiftJIS_to_chars();   # or bytes_to_chars($buffer,'shiftJIS')      
...
   by $bytes = chars_to_shiftJIS(); # or chars_to_bytes($str,'shiftJIS')


-- 
Nick Ing-Simmons <[EMAIL PROTECTED]>
Via, but not speaking for: Texas Instruments Ltd.
Re: Encode, take three

Reply via email to