Jarkko Hietaniemi <[EMAIL PROTECTED]> writes:
>=head1 NAME
>
>Encode - character encodings
>
>=head2 TERMINOLOGY
>
> byte a number in the range 0..255
> char a character in the range 0..maxint (at least 2**32-1)
>
>The marker [INTERNAL] marks Internal Implementation Details, in
>general meant only for those who think they know what they are doing,
>such details may change in future releases.
>
>=head2 bytes
>
> bytes_to_utf8(STRING)
>
>The bytes in STRING are encoded in-place into UTF-8. Returns the new
>size of STRING, or undef if there's a failure. [INTERNAL] Also the
>UTF-8 flag is turned on.
Is this a C or a perl API ?
If a perl API then converting to UTF8 means that substr() is going
to give me a sequence of bytes which encode the string. As such they
have to have the internal UTF8 flag turned off.
>
>=head2 chars
>
> chars_to_utf8(STRING)
>
>The chars in STRING are encoded in-place into UTF-8. The chars are
>asssumed to be encodedin ISO 8859-1 (Latin 1) or US-ASCII.
You took my name and used it exactly the opposite way to what I intended.
Maybe my name was not as clear as I thought.
My intent was that STRING is _ANY_ string in perl's internal representation.
The returned string is a sequence of bytes (0..255) which are the
encoding of that string.
My names were meant to be used like this:
sysread(Handle,$buffer,...); # buffer seq of bytes
my $str = utf8_to_chars(substr($buffer,$start,$len));
# now we have string of chars and we can use char ops ...
my @words;
foreach (split(/\s/,$str)
{
push(@words,ucfirst(lc($_)));
}
my $newstr = join(' ',@words);
# get back byte stream that protocol needs
my $bytes = chars_to_utf8($newstr);
syswrite(Handle,$bytes);
You could have
my $str = shiftJIS_to_chars(); # or bytes_to_chars($buffer,'shiftJIS')
...
by $bytes = chars_to_shiftJIS(); # or chars_to_bytes($str,'shiftJIS')
--
Nick Ing-Simmons <[EMAIL PROTECTED]>
Via, but not speaking for: Texas Instruments Ltd.