=head1 NAME
Encode - character encodings
=head2 TERMINOLOGY
byte a character in the range 0..255
char a character in the range 0..maxint (at least 2**32-1)
The marker [INTERNAL] marks Internal Implementation Details, in general
meant only for those who think they know what they are doing, details
may change in future releases.
=head2 bytes
bytes_to_utf8($string)
The bytes in $string are encoded in-place into UTF-8.
Returns the new size of $string, or undef if there's a failure.
[INTERNAL] Also the UTF-8 flag is turned on.
utf8_to_bytes($string)
The UTF-8 in $string is decoded in-place into bytes.
Returns the new size of $string, or undef if there's a failure.
[INTERNAL] The UTF-8 flag of $string is not checked.
utf8_to_bytes_strict($string)
The UTF-8 in $string is decoded in-place into bytes.
Returns the new size of $string, or dies if the UTF-8 is malformed.
[INTERNAL] The UTF-8 flag of $string is not checked.
=head2 chars
chars_to_utf8($strings)
The chars in $string are encoded in-place into UTF-8.
Returns the new size of $string, or undef if there's a failure.
[INTERNAL] Also the UTF-8 flag is turned on.
utf8_to_chars($string)
The UTF-8 in $string is decoded in-place into chars.
Returns the new size of $string, or undef if there's a failure.
[INTERNAL] The UTF-8 flag of $string is not checked.
utf8_to_chars_strict($string)
The UTF-8 in $string is decoded in-place into chars.
Returns the new size of $string, or dies if the UTF-8 is malformed.
[INTERNAL] The UTF-8 flag of $string is not checked.
=head2 Testing For UTF-8
is_utf8_strict($string)
The data in $string is checked for well-formed-UTF-8-ness.
Returns true if the flag is on, false otherwise.
[INTERNAL] The UTF-8 flag is not checked.
is_utf8($string)
[INTERNAL] Test whether the UTF-8 flag is turned on in the $string.
In other words, the data in $string is NOT checked for
well-formed-UTF-8-ness. If you want that, use is_utf8_strict().
Returns true if the flag is on, false otherwise.
=head2 Toggling UTF-8-ness
on_utf8($string)
[INTERNAL] Turn on the UTF-8 flag in $string. The data in
$string is NOT checked for well-formed-UTF-8-ness. Do not
use frivolously since after turning this on only you know
that data data is in UTF-8, Perl doesn't.
Returns nothing.
off_utf8($string)
[INTERNAL] Turn off the UTF-8 flag in $string.
Do not use frivolously.
Returns nothing.
=head2 UTF-16 and UTF-32 Encodings
utf16le_to_utf8($string)
The little-endian UTF-16 (UCS-2, 2-byte chunks) in $string is encoded
in-place into UTF-8. Returns the new size of $string, or undef is
there's a failure. [INTERNAL] Also the UTF-8 flag is turned on.
utf32le_to_utf8($string)
The little-endian UTF-32 (UCS-4, 4-byte chunks) in $string is encoded
in-place into UTF-8. Returns the new size of $string, or undef is
there's a failure. [INTERNAL] Also the UTF-8 flag is turned on.
utf16be_to_utf8($string)
The big-endian UTF-16 (UCS-2, 2-byte chunks) in $string is encoded
in-place into UTF-8. Returns the new size of $string, or undef is
there's a failure. [INTERNAL] Also the UTF-8 flag is turned on.
utf32be_to_utf8($string)
The big-endian UTF-32 (UCS-4, 4-byte chunks) in $string is encoded
in-place into UTF-8. Returns the new size of $string, or undef is
there's a failure. [INTERNAL] Also the UTF-8 flag is turned on.
utf8_to_utf16le($string)
The UTF-8 in $string is decoded in-place into little-endian UTF-16
(UCS-2, 2-byte chunks). Returns the new size of $string, or undef
if there's a failure. [INTERNAL] The UTF-8 flag of $string is not
checked.
utf8_to_utf32le($string)
The UTF-8 in $string is decoded in-place into little-endian UTF-32
(UCS-4, 4-byte chunks). Returns the new size of $string, or undef
if there's a failure. [INTERNAL] The UTF-8 flag of $string is not
checked.
utf8_to_utf16be($string)
The UTF-8 in $string is decoded in-place into big-endian UTF-16
(UCS-2, 2-byte chunks). Returns the new size of $string, or undef
if there's a failure. [INTERNAL] The UTF-8 flag of $string is not
checked.
utf8_to_utf32be($string)
The UTF-8 in $string is decoded in-place into big-endian UTF-32
(UCS-4, 4-byte chunks). Returns the new size of $string, or undef
if there's a failure. [INTERNAL] The UTF-8 flag of $string is not
checked.
utf8_to_utf16le_strict($string)
The UTF-8 in $string is decoded in-place into little-endian UTF-16
(UCS-2, 2-byte chunks). Returns the new size of $string, or dies if
the UTF-8 is malformed. [INTERNAL] The UTF-8 flag of $string is not
checked.
utf8_to_utf32le_strict($string)
The UTF-8 in $string is decoded in-place into little-endian UTF-32
(UCS-4, 4-byte chunks). Returns the new size of $string, or dies if
the UTF-8 is malformed. [INTERNAL] The UTF-8 flag of $string is not
checked.
utf8_to_utf16be_strict($string)
The UTF-8 in $string is decoded in-place into big-endian UTF-16
(UCS-2, 2-byte chunks). Returns the new size of $string, or dies if
the UTF-8 is malformed. [INTERNAL] The UTF-8 flag of $string is not
checked.
utf8_to_utf32be_strict($string)
The UTF-8 in $string is decoded in-place into big-endian UTF-32
(UCS-4, 4-byte chunks). Returns the new size of $string, or dies if
the UTF-8 is malformed. [INTERNAL] The UTF-8 flag of $string is not
checked.
=cut
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen