=head1 NAME

Encode - character encodings

=head2 TERMINOLOGY

        byte    a character in the range 0..255
        char    a character in the range 0..maxint (at least 2**32-1)

The marker [INTERNAL] marks Internal Implementation Details, in general
meant only for those who think they know what they are doing, details
may change in future releases.

=head2 bytes

        bytes_to_utf8($string)

The bytes in $string are encoded in-place into UTF-8.
Returns the new size of $string, or undef if there's a failure.
[INTERNAL] Also the UTF-8 flag is turned on.

        utf8_to_bytes($string)

The UTF-8 in $string is decoded in-place into bytes.
Returns the new size of $string, or undef if there's a failure.
[INTERNAL] The UTF-8 flag of $string is not checked.

        utf8_to_bytes_strict($string)

The UTF-8 in $string is decoded in-place into bytes.
Returns the new size of $string, or dies if the UTF-8 is malformed.
[INTERNAL] The UTF-8 flag of $string is not checked.

=head2 chars

        chars_to_utf8($strings)

The chars in $string are encoded in-place into UTF-8.
Returns the new size of $string, or undef if there's a failure.
[INTERNAL] Also the UTF-8 flag is turned on.

        utf8_to_chars($string)

The UTF-8 in $string is decoded in-place into chars.
Returns the new size of $string, or undef if there's a failure.
[INTERNAL] The UTF-8 flag of $string is not checked.

        utf8_to_chars_strict($string)

The UTF-8 in $string is decoded in-place into chars.
Returns the new size of $string, or dies if the UTF-8 is malformed.
[INTERNAL] The UTF-8 flag of $string is not checked.

=head2 Testing For UTF-8

        is_utf8_strict($string)

The data in $string is checked for well-formed-UTF-8-ness.
Returns true if the flag is on, false otherwise.
[INTERNAL] The UTF-8 flag is not checked.

        is_utf8($string)

[INTERNAL] Test whether the UTF-8 flag is turned on in the $string.
In other words, the data in $string is NOT checked for
well-formed-UTF-8-ness.  If you want that, use is_utf8_strict().
Returns true if the flag is on, false otherwise.

=head2 Toggling UTF-8-ness

        on_utf8($string)

[INTERNAL] Turn on the UTF-8 flag in $string.  The data in
$string is NOT checked for well-formed-UTF-8-ness.  Do not
use frivolously since after turning this on only you know
that data data is in UTF-8, Perl doesn't.
Returns nothing.

        off_utf8($string)

[INTERNAL] Turn off the UTF-8 flag in $string.
Do not use frivolously.
Returns nothing.

=head2 UTF-16 and UTF-32 Encodings

        utf16le_to_utf8($string)

The little-endian UTF-16 (UCS-2, 2-byte chunks) in $string is encoded
in-place into UTF-8.  Returns the new size of $string, or undef is
there's a failure.  [INTERNAL] Also the UTF-8 flag is turned on.

        utf32le_to_utf8($string)

The little-endian UTF-32 (UCS-4, 4-byte chunks) in $string is encoded
in-place into UTF-8.  Returns the new size of $string, or undef is
there's a failure.  [INTERNAL] Also the UTF-8 flag is turned on.

        utf16be_to_utf8($string)

The big-endian UTF-16 (UCS-2, 2-byte chunks) in $string is encoded
in-place into UTF-8.  Returns the new size of $string, or undef is
there's a failure.  [INTERNAL] Also the UTF-8 flag is turned on.

        utf32be_to_utf8($string)

The big-endian UTF-32 (UCS-4, 4-byte chunks) in $string is encoded
in-place into UTF-8.  Returns the new size of $string, or undef is
there's a failure.  [INTERNAL] Also the UTF-8 flag is turned on.

        utf8_to_utf16le($string)

The UTF-8 in $string is decoded in-place into little-endian UTF-16
(UCS-2, 2-byte chunks). Returns the new size of $string, or undef
if there's a failure.  [INTERNAL] The UTF-8 flag of $string is not
checked.

        utf8_to_utf32le($string)

The UTF-8 in $string is decoded in-place into little-endian UTF-32
(UCS-4, 4-byte chunks). Returns the new size of $string, or undef
if there's a failure.  [INTERNAL] The UTF-8 flag of $string is not
checked.

        utf8_to_utf16be($string)

The UTF-8 in $string is decoded in-place into big-endian UTF-16
(UCS-2, 2-byte chunks). Returns the new size of $string, or undef
if there's a failure.  [INTERNAL] The UTF-8 flag of $string is not
checked.

        utf8_to_utf32be($string)

The UTF-8 in $string is decoded in-place into big-endian UTF-32
(UCS-4, 4-byte chunks). Returns the new size of $string, or undef
if there's a failure.  [INTERNAL] The UTF-8 flag of $string is not
checked.

        utf8_to_utf16le_strict($string)

The UTF-8 in $string is decoded in-place into little-endian UTF-16
(UCS-2, 2-byte chunks). Returns the new size of $string, or dies if
the UTF-8 is malformed.  [INTERNAL] The UTF-8 flag of $string is not
checked.

        utf8_to_utf32le_strict($string)

The UTF-8 in $string is decoded in-place into little-endian UTF-32
(UCS-4, 4-byte chunks). Returns the new size of $string, or dies if
the UTF-8 is malformed.  [INTERNAL] The UTF-8 flag of $string is not
checked.

        utf8_to_utf16be_strict($string)

The UTF-8 in $string is decoded in-place into big-endian UTF-16
(UCS-2, 2-byte chunks). Returns the new size of $string, or dies if
the UTF-8 is malformed.  [INTERNAL] The UTF-8 flag of $string is not
checked.

        utf8_to_utf32be_strict($string)

The UTF-8 in $string is decoded in-place into big-endian UTF-32
(UCS-4, 4-byte chunks). Returns the new size of $string, or dies if
the UTF-8 is malformed.  [INTERNAL] The UTF-8 flag of $string is not
checked.

=cut


-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

Reply via email to