RFC 294 (v1) Internally, data is stored as UTF8

Perl6 RFC Librarian Mon, 25 Sep 2000 13:06:14 -0700
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Internally, data is stored as UTF8

=head1 VERSION

  Maintainer: Simon Cozens <[EMAIL PROTECTED]>
  Date: 25 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 294
  Version: 1
  Status: Developing

=head1 ABSTRACT

We need to settle on an internal data format; this RFC proposes that
UTF8 should be that format.

=head1 DESCRIPTION

Perl 5.6's Unicode support has been hampered by the fact that it was
grafted onto the side of the old string support, and so it tried to
handle both Unicode-encoded and non-Unicode data in the same structures;
this made it an absolute swine to do any manipulation properly on these
strings.

This could all be made a lot easier if we stuck to one single data
format for internal representation, just as most other languages out
there do. If we're going to have decent Unicode support, it naturally
needs to be a UTF. So which one?

UTF32 is just not going to fly. It's too big and bulky. UTF16 is
sensible, but there's probably a lot more legacy ASCII data out there
than anything else, so it makes sense to propose UTF8 as a halfway
house. 

=head1 IMPLEMENTATION

We'll need to get data into Unicode, and I have an RFC about that; we
need to handle data internally, and I have an RFC about that. This RFC
merely settles on the fact that we need a single internal data format
for simplicity and that it should be UTF8.

=head1 REFERENCES

The Unicode FAQ on UTFs and BOMs: (An excellent introduction to what
UTFs are, what they look like and how they work.)
http://www.unicode.org/unicode/faq/utf_bom.html

RFC 295: Normalisation and C<unicode::exact>

RFC ??: When UTF8 leaks out

RFC 300: C<use unicode::representation>

RFC 312: Unicode Combinatorix

RFC 296: Getting Data Into Unicode Is Not Our Problem

RFC ??: Unicode Locales

RFC ??: Abstract the Internal String Interaction
RFC 294 (v1) Internally, data is stored as UTF8

Reply via email to