Re: Source data for perl encodings

James Mon, 08 Jan 2001 15:07:34 -0800
Nick Ing-Simmons wrote:
> >Unicode and the ISO folks have agreed to use the same codepoints.
> >
> >However, the ISO folks have not adopted Unicode's extensive property
> >and algorithm enhancements to the raw codepoint tables.
> >
> >So for now as long as Perl obeys the Unicode standard and TRs, then
> >ISO 10646 can be ignored.
> 
> And if it doesn't obey all the rules? - can we claim ISO 10646 even
> though we don't reach Unicode status?

Ok, I'll be more explicit about what standards compliance means.

As far as I know, ISO 10646 compliance consists of just being able to
represent the codepoints. Easy. (The standard is $585 per version, so
I personally don't know anybody who has read it, including me.)

I think ISO 10646 still has a surrogate scheme that can represent more
characters than Unicode's 1 million, but surrogate specs are
always being revised, and allowing 6 byte UTF-8 would support the extra
code space.

Unicode compliance consists of being able to represent the codepoints
and properties for the repertoires you want, and then saying which
repertoires and which Unicode standard version you support.

As an example, Perl 4.036 was probably Unicode compliant - for English
to the Unicode 1.0 standard!

Hopefully we're aiming a little higher with Perl 5.7, but the point
is you don't have to implement everything from Day 1, as long
as you document what conforms to the standard and what's missing.

> Not that we are going to deliberately break the rules but unless
> someone does an "audit" we will not be sure...

There's plenty of available help for audit.

The Unicode Consortium and IBM Unicode group are all smart, helpful
people who I'm sure would love to review any Perl design or user docs.

James.
Re: Source data for perl encodings

Reply via email to