Mark J. Reed wrote:
On Mon, May 18, 2009 at 9:11 AM, Austin Hastings
<austin_hasti...@yahoo.com> wrote:
If you haven't read the PDD, it's a good start.

<snip useful summary>

I get all that, really.  I still question the necessity of mapping
each grapheme to a single integer.  A single *value*, sure.
length($weird_grapheme) should always be 1, absolutely.  But why does
ord($weird_grapheme) have to be a *numeric* value?  If you convert to,
say, normalization form C and return a list of the scalar values so
obtained, that can be used in any context to reproduce the same
grapheme, with no worries about different processes coming up with
different assignments of arbitrary negative numbers to graphemes.

If you're doing arithmetic with the code points or scalar values of
characters, then the specific numbers would seem to matter.  I'm
looking for the use case where the fact that it's an integer matters
but the specific value doesn't.


There's a couple of cases. First of all, it doesn't have to be an integer. It needs to be a fixed size, and it needs to be orderable, so that we can store a bunch of them in an intelligent fashion, thus making it easy to sort them.

With that said, integers meet the need exactly. Plus, there's the benefit that unicode already has an "escape hatch" built in to it for user-defined stuff. And that escape hatch is an integer.

The benefits are documented in the pod: they're fixed size, so we can scan over them forward and backward at low cost. They're easily distinguished (high bit set) so string code can special-case them quickly. They're orderable, comparable, etc. And best of all they contain no trans fat!

=Austin


Reply via email to