On Tue, Apr 20, 2004 at 10:51:04PM -0700, Larry Wall wrote:
> 
> Yes, that's in the works.  The plan is to have four Unicode support levels.

> These would be declared by lexically scoped declarations:
> 
>     use bytes 'ISO-8859-1';
>     use codepoints;
>     use graphemes;
>     use letters 'Turkish';

> Note these just warp the defaults.  Underneath is still a strongly
> typed string system.  So you can say "use bytes" and know that the
> strings that *you* create are byte strings.  However, if you get in a
> string from another module, you can't necessarily process it as bytes.
> If you haven't specified how such a string is to be processed in
> your worldview, you're probably going to get an exception.  You might
> anyway, if what you specified is an impossible downconversion.
> 
> So yes, you can have "use bytes", but it puts more responsibility on
> you rather than less.  You might rather just specify the type of your
> particular string or array, and stay with codepoints or graphemes in
> the general case.  To the extent that we can preserve the abstraction
> that a string is just a sequence of integers, the values of which
> have some known relationship to Unicode, it should all just work.

> : Is that right, or would there be a key_type property on hashes? More to
> : the point, is it worth it, or will I be further slowing down hash access
> : because it's special-cased in the default situation?
> 
> Hashes should handle various types of built-in key strings properly
> by default.

What is "properly" for string? Is it to hash the "sequence of integers"
as if they're 32 bits wide even if they're less?  Is that sufficient?

Tim.

Reply via email to