Re: A12: Strings

Tim Bunce Wed, 21 Apr 2004 03:44:51 -0700

On Tue, Apr 20, 2004 at 10:51:04PM -0700, Larry Wall wrote:
> 
> Yes, that's in the works.  The plan is to have four Unicode support levels.


> These would be declared by lexically scoped declarations:
> 
>     use bytes 'ISO-8859-1';
>     use codepoints;
>     use graphemes;
>     use letters 'Turkish';

> Note these just warp the defaults.  Underneath is still a strongly
> typed string system.  So you can say "use bytes" and know that the
> strings that *you* create are byte strings.  However, if you get in a
> string from another module, you can't necessarily process it as bytes.
> If you haven't specified how such a string is to be processed in
> your worldview, you're probably going to get an exception.  You might
> anyway, if what you specified is an impossible downconversion.
> 
> So yes, you can have "use bytes", but it puts more responsibility on
> you rather than less.  You might rather just specify the type of your
> particular string or array, and stay with codepoints or graphemes in
> the general case.  To the extent that we can preserve the abstraction
> that a string is just a sequence of integers, the values of which
> have some known relationship to Unicode, it should all just work.

> : Is that right, or would there be a key_type property on hashes? More to
> : the point, is it worth it, or will I be further slowing down hash access
> : because it's special-cased in the default situation?
> 
> Hashes should handle various types of built-in key strings properly
> by default.

What is "properly" for string? Is it to hash the "sequence of integers"
as if they're 32 bits wide even if they're less?  Is that sufficient?

Tim.

Re: A12: Strings

Reply via email to