On Tue, Apr 20, 2004 at 10:51:04PM -0700, Larry Wall wrote: > > Yes, that's in the works. The plan is to have four Unicode support levels.
> These would be declared by lexically scoped declarations: > > use bytes 'ISO-8859-1'; > use codepoints; > use graphemes; > use letters 'Turkish'; > Note these just warp the defaults. Underneath is still a strongly > typed string system. So you can say "use bytes" and know that the > strings that *you* create are byte strings. However, if you get in a > string from another module, you can't necessarily process it as bytes. > If you haven't specified how such a string is to be processed in > your worldview, you're probably going to get an exception. You might > anyway, if what you specified is an impossible downconversion. > > So yes, you can have "use bytes", but it puts more responsibility on > you rather than less. You might rather just specify the type of your > particular string or array, and stay with codepoints or graphemes in > the general case. To the extent that we can preserve the abstraction > that a string is just a sequence of integers, the values of which > have some known relationship to Unicode, it should all just work. > : Is that right, or would there be a key_type property on hashes? More to > : the point, is it worth it, or will I be further slowing down hash access > : because it's special-cased in the default situation? > > Hashes should handle various types of built-in key strings properly > by default. What is "properly" for string? Is it to hash the "sequence of integers" as if they're 32 bits wide even if they're less? Is that sufficient? Tim.