Hey Philippe,

Yes but #= is blissfully unaware of normalization in Squeak/Pharo. In
fact AFAIK Squeak/Pharo is unaware of normalization. Having a short look
at it doesn't even look as if case insensitivity worked in Squeak/Pharo
outside of Latin-1 (I could be wrong though).

Yes, that's what I am thinking about. To be more explicit, suppose "Unicode" series of characters got into the image via the keyboard, a file, a socket... once decoded, what could one do with them? Are all types of decoded character series going to be represented as instances of a single class, although they have inherently different behavior?

In addition you probably don't want #= to do normalization "because
performance". And even if you did you probably still want a fast path
for ByteString receiver and ByteString argument in which case #size is safe.

Assuming all fixed width representation strings (e.g. byte strings) will always have the same encoding (e.g. same code page), then the size check for those seems ok to me.

Just to make sure, I am not celebrating all this complexity in the world... however, given that it's there, how are we going to deal with it? I'm concerned about the long term consequences of making things more complex than they are by reinterpreting them. The problem I see is that ultimately programs just won't Work(TM).

Andres.

Reply via email to