On Sun, Sep 8, 2013 at 10:37 AM, Jonathan S. Shapiro <[email protected]>wrote:
> Whether it is unboxed or not, a non-statically-sized array must carry a > length field. The reason that a separately stored payload isn't a big deal > is: > All true... However, a separate payload incurs double the cache-misses on access, which seems "really really bad" for applications where strings are a large percentage of live objects. (I've assumed here that the GC has special support to avoid tracing the extra indirection pointer, which otherwise would also be a problem) What this mainly serves to reveal is that the proper handling of > international character data is a nightmarishly complex business, and the > entire *concept* of a fixed-length character needs to be discarded in > order to understand how international text really works. Once you realize > that, your whole point of view on indexing encodings changes, because > getting an O(1) indexing operation at the encoding layer doesn't really > help you that much. > Agreed 110%. >> (b) is it a problem that users can't author their own string-type-compatible >> string-slice type? If so, how should it be fixed? > > It's an inevitability, because making everything virtual and relying on JIT > to inline things isn't viable in a systems language. We can certainly > assume that some implementations will use a JIT. Effective use of the > language cannot rely on the availability of JIT. Making everything virtual isn't the only option. The other option is some form of parametric type-instantiation. This is more limited, in that you can't make a new string type and hand it to code which has already solidified the string type. However, I don't think this is the major problem. The major problem in my mind is having mountains of useful library code suddenly become unusable when you decide the default UCS2 binary representation is too much overhead for your application. It's easy enough to make your own UTF-8 string (separate vs embedded payload issues asside), however, it can't be worked on by the regex library, despite being able to produce a compatible stream of "char".
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
