On Sun, Sep 8, 2013 at 10:37 AM, Jonathan S. Shapiro <[email protected]>wrote:

> Whether it is unboxed or not, a non-statically-sized array must carry a
> length field. The reason that a separately stored payload isn't a big deal
> is:
>

All true... However, a separate payload incurs double the cache-misses on
access, which seems "really really bad" for applications where strings are
a large percentage of live objects. (I've assumed here that the GC has
special support to avoid tracing the extra indirection pointer, which
otherwise would also be a problem)

What this mainly serves to reveal is that the proper handling of
> international character data is a nightmarishly complex business, and the
> entire *concept* of a fixed-length character needs to be discarded in
> order to understand how international text really works. Once you realize
> that, your whole point of view on indexing encodings changes, because
> getting an O(1) indexing operation at the encoding layer doesn't really
> help you that much.
>

Agreed 110%.

>> (b) is it a problem that users can't author their own
string-type-compatible
>> string-slice type? If so, how should it be fixed?
>
> It's an inevitability, because making everything virtual and relying on
JIT
> to inline things isn't viable in a systems language. We can certainly
> assume that some implementations will use a JIT. Effective use of the
> language cannot rely on the availability of JIT.

Making everything virtual isn't the only option. The other option is some
form of parametric type-instantiation. This is more limited, in that you
can't make a new string type and hand it to code which has already
solidified the string type. However, I don't think this is the major
problem.

The major problem in my mind is having mountains of useful library code
suddenly become unusable when you decide the default UCS2 binary
representation is too much overhead for your application. It's easy enough
to make your own UTF-8 string (separate vs embedded payload issues asside),
however, it can't be worked on by the regex library, despite being able to
produce a compatible stream of "char".
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to