On Fri, 03 Dec 2010 14:40:30 -0500, Jerry Quinn <[email protected]>
wrote:
I tend to do a lot of transforming strings, but I need to track offsets
back to the original text to maintain alignment between the results and
the input. For that, indexes are necessary and we use them a lot.
In my daily usage of strings, I generally use a string as a whole, not
individual characters. But I do occasionally use it.
Let's also understand that indexing is still present, what is deactivated
is the ability to index to arbitrary code-units. It sounds to me like
this new type would not affect your ability to store offsets (you can
store an index, use it later when referring to the string, etc. just like
you can now).
My string type does not allow for writeable strings. My plan was to allow
you access to the underlying char[] and let you edit that way. Letting
someone write a dchar into the middle a utf-8 string could cause lots of
problems, so I just disabled it by default.
Not sure how that affects your 'transforming' work, are you actually
changing the data or just lazily transforming? I'm interested to hear
whether you think my string type would be a viable alternative.
Probably the right thing to do in this case is just pay for the cost of
using dchar everywhere, but if you're working with large enough
quantities of data, storage efficiency matters.
The huge advantage of using utf-8 is backwards compatibility with ASCII
for C functions.
-Steve