On Fri, 03 Dec 2010 14:40:30 -0500, Jerry Quinn <[email protected]> wrote:

I tend to do a lot of transforming strings, but I need to track offsets back to the original text to maintain alignment between the results and the input. For that, indexes are necessary and we use them a lot.

In my daily usage of strings, I generally use a string as a whole, not individual characters. But I do occasionally use it.

Let's also understand that indexing is still present, what is deactivated is the ability to index to arbitrary code-units. It sounds to me like this new type would not affect your ability to store offsets (you can store an index, use it later when referring to the string, etc. just like you can now).

My string type does not allow for writeable strings. My plan was to allow you access to the underlying char[] and let you edit that way. Letting someone write a dchar into the middle a utf-8 string could cause lots of problems, so I just disabled it by default.

Not sure how that affects your 'transforming' work, are you actually changing the data or just lazily transforming? I'm interested to hear whether you think my string type would be a viable alternative.

Probably the right thing to do in this case is just pay for the cost of using dchar everywhere, but if you're working with large enough quantities of data, storage efficiency matters.

The huge advantage of using utf-8 is backwards compatibility with ASCII for C functions.

-Steve

Reply via email to