On Thursday, 28 June 2012 at 09:58:02 UTC, Roman D. Boiko wrote:
Pedantically speaking, it is possible to index a string with about 50-51% memory overhead to get random access in 0(1) time. Best-performing algorithms can do random access in about 35-50 nanoseconds per operation for strings up to tens of megabytes. For bigger strings (tested up to 1GB) or when some other memory-intensive calculations are performed simultaneously, random access takes up to 200 nanoseconds due to memory-access resolution process.
This would support both random access to characters by their code point index in a string and determining code point index by code unit index.

If only the former is needed, space overhead decreases to 25% for 1K and <15% for 16K-1G string sizes (measured in number of code units, which is twice the number of bytes for wstring). Strings up to 2^64 code units would be supported.

This would also improve access speed significantly (by 10% for small strings and about twice for large).

Reply via email to