Timings should not be very different from random access in any
UTF-32 string implementation, because of design of these
algorithms:
* only operations on 64-bit aligned words are performed
(addition, multiplication, bitwise and shift operations)
* there is no branching except at the very top level for very
large array sizes
* data is stored in a way that makes algorithms cache-oblivious
IIRC. Authors claim that very few cache misses are neccessary
(1-2 per random access).
* after determining code unit index for some code point index
further access is performed as usually inside an array, so in
order to perform slicing it is only needed to calculate code unit
indices for its end and start.
* original data arrays are not modified (unlike for compact
representations of dstring, for example).