Steven D'Aprano:
So while you might save memory by using "UTF-24" instead of UTF-32, it would probably be slower because you would have to grab three bytes at a time instead of four, and the hardware probably does not directly support that.
Low-level string manipulation often deals with blocks larger than an individual character for speed. Generally 32 or 64-bits at a time using the CPU or 128 or 256 using the vector unit. Then there may be entry/exit code to handle initial alignment to a block boundary and dealing with a smaller than block-size tail.
For an example of this kind of thing, see find_max_char in python\Objects\stringlib\find_max_char.h which can examine a char* 32 or 64-bits at a time.
24-bit is likely to be a win in many circumstances due to decreased memory traffic. a 12-bit implementation may also be worthwhile as the low 0x1000 characters of Unicode contains Latin (with extensions), Greek, Cyrillic, Arabic, Hebrew, and most Indic scripts.
Neil -- http://mail.python.org/mailman/listinfo/python-list