On Wed, 19 Jan 2011 16:03:11 +0000 (UTC) Tim Harig <user...@ilthio.net> wrote: > > For many operations, it is just much faster and simpler to use a single > character based container opposed to having to process an entire byte > stream to determine individual letters from the bytes or to having > adaptive size containers to store the data.
You *have* to "process the entire byte stream" in order to determine boundaries of individual letters from the bytes if you want to use a "character based container", regardless of the exact representation. Once you do that it shouldn't be very costly to compute the actual code points. So, "much faster" sounds a bit dubious to me; especially if you factor in the cost of memory allocation, and the fact that a larger container will fit less easily in a data cache. > That said, and more importantly, many > variable length byte streams may not have alternate representations as > unicode does. This whole thread is about UTF-8 (see title) so I'm not sure what kind of relevance this is supposed to have. -- http://mail.python.org/mailman/listinfo/python-list