On the subject of Unicode string processing...
I'm not a perl internals hacker and more of a passive reader of these
lists than an active contributor.
With that caveat, may I humbly point out a design document for
what I think is a clean C library supporting the use of mixed
encoding forms. I've prototyped this design and believe it to be
practical. My advice to implementors is don't try doing it without
simultaneously writing comprehensive unit tests for every function --
the number of details to get right is absolutely numbing.
http://www.regexps.com/letter/strings.html
Because I'm not a perl internals hacker, I apologize if this
turns out to be a complete red herring. On the other hand, there
are some other Unicode goodies available on the same site.
-t