With ICU installed we have now a rather complete support for unicode string manipulation (byte, codepoint levels).

Still todo is string_bitwise_{or,and,xor}.

What should happen, if charsets, or encondings don't match, if encoding is utf8 or utf16/ucs2, ...

I think there are basically two options:

1) throw exceptions *)
   (which combinations are valid?)
2) just do it and mark the resulting bit mess as binary

*) any allowed operations would still produce binary strings (except maybe latin1 <op> latin1 -> latin1).

Any thoughts?

leo


Reply via email to