Op 2010-03-09 18:31, C. Benson Manica schreef: > On Mar 9, 12:24 pm, "Richard Brodie" <r.bro...@rl.ac.uk> wrote: >> "C. Benson Manica" <cbman...@gmail.com> wrote in >> messagenews:98375575-1071-46af-8ebc-f3c817b47...@q23g2000yqd.googlegroups.com... >> >>> The strings come from the same place, i.e. they're exclusively >>> normal ASCII characters. >> >> In this case then converting them to/from UTF-8 is a no-op, so >> it makes no difference at all. > > Except to the database library, which seems perfectly happy to send an > 8-character UTF-8 string to the database as 16 raw characters...
In that case I think you mean UTF-16 or UCS-2 instead of UTF-8. UTF-16 uses 2 or more bytes per character, UCS-2 always uses 2 bytes per character. UTF-8 uses 1 or more bytes per character. If your texts are in a Western language, the second byte will be zero in most characters; you could check for that (but note that the second byte might be the first one in the byte stream, depending on the byte ordering). HTH, Roel -- The saddest aspect of life right now is that science gathers knowledge faster than society gathers wisdom. -- Isaac Asimov Roel Schroeven -- http://mail.python.org/mailman/listinfo/python-list