Regarding cleaning of mixed string encodings in the discography search engine
http://www.xfeedme.com/discs/discography.html Following </F>'s suggestion I came up with this: utf8enc = codecs.getencoder("utf8") utf8dec = codecs.getdecoder("utf8") iso88591dec = codecs.getdecoder("iso-8859-1") def checkEncoding(s): try: (uni, dummy) = utf8dec(s) except: (uni, dummy) = iso88591dec(s, 'ignore') (out, dummy) = utf8enc(uni) return out This works nicely for Nordic stuff like "björgvin halldórsson - gunnar Þórðarson", but russian seems to turn into garbage and I have no idea about chinese. Unless someone has any other ideas I'm giving up now. -- Aaron Watters === In theory, theory is the same as practice. In practice it's more complicated than that. -- folklore -- http://mail.python.org/mailman/listinfo/python-list