hi experts, i m new to python, i m writing crawlers to extract data from some chinese websites, and i run into a encoding problem.
i have a unicode object, which looks like this u'\xd6\xd0\xce\xc4' which is encoded in "gb2312", but i have no idea of how to convert it back to utf-8 to re-create this one is easy: this will work ============================ >>> su = u"中文".encode('gb2312') >>> su u >>> print su.decode('gb2312') 中文 -> (same as the original string) ============================ but this doesn't,why =========================== >>> su = u'\xd6\xd0\xce\xc4' >>> su u'\xd6\xd0\xce\xc4' >>> print su.decode('gb2312') Traceback (most recent call last): File "<console>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128) =========================== thank you -- http://mail.python.org/mailman/listinfo/python-list