I'm trying to import text from email I've received, run some regular
expressions on it, and save the text into a database. I'm trying to
figure out how to handle the issue of character sets. I've had some
problems with my regular expressions on email that has interesting
character sets. Korean text seems to be filled with a lot of '=3D=21'
type of stuff. This doesn't look like unicode (or am I wrong?) so does
anyone know how I should handle it? Do I need to do anything special
when passing text with non-ascii characters to re, MySQLdb, or any other
libraries? Is it better to save the text as-is in my db and save the
character set type too or should I try to convert all text to some
default format like UTF-8? Any advice? Thanks.
--
Michael <[EMAIL PROTECTED]>
http://kavlon.org
--
http://mail.python.org/mailman/listinfo/python-list