I'm trying to import text from email I've received, run some regular expressions on it, and save the text into a database. I'm trying to figure out how to handle the issue of character sets. I've had some problems with my regular expressions on email that has interesting character sets. Korean text seems to be filled with a lot of '=3D=21' type of stuff. This doesn't look like unicode (or am I wrong?) so does anyone know how I should handle it? Do I need to do anything special when passing text with non-ascii characters to re, MySQLdb, or any other libraries? Is it better to save the text as-is in my db and save the character set type too or should I try to convert all text to some default format like UTF-8? Any advice? Thanks.

--
Michael <[EMAIL PROTECTED]>
http://kavlon.org

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to