Hi, I need to convert a lot of text sent in e-mails and MS Word documents into plain text format, to be fed into a python script and then e-mailed. I want the final product to be plain ASCII text i.e. no fancy em hyphens, curly quotes and so forth.
One big problem currently is that when I copy-n-paste characters like curly quotes or em hyphens from OpenOffice.org into gedit or kate, the characters show up obviously incorrect. e.g. a capital A with a bar on top. When looking at them in python strings, these are some examples: \xe2\x80\x99 (single quote) xe2\x80[\x9c\x9d] (RE of opening and closing double quote) \x93 (another curly quote) Is there a utility program in Linux to convert these characters? Or is there a library in Python that will do it for me (instead of me using RE's to substitute them)? Many thanks, Damon -- Damon Lynch <[EMAIL PROTECTED]>
Want to buy your Pack or Services from MandrakeSoft? Go to http://www.mandrakestore.com