Hi,

I need to convert a lot of text sent in e-mails and MS Word documents
into plain text format, to be fed into a python script and then
e-mailed.  I want the final product to be plain ASCII text i.e. no fancy
em hyphens, curly quotes and so forth.

One big problem currently is that when I copy-n-paste characters like
curly quotes or em hyphens from OpenOffice.org into gedit or kate, the
characters show up obviously incorrect.  e.g. a capital A with a bar on
top.  When looking at them in python strings, these are some examples:
\xe2\x80\x99 (single quote)
xe2\x80[\x9c\x9d] (RE of opening and closing double quote)
\x93 (another curly quote)

Is there a utility program in Linux to convert these characters?  Or is
there a library in Python that will do it for me (instead of me using
RE's to substitute them)?

Many thanks,
Damon
-- 
Damon Lynch <[EMAIL PROTECTED]>


Want to buy your Pack or Services from MandrakeSoft? 
Go to http://www.mandrakestore.com

Reply via email to