cool_go_blue wrote: > I try to read a word document as follows: > > app = win32com.client.Dispatch('Word.Application') > doc = app.Documents.Open('D:\myfile.doc') > print doc.Content.Text > > I receive the following error: > > raceback (most recent call last): > File "D:\projects\Myself\MySVD\src\ReadWord.py", line 11, in <module> > print doc.Content.Text > File "D:\Softwares\Python27\lib\encodings\cp1252.py", line 12, in encode > return codecs.charmap_encode(input,errors,encoding_table) > UnicodeEncodeError: 'charmap' codec can't encode character u'\uf06d' > in position 4397: character maps to <undefined> >
You are reading the Word document just fine. The issue is printing it to your terminal. The document contains Unicode characters that aren't present in your terminal's font. You need to tell it how to handle the conversion from Unicode to 8-bit. Try this: print doc.Content.Text.encode('cp1252','replace') That will print ? where invalid characters are found. U+F06D is not a valid character. It's in the "private use" area, so it's possible this is some special code to Word. -- Tim Roberts, t...@probo.com Providenza & Boekelheide, Inc. _______________________________________________ python-win32 mailing list python-win32@python.org http://mail.python.org/mailman/listinfo/python-win32