cool_go_blue wrote:
> I try to read a word document as follows:
>
> app = win32com.client.Dispatch('Word.Application')
> doc = app.Documents.Open('D:\myfile.doc')
> print doc.Content.Text
>
> I receive the following error:
>
> raceback (most recent call last):
>   File "D:\projects\Myself\MySVD\src\ReadWord.py", line 11, in <module>
>     print doc.Content.Text
>   File "D:\Softwares\Python27\lib\encodings\cp1252.py", line 12, in encode
>     return codecs.charmap_encode(input,errors,encoding_table)
> UnicodeEncodeError: 'charmap' codec can't encode character u'\uf06d'
> in position 4397: character maps to <undefined>
>

You are reading the Word document just fine.  The issue is printing it
to your terminal.  The document contains Unicode characters that aren't
present in your terminal's font.  You need to tell it how to handle the
conversion from Unicode to 8-bit.  Try this:

    print doc.Content.Text.encode('cp1252','replace')

That will print ? where invalid characters are found.

U+F06D is not a valid character.  It's in the "private use" area, so
it's possible this is some special code to Word.

-- 
Tim Roberts, t...@probo.com
Providenza & Boekelheide, Inc.

_______________________________________________
python-win32 mailing list
python-win32@python.org
http://mail.python.org/mailman/listinfo/python-win32

Reply via email to