Re: [python-win32] UnicodeEncodingError when print a doc file

Tim Roberts Tue, 14 Jun 2011 18:04:13 -0700

cool_go_blue wrote:
> Thanks. It works. Actually, what I want to do is to parse the whole
> document. How can I retrieve the list of words in the
> document? I use the following code:
>
> for word in doc.Content.Text.encode("cp1252", "replace"):
>     print word
>
> It seems that word is each a character.
>


No, what you are getting back is a Python string.  When you enumerate
through a string, you get characters.  This is basic Python.

If your words are all separated by spaces, you can use split:

    for word in doc.Content.Text.encode("cp1252","replace").split():
        print word

Note, however, that you don't need to convert it to an 8-bit character
set until you want to print it.  If you are going to process these
words, then you might as well leave them in Unicode.

-- 
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.

_______________________________________________
python-win32 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-win32

Re: [python-win32] UnicodeEncodingError when print a doc file

Reply via email to