Thanks. I just find that all item numbers such as 1.1.1 are gone. How can I get 
these numbers. Also, If all items are in a table, how can I get the contents of 
all items and ignore the table structure. Thanks. 

--- On Tue, 6/14/11, Tim Roberts <t...@probo.com> wrote:

From: Tim Roberts <t...@probo.com>
Subject: Re: [python-win32] UnicodeEncodingError when print a doc file
To: "python-win32@python.org" <python-win32@python.org>
Date: Tuesday, June 14, 2011, 9:02 PM

cool_go_blue wrote:
> Thanks. It works. Actually, what I want to do is to parse the whole
> document. How can I retrieve the list of words in the
> document? I use the following code:
>
> for word in doc.Content.Text.encode("cp1252", "replace"):
>     print word
>
> It seems that word is each a character.
>

No, what you are getting back is a Python string.  When you enumerate
through a string, you get characters.  This is basic Python.

If your words are all separated by spaces, you can use split:

    for word in doc.Content.Text.encode("cp1252","replace").split():
        print word

Note, however, that you don't need to convert it to an 8-bit character
set until you want to print it.  If you are going to process these
words, then you might as well leave them in Unicode.

-- 
Tim Roberts, t...@probo.com
Providenza & Boekelheide, Inc.

_______________________________________________
python-win32 mailing list
python-win32@python.org
http://mail.python.org/mailman/listinfo/python-win32
_______________________________________________
python-win32 mailing list
python-win32@python.org
http://mail.python.org/mailman/listinfo/python-win32

Reply via email to