Re: about Python doc reader

norseman Thu, 14 May 2009 09:24:15 -0700

Tim Golden wrote:

norseman wrote:

I did try these.


Doc at once:

outputs two x'0D' and the file. Then it appends x'0D' x'0D' x'0A'x'0D' x'0A' to end of file even though source file itself has no EOL.

( EOL is EndOfLine  aka newline )

That's  cr cr             There are two blank lines at begining.
        cr cr lf cr lf    There is no EOL in source
                          Any idea what those are about?
One crlf is probably from python's print text, but the other?

The lines=
appends   [u'\r', u'\r', u"  to begining of output
and   \r"]x'0D'x'0A'   to the end even though there is no EOL in source.

output is understood:    u'\r'  is Apple EOL
the crlf is probably from print lines.

Not clear what you're doing to get there. This is the (wrapped) outputfrom my interpreter, using Word 2003. As you can see, new

doc: one "\r", nothing more.

<dump>

Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit(Intel)] on win32

Type "help", "copyright", "credits" or "license" for more information.
 >>> import win32com.client
 >>> word = win32com.client.gencache.EnsureDispatch ("Word.Application")
 >>> doc = word.Documents.Add ()
 >>> print repr (doc.Range ().Text)
u'\r'
 >>>

</dump>

==============
The original "do it this way" snippets were:
<code>
import win32com.client

doc = win32com.client.GetObject ("c:/temp/temp.doc")
text = doc.Range ().Text

</code>

Note that this will give you a unicode object with \r line-delimiters.
You could read para by para if that were more useful:

<code>
import win32com.client

doc = win32com.client.GetObject ("c:/temp/temp.doc")
lines = [p.Range () for p in doc.Paragraphs]

</code>


and I added:

print text    after "text =" line above

print lines   after "lines =" line above

then ran file using   python test.py >letmesee
followed by viewing letmesee in hex



Steve



--
http://mail.python.org/mailman/listinfo/python-list

Re: about Python doc reader

Reply via email to