Re: can't open word document after string replacements

2006-10-25 Thread Duncan Booth
Frederic Rentsch [EMAIL PROTECTED] wrote:

 DOC files contain housekeeping info which becomes inconsistent if you 
 change text. Possibly you can exchange stuff of equal length but that 
 wouldn't serve your purpose. RTF files let you do substitutions and they 
 save a lot of space too. But I kind of doubt whether RTF files can 
 contain pictures.

They wouldn't be a lot of use as a document file format if they couldn't 
contain pictures. RTF files can contain just about anything, they can even 
embed other non-rtf objects. Whether rtf applications apart from Word can 
actually handle all of the tags is, of course, another question.

-- 
http://mail.python.org/mailman/listinfo/python-list


can't open word document after string replacements

2006-10-24 Thread Antoine De Groote
Hi there,

I have a word document containing pictures and text. This documents 
holds several 'ABCDEF' strings which serve as a placeholder for names. 
Now I want to replace these occurences with names in a list (members). I 
open both input and output file in binary mode and do the 
transformation. However, I can't open the resulting file, Word just 
telling that there was an error. Does anybody what I am doing wrong?

Oh, and is this approach pythonic anyway? (I have a strong Java background.)

Regards,
antoine


import os

members = somelist

os.chdir(somefolder)

doc = file('ttt.doc', 'rb')
docout = file('ttt1.doc', 'wb')

counter = 0

for line in doc:
 while line.find('ABCDEF')  -1:
 try:
 line = line.replace('ABCDEF', members[counter], 1)
 docout.write(line)
 counter += 1
 except:
 docout.write(line.replace('ABCDEF', '', 1))
 else:
 docout.write(line)

doc.close()
docout.close()

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can't open word document after string replacements

2006-10-24 Thread Daniel Dittmar
Antoine De Groote wrote:
 I have a word document containing pictures and text. This documents 
 holds several 'ABCDEF' strings which serve as a placeholder for names. 
 Now I want to replace these occurences with names in a list (members). I 
 open both input and output file in binary mode and do the 
 transformation. However, I can't open the resulting file, Word just 
 telling that there was an error. Does anybody what I am doing wrong?

The Word document format probably contains some length information about 
paragraphs etc. If you change a string to another one of a different 
length, this length information will no longer match the data and the 
document structure will be hosed.

Possible solutions:
1. Use OLE automation (in the python win32 package) to open the file in 
Word and use Word search and replace. Your script could then directly 
print the document, which you probably have to do anyway.

2. Export the template document to RTF. This is a text format and can be 
more easily manipulated with Python.

 for line in doc:

I don't think that what you get here is actually a line of you document. 
Due to the binary nature of the format, it is an arbitrary chunk.

Daniel
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can't open word document after string replacements

2006-10-24 Thread Bruno Desthuilliers
Antoine De Groote wrote:
 Hi there,
 
 I have a word document containing pictures and text. This documents
 holds several 'ABCDEF' strings which serve as a placeholder for names.
 Now I want to replace these occurences with names in a list (members).

Do you know that MS Word already provides this kind of features ?

 I
 open both input and output file in binary mode and do the
 transformation. However, I can't open the resulting file, Word just
 telling that there was an error. Does anybody what I am doing wrong?

Hand-editing a non-documented binary format may lead to undesirable
results...

 Oh, and is this approach pythonic anyway? 

The pythonic approach is usually to start looking for existing
solutions... In this case, using Word's builtin features and Python/COM
integration would be a better choice IMHO.

 (I have a strong Java
 background.)

Nobody's perfect !-)

 Regards,
 antoine
 
 
 import os
 
 members = somelist
 
 os.chdir(somefolder)
 
 doc = file('ttt.doc', 'rb')
 docout = file('ttt1.doc', 'wb')
 
 counter = 0
 
 for line in doc:

Since you opened the file as binary, you should use file.read() instead.
Ever wondered what your 'lines' look like ?-)

 while line.find('ABCDEF')  -1:

.doc is a binary format. You may find such a byte sequence in it's
content in places that are *not* text content.

 try:
 line = line.replace('ABCDEF', members[counter], 1)
 docout.write(line)

You're writing back the whole chunk on each iteration. No surprise the
resulting document is corrupted.

 counter += 1

seq = list(abcd)
for indice, item in enumerate(seq):
  print %02d : %s % (indice, item)


 except:
 docout.write(line.replace('ABCDEF', '', 1))
 else:
 docout.write(line)
 
 doc.close()
 docout.close()
 



-- 
bruno desthuilliers
python -c print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in '[EMAIL PROTECTED]'.split('@')])
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can't open word document after string replacements

2006-10-24 Thread Jon Clements

Antoine De Groote wrote:

 Hi there,

 I have a word document containing pictures and text. This documents
 holds several 'ABCDEF' strings which serve as a placeholder for names.
 Now I want to replace these occurences with names in a list (members). I
 open both input and output file in binary mode and do the
 transformation. However, I can't open the resulting file, Word just
 telling that there was an error. Does anybody what I am doing wrong?

 Oh, and is this approach pythonic anyway? (I have a strong Java background.)

 Regards,
 antoine


 import os

 members = somelist

 os.chdir(somefolder)

 doc = file('ttt.doc', 'rb')
 docout = file('ttt1.doc', 'wb')

 counter = 0

 for line in doc:
  while line.find('ABCDEF')  -1:
  try:
  line = line.replace('ABCDEF', members[counter], 1)
  docout.write(line)
  counter += 1
  except:
  docout.write(line.replace('ABCDEF', '', 1))
  else:
  docout.write(line)

 doc.close()
 docout.close()

Errr I wouldn't even attempt to do this; how do you know each
'line' isn't going to be split arbitarily, and that 'ABCDEF' doesn't
happen to be part of an image. As you've noted, this is binary data so
you can't assume anything about it. Doing it this way is a Bad Idea
(tm).

If you want to do something like this, why not use templated HTML, or
possibly templated PDFs? Or heaven forbid, Word's mail-merge facility?


(I think MS Office documents are effectively self-contained file
systems, so there is probably some module out there which can
read/write them).

Jon.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can't open word document after string replacements

2006-10-24 Thread Antoine De Groote
Thank you all for your comments.

I ended up saving the word document in XML and then using (a slightly 
modified version of) my script of the OP. For those interested, there 
was also a problem with encodings.

Regards,
antoine
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can't open word document after string replacements

2006-10-24 Thread Antoine De Groote
Bruno Desthuilliers wrote:
 Antoine De Groote wrote:
 Hi there,

 I have a word document containing pictures and text. This documents
 holds several 'ABCDEF' strings which serve as a placeholder for names.
 Now I want to replace these occurences with names in a list (members).
 
 Do you know that MS Word already provides this kind of features ?


No, I don't. Sounds interesting... What is this feature called?

 
 I
 open both input and output file in binary mode and do the
 transformation. However, I can't open the resulting file, Word just
 telling that there was an error. Does anybody what I am doing wrong?
 
 Hand-editing a non-documented binary format may lead to undesirable
 results...
 
 Oh, and is this approach pythonic anyway? 
 
 The pythonic approach is usually to start looking for existing
 solutions... In this case, using Word's builtin features and Python/COM
 integration would be a better choice IMHO.
 
 (I have a strong Java
 background.)
 
 Nobody's perfect !-)
 
 Regards,
 antoine


 import os

 members = somelist

 os.chdir(somefolder)

 doc = file('ttt.doc', 'rb')
 docout = file('ttt1.doc', 'wb')

 counter = 0

 for line in doc:
 
 Since you opened the file as binary, you should use file.read() instead.
 Ever wondered what your 'lines' look like ?-)
 
 while line.find('ABCDEF')  -1:
 
 .doc is a binary format. You may find such a byte sequence in it's
 content in places that are *not* text content.
 
 try:
 line = line.replace('ABCDEF', members[counter], 1)
 docout.write(line)
 
 You're writing back the whole chunk on each iteration. No surprise the
 resulting document is corrupted.
 
 counter += 1
 
 seq = list(abcd)
 for indice, item in enumerate(seq):
   print %02d : %s % (indice, item)
 
 
 except:
 docout.write(line.replace('ABCDEF', '', 1))
 else:
 docout.write(line)

 doc.close()
 docout.close()

 
 
 
-- 
http://mail.python.org/mailman/listinfo/python-list


[OT] Re: can't open word document after string replacements

2006-10-24 Thread Bruno Desthuilliers
Antoine De Groote wrote:
 Bruno Desthuilliers wrote:
 Antoine De Groote wrote:
 Hi there,

 I have a word document containing pictures and text. This documents
 holds several 'ABCDEF' strings which serve as a placeholder for names.
 Now I want to replace these occurences with names in a list (members).

 Do you know that MS Word already provides this kind of features ?
 
 
 No, I don't. Sounds interesting... What is this feature called?

I don't know how it's named in english, but in french it's (well - it
was last time I used MS Word, which is quite some times ago???) fusion
de documents.



-- 
bruno desthuilliers
python -c print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in '[EMAIL PROTECTED]'.split('@')])
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can't open word document after string replacements

2006-10-24 Thread Steve Holden
Antoine De Groote wrote:
 Bruno Desthuilliers wrote:
 
Antoine De Groote wrote:

Hi there,

I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members).

Do you know that MS Word already provides this kind of features ?
 
 
 
 No, I don't. Sounds interesting... What is this feature called?
 
Mail-merge, I believe.

However, if your document can be adequately represented in RTF 
(rich-text format) then you could consider doing string replacements on 
that. I invoice the PyCon sponsors using this rather inelegant technique.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [OT] Re: can't open word document after string replacements

2006-10-24 Thread Richie Hindle

[Antoine]
 I have a word document containing pictures and text. This documents
 holds several 'ABCDEF' strings which serve as a placeholder for names.
 Now I want to replace these occurences with names in a list (members).

[Bruno]
 I don't know how it's named in english, but in french it's (well - it
 was last time I used MS Word, which is quite some times ago???) fusion
 de documents.

Mail Merge?

-- 
Richie Hindle
[EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can't open word document after string replacements

2006-10-24 Thread Frederic Rentsch
Antoine De Groote wrote:
 Hi there,

 I have a word document containing pictures and text. This documents 
 holds several 'ABCDEF' strings which serve as a placeholder for names. 
 Now I want to replace these occurences with names in a list (members). I 
 open both input and output file in binary mode and do the 
 transformation. However, I can't open the resulting file, Word just 
 telling that there was an error. Does anybody what I am doing wrong?

 Oh, and is this approach pythonic anyway? (I have a strong Java background.)

 Regards,
 antoine


 import os

 members = somelist

 os.chdir(somefolder)

 doc = file('ttt.doc', 'rb')
 docout = file('ttt1.doc', 'wb')

 counter = 0

 for line in doc:
  while line.find('ABCDEF')  -1:
  try:
  line = line.replace('ABCDEF', members[counter], 1)
  docout.write(line)
  counter += 1
  except:
  docout.write(line.replace('ABCDEF', '', 1))
  else:
  docout.write(line)

 doc.close()
 docout.close()

   
DOC files contain housekeeping info which becomes inconsistent if you 
change text. Possibly you can exchange stuff of equal length but that 
wouldn't serve your purpose. RTF files let you do substitutions and they 
save a lot of space too. But I kind of doubt whether RTF files can 
contain pictures.

Frederic



-- 
http://mail.python.org/mailman/listinfo/python-list