At 04:28 AM 10/12/2005, Øyvind wrote: >Hello, and thank you for the answers so far. > >The documents can be huge, as in 4-5000 pages, and there are upto 7*5000 >words that needs to be replaced. (It is as you have pointed out a >translation of languages, but for a very speicalised branch of patents. >Therefore there are no translaters that know most of these words. They >will be changed, and thereafter, someone might spend a few weeks/months >getting it correct.) > >I don't really think changing it to rtf is a sollution. The formating is >very important.
You don't lose formatting with rtf. What did you not like about my proposal to run the words collections thru a dictionary. For what its worth I created a 405 word dictionary, and ran a document of 1000 words thru it. Every word in the document was in the dictionary. On my machine which is at least 3 years old (1 ghz cpu I think) it took 16 seconds. Heres the code if you want to experiment with it. I still think it is the fastest solution. But YMMV. import win32com.client import time from translations import t # I assume you have a dictionary of words & translations named t w=win32com.client.Dispatch('word.application') d=w.documents.Open('c:/foo.doc') def main(): s = time.clock() wds = d.words.count+1 for i in range(1,d.wds): word = d.words(i) try: word.text = t[word .text] except:pass print time.clock() - s main() The time is proportional to the # of words in the document. The size of the dictionary should not radically affect the time. The reason for the try is that some words in my sample document were in links, and reassigning the text failed. This also supports the cases where the word is not a dictionary key. Most of the time is spent looping and accessing the words. About 1% looking in the dictionary. The rest in reassigning the text. >The company do have Word Macros today that do the job. But, as you might >imagine, it is very hard to maintain and got lots of 'issues'. It started >out with a few words 6-7 years ago, and have grown. > >Will there be a increase in speed if I pull out all the text, run it thru >regex and thereafter do a Word Search and Replace of those words that >Regex finds, instead of doing a complete Search and Replace in Word? > >Thanks in advance. > > > >-- >This email has been scanned for viruses & spam by Decna as - www.decna.no >Denne e-posten er sjekket for virus & spam av Decna as - www.decna.no > >_______________________________________________ >Python-win32 mailing list >Python-win32@python.org >http://mail.python.org/mailman/listinfo/python-win32 _______________________________________________ Python-win32 mailing list Python-win32@python.org http://mail.python.org/mailman/listinfo/python-win32