Hello, Last year I made the migration from Java to Python and have been having lots of fun. Just this month I got tasked with a COM programming effort, and since none of us on the team are COM programmers, we decided to do the effort in Python and I was assigned the task. And, of course, in our ignorance of COM programming, we ran into a few snags.
The project I am on that requires that we go through 20,000 Word Documents and perform autosummaries on each document. I have something that kind of works, but it has some issues. Basically, the code I wrote does the following: 1. Open the Word Document. 2. Do the AutoSummary 3. Save the results to a flat file for later parsing. 4. Close the Word Document. The problem is that this just seems very inefficent. It sometimes neglects to close the word documents so then my computer gets loaded with tons of open word documents (fortunately I have restricted the number runs it goes through). Also, in the case of yesterday, I actually managed to break the COM server (or whatever it may be called). I'm sure there is a better way of doing things, but actually finding a COM programmer who really understands COM is turning out to be harder than I thought. My hope is that I can get some pointers or even some code fixes so I can progress forward on this effort. If this is not the place to post this sort of request, please direct me to where I should go. For reference, here is the code as it stands: """ This is a test script for playing with various COM word API items using the win32com Python Lib using the Active Python installation. It works in ActivePython, likely nowhere else. Notes: 1. This file need major cleanup 2. This broke Word as a COM server. Need to understand COM better 3. Perhaps run multiple instances of this file? """ import os import datetime # For performance analysis import time from win32com.client import gencache, constants, makepy #basic win32com objects # COM constants that must be established wdSummaryModeCreateNew = 0x3 WORD = 'Word.Application' False, True = 0, -1 #other constants constants breakOut = 10 #How many docs to check before ending. Set to -1 to ensure the entire system is slurped. # seperators used in output docSepB = '\n' + '<document>' + '\n' #Used to break documents inside of the output file docSepE = '\n' + '</document>' + '\n' #Used to break documents inside of the output file sumSepB = '\n' + '<summary>' + '\n' #Used to break summary percentage displays inside of a document section in the output file. sumSepE = '\n' + '</summary>' + '\n' #Used to break summary percentage displays inside of a document section in the output file. dumpfile = open('test.txt', 'w') class Word: """ I represent all the fun of playing with a MS Word document.""" def __init__(self): """ I initialize the COM object library for word. """ self.app = gencache.EnsureDispatch(WORD) self.summaryPercentages = (5, 10, 18, 25) self.errors = 0 def open(self, doc): """ I open the Word file to be autosummarized. """ self.app.Documents.Open(FileName = doc) def autoSummarize(self, Length = 30, Mode = wdSummaryModeCreateNew, UpdateProperties = True): """ I do the autosummary and return the content. This actually creates a new tmp word file.""" try: self.app.ActiveDocument.AutoSummarize(Length, Mode, UpdateProperties) return word.app.ActiveDocument.Content.Text except: self.errors += 1 return '' def close(self): """ I close the Word document.""" self.app.ActiveDocument.Close(SaveChanges=False) if __name__ == '__main__': print '*'*80 word = Word() startTime = datetime.datetime.now() count = 0 for root, dirs, files in os.walk('C:/wordData/'): for file in files: #in case we get a non-word doc or if it is a word temp file that somehow got saved. if file.lower().endswith('.doc') and not file.startswith('~'): fileName = os.path.join(root, file) else: continue print 'File ' + fileName dumpfile.write(docSepB) dumpfile.write(fileName + '\n') for value in word.summaryPercentages: word.open(fileName) print value dumpfile.write(sumSepB) dumpfile.write('Length: ' + str(value) + '\n') try: data = str(word.autoSummarize(Length=value)) except: data = '' #print data if len(data.strip()): dumpfile.write(data) else: dumpfile.write('No Summary') dumpfile.write(sumSepE) word.close() time.sleep(1) dumpfile.write(docSepE) # closing of the doc dumpfile.write('*' * 80 + '\n') # closing of the doc word.close() time.sleep(3) count += 1 if count == breakOut: break if count == breakOut: break print 'Done: ' + str(datetime.datetime.now() - startTime) _______________________________________________ Python-win32 mailing list Python-win32@python.org http://mail.python.org/mailman/listinfo/python-win32