And now for something completely different... I see a lot of COM stuff with Python for excel...and I quickly made the same program output to excel. What if the input file were a Word document? Where is there information about manipulating word documents, or what could I add to make the same program work for word?
Again thanks a lot. I'll start hitting some books about this sort of text manipulation. The Excel add on: import codecs import re from win32com.client import Dispatch path = "c:\\text_samples\\chem_1_utf8.txt" path2 = "c:\\text_samples\\chem_2.txt" input = codecs.open(path, 'r','utf8') output = codecs.open(path2, 'w', 'utf8') NR_RE = re.compile(r'^\d+-\d+-\d+$') #pattern for EINECS number tokens = input.read().split() def iter_elements(tokens): product = [] for tok in tokens: if NR_RE.match(tok) and len(product) >= 4: product[2:-1] = [' '.join(product[2:-1])] yield product product = [] product.append(tok) yield product xlApp = Dispatch("Excel.Application") xlApp.Visible = 1 xlApp.Workbooks.Add() c = 1 for element in iter_elements(tokens): xlApp.ActiveSheet.Cells(c,1).Value = element[0] xlApp.ActiveSheet.Cells(c,2).Value = element[1] xlApp.ActiveSheet.Cells(c,3).Value = element[2] xlApp.ActiveSheet.Cells(c,4).Value = element[3] c = c + 1 xlApp.ActiveWorkbook.Close(SaveChanges=1) xlApp.Quit() xlApp.Visible = 0 del xlApp input.close() output.close() -- http://mail.python.org/mailman/listinfo/python-list