Hi all, I'm trying to copy a bunch of microsoft word documents that have unicode characters into utf-8 text files. Everything works fine at the beginning. The word documents get converted and new utf-8 text files with the same name get created. And then I try to copy the data and I keep on getting "TypeError: coercing to Unicode: need string or buffer, instance found". I'm probably copying the word document wrong. What can I do?
Thanks, Patrick import os, codecs, glob, shutil, win32com.client from win32com.client import Dispatch input = 'C:\\text_samples\\source\\*.doc' output_dir = 'C:\\text_samples\\source\\output' FileFormat=win32com.client.constants.wdFormatText for doc in glob.glob(input): doc_copy = shutil.copy(doc,output_dir) WordApp = Dispatch("Word.Application") WordApp.Visible = 1 WordApp.Documents.Open(doc) WordApp.ActiveDocument.SaveAs(doc, FileFormat) WordApp.ActiveDocument.Close() WordApp.Quit() for doc in glob.glob(input): txt_split = os.path.splitext(doc) txt_doc = txt_split[0] + '.txt' txt_doc = codecs.open(txt_doc,'w','utf-8') shutil.copyfile(doc,txt_doc) -- http://mail.python.org/mailman/listinfo/python-list