Using bash on Debian Etch. If word_doc = sys.argv[1] and it's a file name like My\ Word.doc this function reads My and Word as two separate files unless the second '%s' is quoted. Took me a lot of trial and error to discover. Is this the most elegant way to do it? I was using popen originally, then saw some threads suggesting subprocess cured the spaces in path problem.
def get_MSWordDoc_text(word_doc): """Harvests text from an MSWord doc using antiword.""" antiword = "/usr/bin/antiword" # Note the extra single quotes around the second '%s' # without these quotes, bash chokes on paths with spaces in them # says can't open My ; can't open Word # using new subprocess module, the extra '%s' shouldn't be necessary? # but I could not get to work # see Beazley 2nd Ed. page 340 p = subprocess.Popen("%s '%s'" % (antiword, word_doc), shell=True, stdout=subprocess.PIPE) doc_text = p.stdout.read() return doc_text thx, rd -- http://mail.python.org/mailman/listinfo/python-list