Hi; I am trying to find words in a document that are identical to any word in a vocabulary list, to replace that word with special markup. Let's say the word is "dharma". I don't want to replace the first few letters of, say "dharmawuhirfuhi". Also, to make matters more difficult, if the word "adharma" is found in the document, I need to replace that with special markup, too. (In Sanskrit, "a" preceding a word negates the word.) But I don't want to replace "adharma" and then go off and replace the "dharma" in "adharma", thus having nested markup. Now, I tried separating out all the words in the line (I go through the doc line by line), but then, of course, I lost all the punctuation! So now I have this code:
for word in vocab: aword = "a" + word try: line = re.sub(aword, pu_four + aword + pu_five + aword + pu_six, line) except: pass try: line = re.sub(word, pu_one + word + pu_two + word + pu_three, line) except: pass which, of course, ends up breaking all the above! Can someone send me a shovel to dig my way out of this mess? TIA, Victor
-- http://mail.python.org/mailman/listinfo/python-list