Am 11.11.2010 21:33, schrieb Paul Watson:
On 2010-11-11 08:07, chad wrote:
Let's say that I have an article. What I want to do is read in this
file and have the program skip over ever instance of the words "the",
"and",  "or", and "but". What would be the general strategy for
attacking a problem like this?

I realize that you may need or want to do this in Python. This would be trivial in an awk script.
There are several ways to do this.

skip = ('and','or','but')
all=[]
[[all.append(w) for w in l.split() if w not in skip] for l in open('some.txt').readlines()]
print all

If some.txt contains your original question, it returns this:
["Let's", 'say', 'that', 'I', 'have', 'an', 'article.', 'What', 'I', 'want', 'to ', 'do', 'is', 'read', 'in', 'this', 'file', 'have', 'the', 'program', 'skip', ' over', 'ever', 'instance', 'of', 'the', 'words', '"the",', '"and",', '"or",', '" but".', 'What', 'would', 'be', 'the', 'general', 'strategy', 'for', 'attacking',
 'a', 'problem', 'like', 'this?']

But this _one_ way to get there.
Faster solutions could be based on a regex:
import re
skip = ('and','or','but')
all = re.compile('(\w+)')
print [w for w in all.findall(open('some.txt').read()) if w not in skip]

this gives this result (you loose some punctuation etc):
['Let', 's', 'say', 'that', 'I', 'have', 'an', 'article', 'What', 'I', 'want', ' to', 'do', 'is', 'read', 'in', 'this', 'file', 'have', 'the', 'program', 'skip', 'over', 'ever', 'instance', 'of', 'the', 'words', 'the', 'What', 'would', 'be', 'the', 'general', 'strategy', 'for', 'attacking', 'a', 'problem', 'like', 'this
']

But there are some many ways to do it ...

<<attachment: stefan_sonnenberg.vcf>>

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to