On 11/11/10 09:07, chad wrote:
Let's say that I have an article. What I want to do is read in
this file and have the program skip over ever instance of the
words "the", "and", "or", and "but". What would be the
general strategy for attacking a problem like this?
I'd keep a file of "stop words", read them into a set
(normalizing case in the process). Then, as I skim over each
word in my target file, check if the case-normalized version of
the word is in your stop-words and skipping if it is. It might
look something like this:
def normalize_word(s):
return s.strip().upper()
stop_words = set(
normalize_word(word)
for word in file('stop_words.txt')
)
for line in file('data.txt'):
for word in line.split():
if normalize_word(word) in stop_words: continue
process(word)
-tkc
--
http://mail.python.org/mailman/listinfo/python-list