Python would be good for this, but if you just want a chuck an rumble solution might be.
bash $wget -r --ignore-robots -l 0 -c -t 3 http://www.cnn.com/ bash $ grep -r "Micheal.*" ./www.cnn.com/* Or you could do a wget/python mix like import sys import re sys.os.command("wget -r --ignore-robots -l 0 -c -t 3 http://ww.cnn.com/") re_iraq=re.compile("iraq .+?",re.IGNORECASE) while "file in dirs under ./www.cnn.com/ " iraqs = re_iraq.findall(file.read()) print iraqs -- http://mail.python.org/mailman/listinfo/python-list