Dear mailing List members, According to the website http://www.gnu.org/software/wget/ it is ok to write emails with help requests to this mailing list. I have the following problem:
I am trying to crawl the search results of a news website using *wget*. The name of the website is *www.voanews.com <http://www.voanews.com>*. After typing in my *search keyword* and clicking search on the website, it proceeds to the results. Then i can specify a *"to" and a "from"-date* and hit search again. After this the URL becomes: http://www.voanews.com/search/?st=article&k=mykeyword&df=10%2F01%2F2013&dt=09%2F20%2F2013&ob=dt#article and the actual content of the results is what i want to download. To achieve this I created the following wget-command: wget --reject=js,txt,gif,jpeg,jpg \ --accept=html \ --user-agent=My-Browser \ --recursive --level=2 \ www.voanews.com/search/?st=article&k=germany&df=08%2F21%2F2013&dt=09%2F20%2F2013&ob=dt#article Unfortunately, the crawler doesn't download the search results. It only gets into the upper link bar, which contains the "Home,USA,Africa,Asia,..." links and saves the articles they link to. *It seems like he crawler doesn't check the search result links at all*. *What am I doing wrong and how can I modify the wget command to download the results search list links (and of course the sites they link to) only ?* Thank you for any help...