Thank you. Updating to 1.19 fixed the problem. The version 1.12 came from the Scientific Linux v6 repository. I didn't realize it was so old. Installing 1.19 was easy - just configure;make;make install
Thanks again. Daniel Feenberg On Tue, May 15, 2018 at 5:34 AM, Darshit Shah <[email protected]> wrote: > Hi, > > You are using a very old version of Wget. v1.12 was released in 2009 if I > remember correctly. > > The current version of Wget doesn't seem to have any issues with the > parsing of > that robots.txt. I just tried it locally and it downloads no files at all. > > Please update your version of Wget. > > * Daniel Feenberg <[email protected]> [180514 16:51]: > > > > I have the following wget command line: > > > > wget -r http://wwwdev.nber.org/ > > > > http://wwwdev.nber.org/robots.txt is: > > > > User-agent: * > > Disallow: / > > > > User-Agent: W3C-checklink > > Disallow: > > > > > > However wget fetches thousands of pages from wwwdev.nber.org. I would > have > > thought nothing would be found. (This is a demonstration, obviously in > real > > life I'd have a more detailed robots.txt to control the process). > > > > Obviously too, I don't understand something about wget or robots.txt. Can > > anyone help me out? > > > > This is GNU Wget 1.12 built on linux-gnu. > > > > Thank you > > Daniel Feenberg > > > > -- > Thanking You, > Darshit Shah > PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6 >
