[Bug-wget] robots.txt seemingly ignored

Daniel Feenberg Mon, 14 May 2018 07:51:47 -0700


I have the following wget command line:


   wget -r  http://wwwdev.nber.org/

http://wwwdev.nber.org/robots.txt  is:

  User-agent: *
  Disallow: /

  User-Agent: W3C-checklink
  Disallow:

However wget fetches thousands of pages from wwwdev.nber.org. I would havethought nothing would be found. (This is a demonstration, obviously inreal life I'd have a more detailed robots.txt to control the process).

Obviously too, I don't understand something about wget or robots.txt. Cananyone help me out?


This is GNU Wget 1.12 built on linux-gnu.

Thank you
Daniel Feenberg

[Bug-wget] robots.txt seemingly ignored

Reply via email to