On Fri, 17 Nov 2023, Michael Ewan wrote:
You may be getting caught by robots.txt, try setting the user agent header,
i.e. -U agent-string
Michael,
The wget man page doesn't inform me how to identify the agent-string. All it
says is:
--user-agent=agent-string
Identify as agent-string to the HTTP server.
The HTTP protocol allows the clients to identify themselves using a
"User-Agent" header field. This enables distinguishing the WWW
software,
usually for statistical purposes or for tracing of protocol
violations. Wget
normally identifies as Wget/version, version being the current
version number
of Wget.
However, some sites have been known to impose the policy of
tailoring the
output according to the "User-Agent"-supplied information. While
this is not
such a bad idea in theory, it has been abused by servers denying
information to
clients other than (historically) Netscape or, more frequently,
Microsoft
Internet Explorer. This option allows you to change the
"User-Agent" line
issued by Wget. Use of this option is discouraged, unless you
really know what
you are doing.
Specifying empty user agent with --user-agent="" instructs Wget not
to send the
"User-Agent" header in HTTP requests.
Since I don't really know what I'm doing I'll take the authors' advice and
not use this option. :-)
Thanks,
Rich