On Fri, 17 Nov 2023, Michael Ewan wrote:

You may be getting caught by robots.txt, try setting the user agent header,
i.e. -U agent-string

Michael,

The wget man page doesn't inform me how to identify the agent-string. All it
says is:
 --user-agent=agent-string
           Identify as agent-string to the HTTP server.

           The HTTP protocol allows the clients to identify themselves using a
           "User-Agent" header field.  This enables distinguishing the WWW 
software,
           usually for statistical purposes or for tracing of protocol 
violations.  Wget
           normally identifies as Wget/version, version being the current 
version number
           of Wget.

           However, some sites have been known to impose the policy of 
tailoring the
           output according to the "User-Agent"-supplied information.  While 
this is not
           such a bad idea in theory, it has been abused by servers denying 
information to
           clients other than (historically) Netscape or, more frequently, 
Microsoft
           Internet Explorer.  This option allows you to change the 
"User-Agent" line
           issued by Wget.  Use of this option is discouraged, unless you 
really know what
           you are doing.

           Specifying empty user agent with --user-agent="" instructs Wget not 
to send the
           "User-Agent" header in HTTP requests.

Since I don't really know what I'm doing I'll take the authors' advice and
not use this option. :-)

Thanks,

Rich

Reply via email to