Hi, -A / -R is applied before downloading a file and in http://www.cs.toronto.edu/maxsat-lib/maxsat-instances/master-set/index.html all the subdirectories are referred to as files, not as subdirectories (a trailing / would indicate a subdirectory).
Indeed, wget should use a HEAD request before applying -A / -R. And only apply these filter options if the resulting mime type is not text/html or text/css. So this looks like a bug that should be fixed. My time is currently very limited, so maybe someone jumps in and gives it a try ? You could check if homebrew provides wget2. Wget2 does it correctly and would do what you expect. Regards, Tim On 16.04.20 20:07, Fahiem Bacchus wrote: > Hi, I am creating an scientific archive containing problem sets and want to > post wget instructions for downloading the problem sets. > > 1. wget -r -nd -erobots=off > http://www.cs.toronto.edu/maxsat-lib/maxsat-instances/master-set/unweighted > -A 'zip' > Works, it descends to the subdirectories under unweighted, and > retrieves the zip files in contained in each subdirectory. > 2. wget -r -nd -erobots=off > http://www.cs.toronto.edu/maxsat-lib/maxsat-instances/master-set/ -A 'zip' > Does not work it stops after rejecting the index.html file in > master-set. > 3. wget -r -nd -erobots=off > http://www.cs.toronto.edu/maxsat-lib/maxsat-instances/master-set/ > Kind of works, it gets all of the files, but does not restrict itself > to the zip files. > > Maybe I don't understand the options? But it looks like a bug in the > interaction of the -A flag and descending into > subdirectories? > > thanks > Fahiem Bacchus > > Here is the site > http://www.cs.toronto.edu/maxsat-lib/ > > With directory structure: > master-instances > master-set > unweighted > CircuitDebuggingProblems > CircuitDebuggingProblems.zip > .... many other subdirs each containing a zip > weighted > many subdirs each containing a zip > ms-evals > original > > I also tried a -l 10 flag...did not help. > > Version info: > ============ > GNU Wget 1.20.3 built on darwin18.6.0. > > -cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls > +ntlm +opie -psl +ssl/openssl > > Wgetrc: > /usr/local/etc/wgetrc (system) > Locale: > /usr/local/Cellar/wget/1.20.3_1/share/locale > Compile: > clang -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/usr/local/etc/wgetrc" > -DLOCALEDIR="/usr/local/Cellar/wget/1.20.3_1/share/locale" -I. > -I../lib -I../lib -I/usr/local/opt/openssl@1.1/include -DNDEBUG -g > -O2 > Link: > clang -DNDEBUG -g -O2 -lidn2 -L/usr/local/opt/openssl@1.1/lib -lssl > -lcrypto -ldl -lz ftp-opie.o openssl.o http-ntlm.o ../lib/libgnu.a > -liconv -lintl -Wl,-framework -Wl,CoreFoundation -lunistring > > Copyright (C) 2015 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > <http://www.gnu.org/licenses/gpl.html>. > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > > Originally written by Hrvoje Niksic <hnik...@xemacs.org>. > Please send bug reports and questions to <bug-wget@gnu.org>. > ========= > /usr/local/etc/wgetrc > -------------------------- > ### > ### Sample Wget initialization file .wgetrc > ### > > ## You can use this file to change the default behaviour of wget or to > ## avoid having to type many many command-line options. This file does > ## not contain a comprehensive list of commands -- look at the manual > ## to find out what you can put into this file. You can find this here: > ## $ info wget.info 'Startup File' > ## Or online here: > ## https://www.gnu.org/software/wget/manual/wget.html#Startup-File > ## > ## Wget initialization file can reside in /usr/local/etc/wgetrc > ## (global, for all users) or $HOME/.wgetrc (for a single user). > ## > ## To use the settings in this file, you will have to uncomment them, > ## as well as change them, in most cases, as the values on the > ## commented-out lines are the default values (e.g. "off"). > ## > ## Command are case-, underscore- and minus-insensitive. > ## For example ftp_proxy, ftp-proxy and ftpproxy are the same. > > > ## > ## Global settings (useful for setting up in /usr/local/etc/wgetrc). > ## Think well before you change them, since they may reduce wget's > ## functionality, and make it behave contrary to the documentation: > ## > > # You can set retrieve quota for beginners by specifying a value > # optionally followed by 'K' (kilobytes) or 'M' (megabytes). The > # default quota is unlimited. > #quota = inf > > # You can lower (or raise) the default number of retries when > # downloading a file (default is 20). > #tries = 20 > > # Lowering the maximum depth of the recursive retrieval is handy to > # prevent newbies from going too "deep" when they unwittingly start > # the recursive retrieval. The default is 5. > #reclevel = 5 > > # By default Wget uses "passive FTP" transfer where the client > # initiates the data connection to the server rather than the other > # way around. That is required on systems behind NAT where the client > # computer cannot be easily reached from the Internet. However, some > # firewalls software explicitly supports active FTP and in fact has > # problems supporting passive transfer. If you are in such > # environment, use "passive_ftp = off" to revert to active FTP. > #passive_ftp = off > > # The "wait" command below makes Wget wait between every connection. > # If, instead, you want Wget to wait only between retries of failed > # downloads, set waitretry to maximum number of seconds to wait (Wget > # will use "linear backoff", waiting 1 second after the first failure > # on a file, 2 seconds after the second failure, etc. up to this max). > #waitretry = 10 > > > ## > ## Local settings (for a user to set in his $HOME/.wgetrc). It is > ## *highly* undesirable to put these settings in the global file, since > ## they are potentially dangerous to "normal" users. > ## > ## Even when setting up your own ~/.wgetrc, you should know what you > ## are doing before doing so. > ## > > # Set this to on to use timestamping by default: > #timestamping = off > > # It is a good idea to make Wget send your email address in a `From:' > # header with your request (so that server administrators can contact > # you in case of errors). Wget does *not* send `From:' by default. > #header = From: Your Name <username@site.domain> > > # You can set up other headers, like Accept-Language. Accept-Language > # is *not* sent by default. > #header = Accept-Language: en > > # You can set the default proxies for Wget to use for http, https, and ftp. > # They will override the value in the environment. > #https_proxy = http://proxy.yoyodyne.com:18023/ > #http_proxy = http://proxy.yoyodyne.com:18023/ > #ftp_proxy = http://proxy.yoyodyne.com:18023/ > > # If you do not want to use proxy at all, set this to off. > #use_proxy = on > > # You can customize the retrieval outlook. Valid options are default, > # binary, mega and micro. > #dot_style = default > > # Setting this to off makes Wget not download /robots.txt. Be sure to > # know *exactly* what /robots.txt is and how it is used before changing > # the default! > #robots = on > > # It can be useful to make Wget wait between connections. Set this to > # the number of seconds you want Wget to wait. > #wait = 0 > > # You can force creating directory structure, even if a single is being > # retrieved, by setting this to on. > #dirstruct = off > > # You can turn on recursive retrieving by default (don't do this if > # you are not sure you know what it means) by setting this to on. > #recursive = off > > # To always back up file X as X.orig before converting its links (due > # to -k / --convert-links / convert_links = on having been specified), > # set this variable to on: > #backup_converted = off > > # To have Wget follow FTP links from HTML files by default, set this > # to on: > #follow_ftp = off > > # To try ipv6 addresses first: > #prefer-family = IPv6 > > # Set default IRI support state > #iri = off > > # Force the default system encoding > #localencoding = UTF-8 > > # Force the default remote server encoding > #remoteencoding = UTF-8 > > # Turn on to prevent following non-HTTPS links when in recursive mode > #httpsonly = off > > # Tune HTTPS security (auto, SSLv2, SSLv3, TLSv1, PFS) > #secureprotocol = auto >
signature.asc
Description: OpenPGP digital signature