Guillaume Turri wrote: > In fact, why is this option treated after a download?
When mirroring, all HTML files have to be downloaded (whether or not it is desired to ultimately keep the HTML file) in order to find all the interesting file. For example: wget http://www.somesite.com/index.html --mirror --accept=pdf says to accept (that is keep) only PDF files that are referenced on the site. If you didn't download the HTML files looking for links to PDF files, you would not download anything at all. There is a strong case to be made to ignore files that do not have a known HTML suffix (although that might not always be the right action as any suffix could potentially include HTML content). Downloading .zip or .tar files with --accept=pdf doesn't make much sense (to the human watching wget run anyway), but getting the logic right will be tricky. Tony