RE: [Bug-wget] How to ignore link like "index.html?lang=ja"?

Tony Lewis Thu, 03 Jun 2010 17:48:11 -0700

Guillaume Turri wrote:

> In fact, why is this option treated after a download?


When mirroring, all HTML files have to be downloaded (whether or not it is
desired to ultimately keep the HTML file) in order to find all the
interesting file. For example:

wget http://www.somesite.com/index.html --mirror --accept=pdf

says to accept (that is keep) only PDF files that are referenced on the
site. If you didn't download the HTML files looking for links to PDF files,
you would not download anything at all.

There is a strong case to be made to ignore files that do not have a known
HTML suffix (although that might not always be the right action as any
suffix could potentially include HTML content). Downloading .zip or .tar
files with --accept=pdf doesn't make much sense (to the human watching wget
run anyway), but getting the logic right will be tricky.

Tony

RE: [Bug-wget] How to ignore link like "index.html?lang=ja"?

Reply via email to