On Freitag, 7. Oktober 2016 15:40:55 CEST Dale R. Worley wrote: > Tim Ruehsen <tim.rueh...@gmx.de> writes: > > the changes in recur.c are not acceptable. They circumvent too many checks > > like host-spanning, excludes and even --https-only. > > I suppose it depends on what you consider the semantics to be. > Generally, I look at it if I've specified to download http://x/y/z and > http://x/y/z redirects to http://a/b/c, if http://x/y/z passes the tests > I've specified, then the page should be downloaded; the fact that it's > redirected to http://a/b/c is incidental. Most checks *should* be > circumvented. > > I guess I'd make exceptions for --https-only, which is presumably > placing a requirement on *how* the pages should be fetched, and probably > the robots check, as that's a policy statement by the server.
If you become redirected to another host/domain, it is wget policy not to do so except the user explicitly states it (--span-host or --domains). Your case is a redirection within the same domain - which my patch considers to be ok (even if that redirection contains an explicitly unwanted path/ component). Even that might be dangerous as a default behavior- that is why I want to see some more opinions. We could add another cli option for fine-tuning here. Tim
signature.asc
Description: This is a digitally signed message part.