[Bug-wget] subscribe
Thanks!
Re: [Bug-wget] Feature request: option to not download rejected files
> ...it would be more useful to avoid downloading rejected files altogether... Hmm, after a bit more digging, I see this isn't a new request: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=217243 Is anyone working on this?
[Bug-wget] Feature request: option to not download rejected files
Hi! First of all, I find wget very useful, so thank you to everyone who has contributed to it! I gather that the rejection list (--reject and --reject-regex) is used to determine which downloaded files to permanently save or not. While that's sometimes useful, there are other times it would be more useful to avoid downloading rejected files altogether. For example, rejecting any file with a question mark in it, to avoid duplication due to endless combinations of parameters. It would put far less strain on the server to be able to just download the main version of each page and not its various iterations. Someone even went as far as to write a quick hack to add this functionality for themselves: https://stackoverflow.com/questions/12704197/wget-reject-still-downloads-file It would be much nicer if it was built in, in a more robust and extensible manner. Thanks, Zoë.
Re: [Bug-wget] Feature request: option to not download rejected files
For anyone else who needs to do this, I adapted Sergey Svishchev's 1.8-era patch for 19.1 (one of the few versions I managed to get to compile in OS X; I'm on a Mac, and not the best programmer): recur.c:578 - if (blacklist_contains (blacklist, url)) + if (blacklist_contains (blacklist, url) || !acceptable (url)) It's not ideal, but it seems to solve the problem as a temporary fix. Hopefully it might help someone else who needs this functionality. Cheers, Zoë.
Re: [Bug-wget] Feature request: option to not download rejected files
On 06/29/2018 03:20 PM, Zoe Blade wrote: > For anyone else who needs to do this, I adapted Sergey Svishchev's 1.8-era > patch for 19.1 (one of the few versions I managed to get to compile in OS X; > I'm on a Mac, and not the best programmer): > > recur.c:578 > - if (blacklist_contains (blacklist, url)) > + if (blacklist_contains (blacklist, url) || !acceptable (url)) > > It's not ideal, but it seems to solve the problem as a temporary fix. > Hopefully it might help someone else who needs this functionality. Hi Zoë, we recently had a discussion (20.6.2018 "Why does -A not work") where I confirmed that --reject-regex works like a filter for detected URLs. BTW, the OP wanted --reject-regex to download+parse HTML (and delete thereafter if matching the rejected regex) - so the opposite from your request. In Wget2 there is an extra option for this, --filter-urls. Maybe --filter-mime-type is also worth a look. Best would be if you can provide a small example / reproducer. It can also be a hand-crafted HTML file. Regards, Tim signature.asc Description: OpenPGP digital signature