Re: [Bug-wget] Feature request: option to not download rejected files
On 07/03/2018 12:48 PM, Zoe Blade wrote: >> In Wget2 there is an extra option for this, --filter-urls. > > Thank you Tim, this sounds like exactly what I was after! (It's especially > important when you have wget logged in as a user, to be able to tell it not > to go to the logout page.) Though if that feature could be ported to the > original wget, with its WARC support etc, that'd be useful. I guess I'll > stick with my hacked version for now. WARC for wget2 is on the list, maybe as an extra library project. Thanks for your feedback - I wasn't aware of WARC users out there ;-) Regards, Tim signature.asc Description: OpenPGP digital signature
Re: [Bug-wget] Feature request: option to not download rejected files
> In Wget2 there is an extra option for this, --filter-urls. Thank you Tim, this sounds like exactly what I was after! (It's especially important when you have wget logged in as a user, to be able to tell it not to go to the logout page.) Though if that feature could be ported to the original wget, with its WARC support etc, that'd be useful. I guess I'll stick with my hacked version for now. Thanks, Zoë.
Re: [Bug-wget] Feature request: option to not download rejected files
On 06/29/2018 03:20 PM, Zoe Blade wrote: > For anyone else who needs to do this, I adapted Sergey Svishchev's 1.8-era > patch for 19.1 (one of the few versions I managed to get to compile in OS X; > I'm on a Mac, and not the best programmer): > > recur.c:578 > - if (blacklist_contains (blacklist, url)) > + if (blacklist_contains (blacklist, url) || !acceptable (url)) > > It's not ideal, but it seems to solve the problem as a temporary fix. > Hopefully it might help someone else who needs this functionality. Hi Zoë, we recently had a discussion (20.6.2018 "Why does -A not work") where I confirmed that --reject-regex works like a filter for detected URLs. BTW, the OP wanted --reject-regex to download+parse HTML (and delete thereafter if matching the rejected regex) - so the opposite from your request. In Wget2 there is an extra option for this, --filter-urls. Maybe --filter-mime-type is also worth a look. Best would be if you can provide a small example / reproducer. It can also be a hand-crafted HTML file. Regards, Tim signature.asc Description: OpenPGP digital signature
Re: [Bug-wget] Feature request: option to not download rejected files
For anyone else who needs to do this, I adapted Sergey Svishchev's 1.8-era patch for 19.1 (one of the few versions I managed to get to compile in OS X; I'm on a Mac, and not the best programmer): recur.c:578 - if (blacklist_contains (blacklist, url)) + if (blacklist_contains (blacklist, url) || !acceptable (url)) It's not ideal, but it seems to solve the problem as a temporary fix. Hopefully it might help someone else who needs this functionality. Cheers, Zoë.
Re: [Bug-wget] Feature request: option to not download rejected files
> ...it would be more useful to avoid downloading rejected files altogether... Hmm, after a bit more digging, I see this isn't a new request: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=217243 Is anyone working on this?