On 06/29/2018 03:20 PM, Zoe Blade wrote: > For anyone else who needs to do this, I adapted Sergey Svishchev's 1.8-era > patch for 19.1 (one of the few versions I managed to get to compile in OS X; > I'm on a Mac, and not the best programmer): > > recur.c:578 > - if (blacklist_contains (blacklist, url)) > + if (blacklist_contains (blacklist, url) || !acceptable (url)) > > It's not ideal, but it seems to solve the problem as a temporary fix. > Hopefully it might help someone else who needs this functionality.
Hi Zoë, we recently had a discussion (20.6.2018 "Why does -A not work") where I confirmed that --reject-regex works like a filter for detected URLs. BTW, the OP wanted --reject-regex to download+parse HTML (and delete thereafter if matching the rejected regex) - so the opposite from your request. In Wget2 there is an extra option for this, --filter-urls. Maybe --filter-mime-type is also worth a look. Best would be if you can provide a small example / reproducer. It can also be a hand-crafted HTML file. Regards, Tim
signature.asc
Description: OpenPGP digital signature