Re: [Bug-wget] Feature request: option to not download rejected files

2018-07-03 Thread Tim Rühsen
On 07/03/2018 12:48 PM, Zoe Blade wrote:
>> In Wget2 there is an extra option for this, --filter-urls.
> 
> Thank you Tim, this sounds like exactly what I was after!  (It's especially 
> important when you have wget logged in as a user, to be able to tell it not 
> to go to the logout page.)  Though if that feature could be ported to the 
> original wget, with its WARC support etc, that'd be useful.  I guess I'll 
> stick with my hacked version for now.

WARC for wget2 is on the list, maybe as an extra library project.

Thanks for your feedback - I wasn't aware of WARC users out there ;-)

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Feature request: option to not download rejected files

2018-07-03 Thread Zoe Blade
> In Wget2 there is an extra option for this, --filter-urls.

Thank you Tim, this sounds like exactly what I was after!  (It's especially 
important when you have wget logged in as a user, to be able to tell it not to 
go to the logout page.)  Though if that feature could be ported to the original 
wget, with its WARC support etc, that'd be useful.  I guess I'll stick with my 
hacked version for now.

Thanks,
Zoë.


Re: [Bug-wget] Feature request: option to not download rejected files

2018-06-29 Thread Tim Rühsen
On 06/29/2018 03:20 PM, Zoe Blade wrote:
> For anyone else who needs to do this, I adapted Sergey Svishchev's 1.8-era 
> patch for 19.1 (one of the few versions I managed to get to compile in OS X; 
> I'm on a Mac, and not the best programmer):
> 
> recur.c:578
> -  if (blacklist_contains (blacklist, url))
> +  if (blacklist_contains (blacklist, url) || !acceptable (url))
> 
> It's not ideal, but it seems to solve the problem as a temporary fix.  
> Hopefully it might help someone else who needs this functionality.

Hi Zoë,

we recently had a discussion (20.6.2018 "Why does -A not work") where I
confirmed that --reject-regex works like a filter for detected URLs.

BTW, the OP wanted --reject-regex to download+parse HTML (and delete
thereafter if matching the rejected regex) - so the opposite from your
request.

In Wget2 there is an extra option for this, --filter-urls. Maybe
--filter-mime-type is also worth a look.

Best would be if you can provide a small example / reproducer. It can
also be a hand-crafted HTML file.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Feature request: option to not download rejected files

2018-06-29 Thread Zoe Blade
For anyone else who needs to do this, I adapted Sergey Svishchev's 1.8-era 
patch for 19.1 (one of the few versions I managed to get to compile in OS X; 
I'm on a Mac, and not the best programmer):

recur.c:578
-  if (blacklist_contains (blacklist, url))
+  if (blacklist_contains (blacklist, url) || !acceptable (url))

It's not ideal, but it seems to solve the problem as a temporary fix.  
Hopefully it might help someone else who needs this functionality.

Cheers,
Zoë.


Re: [Bug-wget] Feature request: option to not download rejected files

2018-06-29 Thread Zoe Blade
> ...it would be more useful to avoid downloading rejected files altogether...

Hmm, after a bit more digging, I see this isn't a new request: 
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=217243  Is anyone working on 
this?