Hi Nils, On 06/20/2018 06:16 AM, Nils Gerlach wrote: > Hi there, > > in #wget on freenode I was suggested to write this to you: > I tried using wget to get some images: > wget -nd -rH -Dcomicstriplibrary.org -A > "little-nemo*s.jpeg","*html*","*.html.*","*.tmp","*page*","*display*" -p -e > robots=off 'http://comicstriplibrary.org/search?search=little+nemo' > I wanted to download the images only but wget was not following any of the > links so I got that much more into -A. But it still does not follow the > links. > Page numbers of the search result contain "page" in the link, links to the > big pictures i want wget to download contain "display". Both are given in > -A and are seen in the html-document wget gets. Neither is followed by wget. > > Why does this not work at all? Website is public, anybody is free to test. > But this is not my website!
-A / -R works only on the filename, not on the path. The docs (man page) is not very explicit about it. Instead try --accept-regex / --reject-regex which acts on the complete URL - but shell wildcard's won't work. For your example this means to replace '.' by '\.' and '*' by '.*'. To download those nemo jpegs: wget -d -rH -Dcomicstriplibrary.org --accept-regex ".*little-nemo.*n\.jpeg" -p -e robots=off 'http://comicstriplibrary.org/search?search=little+nemo' --regex-type=posix Regards, Tim
signature.asc
Description: OpenPGP digital signature