It does not delete any html-file or anything else. Either it is accepted and kept or it is saved forever. With the tip about --accept and --acept-regex I can get wget to traverse the links but it does not go deep enough to get the *l.jpgs I tried to increase -l but to no avail. It seems like it is going only 1 link deep. And not deletes.
2018-06-20 16:58 GMT+02:00 Tim Rühsen <tim.rueh...@gmx.de>: > Hi Niels, > > please always answer to the mailing list (no problem if you CC me, but > not needed). > > It was just an example for POSIX regexes - it's up to you to work out > the details ;-) Or maybe there is a volunteer reading this. > > The implicitly downloaded HTML pages should be removed after parsing > when you use --accept-regex. Except the explicitly 'starting' page from > your command line. > > Regards, Tim > > On 06/20/2018 04:28 PM, Nils Gerlach wrote: > > Hi Tim, > > > > I am sorry but your command does not work. It only downloads the > thumbnails > > from the first page > > and follows none of the links. Open the link in a browser. Click on the > > pictures to get a larger picture. > > There is a link "high quality picture" the pictures behind those links > are > > the ones i want to download. > > Regex being ".*little-nemo.*n\l.jpeg". And not only the first page but > from > > the other search result pages, too. > > Can you work that one out? Does this work with wget? Best result would be > > if the visited html-pages were > > deleted by wget. But if they stay I can delete them afterwards. But > > automatism would be better, that's why I am > > trying to use wget ;) > > > > Thanks for the information on the filename and path, though. > > > > Greetings > > > > 2018-06-20 16:13 GMT+02:00 Tim Rühsen <tim.rueh...@gmx.de>: > > > >> Hi Nils, > >> > >> On 06/20/2018 06:16 AM, Nils Gerlach wrote: > >>> Hi there, > >>> > >>> in #wget on freenode I was suggested to write this to you: > >>> I tried using wget to get some images: > >>> wget -nd -rH -Dcomicstriplibrary.org -A > >>> "little-nemo*s.jpeg","*html*","*.html.*","*.tmp","*page*","*display*" > >> -p -e > >>> robots=off 'http://comicstriplibrary.org/search?search=little+nemo' > >>> I wanted to download the images only but wget was not following any of > >> the > >>> links so I got that much more into -A. But it still does not follow the > >>> links. > >>> Page numbers of the search result contain "page" in the link, links to > >> the > >>> big pictures i want wget to download contain "display". Both are given > in > >>> -A and are seen in the html-document wget gets. Neither is followed by > >> wget. > >>> > >>> Why does this not work at all? Website is public, anybody is free to > >> test. > >>> But this is not my website! > >> > >> -A / -R works only on the filename, not on the path. The docs (man page) > >> is not very explicit about it. > >> > >> Instead try --accept-regex / --reject-regex which acts on the complete > >> URL - but shell wildcard's won't work. > >> > >> For your example this means to replace '.' by '\.' and '*' by '.*'. > >> > >> To download those nemo jpegs: > >> wget -d -rH -Dcomicstriplibrary.org --accept-regex > >> ".*little-nemo.*n\.jpeg" -p -e robots=off > >> 'http://comicstriplibrary.org/search?search=little+nemo' > >> --regex-type=posix > >> > >> Regards, Tim > >> > >> > > > >