Hi Matt,this works as expected with wget2 built from latest git master. Which reminds me that we urgently need a new release.
If you want to build wget2 from tarball (which is more hassle-free than building from git master), follow the instruction from https://gitlab.com/gnuwget/wget2/#downloading-and-building-from-tarball). Don't forget to install the requisites beforehand.
Feel free to ask here if you run into trouble. Regards, Tim On 18.08.21 02:08, Matt Huszagh wrote:
Hello, I'm trying to archive a single webpage for offline viewing with wget2. To accomplish this, I'm invoking the following command: ``` wget2 --robots=off --page-requisites --adjust-extension --convert-links=on http://www.ke5fx.com/k22.htm ``` From reading the help menu, it's my understanding that wget2 should download everything need to display this page (from page-requisites) and convert the links to these resources to point to the local copies (with convert-links). However, this is not the behavior I observe. For example, the HTML for several of the images show up as ``` <i>Click on photos below to enlarge</i> <hr> <br clear=all> <a href="http://www.ke5fx.com/k22/ext_large.jpg"><img src="http://www.ke5fx.com/k22/ext_sm.jpg" hspace=30 vspace=30></a> <br clear=all> <a href="http://www.ke5fx.com/k22/int_large.jpg"><img src="http://www.ke5fx.com/k22/int_sm.jpg" hspace=30 vspace=30></a> <br clear=all> <a href="http://www.ke5fx.com/k22/bfg_large.jpg"><img src="http://www.ke5fx.com/k22/bfg_sm.jpg" hspace=30 vspace=30></a> <br clear=all> <hr> ``` The downloaded directory structure does, however, appear correct: ``` $ tree www.ke5fx.com www.ke5fx.com ├── k22 │ ├── bfg_sm.jpg │ ├── compare.png │ ├── ext_sm.jpg │ ├── HP_k22_s21_s12.gif │ ├── int_sm.jpg │ ├── k22_s11.png │ └── k22_s21_s12.gif └── k22.htm.html ``` Moreover, doing the same thing with wget works as I'd expect: ``` wget -e --robots=off --page-requisites --adjust-extension --convert-links http://www.ke5fx.com/k22.htm ``` ``` <hr> <br clear=all> <a href="http://www.ke5fx.com/k22/ext_large.jpg"><img src="k22/ext_sm.jpg" hspace=30 vspace=30></a> <br clear=all> <a href="http://www.ke5fx.com/k22/int_large.jpg"><img src="k22/int_sm.jpg" hspace=30 vspace=30></a> <br clear=all> <a href="http://www.ke5fx.com/k22/bfg_large.jpg"><img src="k22/bfg_sm.jpg" hspace=30 vspace=30></a> <br clear=all> <hr> ``` When I attempt the same wget2 command with a wikipedia page, I get different results: ``` wget2 --robots=off --page-requisites --adjust-extension --convert-links=on https://en.wikipedia.org/wiki/EPROM ``` ``` <div class="thumb tleft"><div class="thumbinner" style="width:252px;"><a href="/wiki/File:EPROM_Intel_C1702A.jpg" class="image"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/3/39/EPROM_Intel_C1702A.jpg/250px-EPROM_Intel_C1702A.jpg" decoding="async" width="250" height="130" class="thumbimage" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/3/39/EPROM_Intel_C1702A.jpg/375px-EPROM_Intel_C1702A.jpg 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/3/39/EPROM_Intel_C1702A.jpg/500px-EPROM_Intel_C1702A.jpg 2x" data-file-width="1275" data-file-height="665" /></a> <div class="thumbcaption"><div class="magnify"><a href="/wiki/File:Eprom.jpg" class="internal" title="Enlarge"></a></div>An Intel 1702A EPROM, one of the earliest EPROM types (1971), 256 by 8 bit. The small quartz window admits UV light for erasure.</div></div></div> ``` Rather than the src pointing to a remote url or local file, it points to a nonexistant "//upload.wikimedia.org/...". It's worth mentioning that wget doesn't get me the expected behavior here either. The image files reference remote urls, rather than local paths. Am I misusing wget2 somehow? If so, what are the correct flags to achieve what I want? Thanks Matt
OpenPGP_signature
Description: OpenPGP digital signature