[Bug-wget] how to make wget always create REALLY complete, REALLY offline archives?

Marco Fioretti Wed, 02 Oct 2019 05:25:28 -0700

Greetings,
I define making a "really complete, really offline archive" of a web page as 
downloading that page AND ALL the pieces a browser needs to display it as it 
normally appears from the internet, even on a computer that is totally offline: 
images, css, javascript, avatar icons,  everything... no matter where those 
other files originally were on the internet.
I have noticed several times in the past that the wget options supposed to do 
this (--span-hosts, -mirror, -k, whatever) do not always work as advertised. 
The latter case is the one I just documented here:
https://github.com/pirate/ArchiveBox/issues/276 
(https://github.com/pirate/ArchiveBox/issues/276)
where, for example, wget does make local copies of javascript files linked from 
the HTML page, but does NOT modifies the HTML to point to them, instead of 
their original servers. But it seems to me archivebox uses wget with the same 
options listed in the countless tutorials about "how to make offline copies 
with wget", so either those tutorials are all wrong, or there are 
bugs/intrinsical limits inside wget itself.
What do you think is happening in that case? And, in general, how do you use 
wget to ALWAYS make a "really complete, really offline archive" of a web page?
Thanks,


Marco

[Bug-wget] how to make wget always create REALLY complete, REALLY offline archives?

Reply via email to