URL: <https://savannah.gnu.org/bugs/?67488>
Summary: Consider saving the URL that was fetched
Group: GNU Wget
Submitter: eokoochu
Submitted: Tue 09 Sep 2025 03:33:36 PM GMT
Category: Feature Request
Severity: 3 - Normal
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Name:
Originator Email:
Open/Closed: Open
Discussion Lock: Any
Release: None
Operating System: GNU/Linux
Reproducibility: None
Fixed Release: None
Planned Release: None
Regression: None
Work Required: None
Patch Included: None
_______________________________________________________
Follow-up Comments:
-------------------------------------------------------
Date: Tue 09 Sep 2025 03:33:36 PM GMT By: Eo Koochu <eokoochu>
When archiving a webpage, this command is quite useful:
$ wget -P "$dir" -E -H -k -K -p "$url"
The annoying thing is that it leaves no record of what URL was fetched. Not
only would it be useful to store that information, but it’s somewhat
important to remedy another problem: all the content is scattered into a tree
of files. Which file do we need to tell the browser to open later?
I have written a wrapper script for wget that writes a file “url.txt”
which then contains the URL that was fetched. It’s very useful for later
working out which file in the tree the browser needs to open. It’s a hack
though. Ideally wget should store the URL in a way that solves both problems,
so we have metadata of what was fetched and therefore what to open with a
browser. And since webpages often change, it might be useful to record the
date of the snapshot somewhere too.
For reference, there is a Firefox plugin called SingleFile that saves a
webpage and all objects to render it in a single file. When it does that, it
adds a comment to the top of the HTML file that contains the URL. E.g.:
<!DOCTYPE html> <html lang=en data-color-mode=auto data-light-theme=light
data-dark-theme=dark data-a11y-animated-images=system
data-a11y-link-underlines=true class=js-focus-visible data-js-focus-visible
data-turbo-loaded style><!--
Page saved with SingleFile
url: https://savannah.gnu.org/bugs/?group=wget
saved date: Tue Sep 09 2025 17:21:47 GMT+0200 (Central European Summer Time)
--><meta charset=utf-8>
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?67488>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
signature.asc
Description: PGP signature
