Hi Triston, On 13.07.2018 18:52, Triston Line wrote: > Hi Tim, > > Excellent answer thank you very much for this info, "-N" or > "--timestamping" sounds like a much better way to go, however if I'm > converting links, using wget (1) I think I've read somewhere and noticed > that two separate commands running in series wouldn't be able to > continue due to the links from the previous session/command-instance? > More clearly, I've read that the primary reason continuing from a fault > is impossible is due to the fact that converting links to mirror isn't > something that can be continued and the links are only valid for that > session. Sounds silly to me because you're just formatting <a href> tags > from my understanding but there's probably a bit more to it.
Well, the links/URLs in the converted file are adapted to your local directory structure (relative). Depending on the wget's directory options that are in use, you cannot reconstruct the original URLs. What we would need is some metadata for each file downloaded, e.g. the original URL, the referrer URL, ... We already have such data (see --xattr option) since a while - *if* your filesystem supports it. So we *could* use this metadata if possible. That would be a new feature to be implemented. > I have used max-threads in the past and I've tried a suggestion for > xargs on one of the stack exchange forums, so I do toy with those > settings while testing out my friend's servers at UBC. Government on the > other hand I might get in a bit of trouble if I'm loading them during > working hours (Gosh knows I don't wanna come in at some ungodly hour > (e.g. 3 am) with the network-services team to toy around with their > stuff at different sites or perform intranet backups around different > sites from my local). > > " The server then only sends payload/data if it has a newer version of > that document, else it responds with 304 Not Modified." This is 400 > Bytes to respond with the last modification date of a file? No, we send the GET request with the local file's timestamp. If the server has a newer version, it sends it together with a 200 OK, else it sends 304 Not Modified with an empty body. Just give it a try. If you see, everything is re-downloaded, stop and try again with '-N --no-if-modified-since'. This makes wget to send a HEAD request first - and depending on the timestamp info - wget eventually creates a GET request thereafter (or nothing if the local file is up-to-date). But even the HEAD method can fail if the server sends wrong timestamps. I saw servers sending always the current date instead of the files date (e.g. true for dynamic / on-the-fly generated web pages). Regards, Tim
signature.asc
Description: OpenPGP digital signature
