On 2/12/19 12:21 AM, Darshit Shah wrote: > * Tim Rühsen <[email protected]> [190211 13:45]: >> You are right, --if-modified-since changes -N behavior in case a file is >> incomplete. --if-modified-since can't easily be fixed since the 304 >> response does not include file size information. >> >> As you suggest, we should disable this option by default or at least >> discuss the options we have. > > That's correct. While, the lack of a Content-Length header on a 304 response > causes problems, we can't rely on it to exist even for normal 200 / 206 > response. > > Let me try to aggregate some of the possible options (I'm not saying any of > these are particularly a good idea): > > 1. Write file to a tmpfile and on successful download, move it to the real > location. > This option has multiple problems. Firstly, people don't expect Wget to > write to a tmp file. This can be problematic, especially when people try to > play streaming data without a -O. But for the purposes of dealing with -N > and --if-modified-since, this is the best option. > > 2. Issue a utime() call after every write() in order to set the mtime again to > something older than the one reported by the server. > In this, we would need to issue a utime() after each call to write() in > order > to reset its mtime to an earlier time. After the file is fully downloaded, > set the mtime to the actual one as provided by the server. This introduces > an issue where Wget is issuing too many system calls. And with Wget2, it > might get really bad due to downloading ~30+ files in parallel. I'm also > unsure of how the kernel handles races between write() and utime() calls. > We > don't want to set the mtime of the file and have it overwritten by the > previous write() call. This might be valid option, especially since it is > cross platform. However, the performance impact would need to be evaluated. > > 3. Only enable If-Modified-Since when xattr is available. > The idea here is simple, on systems where xattr is possible, store either > an > old timestamp or a completion flag in the attributes. Use this metadata to > issue a If-Modified-Since header. If xattr is not available or the > attributes are not found, use the HEAD+GET approach. > > > Are there any other options that I've missed?
4. Do not use --if-modified-since by default with -N - let the user
control it.
We only have an issue if the -N download gets interrupted and should be
continued later. This is often not the case - like in my personal
interactive '-r -N' scenarios. Of course it's error-prone to non-aware
users. But you asked for other options.
Didn't I solve that issue for Wget2 already ?
From src/wget.c (http_receive_response):
if (resp->last_modified) {
/* If program was aborted, we store file times one second less than
the server time.
* So a later download with -N would start over instead of leaving
incomplete data.
* Or a later download with -c -N would continue with a
IF-MODIFIED-SINCE: HTTP header. */
if (config.xattr && !terminate)
write_xattr_last_modified(resp->last_modified, context->outfd);
set_file_mtime(context->outfd, resp->last_modified - terminate);
}
Regards, Tim
>
>> On 2/10/19 2:42 PM, Lawrence Wade wrote:
>>> Hi Tim,
>>>
>>> Okay. Using the OpenSUSE-packaged wget (1.19.5) that comes with Leap 15.0:
>>>
>>> $ wget -r -N 192.168.2.100:8080
>>> ...
>>> Reusing existing connection to 192.168.2.100:8080.
>>> HTTP request sent, awaiting response... 304 Not Modified
>>> File ‘192.168.2.100:8080/OaP6ysTyz6Y.mp
>>> 4’ not modified on server. Omitting download.
>>>
>>> This file is incomplete in my local copy.
>>>
>>> Trying again as you suggest,
>>>
>>> $ wget -r -N --no-if-modified-since 192.168.2.100:8080
>>> ...
>>> --2019-02-10 08:35:14-- http://192.168.2.100:8080/OaP6ysTyz6Y.mp4
>>> Reusing existing connection to 192.168.2.100:8080.
>>> HTTP request sent, awaiting response... 200 OK
>>> Length: 38044195 (36M) [application/octet-stream]
>>> The sizes do not match (local 8643456) -- retrieving.
>>> --2019-02-10 08:35:14-- http://192.168.2.100:8080/OaP6ysTyz6Y.mp4
>>> Reusing existing connection to 192.168.2.100:8080.
>>> HTTP request sent, awaiting response... 200 OK
>>> Length: 38044195 (36M) [application/octet-stream]
>>> Saving to: ‘192.168.2.100:8080/OaP6ysTy
>>> z6Y.mp4
>>> ...
>>>
>>> And it appears to work as expected. Won't this change to the behaviour
>>> of -N option subtly break a lot of scripts which rely on wget?
>>>
>>> Thanks so much, Tim. I do have an answer and a workaround though my
>>> concerns remain.
>>>
>>> Lawrence Wade
>>> Ottawa, Canada
>>>
>>> On Sun, Feb 10, 2019 at 2:11 AM Lawrence Wade <[email protected]>
>>> wrote:
>>>>
>>>> Hi Everyone,
>>>>
>>>> This might be a corroboration of this
>>>> http://lists.gnu.org/archive/html/bug-wget/2018-10/msg00049.html
>>>> and this
>>>> https://bugs.launchpad.net/ubuntu/+source/wget/+bug/1715481
>>>>
>>>> I use wget to backup my cellphone running Palapa Web Server, and it
>>>> has worked well for me for years. Since upgrading to OpenSUSE Leap 15,
>>>> I have been having corrupted files.
>>>>
>>>> My method is
>>>> $ wget -r -N 192.168.2.100:8080
>>>> and if the connection is interrupted for any reason, the next time I
>>>> call wget it would complete any incomplete files. And since Leap 15, I
>>>> have been getting gradually corrupted backups. I was tearing my hair
>>>> out looking at wgetrc and other things.
>>>>
>>>> With one long file that I knew was incomplete, I got a Not Modified -
>>>> omitting download, even though I knew the file sizes were different
>>>> between the server and wget's copy - though the wget man page
>>>> explicitly states that if the file sizes do not match, -N will trigger
>>>> a download.
>>>>
>>>> I tried on OpenSUSE 42.3 (wget 1.14) and the incomplete file triggered
>>>> a download, even though wgetrc was identical.
>>>>
>>>> Again, on Leap 15, I compiled 1.20.1 (latest), 1.17.1, and then
>>>> finally with 1.16.3 the behaviour went back to what I expected (and I
>>>> got my corrupted phone backups fixed).
>>>>
>>>> Was a bug possibly introduced in 1.17 with the support for
>>>> --if-modified-since?
>>>>
>>>> Version shipping with OpenSUSE Leap 15:
>>>> GNU Wget 1.19.5 built on linux-gnu.
>>>> +cares +digest +gpgme +https +ipv6 +iri +large-file +metalink +nls
>>>> +ntlm +opie +psl +ssl/openssl
>>>>
>>>> Last version I tried where "wget -r -N" works as expected:
>>>> GNU Wget 1.16.3 built on linux-gnu.
>>>> +digest +https +ipv6 -iri +large-file +nls +ntlm +opie +psl +ssl/gnutls
>>>>
>>>> I'm open to the possibility that there may be something else causing
>>>> this bug, I have not found many mentions of it, but then again it is
>>>> subtle. You get pretty confident when you just let wget do its thing,
>>>> so there may be a lot of incomplete files out there... :)
>>>>
>>>> Thanks so much for your help. I can provide any other info that would
>>>> be helpful.
>>>>
>>>> Lawrence Wade
>>>> Ottawa, Canada
>>>
>>
>
>
>
signature.asc
Description: OpenPGP digital signature
