[Bug-wget] [PATCH] Make wget capable of starting download from a specified position.
This patch adds an option `--start-pos' for specifying starting position of a download, both for HTTP and FTP. When specified, the newly added option would override `--continue'. Apart from that, no existing code should be affected. Signed-off-by: Yousong Zhou --- Hi, I found myself needed this feature when I was trying to tunnel the download of big file (several gigabytes) from a remote machine back to local through a somewhat flaky connection. It's a pain both for the server and local network users if we have to repeat the previously already downloaded part in case that the connection hangs or breaks. Specifying 'Range: ' header is not an option for wget (integrity check in the code would fail), and curl is not fast enough. So I decided to make this patch in hope that this can also be useful to someone else. yousong doc/ChangeLog |4 doc/wget.texi | 14 ++ src/ChangeLog |9 + src/ftp.c |2 ++ src/http.c|2 ++ src/init.c|1 + src/main.c|1 + src/options.h |1 + 8 files changed, 34 insertions(+), 0 deletions(-) diff --git a/doc/ChangeLog b/doc/ChangeLog index 3b05756..df103c8 100644 --- a/doc/ChangeLog +++ b/doc/ChangeLog @@ -1,3 +1,7 @@ +2013-12-21 Yousong Zhou + + * wget.texi: Add documentation for --start-pos. + 2013-10-06 Tim Ruehsen * wget.texi: add/explain quoting of wildcard patterns diff --git a/doc/wget.texi b/doc/wget.texi index 4a1f7f1..166ea08 100644 --- a/doc/wget.texi +++ b/doc/wget.texi @@ -701,6 +701,20 @@ Another instance where you'll get a garbled file if you try to use Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http} servers that support the @code{Range} header. +@cindex offset +@cindex continue retrieval +@cindex incomplete downloads +@cindex resume download +@cindex start position +@item --start-pos=@var{OFFSET} +Start the download at position @var{OFFSET}. Offset may be expressed in bytes, +kilobytes with the `k' suffix, or megabytes with the `m' suffix. + +When specified, it would override the behavior of @samp{--continue}. When +using this option, you may also want to explicitly specify an output filename +with @samp{-O FILE} in order to not overwrite an existing partially downloaded +file. + @cindex progress indicator @cindex dot style @item --progress=@var{type} diff --git a/src/ChangeLog b/src/ChangeLog index 42ce3e4..ab8a496 100644 --- a/src/ChangeLog +++ b/src/ChangeLog @@ -1,3 +1,12 @@ +2013-12-21 Yousong Zhou + + * options.h: Add option --start-pos to specify start position of + a download. + * main.c: Same purpose as above. + * init.c: Same purpose as above. + * http.c: Utilize opt.start_pos for HTTP download. + * ftp.c: Utilize opt.start_pos for FTP retrieval. + 2013-11-02 Giuseppe Scrivano * http.c (gethttp): Increase max header value length to 512. diff --git a/src/ftp.c b/src/ftp.c index c2522ca..c7ab6ef 100644 --- a/src/ftp.c +++ b/src/ftp.c @@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo *f, ccon *con, char **local_fi /* Decide whether or not to restart. */ if (con->cmd & DO_LIST) restval = 0; + else if (opt.start_pos) +restval = opt.start_pos; else if (opt.always_rest && stat (locf, &st) == 0 && S_ISREG (st.st_mode)) diff --git a/src/http.c b/src/http.c index 754b7ec..a354c6b 100644 --- a/src/http.c +++ b/src/http.c @@ -3098,6 +3098,8 @@ Spider mode enabled. Check if remote file exists.\n")); /* Decide whether or not to restart. */ if (force_full_retrieve) hstat.restval = hstat.len; + else if (opt.start_pos) +hstat.restval = opt.start_pos; else if (opt.always_rest && got_name && stat (hstat.local_file, &st) == 0 diff --git a/src/init.c b/src/init.c index 84ae654..7f7a34e 100644 --- a/src/init.c +++ b/src/init.c @@ -271,6 +271,7 @@ static const struct { { "showalldnsentries", &opt.show_all_dns_entries, cmd_boolean }, { "spanhosts",&opt.spanhost, cmd_boolean }, { "spider", &opt.spider,cmd_boolean }, + { "startpos", &opt.start_pos, cmd_bytes }, { "strictcomments", &opt.strict_comments, cmd_boolean }, { "timeout", NULL, cmd_spec_timeout }, { "timestamping", &opt.timestamping, cmd_boolean }, diff --git a/src/main.c b/src/main.c index 19d7253..4fbfaee 100644 --- a/src/main.c +++ b/src/main.c @@ -281,6 +281,7 @@ static struct cmdline_option option_data[] = { "server-response", 'S', OPT_BOOLEAN, "serverresponse", -1 }, { "span-hosts", 'H', OPT_BOOLEAN, "spanhosts", -1 }, { "spider", 0, OPT_BOOLEAN, "spider", -1 }, +{ "start-pos", 0, OPT_VALUE, "startpos", -1 }, { "strict-comments", 0, OPT_BOOLEAN, "strictcomments", -1 }, { "timeout", 'T', OPT_VALUE, "timeout", -1 }, { "timestamp
Re: [Bug-wget] wget seems to be "out of touch" with security (fails on most (all?) https websites...(where browsers work)
Daniel Kahn Gillmor wrote: A) if the client already has the root CA's cert, there is no need to transmit it B) alternately, if the client does not already have the root CA's cert, then it has no reason to trust the root CA's cert, so why bother transmitting it? --- perfect sense. The reason I spoke up is this was the 2nd or 3rd time since upgrading. kernel.org also comes back with probs: wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.11.3.xz --2013-12-20 15:49:09-- https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.11.3.xz Resolving web-proxy (web-proxy)... 192.168.4.1 Connecting to web-proxy (web-proxy)|192.168.4.1|:8118... connected. WARNING: cannot verify www.kernel.org's certificate, issued by ‘/C=IL/O=StartCom Ltd./OU=Secure Digital Certificate Signing/CN=StartCom Class 2 Primary Intermediate Server CA’: Unable to locally verify the issuer's authority. Proxy request sent, awaiting response... 200 OK Length: 67900 (66K) [application/x-xz] Saving to: ‘patch-3.11.3.xz’ I can't find the 3rd rq, right now, I so not sure what its prob was... but 3 probs in as many days and I begin to think wget isn't accessing the right security files (the "out of touch" bit...);-)
Re: [Bug-wget] wget seems to be "out of touch" with security (fails on most (all?) https websites...(where browsers work)
On 12/20/2013 05:12 PM, L Walsh wrote: > > > Daniel Kahn Gillmor wrote: >> >> openssl s_client -connect collaboration.opengroup.org:443 > openssl s_client -connect collaboration.opengroup.org:443 > CONNECTED(0003) > depth=2 C = US, O = "The Go Daddy Group, Inc.", OU = Go Daddy Class 2 > Certification Authority > verify error:num=19:self signed certificate in certificate chain > verify return:0 [...] > Verify return code: 19 (self signed certificate in certificate chain) > --- > - > > I'm not well versed in reading certs, but is the problem that > godaddy's cert looks 'self-signed'? Nope, we expect the certificate for a root CA to be self-signed. Godaddy's cert there is a root CA's cert. the error report there is that the opengroup server is needlessly including the root CA's cert in their list of certs. the only things servers need to send are: 0) their end-entity ("EE")cert (the cert that belongs to the server itself) 1) the cert of the intermediate CA that signed the EE cert, if any 2) the cert of the intermediate CA that signed the first intermediate CA cert, if any, etc... and so on, up to, but not including the root CA's cert. Why isn't the root CA's cert necessary? because: A) if the client already has the root CA's cert, there is no need to transmit it B) alternately, if the client does not already have the root CA's cert, then it has no reason to trust the root CA's cert, so why bother transmitting it? You probably can find a copy of godaddy's root cert in your filesystem (e.g. in the ca-certificates package in debian, there is /usr/share/ca-certificates/mozilla/Go_Daddy_Class_2_CA.crt. tell openssl s_client that this is an acceptable root authority (e.g. via the -CAfile option), and it should connect fine: openssl s_client \ -CAfile /usr/share/ca-certificates/mozilla/Go_Daddy_Class_2_CA.crt \ -connect collaboration.opengroup.org:443b for wget, the comparable option is --ca-certificate. hth, --dkg signature.asc Description: OpenPGP digital signature
Re: [Bug-wget] wget seems to be "out of touch" with security (fails on most (all?) https websites...(where browsers work)
Daniel Kahn Gillmor wrote: openssl s_client -connect collaboration.opengroup.org:443 openssl s_client -connect collaboration.opengroup.org:443 CONNECTED(0003) depth=2 C = US, O = "The Go Daddy Group, Inc.", OU = Go Daddy Class 2 Certification Authority verify error:num=19:self signed certificate in certificate chain verify return:0 --- Certificate chain 0 s:/O=*.opengroup.org/OU=Domain Control Validated/CN=*.opengroup.org i:/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certificates.godaddy.com/repository/CN=Go Daddy Secure Certification Authority/serialNumber=07969287 1 s:/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certificates.godaddy.com/repository/CN=Go Daddy Secure Certification Authority/serialNumber=07969287 i:/C=US/O=The Go Daddy Group, Inc./OU=Go Daddy Class 2 Certification Authority 2 s:/C=US/O=The Go Daddy Group, Inc./OU=Go Daddy Class 2 Certification Authority i:/C=US/O=The Go Daddy Group, Inc./OU=Go Daddy Class 2 Certification Authority --- Server certificate -BEGIN CERTIFICATE- MIIFYTCCBEmgAwIBAgIHB91EOjiUOjANBgkqhkiG9w0BAQUFADCByjELMAkGA1UE BhMCVVMxEDAOBgNVBAgTB0FyaXpvbmExEzARBgNVBAcTClNjb3R0c2RhbGUxGjAY BgNVBAoTEUdvRGFkZHkuY29tLCBJbmMuMTMwMQYDVQQLEypodHRwOi8vY2VydGlm aWNhdGVzLmdvZGFkZHkuY29tL3JlcG9zaXRvcnkxMDAuBgNVBAMTJ0dvIERhZGR5 IFNlY3VyZSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eTERMA8GA1UEBRMIMDc5Njky ODcwHhcNMTIxMDAyMDg1NDU1WhcNMTcxMDAyMDg1NDU1WjBXMRgwFgYDVQQKDA8q Lm9wZW5ncm91cC5vcmcxITAfBgNVBAsTGERvbWFpbiBDb250cm9sIFZhbGlkYXRl ZDEYMBYGA1UEAwwPKi5vcGVuZ3JvdXAub3JnMIIBIjANBgkqhkiG9w0BAQEFAAOC AQ8AMIIBCgKCAQEAzUXFPU0nOq9uC9eewV3T8q6qt/N9jhuuSiZ7BTmvkV47VE3e WBTWnRSxF5GOs/SV2oUo4qF9vYtZVURPjXeZ0FL2n0GeSYBtH4scChcMBa4IbOhF 2h0l4SL0dF0SSaJmElOFdg/pHFIHhU9cGN2AOKbHW71BnKVvVu80lLc01kvlUYZ3 P3r000FFL1Z2uH+fBpF4QxJfbPKcPDvdrwnGOGcnLJnSm8TuNuAn5uXw4AN6/jkd UgYphp0IqpdMiAuQe9Pa+WjghWH+Ot7rhfWm2Cu+7mFd8ix67T58Re/Pdt9+v8+w viVWUMKh+1V4ZMEgpM4Wt1cR7JUF7lf4Xcj9IwIDAQABo4IBvDCCAbgwDwYDVR0T AQH/BAUwAwEBADAdBgNVHSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwDgYDVR0P AQH/BAQDAgWgMDMGA1UdHwQsMCowKKAmoCSGImh0dHA6Ly9jcmwuZ29kYWRkeS5j b20vZ2RzMS03Ny5jcmwwUwYDVR0gBEwwSjBIBgtghkgBhv1tAQcXATA5MDcGCCsG AQUFBwIBFitodHRwOi8vY2VydGlmaWNhdGVzLmdvZGFkZHkuY29tL3JlcG9zaXRv cnkvMIGABggrBgEFBQcBAQR0MHIwJAYIKwYBBQUHMAh0dHA6Ly9vY3NwLmdv ZGFkZHkuY29tLzBKBggrBgEFBQcwAoY+aHR0cDovL2NlcnRpZmljYXRlcy5nb2Rh ZGR5LmNvbS9yZXBvc2l0b3J5L2dkX2ludGVybWVkaWF0ZS5jcnQwHwYDVR0jBBgw FoAU/axhMpNsRdbi7oVfmrrndplozOcwKQYDVR0RBCIwIIIPKi5vcGVuZ3JvdXAu b3Jngg1vcGVuZ3JvdXAub3JnMB0GA1UdDgQWBBTwOK+cZzMoC8P0rbAuhXBio5Dt xDANBgkqhkiG9w0BAQUFAAOCAQEAH05lag39y+BUPlOZa+fibAV7q2RWiMfe+3XG 9J6Cfbnd51FpX6HLfrC30/WHhkVkGuAlrtMaewoyJ/HveRaw1qO5UrtlELaQSu5e s5pNRBcFQA8PyHn7n/Nxzohf69zuuPQZA3yiGfoFlucGSubq+z6+B/2Q16hSILBW dIF1SAaSKT+CdHkzoX9CWpftst1hu30HmaRk4ELfR8mZszcTB33XNEXKhuA3rJHu 7A+FU6YInd3wUsjjqzxdNPvZo7f6XH3y7WduVpI1JuG+y9Oi+HVHzF32QSFOwX4S qRtix+03WSyZ9QGATRTdyn7av5US4mxj18nkTiXJosiDV5zjLA== -END CERTIFICATE- subject=/O=*.opengroup.org/OU=Domain Control Validated/CN=*.opengroup.org issuer=/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certificates.godaddy.com/repository/CN=Go Daddy Secure Certification Authority/serialNumber=07969287 --- No client certificate CA names sent --- SSL handshake has read 4364 bytes and written 517 bytes --- New, TLSv1/SSLv3, Cipher is DHE-RSA-AES256-SHA Server public key is 2048 bit Secure Renegotiation IS supported Compression: NONE Expansion: NONE SSL-Session: Protocol : TLSv1 Cipher: DHE-RSA-AES256-SHA Session-ID: 487454B12E7EAD451BF1B134B5D64ED9BD276942E1698972405B7C38370D9962 Session-ID-ctx: Master-Key: B71914B309EE9378995E72F6C43F177897BF98363C5774A0D5B9B04440153A942653FDBF5C8C9E1D3652666A3067ED2D Key-Arg : None PSK identity: None PSK identity hint: None SRP username: None Start Time: 1387577358 Timeout : 300 (sec) Verify return code: 19 (self signed certificate in certificate chain) --- - I'm not well versed in reading certs, but is the problem that godaddy's cert looks 'self-signed'?
Re: [Bug-wget] wget seems to be "out of touch" with security (fails on most (all?) http websites...(where browsers work)
Am Freitag, 20. Dezember 2013, 13:54:12 schrieb Mike Frysinger: > On Friday 20 December 2013 12:03:43 L Walsh wrote: > > Perhaps wget isn't using the new location? > > openssl manages its cert locations itself, not wget. file a bug for your > distro. You are right. What I wrote before about /etc/ssl/certs counts for Wget +gnutls only. Sorry. Tim signature.asc Description: This is a digitally signed message part.
Re: [Bug-wget] wget seems to be "out of touch" with security (fails on most (all?) http websites...(where browsers work)
Am Freitag, 20. Dezember 2013, 09:03:43 schrieb L Walsh: > But at the end of the update script, I notice a message: > if ($foundignored) > { >print STDERR "\n* = CA Certificates in /etc/ssl/certs are only seen by > some legacy applications. > To install CA-Certificates globally move them to /etc/pki/trust/ancors > instead!\n"; } > > Perhaps wget isn't using the new location? Wget is using /etc/ssl/certs by default. If the distribution uses a different directory, the package maintainer should change the default directory either by providing a patch or by specifying the directory in /etc/wgetrc. Have a look into /etc/sl/certs and /etc/pki/trust/ancors, which of them fits your needs. Assuming you want /etc/pki/trust/ancors as the certificate directory, put it into /etc/wgetrc (or into ~/.wgetrc): cadirectory=/etc/pki/trust/ancors BTW, the 'Go Daddy' certs are named here (Debian SID) Go_Daddy_* It is a good idea to submit a bug report for the wget package of your dist (if it hasn't already be done by someone else). Regards, Tim signature.asc Description: This is a digitally signed message part.
Re: [Bug-wget] wget seems to be "out of touch" with security (fails on most (all?) http websites...(where browsers work)
On Friday 20 December 2013 12:03:43 L Walsh wrote: > Perhaps wget isn't using the new location? openssl manages its cert locations itself, not wget. file a bug for your distro. -mike signature.asc Description: This is a digitally signed message part.
Re: [Bug-wget] wget seems to be "out of touch" with security (fails on most (all?) http websites...(where browsers work)
mancha wrote: L Walsh tlinx.org> writes: I recently started using 1.14 of wget included with my distro's updates: GNU Wget 1.14 built on linux-gnu. Trouble is, it gives security warnings on almost every https site I access. I can't think of 1 where I didn't have to override the security warning (and this time, I just put it in my .wgetrc file). So why does wget get all these errors when my browsers don't? It appears your wget is built against the openssl library. For https certificate verification to work in wget automagically as it does in the major browsers, openssl needs a properly configured root certificate store (default location: /etc/ssl/certs). I have the latest ca-certificates for opensuse 13.1 installed: rpm -ql ca-certificates /etc/ca-certificates /etc/ca-certificates/update.d /etc/pki /etc/pki/trust /etc/pki/trust/anchors /etc/pki/trust/blacklist /etc/ssl/ca-bundle.pem /etc/ssl/certs /usr/lib/ca-certificates /usr/lib/ca-certificates/update.d /usr/lib/ca-certificates/update.d/certbundle.run /usr/lib/ca-certificates/update.d/etc_ssl.run /usr/lib/ca-certificates/update.d/java.run /usr/lib/ca-certificates/update.d/openssl.run /usr/sbin/update-ca-certificates /usr/share/doc/packages/ca-certificates /usr/share/doc/packages/ca-certificates/COPYING /usr/share/doc/packages/ca-certificates/README /usr/share/man/man8/update-ca-certificates.8.gz /usr/share/pki /usr/share/pki/trust /usr/share/pki/trust/anchors /usr/share/pki/trust/blacklist /var/lib/ca-certificates /var/lib/ca-certificates/ca-bundle.pem /var/lib/ca-certificates/java-cacerts /var/lib/ca-certificates/openssl /var/lib/ca-certificates/pem -- It shows files in /etc/ssl as well as other places. But at the end of the update script, I notice a message: if ($foundignored) { print STDERR "\n* = CA Certificates in /etc/ssl/certs are only seen by some legacy applications. To install CA-Certificates globally move them to /etc/pki/trust/ancors instead!\n"; } Perhaps wget isn't using the new location? Check your distrib's documentation/support forums/mailing lists for how to set this up. It might be a package that you can easily install (for example, Debian and derivatives call theirs "ca-certificates"). This is not a wget issue proper. --mancha
Re: [Bug-wget] wget seems to be "out of touch" with security (fails on most (all?) http websites...(where browsers work)
Daniel Stenberg haxx.se> writes: > > On Fri, 20 Dec 2013, mancha wrote: > > > This is not a wget issue proper. > > If it only warns and still continues and gets the content, I would still > call it a problem. I believe it continues because of an explicit user override of default behavior (--no-check-certificate). The reporter can confirm that, of course. --mancha
Re: [Bug-wget] wget seems to be "out of touch" with security (fails on most (all?) http websites...(where browsers work)
On Fri, 20 Dec 2013, mancha wrote: This is not a wget issue proper. If it only warns and still continues and gets the content, I would still call it a problem. -- / daniel.haxx.se