[Bug-wget] How to specify output file name ( not -O for logging messages)

2013-01-31 Thread L Walsh

I wanted to try downloading a file and wanted to specify the name for it to be 
saved
as.

How do I do that?

The name (the http address) was too long of a name for a file system on linux.





Re: [Bug-wget] How to specify output file name ( not -O for logging messages)

2013-02-01 Thread L Walsh

Ah... Thanks!... my confusion!


Dmitry Bogatov wrote:

There was conversation with L Walsh:

I wanted to try downloading a file and wanted to specify the
name for it to be saved as.
How do I do that?


wget --output-document
PS. In your header you talk about -O as log messages. In fact,
-o(small letter) is log messages, and -O(capital one) is output
document.




Re: [Bug-wget] Save 3 byte utf8 url

2013-02-15 Thread L Walsh




Ángel González wrote:

On 07/02/13 15:06, bes wrote:

Hi,

i found some bug in wget with interpreting and save percent-encoding 3 byte
utf8 url

example:
1. Create url with "—". This is U+2014 (EM DASH). Percent-encoding UTF-8 is
"%E2%80%94"
2. Try wget it: wget "http://example.com/abc—d"; or wget "
http://example.com/abc%E2%80%94d"; directly
3. Wget save this URL to file "abc\342%80%94d". Expected is
"abc%E2%80%94d". This is a bug.


The problem is that it checks if it's a printable character in latin1.
There is a bug at https://savannah.gnu.org/bugs/index.php?37564
An option would be to use --restrict-file-names=nocontrol to get the em
dash in the filename, instead of the percent-encoded version.

---
Do you mean printable character in the current locale?
Or can it not do UTF-8 at all?


latin1 is going the way of the dodo...most sites still use it, but
HTML5 is supposed to be UTF8..

If it found "González" on a file would it be able to save it correctly?




Re: [Bug-wget] Save 3 byte utf8 url

2013-02-16 Thread L Walsh



Ángel González wrote:



Or can it not do UTF-8 at all?

latin1 is going the way of the dodo...most sites still use it, but
HTML5 is supposed to be UTF8..
http://www.whatwg.org/specs/web-apps/current-work/#urls refers to 
http://url.spec.whatwg.org/ and it does set the encoding by default to 
utf-8. But I think it refers to /encoding/ a character, not to figure 
out which encoding was used in a url.

---
Aren't url's, usually referenced by getting them from
within a webpage?  So, _if_ the source of the webpage was UTF-8
encoded, wouldn't the url's also be encoded that way?

I notice in FF, I can choose the messed up version or the
real version in 'about:config' with the settings:
network.standard-url.encode.query-utf8(default is false, but I set it to 
TRUE
I have yet to encounter a website that DOESN'T understand UTF-8).

and the other -- (and this is the one that gives you real vs. %%):
network.standard-url.escape-utf8   (default=true meaning do %%), but
changing that to false will send utf8 'over the wire' (and change what
you 'Copy', if you copy the url from the address bar.


Example:  With the latter setting in default, if I type in
http://www.last.fm/music/梶浦由記

I'll get taken to a page where the addr-bar LOOKS that way
(assuming the 1st setting, above, is TRUE),  but if I try to
cut/paste, I get 
"http://www.last.fm/music/%E6%A2%B6%E6%B5%A6%E7%94%B1%E8%A8%98";.

However, if I have the 2nd setting in non-default, 'FALSE' (meaning
don't encode UTF-8 as %%), then going to that page, and CUT/Paste
gives me: http://www.last.fm/music/梶浦由記.

If I save that page from my browser on windows 7 ...
the file is saved correctly (as viewed from either Explorer,
or a 'Cygwin X11 Window (like Terminal).  But if I view it
from an old DOS-compat-style window like the one that comes up with
'cmd.exe' -- there I get '' as it can't display UTF-8.


Unfortunately,  I know of no native Microsoft win32-cmd line program
that will display the chars correctly even though you CAN set the
terminal / MS-console for UTF8 with
mode[.com]  con: cp select=65001, but MS's driver for codepage
65001 is (IMO) deliberately broken to prevent people from
using UTF-8 (which was the chosen standard for Unicode, over MS's
preferred UCS-2 solution (which they often, "rebrand",  usually
falsely) as UTF-16.  A large number of their legacy programs that
don't natively understand UTF-8 don't work beyond code page 1 --
i.e. UCS-2 compatible -- only 16-bits for the character.

Most don't *really* handle UTF-16, which takes 2 16-bit characters
to handle the full Unicode standard.




We could assume it's the same charset as the document, but what to do 
with documents with no charset (by wrong configuration, or for being 
scripts, images...) ?

---
User choice or option?  -- I think you are supposed to
try a utf-8 decode on the object first, as if the document ISN'T
UTF-8, it will fail, but the reverse is not true -- if you try to
decode as latin1, all codes from 0x20-0xff are valid display codes,
so the decode algorithm can't fail.  But with UTF-8, any char
over 0x7f, has to be a 2 byte sequence where both should have
the high bit set.  All of the UTF-8 'continuation' bytes have 0b10 in
conforming (standard) UTF-8.


Seems easier to treat as utf-8 if it contains utf-8 sequences. That 
still needs a transformation of filenames, though.

---
On linux, if their locale is UTF-8, then not.  or even on
Windows under cygwin -- if their locale is UTF-8, then not.  But if
they have an 8-bit locale -- you'd have to use %encoding to get
everything.  No guarantees that the UTF-8 filenames they download
can be recoded into any 8-bit character set.  But on windows --
I'd decode to UTF-16 and use that -- since at least the filename
will look correct if they browse it in a desktop application
or if they use an X11 Terminal like they could from the cygwin
collection






If it found "González" on a file would it be able to save it correctly?


wget is always able to download the urls, the only difference is if they 
"look nice" in your system 

---
Or if they can be saved at all -- some google addresses are > the filename
length.


A url like http://example.org/González in utf-8 would be encoded as 
http://example.org/Gonz%c3%a1lez so wget would think those are the 
characters à (0xC3) and ¡ (0xA1), saving it "as is". So if my filenames 
are utf-8 (eg. Linux) I will see it as González, if they are latin1 (eg. 
Windows, using windows-1252) I will see it as González.


Oh joy!  (*sigh*)





Re: [Bug-wget] A possible wget bug?

2013-07-13 Thread L Walsh

I'm downloading and would like to exclude several paths from the
download. I put the command in a script (enclosed below) but it keeps
doing the download. Am I messing something up? Am I not understanding
the syntax?

My intent is to NOT download any of the i386, epel, macports,
postgresql, or the ubuntu directories. I tried just coding
i386,epel,macports,postgresql,ubuntu but that didn't work either. 


Any help you can give me in this would be wonderful.

---

I am doing something similar with open SUSE dist's -- I like to keep the
current versions I am using on my local disk.  I too had things that
I wanted to exclude.  So I first cam up with a list of exts and pats that
I wanted to exclude that use the normal rsynch wildcard matching.

ignore_exts='*.mirrorlist*,*.i586*,*.iso,*.asc,*.torrent,*.md5,*.drpm'
ignore_pats='*index.html?C=?;O=?*,*Addon-Lang*iso*,*LiveCD*iso*,*NET*iso*,*/i586
/.*'

That simplified the invocation statement.  The key was getting the exclude
statements to ignore everything I didn't want.  I found once those were
right, the rest comes easier.





[Bug-wget] wget seems to be "out of touch" with security (fails on most (all?) http websites...(where browsers work)

2013-12-19 Thread L Walsh

I recently started using 1.14 of wget included with my distro's updates:
GNU Wget 1.14 built on linux-gnu.

+digest +https +ipv6 +iri +large-file +nls +ntlm +opie +ssl/openssl

Wgetrc:
/home/law/.wgetrc (user)
/etc/wgetrc (system)
Locale: /usr/share/locale
Compile: gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc"
-DLOCALEDIR="/usr/share/locale" -I. -I../lib -I../lib
-fmessage-length=0 -grecord-gcc-switches -O2 -Wall
-D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables
-fasynchronous-unwind-tables -g
Link: gcc -fmessage-length=0 -grecord-gcc-switches -O2 -Wall
-D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables
-fasynchronous-unwind-tables -g -lproxy /usr/lib64/libssl.so
/usr/lib64/libcrypto.so /usr/lib64/libz.so -ldl -lz -lz -lidn
ftp-opie.o openssl.o http-ntlm.o ../lib/libgnu.a


-

Trouble is, it gives security warnings on almost every https
site I access.

I can't think of 1 where I didn't have to override the security
warning (and this time, I just put it in my .wgetrc file).

So why does wget get all these errors when my browsers don't?



Like here is pulling a single doc from the POSIX folks at
open group.  Anyone have an idea why certs from godaddy would
not resolve properly?

Thanks...

wget 
"https://collaboration.opengroup.org/pegasus/pp/documents/29166/ReleaseNotes.htm";
--2013-12-19 20:38:25-- 
https://collaboration.opengroup.org/pegasus/pp/documents/29166/ReleaseNotes.htm

Resolving collaboration.opengroup.org (collaboration.opengroup.org)... 
64.79.149.150
Connecting to collaboration.opengroup.org 
(collaboration.opengroup.org)|64.79.149.150|:443... connected.
WARNING: cannot verify collaboration.opengroup.org's certificate, issued by 
‘/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, 
Inc./OU=http://certificates.godaddy.com/repository/CN=Go Daddy Secure 
Certification Authority/serialNumber=07969287’:

  Self-signed certificate encountered.
HTTP request sent, awaiting response... 302 Found
Location: 
https://sso.opengroup.org/IDBUS/PROD/PHP-PLATO/JOSSO/SSO/REDIR?josso_back_to=https://collaboration.opengroup.org/josso/josso-php-partnerapp/josso-security-check.php&josso_cmd=login_optional&josso_partnerapp_host=collaboration.opengroup.org&josso_partnerapp_id=plato 
[following]
--2013-12-19 20:38:26-- 
https://sso.opengroup.org/IDBUS/PROD/PHP-PLATO/JOSSO/SSO/REDIR?josso_back_to=https://collaboration.opengroup.org/josso/josso-php-partnerapp/josso-security-check.php&josso_cmd=login_optional&josso_partnerapp_host=collaboration.opengroup.org&josso_partnerapp_id=plato

Resolving sso.opengroup.org (sso.opengroup.org)... 64.79.149.147
Connecting to sso.opengroup.org (sso.opengroup.org)|64.79.149.147|:443... 
connected.
WARNING: cannot verify sso.opengroup.org's certificate, issued by 
‘/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, 
Inc./OU=http://certificates.godaddy.com/repository/CN=Go Daddy Secure 
Certification Authority/serialNumber=07969287’:

  Self-signed certificate encountered.
HTTP request sent, awaiting response... 302 Found
Location: 
https://collaboration.opengroup.org/josso/josso-php-partnerapp/josso-security-check.php 
[following]
--2013-12-19 20:38:26-- 
https://collaboration.opengroup.org/josso/josso-php-partnerapp/josso-security-check.php

Reusing existing connection to collaboration.opengroup.org:443.
HTTP request sent, awaiting response... 302 Found
Location: /pegasus/pp/documents/29166/ReleaseNotes.htm [following]
--2013-12-19 20:38:26-- 
https://collaboration.opengroup.org/pegasus/pp/documents/29166/ReleaseNotes.htm

Reusing existing connection to collaboration.opengroup.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 103075 (101K) [text/html]
Saving to: ‘ReleaseNotes.htm’



Re: [Bug-wget] wget seems to be "out of touch" with security (fails on most (all?) http websites...(where browsers work)

2013-12-20 Thread L Walsh



mancha wrote:

L Walsh  tlinx.org> writes:


I recently started using 1.14 of wget included with my distro's updates:
GNU Wget 1.14 built on linux-gnu.
 
Trouble is, it gives security warnings on almost every https

site I access.

I can't think of 1 where I didn't have to override the security
warning (and this time, I just put it in my .wgetrc file).

So why does wget get all these errors when my browsers don't?


It appears your wget is built against the openssl library. For https
certificate verification to work in wget automagically as it does in
the major browsers, openssl needs a properly configured root
certificate store (default location: /etc/ssl/certs).


I have the latest ca-certificates for opensuse 13.1 installed:

rpm -ql ca-certificates

/etc/ca-certificates
/etc/ca-certificates/update.d
/etc/pki
/etc/pki/trust
/etc/pki/trust/anchors
/etc/pki/trust/blacklist
/etc/ssl/ca-bundle.pem
/etc/ssl/certs
/usr/lib/ca-certificates
/usr/lib/ca-certificates/update.d
/usr/lib/ca-certificates/update.d/certbundle.run
/usr/lib/ca-certificates/update.d/etc_ssl.run
/usr/lib/ca-certificates/update.d/java.run
/usr/lib/ca-certificates/update.d/openssl.run
/usr/sbin/update-ca-certificates
/usr/share/doc/packages/ca-certificates
/usr/share/doc/packages/ca-certificates/COPYING
/usr/share/doc/packages/ca-certificates/README
/usr/share/man/man8/update-ca-certificates.8.gz
/usr/share/pki
/usr/share/pki/trust
/usr/share/pki/trust/anchors
/usr/share/pki/trust/blacklist
/var/lib/ca-certificates
/var/lib/ca-certificates/ca-bundle.pem
/var/lib/ca-certificates/java-cacerts
/var/lib/ca-certificates/openssl
/var/lib/ca-certificates/pem
--
It shows files in /etc/ssl as well as other places.

But at the end of the update script, I notice a message:
if ($foundignored)
{
  print STDERR "\n* = CA Certificates in /etc/ssl/certs are only seen by some 
legacy applications.

To install CA-Certificates globally move them to /etc/pki/trust/ancors 
instead!\n";
}

Perhaps wget isn't using the new location?












Check your distrib's documentation/support forums/mailing lists
for how to set this up. It might be a package that you can easily
install (for example, Debian and derivatives call theirs
"ca-certificates").

This is not a wget issue proper.

--mancha











Re: [Bug-wget] wget seems to be "out of touch" with security (fails on most (all?) https websites...(where browsers work)

2013-12-20 Thread L Walsh



Daniel Kahn Gillmor wrote:


 openssl s_client -connect collaboration.opengroup.org:443

openssl s_client -connect collaboration.opengroup.org:443
CONNECTED(0003)
depth=2 C = US, O = "The Go Daddy Group, Inc.", OU = Go Daddy Class 2 
Certification Authority

verify error:num=19:self signed certificate in certificate chain
verify return:0
---
Certificate chain
 0 s:/O=*.opengroup.org/OU=Domain Control Validated/CN=*.opengroup.org
   i:/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, 
Inc./OU=http://certificates.godaddy.com/repository/CN=Go Daddy Secure 
Certification Authority/serialNumber=07969287
 1 s:/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, 
Inc./OU=http://certificates.godaddy.com/repository/CN=Go Daddy Secure 
Certification Authority/serialNumber=07969287

   i:/C=US/O=The Go Daddy Group, Inc./OU=Go Daddy Class 2 Certification 
Authority
 2 s:/C=US/O=The Go Daddy Group, Inc./OU=Go Daddy Class 2 Certification 
Authority
   i:/C=US/O=The Go Daddy Group, Inc./OU=Go Daddy Class 2 Certification 
Authority
---
Server certificate
-BEGIN CERTIFICATE-
MIIFYTCCBEmgAwIBAgIHB91EOjiUOjANBgkqhkiG9w0BAQUFADCByjELMAkGA1UE
BhMCVVMxEDAOBgNVBAgTB0FyaXpvbmExEzARBgNVBAcTClNjb3R0c2RhbGUxGjAY
BgNVBAoTEUdvRGFkZHkuY29tLCBJbmMuMTMwMQYDVQQLEypodHRwOi8vY2VydGlm
aWNhdGVzLmdvZGFkZHkuY29tL3JlcG9zaXRvcnkxMDAuBgNVBAMTJ0dvIERhZGR5
IFNlY3VyZSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eTERMA8GA1UEBRMIMDc5Njky
ODcwHhcNMTIxMDAyMDg1NDU1WhcNMTcxMDAyMDg1NDU1WjBXMRgwFgYDVQQKDA8q
Lm9wZW5ncm91cC5vcmcxITAfBgNVBAsTGERvbWFpbiBDb250cm9sIFZhbGlkYXRl
ZDEYMBYGA1UEAwwPKi5vcGVuZ3JvdXAub3JnMIIBIjANBgkqhkiG9w0BAQEFAAOC
AQ8AMIIBCgKCAQEAzUXFPU0nOq9uC9eewV3T8q6qt/N9jhuuSiZ7BTmvkV47VE3e
WBTWnRSxF5GOs/SV2oUo4qF9vYtZVURPjXeZ0FL2n0GeSYBtH4scChcMBa4IbOhF
2h0l4SL0dF0SSaJmElOFdg/pHFIHhU9cGN2AOKbHW71BnKVvVu80lLc01kvlUYZ3
P3r000FFL1Z2uH+fBpF4QxJfbPKcPDvdrwnGOGcnLJnSm8TuNuAn5uXw4AN6/jkd
UgYphp0IqpdMiAuQe9Pa+WjghWH+Ot7rhfWm2Cu+7mFd8ix67T58Re/Pdt9+v8+w
viVWUMKh+1V4ZMEgpM4Wt1cR7JUF7lf4Xcj9IwIDAQABo4IBvDCCAbgwDwYDVR0T
AQH/BAUwAwEBADAdBgNVHSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwDgYDVR0P
AQH/BAQDAgWgMDMGA1UdHwQsMCowKKAmoCSGImh0dHA6Ly9jcmwuZ29kYWRkeS5j
b20vZ2RzMS03Ny5jcmwwUwYDVR0gBEwwSjBIBgtghkgBhv1tAQcXATA5MDcGCCsG
AQUFBwIBFitodHRwOi8vY2VydGlmaWNhdGVzLmdvZGFkZHkuY29tL3JlcG9zaXRv
cnkvMIGABggrBgEFBQcBAQR0MHIwJAYIKwYBBQUHMAh0dHA6Ly9vY3NwLmdv
ZGFkZHkuY29tLzBKBggrBgEFBQcwAoY+aHR0cDovL2NlcnRpZmljYXRlcy5nb2Rh
ZGR5LmNvbS9yZXBvc2l0b3J5L2dkX2ludGVybWVkaWF0ZS5jcnQwHwYDVR0jBBgw
FoAU/axhMpNsRdbi7oVfmrrndplozOcwKQYDVR0RBCIwIIIPKi5vcGVuZ3JvdXAu
b3Jngg1vcGVuZ3JvdXAub3JnMB0GA1UdDgQWBBTwOK+cZzMoC8P0rbAuhXBio5Dt
xDANBgkqhkiG9w0BAQUFAAOCAQEAH05lag39y+BUPlOZa+fibAV7q2RWiMfe+3XG
9J6Cfbnd51FpX6HLfrC30/WHhkVkGuAlrtMaewoyJ/HveRaw1qO5UrtlELaQSu5e
s5pNRBcFQA8PyHn7n/Nxzohf69zuuPQZA3yiGfoFlucGSubq+z6+B/2Q16hSILBW
dIF1SAaSKT+CdHkzoX9CWpftst1hu30HmaRk4ELfR8mZszcTB33XNEXKhuA3rJHu
7A+FU6YInd3wUsjjqzxdNPvZo7f6XH3y7WduVpI1JuG+y9Oi+HVHzF32QSFOwX4S
qRtix+03WSyZ9QGATRTdyn7av5US4mxj18nkTiXJosiDV5zjLA==
-END CERTIFICATE-
subject=/O=*.opengroup.org/OU=Domain Control Validated/CN=*.opengroup.org
issuer=/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, 
Inc./OU=http://certificates.godaddy.com/repository/CN=Go Daddy Secure 
Certification Authority/serialNumber=07969287

---
No client certificate CA names sent
---
SSL handshake has read 4364 bytes and written 517 bytes
---
New, TLSv1/SSLv3, Cipher is DHE-RSA-AES256-SHA
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
SSL-Session:
Protocol  : TLSv1
Cipher: DHE-RSA-AES256-SHA
Session-ID: 487454B12E7EAD451BF1B134B5D64ED9BD276942E1698972405B7C38370D9962
Session-ID-ctx:
Master-Key: 
B71914B309EE9378995E72F6C43F177897BF98363C5774A0D5B9B04440153A942653FDBF5C8C9E1D3652666A3067ED2D

Key-Arg   : None
PSK identity: None
PSK identity hint: None
SRP username: None
Start Time: 1387577358
Timeout   : 300 (sec)
Verify return code: 19 (self signed certificate in certificate chain)
---
-

I'm not well versed in reading certs, but is the problem that
godaddy's cert looks 'self-signed'?



Re: [Bug-wget] wget seems to be "out of touch" with security (fails on most (all?) https websites...(where browsers work)

2013-12-20 Thread L Walsh



Daniel Kahn Gillmor wrote:



 A) if the client already has the root CA's cert, there is no need to
transmit it

 B) alternately, if the client does not already have the root CA's cert,
then it has no reason to trust the root CA's cert, so why bother
transmitting it?

---
perfect sense.

The reason I spoke up is this was the 2nd or 3rd time since upgrading.

kernel.org also comes back with probs:


wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.11.3.xz
--2013-12-20 15:49:09-- 
https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.11.3.xz

Resolving web-proxy (web-proxy)... 192.168.4.1
Connecting to web-proxy (web-proxy)|192.168.4.1|:8118... connected.
WARNING: cannot verify www.kernel.org's certificate, issued by ‘/C=IL/O=StartCom 
Ltd./OU=Secure Digital Certificate Signing/CN=StartCom Class 2 Primary 
Intermediate Server CA’:

  Unable to locally verify the issuer's authority.
Proxy request sent, awaiting response... 200 OK
Length: 67900 (66K) [application/x-xz]
Saving to: ‘patch-3.11.3.xz’

I can't find the 3rd rq, right now, I so not sure what its prob was...
but 3 probs in as many days and I begin to think wget isn't accessing the
right security files (the "out of touch" bit...);-)






Re: [Bug-wget] wget seems to be "out of touch" with security (fails on most (all?) http websites...(where browsers work)

2013-12-21 Thread L Walsh



mancha wrote:

L Walsh  tlinx.org> writes:


I recently started using 1.14 of wget included with my distro's updates:
GNU Wget 1.14 built on linux-gnu.
 
Trouble is, it gives security warnings on almost every https

site I access.

I can't think of 1 where I didn't have to override the security
warning (and this time, I just put it in my .wgetrc file).

So why does wget get all these errors when my browsers don't?


It appears your wget is built against the openssl library. For https
certificate verification to work in wget automagically as it does in
the major browsers, openssl needs a properly configured root
certificate store (default location: /etc/ssl/certs).


-
What format file does wget require?

I noticed in /etc/ssl/certs:
  README.RootCerts:
 The OpenSSL project does not (any longer) include root CA certificates.
(and a suggestion to go read an FAQ (not in same dir -- have to find it)
other than that -- a bunch of .pem files but only for local daemons (likely
self-signed... imaps stuff mostly.
---
I noticed firefox points at the /etc/pki/nssdb, where I see
cert9.db, key4.db and pkcs11.txt (all dated Dec 9)...
would wget be able to read those?   That seems to be where the
current cert-store is...but not in pem

(FWIW -- one would think SUSE would have set this up
in advance for their distro version of wget... but I guess
that'd be too much "like right"...sigh)



Re: [Bug-wget] RFE: --norc

2014-02-14 Thread L Walsh


Darshit Shah wrote:

On Fri, Feb 14, 2014 at 8:25 AM, Pierre Fortin  wrote:


Just a thought...  have you tried: --config=/dev/null ?


Only recently did we introduce this feature to Wget.
In the current git sources, commit
b9e5cintroduces
the --no-config option which
will ignore any wgetrc files on your machine. It will be a part of the next
release.

=
Pierre, good suggestion... and Darshit -- excellent! ;-)




Re: [Bug-wget] [Bug-Wget] Issues with Metalink support

2014-04-05 Thread L Walsh



Darshit Shah wrote:

I was trying to download a large ISO (>4GB) through a metalink file.

The first thing that struck me was: The file is first downloaded to
/tmp and then moved to the location.

Is there any specific reason for this?


Sorry for the long delay answering this but I thought
I would mention a specific reason that such is done
on windows (that may apply to linux in various degrees
depending on filesystem type used and file-system activity).

To answer the question, there is a reason, but
its importance would be specific to each user's use case.

It is consistent with how some files from the internet are
downloaded, copied or extracted on windows.

I.e. IE will download things to a tmp dir (usually
under the user's home dir on windows), then
move it into place when it is done.  This prevents partly
transfered files from appearing in the destination.

Downloading this way can, also, *allow* for allocating
sufficient contiguous space at the destination in 1
allocation, and then copying the file
into place -- this allows for less fragmentation at the
final destination.  This is more true with larger
files and slower downloads that might stretch over several
or more minutes.  Other activity on the disk
is likely and if writes occur, they might happen in the
middle of where the downloaded file _could_ have had
contiguous space.

So putting a file that is likely to be fragmented as it
is downloaded due to other processes running, into
a 'tmp' location, can allow for knowing the full size
and allocating the full amount for the file so it can
be contiguous on disk.

It can't allocate the full amount for the file at
the destination until it has the whole thing locally, since
if the download is interrupted, the destination would contain
a file that looks to be the right size, but would have
an incomplete download in it.

Anyway -- the behavior of copying it to a tmp is a useful
feature to have -- IF you have the space.  It would be
a "nice" (not required) feature if there was an option on
how to do this (i.e. store file directly on download, or
use a tmpdir and then move (or copy) the file into the
final location.

Always going direct is safest if user is tight on diskspace,
but has the deficit of often causing more disk fragmentation.

(FWIW, I don't really care one way or the other, but wanted
to tell you why it might be useful)...

Cheers!
Linda



Re: [Bug-wget] [Bug-Wget] Issues with Metalink support

2014-04-05 Thread L Walsh



Random Coder wrote:
On Sat, Apr 5, 2014 at 4:09 PM, L Walsh <mailto:w...@tlinx.org>> wrote:


I.e. IE will download things to a tmp dir (usually
under the user's home dir on windows), then
move it into place when it is done.  This prevents partly
transfered files from appearing in the destination.


IE does not download to a tmp folder.

---

It depends on timing, what version of IE, and probably
the phase of the moon, but here's a abbreviated trace of me downloading
the linux kernel into C:\tmp\download.  I annotate what's going on in
the left column... you can see almost 50% of the file was downloaded
into a tmp file, then switched to final destination and only
wrote 1M chunks instead of previous 4-12K chunks.

6:17:13,IEXPLORE,CreateFile",OK 
,"C:AppData\Local\MS\Win\\BNZE234N","Desired Access: 
Read Attributes, OpenResult: Opened"
6:17:13,IEXPLORE,CreateFile",OK 
,"C:AppData\Local\MS\Win\\BNZE234N\linux-3.14.tar[1].xz","Desired 
Access: Generic Write, Read Attributes, OpenResult: Created"
6:17:13,IEXPLORE,SetAllocationInformationFile",OK 
,"C:AppData\Local\MS\Win\\BNZE234N\linux-3.14.tar[1].xz","AllocationSize: 
78,399,152"
6:17:13,IEXPLORE,WriteFile",OK 
,"C:AppData\Local\MS\Win\\BNZE234N\linux-3.14.tar[1].xz","Offset: 
0, Length: 704, Priority: Normal"
6:17:13,IEXPLORE,WriteFile",OK 
,"C:AppData\Local\MS\Win\\BNZE234N\linux-3.14.tar[1].xz","Offset: 
704, Length: 1,944"
6:17:13,IEXPLORE,WriteFile",OK 
,"C:AppData\Local\MS\Win\\BNZE234N\linux-3.14.tar[1].xz","Offset: 
2,648, Length: 8,192"
6:17:13,IEXPLORE,WriteFile",OK 
,"C:AppData\Local\MS\Win\\BNZE234N\linux-3.14.tar[1].xz","Offset: 
10,840, Length: 4,096"

...
6:17:23,IEXPLORE,WriteFile",OK 
,"C:AppData\Local\MS\Win\\BNZE234N\linux-3.14.tar[1].xz","Offset: 
36,207,192, Length: 4,096"
6:17:23,IEXPLORE,WriteFile",OK 
,"C:AppData\Local\MS\Win\\BNZE234N\linux-3.14.tar[1].xz","Offset: 
36,211,288, Length: 4,096"
6:17:23,IEXPLORE,WriteFile",OK 
,"C:AppData\Local\MS\Win\\BNZE234N\linux-3.14.tar[1].xz","Offset: 
36,215,384, Length: 16,384"


I've typed in the save pathname now:
6:17:23,explorer,808","CreateFile",OK 
,"C:\tmp\download\linux-3.14.tar.xz","Desired Access: Read Attributes, 
OpenResult: Opened"

6:17:23,explorer,808","CloseFile",OK ,"C:\tmp\download\linux-3.14.tar.xz",""
6:17:23,IEXPLORE,WriteFile",OK 
,"C:AppData\Local\MS\Win\\BNZE234N\linux-3.14.tar[1].xz","Offset: 
36,231,768, Length: 4,096"

...
6:17:23,IEXPLORE,WriteFile",OK 
,"C:AppData\Local\MS\Win\\BNZE234N\linux-3.14.tar[1].xz","Offset: 
36,461,144, Length: 4,096"

...

opens "partial file in same directory":

6:17:23,IEXPLORE,CreateFile",OK 
,"C:\tmp\download\linux-3.14.tar.xz.w5aj0r5.partial","Desired Access: Generic 
Write, OpenResult: Opened"


copies from 1st tmp to final location tmp, but in 1MB increments
6:17:23,IEXPLORE,ReadFile",OK 
,"C:AppData\Local\MS\Win\\BNZE234N\linux-3.14.tar[1].xz","Offset: 
0, Length: 1,048,576, Priority: Normal"
6:17:23,IEXPLORE,WriteFile",OK 
,"C:\tmp\download\linux-3.14.tar.xz.w5aj0r5.partial","Offset: 0, Length: 
1,048,576, Priority: Normal"

...
6:17:23,IEXPLORE,ReadFile",OK 
,"C:AppData\Local\MS\Win\\BNZE234N\linux-3.14.tar[1].xz","Offset: 
35,651,584, Length: 817,752"
6:17:23,IEXPLORE,WriteFile",OK 
,"C:\tmp\download\linux-3.14.tar.xz.w5aj0r5.partial","Offset: 35,651,584, 
Length: 817,752"
6:17:23,IEXPLORE,ReadFile","END OF 
FILE","C:AppData\Local\MS\Win\\BNZE234N\linux-3.14.tar[1].xz","Offset: 
36,469,336, Length: 1,048,576"


deletes first tmp, and now saved directly to "patial" at destination:
6:17:23,IEXPLORE,SetDispositionInformationFile",OK 
,"C:AppData\Local\MS\Win\\BNZE234N\linux-3.14.tar[1].xz","Delete: 
True"
6:17:23,IEXPLORE,CloseFile",OK 
,"C:AppData\Local\MS\Win\\BNZE234N\linux-3.14.tar[1].xz",""


only 1M writes:
6:17:23,IEXPLORE,WriteFile",OK 
,"C:\tmp\download\linux-3.14.tar.xz.w5aj0r5.partial","Offset: 36,469,336, 
Length: 1,048,576"
6:17:24,IEXPLORE,WriteFile",OK 
,"C:\tmp\download\linux-3.14.tar.xz.w5aj0r5.partial","Offset: 37,517,912, 
Length: 1,048,576"
6:17:24,IEXPLORE,WriteFile",OK 
,"C:\tmp\download\linux-3.14.tar.xz.w5aj0r5.partial","Offset: 38,566,488, 
Length: 1,048,576"
6:17:24,explorer,QueryDirectory",OK 
,"C:\tmp\download\linux-3.14.tar.xz.w5aj0r5.partial","Filter: 
linux

Re: [Bug-wget] [Bug-Wget] Issues with Metalink support

2014-04-07 Thread L Walsh



Darshit Shah wrote:
Wget could, in theory, use fallocate() for linux, posix_fallocate() for 
other posix-compliant systems and SetFileInformationByHandle (is this 
available on older versions of Windows?) for Windows systems. It isn't 
going out of the way by a large extent but ensures Wget plays well on 
each system. However, this is going to lead to way too many code paths 
and ifdef statements, and personally speaking, I'd rather we use only 
posix_fallocate() everywhere and the Windows SysCalls for Windows.


Hey, that'd be fine with me -- OR if the length is not known,
then allocating 1Meg chunks at a time and truncating at the final
write.  If performance was an issue, I'd fork off the truncation
in background -- I do something similar in a file util that can
delete duplicates, the deletions I do with async i/o in the
background so they won't slow down the primary function.

I don't usually have a problem with fragmentation on linux
as I run xfs and will do some pre-allocation for you (more in recent
kernels with it's "speculative preallocation"), AND for those who
have degenerate use cases or who are anal-retentive (*cough*) their
is a file-system reorganizer that can be run when needed or on a nightly
cronjob...  So this isn't really a problem for me -- I was answering
the question because MS took preventative measures to try to slow
down disk fragmentation, as NTFS (and FAT for that matter)
will suffer when it gets bad like many file systems.  Most don't protect
themselves to the extremes that xfs does to prevent it.

But a sane middle ground like using posix pre-alloc calls
and such seem like a reasonable middle ground -- or preallocating
larger spaces when downloading large files

I.e. Probably don't want to allocate a meg for each little
1k file on a mirror, but if you see the file size is large (size known),
or have downloaded a meg or more, then preallocation w/a truncate
starts to make some sense...

I was just speaking up to answer the question you posed, about
why someone might copy to one place then another...it wasn't meant
to create a problem as to give some insight as to why it might be done.





Re: [Bug-wget] [Bug-Wget] Issues with Metalink support

2014-04-08 Thread L Walsh



Steven M. Schweda wrote:


   In some cases, on some operating systems (VMS, for example), UnZip
can pre-allocate disk space when extracting an archive member.  It's
not generally done, because the methods used tend to be OS-specific.

---
Do the posix calls, and if the OS is compliant, it works, if not, then no
worse than today.




   I'll let you decide what Wget should be doing, but I'd be careful
about faulty analogies to other programs.

-
   I wouldn't call them faulty analogies.  In the cases I've seen
w/7.zip, it's extracting from a network drive onto the local drive.
While it is true my network drive is faster than hard disks of 8-10 years
ago, it's still 'downloading' from the net, on to the local machine, so..
not sure why you'd call that faulty...   The theme in common is how many
writes from other processes are likely to come in and reserve space in
the middle of your download. 


FWIW -- I just tried 7z now, and extracting a 6G file to C:/tmp --
it *did go direct*. 


Thing is, some of the things I remember have changed over the years.
So it's hard to say with any given version what does what without
retesting.

For this subject, when downloading in parallel -- if the final size
is known, it sounds like pre-allocating the file would be a good thing.

I know transmission (a torrent client) at least makes that an option
(don't remember if it is default or not) so as to not cause fragmentation --
and it's fill pattern might not be extremely different than running
say, several TCP downloads that would fill the file from different locations.





[Bug-wget] Probs downloading secure content on Cygwin/Windows 7/64

2015-08-28 Thread L Walsh


 wget 

"https://get.adobe.com/flashplayer/download/?installer=FP_18_for_Firefox_-_NPAPI&os=Windows%207&browser_type=Gecko&browser_dist=Firefox&p=mss";
--2015-08-28 11:17:19--  
https://get.adobe.com/flashplayer/download/?installer=FP_18_for_Firefox_-_NPAPI&os=Windows%207&browser_type=Gecko&browser_dist=Firefox&p=mss

Resolving webproxy (webproxy)... 192.168.4.1, 192.168.3.1
Connecting to webproxy (webproxy)|192.168.4.1|:8118... connected.
ERROR: The certificate of ‘get.adobe.com’ is not trusted.
ERROR: The certificate of ‘get.adobe.com’ hasn't got a known issuer.
-
I went into my web browser (which doesn't seem to have an issue with the
cert), looked at the security for the page and exported the Security 
Cert chain
to a ".crt" file. 


In windows, I could click on that to install the cert into Window's local
store and it was "imported successfully".

But it seems wget still doesn't know how to use the native
machines cert-store.

Shouldn't it be able to use the native host's cert store automatically,
or is there some extra magic words / switches I should have known to
use?

;-/

Ever since the cert checking was turned on in wget, the only way I've been
able to d/l secure stuff is to tell it to ignore the security, which seems
like it might be counter-productive.

Seems alot like the standard security problem of it making it so difficult
to use, that people simply create an alias to never check security -- which
can't be better than before when I wasn't taught to turn off security (not
that I usually do, but it seems like that's the direction I'm being 
"hurded"...

;-)

help?

version info:
law.Bliss> wget --version
GNU Wget 1.16.1 built on cygwin.

+digest +https +ipv6 +iri +large-file +nls +ntlm +opie -psl +ssl/gnutls

Wgetrc:
  /Users/law.Bliss/.wgetrc (user)
  /etc/wgetrc (system)
Locale:
  /usr/share/locale
Compile:
  gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc"
  -DLOCALEDIR="/usr/share/locale" -I.
  -I/usr/src/wget-1.16.1-1.x86_64/src/wget-1.16.1/src -I../lib
  -I/usr/src/wget-1.16.1-1.x86_64/src/wget-1.16.1/lib
  -I/usr/include/uuid -I/usr/include/p11-kit-1 -DHAVE_LIBGNUTLS
  -DNDEBUG -ggdb -O2 -pipe -Wimplicit-function-declaration
  
-fdebug-prefix-map=/usr/src/wget-1.16.1-1.x86_64/build=/usr/src/debug/wget-1.16.1-1 

  
-fdebug-prefix-map=/usr/src/wget-1.16.1-1.x86_64/src/wget-1.16.1=/usr/src/debug/wget-1.16.1-1 


Link:
  gcc -I/usr/include/uuid -I/usr/include/p11-kit-1 -DHAVE_LIBGNUTLS
  -DNDEBUG -ggdb -O2 -pipe -Wimplicit-function-declaration
  
-fdebug-prefix-map=/usr/src/wget-1.16.1-1.x86_64/build=/usr/src/debug/wget-1.16.1-1 

  
-fdebug-prefix-map=/usr/src/wget-1.16.1-1.x86_64/src/wget-1.16.1=/usr/src/debug/wget-1.16.1-1 


  -liconv -lintl -lpcre -luuid -lnettle -lgnutls -lz -lintl -liconv
  -lp11-kit -lgmp -lhogweed -lgmp -lnettle -ltasn1 -lp11-kit -lz -lz
  -lidn ftp-opie.o gnutls.o http-ntlm.o ../lib/libgnu.a

Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Originally written by Hrvoje Niksic .
Please send bug reports and questions to .





Re: [Bug-wget] Probs downloading secure content on Cygwin/Windows 7/64

2015-08-31 Thread L Walsh



Tim Ruehsen wrote:

Hi,

in addition to Ander Juaristi's posting:

Have you installed the package 'ca-certificates' ?
Check with 'ls -la /etc/ssl/certs/'

In my CygWin environment wget 1.16.3 downloads your example URL.

---
Thanks...

I have 1.6.1:

law.Bliss> wget --version
GNU Wget 1.16.1 built on cygwin.

+digest +https +ipv6 +iri +large-file +nls +ntlm +opie -psl +ssl/gnutls

---
But in /etc/ssl, I have:

tree -fFl

/etc/ssl
└── certs -> /usr/ssl/certs/
   ├── README.RootCerts*
   ├── ca-bundle.crt -> /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem*
   ├── ca-bundle.trust.crt -> 
/etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt*
   ├── demo/
   │   ├── ca-cert.pem*
   │   ├── dsa-ca.pem*
   │   ├── dsa-pca.pem*
   │   └── pca-cert.pem*
   └── expired/
   └── ICE.crl*

3 directories, 8 files
/> tree -fFl /etc/ssl
/etc/ssl
└── /etc/ssl/certs -> /usr/ssl/certs/
   ├── /usr/ssl/certs/README.RootCerts*
   ├── /usr/ssl/certs/ca-bundle.crt -> 
/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem*
   ├── /usr/ssl/certs/ca-bundle.trust.crt -> 
/etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt*
   ├── /usr/ssl/certs/demo/
   │   ├── /usr/ssl/certs/demo/ca-cert.pem*
   │   ├── /usr/ssl/certs/demo/dsa-ca.pem*
   │   ├── /usr/ssl/certs/demo/dsa-pca.pem*
   │   └── /usr/ssl/certs/demo/pca-cert.pem*
   └── /usr/ssl/certs/expired/
   └── /usr/ssl/certs/expired/ICE.crl*
3 directories, 8 files
=
The Readme in rootcert says:
The OpenSSL project does not (any longer) include root CA certificates.
Please check out the FAQ:
 * How can I set up a bundle of commercial root CA certificates?
--- 
but I'm not seeing any FAQ at this point.


hmmm..


 



Re: [Bug-wget] Getting started -- Q: what is wget2 (vs. wget?)

2016-05-30 Thread L Walsh

What's the difference between wget2 and wget?

*kick me in the head*,  ok, but I didn't even know there was a wget2.
*sigh*


Tim Rühsen wrote:
I see the most development has moved to wget2 




Re: [Bug-wget] retrieval failure:Forbidden? for UTF-8-URL in wget that works on FF and IE

2016-06-08 Thread L Walsh



Tim Rühsen wrote:

On Wednesday 08 June 2016 11:47:46 L. A. Walsh wrote:

I tried:

wget "http://translate.google.com/#ja/en/クイーンズブレイド・メインテーマB";

But get a an Error "403: Forbidden" (tried w/ and w/o proxy) -- same.

But cut/paste the same URL into IE11 or
PaleMoon (a 64-bit FF derivative), and it works.

Any idea why or what I might do to get it to work?


Basically, from '#' on (fragment part of URL) nothing is relevant for the HTTP 
request. This is what Firefox 46 sends to localhost:8080 (I started a netcat 
'nc -l -p 8080' to make sure).


Sounds like FF46 is broken.

---
It is in PaleMoon, IE11 and GoogleChrome.  All of them fetch
the URL when pasted into the address box.  




If I do a 'telnet translate.google.com 80' and paste the above (just with 
'Host: translate.google.com' and an empty line at the end):

-
AFAIK, telnet doesn't support anything but ascii.



My guess is, that google does not like User-Agent 'wget', now trying with 
Firefox's User-Agent:
$ wget -d -U "Mozilla/5.0 (X11; Linux x86_64; rv:46.0) Gecko/20100101 
Firefox/46.0" http://translate.google.com


And zack ... that works. Give it a try.

Regards, Tim


Oh frack!   I thought the days of client-censoring were over
when MS stopped doing it.   Now Google is the new MS??  Frack.
That sucks.





Re: [Bug-wget] retrieval failure:Forbidden? for UTF-8-URL in wget that works on FF and IE

2016-06-08 Thread L Walsh



Eli Zaretskii wrote:

Date: Wed, 08 Jun 2016 11:47:46 -0700
From: "L. A. Walsh" 

I tried:

wget "http://translate.google.com/#ja/en/クイーンズブレイド・メインテーマB";

But get a an Error "403: Forbidden" (tried w/ and w/o proxy) -- same.


On what OS and with which version of wget?



linux, wget=1.16
under cygwin, wget-1.16.1, I get:
fetching https://translate.google.com/#ja/en/クイーンズブレイド・メインテーマB
--2016-06-08 13:12:41--  https://translate.google.com/
Resolving translate.google.com (translate.google.com)... 216.58.194.174, 
2607:f8b0:4005:804::200e
Connecting to translate.google.com 
(translate.google.com)|216.58.194.174|:443... connected.
ERROR: The certificate of ‘translate.google.com’ is not trusted.
ERROR: The certificate of ‘translate.google.com’ hasn't got a known issuer.
--2016-06-08 13:12:42--  https://translate.google.com/
Connecting to translate.google.com 
(translate.google.com)|216.58.194.174|:443... connected.
ERROR: The certificate of ‘translate.google.com’ is not trusted.
ERROR: The certificate of ‘translate.google.com’ hasn't got a known issuer.
law.Bliss> wget --version
GNU Wget 1.16.1 built on cygwin.
 
Note -- message on linux also says:

fetching https://translate.google.com/#ja/en/クイーンズブレイド・メインテーマB
--2016-06-08 13:13:57--  https://translate.google.com/
Resolving web-proxy (web-proxy)... 192.168.3.1
Connecting to web-proxy (web-proxy)|192.168.3.1|:8118... connected.
Proxy request sent, awaiting response... 403 Forbidden
2016-06-08 13:13:57 ERROR 403: Forbidden.

--2016-06-08 13:13:57--  https://translate.google.com/
Reusing existing connection to translate.google.com:443.
Proxy request sent, awaiting response... 403 Forbidden
2016-06-08 13:13:57 ERROR 403: Forbidden.

Converted 0 files in 0 seconds.

--- both versions "claim" it is fetching the full URL. -- 


looks like Google is taking up MS's old habits and censoring various clients.
"Do evil"(c)Google 2016.








Re: [Bug-wget] ANN: Wget2 development shifting to GitLab

2017-05-31 Thread L Walsh

Darshit Shah wrote:

Hi,

As many of you are aware, we've been working on Wget2 (aka Wget 2.x) 
for some time now. We have also been using GitHub in general for 
collaborating our efforts on Wget2. However, today I'd like to 
announce that we will instead be moving all our efforts on Wget2 to a 
new home on GitLab.


   Is the permission denied normal?


 git clone g...@gitlab.com:gnuwget/wget2.git

Cloning into 'wget2'...
The authenticity of host 'gitlab.com ... added gitlab.com to known hosts.

Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Is that the right syntax?

I don't see a download option for another format...