[Bug-wget] trouble with self signed certificates --ca-directory=directory

2012-03-29 Thread drayon
Having the most head wrenching time with wget:

Version/compile details running on Mac OS X 10.6.8
==
GNU Wget 1.13.4 built on darwin11.3.0.

+digest +https +ipv6 -iri +large-file -nls +ntlm +opie +ssl/openssl 

Wgetrc: 
/usr/local/etc/wgetrc (system)
Compile: gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/usr/local/etc/wgetrc" 
-DLOCALEDIR="/usr/local/share/locale" -I. -I../lib -I../lib -O2 
-Wall 
Link: gcc -O2 -Wall -liconv -lssl -lcrypto -lz -ldl -lz ftp-opie.o openssl.o 
http-ntlm.o ../lib/libgnu.a 
==

Command issued in terminal:
==
wget https://forums.mvgroup.org/
--2012-03-29 10:20:39--  https://forums.mvgroup.org/
Resolving forums.mvgroup.org... 87.241.99.41
Connecting to forums.mvgroup.org|87.241.99.41|:443... connected.
ERROR: cannot verify forums.mvgroup.org's certificate, issued by 
`/O=MVGroup/CN=forums.mvgroup.org':
  Self-signed certificate encountered.
==

I exported the Certificate "forums.mvgroup.org.pem" to 
/System/Library/OpenSSL/certs/forums.mvgroup.org.pem

If I open the text file the following data is inside
-BEGIN CERTIFICATE-
MIIB1TCCAT4CCQDWXiQIRDVVdDANBgkqhkiG9w0BAQUFADAvMRAwDgYDVQQKEwdN
Vkdyb3VwMRswGQYDVQQDExJmb3J1bXMubXZncm91cC5vcmcwHhcNMTIwMzI4MTgz
NDQzWhcNMTMwMzI4MTgzNDQzWjAvMRAwDgYDVQQKEwdNVkdyb3VwMRswGQYDVQQD
ExJmb3J1bXMubXZncm91cC5vcmcwgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGB
ALizUiY+TJ0JWqhwD8q0q75Fg15tV2W2FHFp9ysMw+4seMSiLI2R/h/aBc2XaAqG
sMb47wKYpqqtJWBkgFuOTZhpaIcr+nPQsLUGAT1hxVz+/fzytZP5XfzXlNq5CKcP
FQ3HwDqx61UXTngWDTBZIe8y/IuAvvvQJOICEL2QXPU5AgMBAAEwDQYJKoZIhvcN
AQEFBQADgYEAn3A7/1weG631+zO5fYiN9pl7tw1p6SC/fBuYr6yA/QmfiwYykv8V
rCqnQvlUNzY8PRIByRBYkezB3QEuaK0hpTCtp/ueBr9z/qVZhmcyXTm78HLDKopc
ft7z6DFRXtOYGhqy73Jax0cERNgA3LqzGE9RXNC31Hl3jVunGJAyfyc=
-END CERTIFICATE-

I then issued the following command: (--certificate=file)

wget --certificate=forums.mvgroup.org.pem 
https://forums.mvgroup.org/index.php?showtopic=2827
--2012-03-29 10:56:08--  https://forums.mvgroup.org/index.php?showtopic=2827
OpenSSL: error:0906D06C:PEM routines:PEM_read_bio:no start line
OpenSSL: error:140B0009:SSL routines:SSL_CTX_use_PrivateKey_file:PEM lib
Disabling SSL due to encountered errors.
===
I assume "--certificate=forums.mvgroup.org.pem" looks for this "file" in the 
current terminal directory? or do we include the full path? ie
wget --certificate=/System/Library/OpenSSL/certs/forums.mvgroup.org.pem
===

Ok so in Terminal I change directory to '/System/Library/OpenSSL/certs'
then issue:
sudo wget --ca-certificate=forums.mvgroup.org.pem 
https://forums.mvgroup.org/index.php?showtopic=2827

Success (note sudo since this is a system directory).

wget manual says "Without this option Wget looks for CA certificates at the 
system-specified locations, chosen at OpenSSL installation time". So why on OS 
X does SSL NOT look in '/System/Library/OpenSSL/certs'? I can't find a config 
file or correct command to set to this directory as the default to look for 
certificates.

Also I use ‘--ca-directory=directory’ as

wget --ca-directory=/System/Library/OpenSSL/certs/ 
https://forums.mvgroup.org/index.php?showtopic=2827

terminal reports
==
Resolving forums.mvgroup.org... 87.241.99.41
Connecting to forums.mvgroup.org|87.241.99.41|:443... connected.
ERROR: cannot verify forums.mvgroup.org's certificate, issued by 
`/O=MVGroup/CN=forums.mvgroup.org':
  Self-signed certificate encountered.
To connect to forums.mvgroup.org insecurely, use `--no-check-certificate'.
==

I think this must be a bug or wrong usage because logically this command tells 
wget to tell openssl to look in '/System/Library/OpenSSL/certs/' for a 
certificate but it keeps failing unless we specifically tell wget the exact 
file based on the current directory else it fails if current directory doesnt 
contain a cert.

Please clarify and perhaps manual should show working examples for options like 
‘--ca-directory=directory’

Regards

Re: [Bug-wget] trouble with self signed certificates --ca-directory=directory

2012-03-29 Thread Ángel González
On 29/03/12 04:45, drayon wrote:
> Having the most head wrenching time with wget:
>
> Version/compile details running on Mac OS X 10.6.8
> ==
> GNU Wget 1.13.4 built on darwin11.3.0.
> (...)
>
> I then issued the following command: (--certificate=file)
> 
> wget --certificate=forums.mvgroup.org.pem 
> https://forums.mvgroup.org/index.php?showtopic=2827
> --2012-03-29 10:56:08--  https://forums.mvgroup.org/index.php?showtopic=2827
> OpenSSL: error:0906D06C:PEM routines:PEM_read_bio:no start line
> OpenSSL: error:140B0009:SSL routines:SSL_CTX_use_PrivateKey_file:PEM lib
> Disabling SSL due to encountered errors.
> ===
> I assume "--certificate=forums.mvgroup.org.pem" looks for this "file" in the 
> current terminal directory? or do we include the full path? ie
> wget --certificate=/System/Library/OpenSSL/certs/forums.mvgroup.org.pem
> ===
It looks for it in the current folder. You can also call it from a
different folder specifying the full path.
But note that it is reading it here, the error is "PEM
routines:PEM_read_bio:no start line", otherwise it
would be "system library:fopen:No such file or directory"



> Ok so in Terminal I change directory to '/System/Library/OpenSSL/certs'
> then issue:
> sudo wget --ca-certificate=forums.mvgroup.org.pem 
> https://forums.mvgroup.org/index.php?showtopic=2827
>
> Success (note sudo since this is a system directory).
You shouldn't need sudo here, just for running it on this folder (it
wouldn't be able to save it there, but you could use for instance -O
/tmp/forum ).

It's strange it worked for you, as I wasn't able to get it work using
just --ca-certificate

> wget manual says "Without this option Wget looks for CA certificates at the 
> system-specified locations, chosen at OpenSSL installation time". So why on 
> OS X does SSL NOT look in '/System/Library/OpenSSL/certs'? I can't find a 
> config file or correct command to set to this directory as the default to 
> look for certificates.
>
> Also I use ‘--ca-directory=directory’ as
>
> wget --ca-directory=/System/Library/OpenSSL/certs/ 
> https://forums.mvgroup.org/index.php?showtopic=2827
>
> terminal reports
> ==
> Resolving forums.mvgroup.org... 87.241.99.41
> Connecting to forums.mvgroup.org|87.241.99.41|:443... connected.
> ERROR: cannot verify forums.mvgroup.org's certificate, issued by 
> `/O=MVGroup/CN=forums.mvgroup.org':
>   Self-signed certificate encountered.
> To connect to forums.mvgroup.org insecurely, use `--no-check-certificate'.
> ==
>
> I think this must be a bug or wrong usage because logically this command 
> tells wget to tell openssl to look in '/System/Library/OpenSSL/certs/' for a 
> certificate but it keeps failing unless we specifically tell wget the exact 
> file based on the current directory else it fails if current directory doesnt 
> contain a cert.
Note that the wget manual also says "the file name is based on a hash
value derived from the certificate.  This is achieved by
processing a certificate directory with the `c_rehash' utility supplied
with OpenSSL.".
In this case, running c_rehash , creates a symlink from
3cc93452.0 to forums.mvgroup.org.pem

Using wget with ca-directory does work for me if there is such link, but
fail otherwise.
I suppose wget is also trying to open it at
/System/Library/OpenSSL/certs/3cc93452.0, so if you make such symlink
there it should also work.




Re: [Bug-wget] Bug on latest wget (1.3.14)

2012-03-29 Thread Tim Ruehsen
Just some more infos:
It is reproducible with the latest trunk version.

The problem seems to be empty queries like in main.css (original):
src: url('/TLBB/fbinir/mult/stagsans-book-webfont.eot');^M
src: url('/TLBB/fbinir/mult/stagsans-book-webfont.eot?#iefix') 
format('embedded-opentype'),^M

BTW, empty queries are absolutely legal (rfc 2396: query = *uric).

The downloader downloads
/TLBB/fbinir/mult/stagsans-book-webfont.eot
and
/TLBB/fbinir/mult/stagsans-book-webfont.eot?
and saves them both into the same file
Saving to: 'accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-book-
webfont.eot'

I assume the hashmaps 'dl_url_file_map' and 'dl_file_url_map' are now out of 
sync.
Now the scanning can't find a local file for the first download and thus does 
not translate it to local name but to a complete name.

Here is some debug output where you can see it (look for 'complete', which 
should be local):

Scanning accionistaseinversores.bbva.com/TLBB/fbinir/css/main.css?v=1 (from 
http://accionistaseinversores.bbva.com/TLBB/fbinir/css/main.css?v=1)
Loaded accionistaseinversores.bbva.com/TLBB/fbinir/css/main.css?v=1 (size 
99449).
accionistaseinversores.bbva.com/TLBB/fbinir/css/main.css?v=1: 
merge('http://accionistaseinversores.bbva.com/TLBB/fbinir/css/main.css?v=1', 
'/TLBB/fbinir/mult/stagsans-book-webfont.eot') -> 
http://accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-book-
webfont.eot
appending 'http://accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-
book-webfont.eot' to urlpos.
Found URI: [url('/TLBB/fbinir/mult/stagsans-book-webfont.eot')] at 2404 
[/TLBB/fbinir/mult/stagsans-book-webfont.eot]
accionistaseinversores.bbva.com/TLBB/fbinir/css/main.css?v=1: 
merge('http://accionistaseinversores.bbva.com/TLBB/fbinir/css/main.css?v=1', 
'/TLBB/fbinir/mult/stagsans-book-webfont.eot?#iefix') -> 
http://accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-book-
webfont.eot?#iefix
appending 'http://accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-
book-webfont.eot?' to urlpos.
Found URI: [url('/TLBB/fbinir/mult/stagsans-book-webfont.eot?#iefix')] at 2462 
[/TLBB/fbinir/mult/stagsans-book-webfont.eot?#iefix]

will convert url 
http://accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-book-
webfont.eot to complete
URI encoding = 'ANSI_X3.4-1968'
will convert url 
http://accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-book-
webfont.eot? to local 
accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-book-webfont.eot
URI encoding = 'ANSI_X3.4-1968'

Tim Ruehsen

Am Thursday 29 March 2012 schrieb Alejandro Supu:
> Hi,
> 
> I have found a bug on the latest version of the http client, wget 1.3.14
> 
> This is how to reproduce it:
> 
> If we save the page:
> http://accionistaseinversores.bbva.com/TLBB/tlbb/bbvair/esp/index.jsp with
> the following parameters: wget -k -p
> http://accionistaseinversores.bbva.com/TLBB/tlbb/bbvair/esp/index.jsp
> 
> On the saved "main.css" file
> (\accionistaseinversores.bbva.com\TLBB\fbinir\css), there are files that
> point to the remote files instead of the saved ones! For example, on line
> 57, 68 and 79, it points to
> http://accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-light-web
> font.eot instead of ../mult/stagsans-book-webfont.eot and this file was
> saved to local... There are other files with the same behaviour.
> 
> If you search the string "http" within the CSS file, you will find all the
> pointed files to remote instead of the local SAVED ones.
> 
> Please, tell me anything related to this bug or when it will be corrected.
> 
> THANKS!



Re: [Bug-wget] Bug on latest wget (1.3.14)

2012-03-29 Thread Tim Rühsen
In url.c / url_file_name() an empty query is not used for the filename:

  /* Append "?query" to the file name. */
  u_query = u->query && *u->query ? u->query : NULL;

Should it be patched here ?

Mit freundlichen Grüßen

Tim Rühsen

Am Thursday 29 March 2012 schrieb Tim Ruehsen:
> Just some more infos:
> It is reproducible with the latest trunk version.
> 
> The problem seems to be empty queries like in main.css (original):
> src: url('/TLBB/fbinir/mult/stagsans-book-webfont.eot');^M
> src: url('/TLBB/fbinir/mult/stagsans-book-webfont.eot?#iefix')
> format('embedded-opentype'),^M
> 
> BTW, empty queries are absolutely legal (rfc 2396: query = *uric).
> 
> The downloader downloads
>   /TLBB/fbinir/mult/stagsans-book-webfont.eot
> and
>   /TLBB/fbinir/mult/stagsans-book-webfont.eot?
> and saves them both into the same file
> Saving to: 'accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-book-
> webfont.eot'
> 
> I assume the hashmaps 'dl_url_file_map' and 'dl_file_url_map' are now out
> of sync.
> Now the scanning can't find a local file for the first download and thus
> does not translate it to local name but to a complete name.
> 
> Here is some debug output where you can see it (look for 'complete', which
> should be local):
> 
> Scanning accionistaseinversores.bbva.com/TLBB/fbinir/css/main.css?v=1 (from
> http://accionistaseinversores.bbva.com/TLBB/fbinir/css/main.css?v=1)
> Loaded accionistaseinversores.bbva.com/TLBB/fbinir/css/main.css?v=1 (size
> 99449).
> accionistaseinversores.bbva.com/TLBB/fbinir/css/main.css?v=1:
> merge('http://accionistaseinversores.bbva.com/TLBB/fbinir/css/main.css?v=1'
> , '/TLBB/fbinir/mult/stagsans-book-webfont.eot') ->
> http://accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-book-
> webfont.eot
> appending
> 'http://accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-
> book-webfont.eot' to urlpos.
> Found URI: [url('/TLBB/fbinir/mult/stagsans-book-webfont.eot')] at 2404
> [/TLBB/fbinir/mult/stagsans-book-webfont.eot]
> accionistaseinversores.bbva.com/TLBB/fbinir/css/main.css?v=1:
> merge('http://accionistaseinversores.bbva.com/TLBB/fbinir/css/main.css?v=1'
> , '/TLBB/fbinir/mult/stagsans-book-webfont.eot?#iefix') ->
> http://accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-book-
> webfont.eot?#iefix
> appending
> 'http://accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-
> book-webfont.eot?' to urlpos.
> Found URI: [url('/TLBB/fbinir/mult/stagsans-book-webfont.eot?#iefix')] at
> 2462 [/TLBB/fbinir/mult/stagsans-book-webfont.eot?#iefix]
> 
> will convert url
> http://accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-book-
> webfont.eot to complete
> URI encoding = 'ANSI_X3.4-1968'
> will convert url
> http://accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-book-
> webfont.eot? to local
> accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-book-webfont.eot
> URI encoding = 'ANSI_X3.4-1968'
> 
> Tim Ruehsen
> 
> Am Thursday 29 March 2012 schrieb Alejandro Supu:
> > Hi,
> > 
> > I have found a bug on the latest version of the http client, wget 1.3.14
> > 
> > This is how to reproduce it:
> > 
> > If we save the page:
> > http://accionistaseinversores.bbva.com/TLBB/tlbb/bbvair/esp/index.jsp
> > with the following parameters: wget -k -p
> > http://accionistaseinversores.bbva.com/TLBB/tlbb/bbvair/esp/index.jsp
> > 
> > On the saved "main.css" file
> > (\accionistaseinversores.bbva.com\TLBB\fbinir\css), there are files that
> > point to the remote files instead of the saved ones! For example, on line
> > 57, 68 and 79, it points to
> > http://accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-light-we
> > b font.eot instead of ../mult/stagsans-book-webfont.eot and this file was
> > saved to local... There are other files with the same behaviour.
> > 
> > If you search the string "http" within the CSS file, you will find all
> > the pointed files to remote instead of the local SAVED ones.
> > 
> > Please, tell me anything related to this bug or when it will be
> > corrected.
> > 
> > THANKS!
-- 



Re: [Bug-wget] patch to activate itimer support

2012-03-29 Thread Giuseppe Scrivano
Tim Ruehsen  writes:

>
> === modified file 'src/utils.c'
> --- src/utils.c   2012-03-25 15:49:55 +
> +++ src/utils.c   2012-03-28 11:10:57 +
> @@ -59,6 +59,11 @@
>  # endif
>  #endif
>  
> +/* Needed for itimer support in alarm_set() and alarm_cancel() */
> +# ifdef HAVE_SYS_TIME_H
> +#  include 
> +# endif
> +

Thanks!  I have done a small change and committed it.

Giuseppe



Re: [Bug-wget] timeout question (regarding the code)

2012-03-29 Thread Giuseppe Scrivano
Tim Ruehsen  writes:

> Hi,
>
> the wget man page says a timeout value of 0 means 'forever'.
> Even if seldom used, 0 seems to be a legal value.

it can't be a legal value.  It means the value you are waiting for is
immediately available.  That is not possible when you are waiting for
something coming from the network.

Cheers,
Giuseppe



Re: [Bug-wget] timeout question (regarding the code)

2012-03-29 Thread Micah Cowan
On 03/29/2012 11:23 AM, Giuseppe Scrivano wrote:
> Tim Ruehsen  writes:
> 
>> Hi,
>>
>> the wget man page says a timeout value of 0 means 'forever'.
>> Even if seldom used, 0 seems to be a legal value.
> 
> it can't be a legal value.  It means the value you are waiting for is
> immediately available.  That is not possible when you are waiting for
> something coming from the network.

His point would seem to be that this meaning differs from the one
assigned to it in the manpage, and as originally intended.

I believe it was meant to be analogous to -l 0, which is equivalent to
-l inf.

-mjc




Re: [Bug-wget] timeout question (regarding the code)

2012-03-29 Thread Giuseppe Scrivano
Micah Cowan  writes:

> On 03/29/2012 11:23 AM, Giuseppe Scrivano wrote:
>> Tim Ruehsen  writes:
>> 
>>> Hi,
>>>
>>> the wget man page says a timeout value of 0 means 'forever'.
>>> Even if seldom used, 0 seems to be a legal value.
>> 
>> it can't be a legal value.  It means the value you are waiting for is
>> immediately available.  That is not possible when you are waiting for
>> something coming from the network.
>
> His point would seem to be that this meaning differs from the one
> assigned to it in the manpage, and as originally intended.
>
> I believe it was meant to be analogous to -l 0, which is equivalent to
> -l inf.

sorry that I wasn't clear.  The documentation says "Setting a timeout to
0 disables it altogether".  That is the correct behaviour, if the code
doesn't do it, then it is a bug... and patches are welcome :-)

Giuseppe



Re: [Bug-wget] patch to fix some types of warnings

2012-03-29 Thread Giuseppe Scrivano
Hello Tim,

Tim Ruehsen  writes:

> function declaration isn't a prototype [-Wstrict-prototypes]
> no previous prototype for 'convert_links_in_hashtable' [-Wmissing-prototypes]
> suggest braces around empty body in an 'else' statement [-Wempty-body]
>
> please apply it to the repository.

please provide a ChangeLog entry for these entries.  Look at other
entries in the ChangeLog file to see how it should be done.

>else
> +{
>  /* Error in expiration spec.  Assume default (cookie doesn't
> expire, but valid only for this session.)  */
> -;
> +}
>  }
>else if (TOKEN_IS (name, "max-age"))
>  {
> @@ -434,8 +435,9 @@
>cookie->secure = 1;
>  }
>else
> +{
>  /* Ignore unrecognized attribute. */
> -;
> +}

I would rather move these comments near the if and explain what happens
in the particular case.  An empty branch is quite ugly.


> +# ifndef HAVE_LIBUUID
>  /* Fills uuid_str with a UUID based on random numbers.
> (See RFC 4122, UUID version 4.)
>  
> @@ -612,6 +613,7 @@
>  uuid_data[10], uuid_data[11], uuid_data[12], uuid_data[13], 
> uuid_data[14],
>  uuid_data[15]);
>  }
> +#endif

Please provide it as a separate patch.

Thanks for your work!

Giuseppe



Re: [Bug-wget] Batch retrieval does not recover from extended pause

2012-03-29 Thread Ángel González
Hello Pekka,
Thanks for your report.

gethttp() isn't the easiest function to follow, with its 1243 lines, but
I think
everything important is happening at the bottom. The file is created in
line 2855,
but not stored into output_stream (it's only used by -O). Then it's
failing at
read_response_body() and retries to open it in exclusive mode, instead of
continuing writing to it. I think the difference is made in line 2817,
where you
usually fopen (hs->local_file, "ab"); on retries, but as in this case it
timeouted
without reading anything, it goes through the fopen_excl() path.

I think this would fix it, although it's not clear at all that in such
case the file
was just created by us with 0-size. It probably deserves a comment.

=== modified file 'src/http.c'
--- src/http.c2012-02-25 10:58:21 +
+++ src/http.c2012-03-29 20:58:42 +
@@ -2827,7 +2827,7 @@
 }
   else if (ALLOW_CLOBBER || count > 0)
 {
-  if (opt.unlink && file_exists_p (hs->local_file))
+  if ((opt.unlink || count > 0) && file_exists_p (hs->local_file))
 {
   int res = unlink (hs->local_file);
   if (res < 0)