Re: [Bug-wget] lex compile problem on AIX 7.1

2017-10-03 Thread Tim Rühsen
On Dienstag, 3. Oktober 2017 18:31:45 CEST Avinash Sonawane wrote:
> On Mon, Oct 2, 2017 at 8:56 PM, Tim Rühsen <tim.rueh...@gmx.de> wrote:
> > On 10/02/2017 10:00 AM, l...@langs.se wrote:
> >> Hi!
> >> 
> >> I get the following error when compiling wget 1.19.1 on AIX 7.1:
> >> 
> >> make all-am
> >> CC connect.o
> >> CC convert.o
> >> CC cookies.o
> >> CC ftp.o
> >> lex -ocss.c
> >> 0: Warning: 1285-300 The o flag is not valid.
> >> 0: Warning: 1285-300 The s flag is not valid.
> >> 0: Warning: 1285-300 The s flag is not valid.
> >> 0: Warning: 1285-300 The . flag is not valid.
> >> 
> >> Seems the LEX arguments are not valid?
> >> 
> >> Any suggestions?
> > 
> > Hi,
> > 
> > some 'lex' versions *must have* a space after -o, some others *must not
> > have* a space there. We decided not to use a space since this covers
> > most build environments.
> 
> I am concerned about the `-o` option itself. As per the POSIX.1-2008
> [0], lex only supports -t, -n and -v. Though agreed flex is the de
> facto implementation of lex which supports `-o`, I think we should
> stick to the POSIX standard and shouldn't use `-o`.
> 
> I see 2 possible approaches:
> 1. use lex.yy.c, default file generated or
> 2. use `-t >` to have the desired custom filename.

We had that several times in the past - and no solution fits in all situations. 
I don't remember the details right now just the "don't touch it conclusion".

Search the mailing list archives... but don't waste your time.

Regards, Tim

signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] Signature verification support in wget?

2017-08-30 Thread Tim Rühsen
Hi Ludo,

thanks for heads up :-)

Darshit just opened an issue at https://gitlab.com/gnuwget/wget2/issues/266.


If you don't mind, I would add your suggestions there.


With Best Regards, Tim



On 08/30/2017 02:52 PM, Ludovic Courtès wrote:
> Hello!
> 
> Following the GNU Hackers Meeting there was a discussion about the
> ability to add signature verification support directly in wget, which
> I’ll try to summarize here to get the ball rolling.
> 
> Darshit was suggesting having this:
> 
>   wget --verify-signature \
> https://ftp.gnu.org/gnu/recutils/recutils-1.7.tar.gz
> 
> whereby wget would automatically download recutils-1.7.tar.gz.sig and
> run gpgv or similar.  Having something along these lines would be great
> because it could help make things “secure by default”, as the marketing
> folks would say.  :-)
> 
> The devil is in the detail though, and I was wondering whether having
> that feature within wget might raise another set of issues, and
> whether/how these could be solved.  Here are some examples:
> 
>   • Is the file named .sig, .sign, or .asc?
> 
>   • Is it the compressed tarball that’s signed or the uncompressed one
> (as on kernel.org)?
> 
>   • For GNU specifically, should we somehow honor the keyring that’s
> published on ftp.gnu.org?
> 
>   • What should wget do when a file is signed by an unknown OpenPGP key?
> Should it offer to import it in the user’s keyring?  Or abort?
> 
>   • How would --verify-signature report errors in a way that is
> intelligible to the user?
> 
> We dealt with some of these in the “guix import”¹ and “guix refresh”²
> tools.  For example, the kernel.org and GNU updaters and importers work
> slightly differently due to the different conventions being used.  These
> commands also have a --key-download option to specify how unknown
> OpenPGP keys should be handled.
> 
> It might be that the answer is that this feature is too “high level” for
> wget after all, or that it should be made available in the form of wget2
> plugins specifically tailored to one web site’s infrastructure
> (kernel.org, gnu.org), or that we’d have to live with wget supporting
> only one specific convention.
> 
> Thoughts?
> 
> Ludo’.
> 
> ¹ https://www.gnu.org/software/guix/manual/html_node/Invoking-guix-import.html
> ² 
> https://www.gnu.org/software/guix/manual/html_node/Invoking-guix-refresh.html
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Signature verification support in wget?

2017-08-30 Thread Tim Rühsen
On 08/30/2017 05:14 PM, Darshit Shah wrote:
> * Unknown PGP keys are always an interesting problem due to the
> various differences in how people would like to deal with it. By
> default, I would suggest that Wget reports a warning about a missing
> key and continues. There would be yet another switch
> "--import-missing-keys" which would cause Wget2 to attempt to
> contact keyservers and import the key.

Your emailer formats very interesting ;-)

Let's extend the warning by a clear command example how to retrieve the
missing key. Also there are inflictions when the proper fingerprint
isn't known.

Is there a way to download a key temporary and throw it away if the
signature doesn't match ? Because maybe the content+signature is correct
but the downloaded key is wrong...

With Best Regards, Tim




signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Compiling against libwget built with ubsan

2017-08-29 Thread Tim Rühsen
On 08/29/2017 05:03 PM, Avinash Sonawane wrote:
> Hello!
> 
> $ cat foo.c
> #include 
> #include 
> 
> int main(int argc, char **argv)
> {
> char foo[] = "FOO";
> printf("%s\n", wget_strtolower(foo));
> 
> return 0;
> }
> 
> $ gcc -lwget foo.c -o foo
> //usr/local/lib/libwget.so: undefined reference to
> `__ubsan_handle_divrem_overflow_abort'
> //usr/local/lib/libwget.so: undefined reference to
> `__ubsan_handle_add_overflow_abort'
> //usr/local/lib/libwget.so: undefined reference to
> `__ubsan_handle_negate_overflow_abort'
> //usr/local/lib/libwget.so: undefined reference to
> `__ubsan_handle_out_of_bounds_abort'
> //usr/local/lib/libwget.so: undefined reference to
> `__ubsan_handle_mul_overflow_abort'
> //usr/local/lib/libwget.so: undefined reference to
> `__ubsan_handle_vla_bound_not_positive_abort'
> //usr/local/lib/libwget.so: undefined reference to
> `__ubsan_handle_load_invalid_value_abort'
> //usr/local/lib/libwget.so: undefined reference to
> `__ubsan_handle_float_cast_overflow_abort'
> //usr/local/lib/libwget.so: undefined reference to
> `__ubsan_handle_type_mismatch_abort'
> //usr/local/lib/libwget.so: undefined reference to
> `__ubsan_handle_sub_overflow_abort'
> //usr/local/lib/libwget.so: undefined reference to
> `__ubsan_handle_nonnull_arg_abort'
> //usr/local/lib/libwget.so: undefined reference to
> `__ubsan_handle_shift_out_of_bounds_abort'
> collect2: error: ld returned 1 exit status
> 
> If I compile libwget without --fsanitize-ubsan then the object files
> get linked as expected. But how to use libwget when it's compiled with
> ubsan?
> 
> PS - I know this is not a libwget question per-se. But a library
> linker issue. Nevertheless, any pointers will be highly appreciated!

If you compile wget2/libwget with -fsanitize=undefined, then you have to
use this flag later as well, so that the right libubsan gets linked.

gcc -fsanitize=undefined -lwget foo.c -o foo


Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Problems building wget2

2017-09-27 Thread Tim Rühsen
Hi Josef,

please see README.md on how to build wget2.

If you installed from git repo (https://gitlab.com/gnuwget/wget2.git) then

./bootstrap

./configure

make

make check


With Best Regards, Tim



On 09/27/2017 02:41 PM, Josef Moellers wrote:
> I was trying to see if a bug I fixed in wget also happens to be in
> wget2, but I have problems building wget2!
> 
> When I call "autoconf" to build the configure script, I get these error
> messages:
> 
> configure.ac:12: error: possibly undefined macro: AM_INIT_AUTOMAKE
> configure.ac:104: error: possibly undefined macro: AM_PROG_CC_C_O
> configure.ac:267: error: possibly undefined macro: AS_IF
> configure.ac:291: error: possibly undefined macro: AC_DEFINE
> configure.ac:300: error: possibly undefined macro: AM_CONDITIONAL
> configure.ac:398: error: possibly undefined macro: AC_SEARCH_LIBS
> configure.ac:400: error: possibly undefined macro: AC_MSG_WARN
> 
> I have tried on a SLES12-SP3 and a Kubuntu 16.04
> 
> Josef
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] cipher_list string when using OpenSSL

2017-10-19 Thread Tim Rühsen
Hi Jeffrey,

thanks for heads up !

Does OpenSSL meanwhile have a PFS for their cipher list ?

Currently it looks like that each and every client has to amend their
cipher list from time to time. Instead, this should be done in the
library. So that new versions automatically make the client code more
secure. GnuTLS does it.


That's one reason why we (wget developers) already discussed about
dropping OpenSSL support completely. The background is that the OpenSSL
code in Wget has no maintainer. We take (small) patches every now and
then but there is no expert here for review or active progress.


Having your random seeding issue in mind, there seems to be even more
reasons to drop that OpenSSL code.

If there is someone here who wants to maintain the OpenSSL code of Wget
- you are very welcome (Let us know) ! In the meantime I'll ask the
other maintainers about their opinion.


With Best Regards, Tim



On 10/19/2017 12:57 AM, Jeffrey Walton wrote:
> Hi Everyone,
> 
> I believe this has some room for improvement (from src/openssl.c):
> 
> "HIGH:MEDIUM:!RC4:!SRP:!PSK:!RSA:!aNULL@STRENGTH"
> 
> I think it would be a good idea to provide a `--cipher_list` option to
> allow the user to specify it. It might also be prudent to allow the
> string to be specified in `.wgetrc`.
> 
> Regarding the default string, its 2017, and this is probably closer to
> what should be used by default:
> 
> "HIGH:!aNULL:!RC4:!MD5:!SRP:!PSK:!kRSA"
> 
> The "!kRSA" means RSA cannot be used for key exchange (i.e., RSA key
> transport), but can be used for digital signatures. MD5 is probably
> another algorithm that should be sunsetted at this point in time
> (though I am not aware of a HMAC/MD5 attack that can be carried out in
> TCP's 2MSL re-transmit time frame).
> 
> I use the same cipher_list on the servers under my control. I've never
> received a complaint from them. They cipher_list also helps get one of
> those A+ reports from the various SSL scanners.
> 
> Jeff
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded

2017-11-12 Thread Tim Rühsen
On Donnerstag, 2. November 2017 21:09:46 CET YX Hao wrote:
> Dear Tim,
> 
> 
> 
> The 2nd patch is attached. Please take a review :)

Hmmm, is this a stand-alone patch and working without your patch #1 (Fix 
printing...) ?

Please give at least one example (better more) to show what your patch fixes.
Even better: write a small python test in testenv which fails without your 
patch and succeeds with your patch. That would protect against regressions.

As I understand, the second patch is still in discussion with Eli. Since I do 
not have Windows, I can't help you here. Though what I saw from the 
discussion, you address a portability issue that likely should be solved 
within gnulib. Maybe you could (in parallel) send a mail to bug-gnu...@gnu.org 
with a link to your discussion with Eli. There might be some people with 
deeper knowledge.

With Best Regards, Tim


signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] New wget (1.19.2): Unexpected download behaviour for gzip-compressed tarballs (HTTP-header dependent)

2017-11-03 Thread Tim Rühsen
On 11/03/2017 09:30 AM, Daniel Stenberg wrote:
> On Thu, 2 Nov 2017, Tim Rühsen wrote:
> 
>> How would you (or curl) handle
>>  Content-Type: application/x-tar
>>  Content-Encoding: gzip
> 
>> when downloading 'x.tar.gz' or 'x.tgz' ? Save the file compressed or
>> uncompressed ? And what if the file is (correctly) named 'x.tar' ?
> 
> Fortunately for me, curl doesn't make such decisions for the user so the
> question becomes moot - but it also means that it doesn't provide any
> guidance or help for the wget case. curl decompresses content-encoding
> if asked and it saves output in the file name the user asks for.

Delegating to the user is quite elegant :-)

>> I downloaded/tested thousands of web pages and they behave as if
>> 'Content- Encoding: gzip' is a compression for the transport.
>> Uncompressing it 'on-the- fly' and saving that uncompressed data was
>> the correct behavior.
> 
> Sure, because that's how HTTP clients and browsers have done for a long
> time now even if Content-Encoding: wasn't originally intended for it.
> The language in the spec still explains how it is not a transfer
> compression even if we can often pretend that it works that way.

Thanks for sharing that knowledge. That puts Jens' input into a
different light. We'll add a patch to work with both kinds of server
behavior.


With Best Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] New wget (1.19.2): Unexpected download behaviour for gzip-compressed tarballs (HTTP-header dependent)

2017-11-03 Thread Tim Rühsen
On 11/03/2017 06:37 AM, James Cloos wrote:
>>>>>> "TR" == Tim Rühsen <tim.rueh...@gmx.de> writes:
> 
> TR> I downloaded/tested thousands of web pages and they behave as if 'Content-
> TR> Encoding: gzip' is a compression for the transport. Uncompressing it 
> 'on-the-
> TR> fly' and saving that uncompressed data was the correct behavior.
> 
> Lots of servers have that misconfiguration; it was recommended in the
> past and apache defaulted to doing that when grabbing things like tar.gz.
> 
> The gui browsers had to learn to work around that misconfig.  wget also
> has to.
> 
> In short, do not uncompress if the destination name has a compression
> suffix.
> 
> Or, in that case, test whether the uncompressed data starts with gzip
> magic and complete one decompression if so, non if not so.
> 
> And the same for the other compression formats.

Thanks for this insight !

Looking at the Mozilla/Gecko sources shows that gzip Content-Encoding is
just cleared for Content-Types application/x-gzip, application/gzip and
application/x-gunzip. That makes it straight forward to go that way.

With Best Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] New wget (1.19.2): Unexpected download behaviour for gzip-compressed tarballs (HTTP-header dependent)

2017-11-02 Thread Tim Rühsen
On Mittwoch, 1. November 2017 22:21:38 CET Daniel Stenberg wrote:
> On Wed, 1 Nov 2017, Tim Rühsen wrote:
> > Content-Encoding: gzip means that the data has been compressed for
> > transportation purposes only.
> 
> That's actually not what it means. There's transfer-encoding for that
> purpose, but that's not generally supported by clients.

I didn't want to over-complicate things. What I indeed didn't remember was 
that Transfer-Encoding allows 'gzip' (even in combination with chunked):
https://tools.ietf.org/html/rfc7230#section-3.3.1

> RFC7231 section 3.1.2.1 [*] says this:
> 
> Content coding values indicate an encoding transformation that has
> been or can be applied to a representation.
> 
> [*] = https://tools.ietf.org/html/rfc7231#section-3.1.2.1

"has been or can be" are to different things which also include "is/was not".
How would you (or curl) handle
  Content-Type: application/x-tar
  Content-Encoding: gzip
when downloading 'x.tar.gz' or 'x.tgz' ? Save the file compressed or 
uncompressed ? And what if the file is (correctly) named 'x.tar' ?

I downloaded/tested thousands of web pages and they behave as if 'Content-
Encoding: gzip' is a compression for the transport. Uncompressing it 'on-the-
fly' and saving that uncompressed data was the correct behavior.

Regards, Tim


signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] New wget (1.19.2): Unexpected download behaviour for gzip-compressed tarballs (HTTP-header dependent)

2017-11-01 Thread Tim Rühsen
Hi Jens,

On Mittwoch, 1. November 2017 17:27:58 CET Jens Schleusener wrote:
> Hi,
> 
> the new "wget" release 1.19.2 has got a new feature:
> 
>   "gzip Content-Encoding decompression"
> 
> But that feature - at least for my self-conmpiled binary - leads to a
> problem if one downloads gzip-compressed tarballs from sites that send for
> e.g. an HTTP response header containing lines like
> 
>   Content-Type: application/x-tar
>   Content-Encoding: gzip

You describe clearly a broken server behavior. 

> 
> In that cases wget saves a downloaded gzip-compressed tarball now
> decompressed (!) what probably breaks a lot of scripts.

Not sure why anyone relies on broken behavior. What if the broken server 
configuration becomes fixed ? Then your script breaks as well.

> Additionally the
> tarball is saved nevertheless under a filename with the "tar.gz" extension
> and not with the "tar" extension.

At least on *nix, the file extension says nothing about the content. That is 
why we have the mime-type stated in Content-Type. 'x-tar' clearly is a non-
compressed tar file. Content-Encoding: gzip means that the data has been 
compressed for transportation purposes only.

Anyways, whatever we do - it will be broken on some servers and on others not.

> Solutions/workarounds may be on affected servers the delivering of an
> alternative HTTP header like
> 
>   Content-Type: application/x-gzip
>   (or Content-Type: application/octet-stream)
> 
> or on the client side the use of the new "wget" option
> 
>   --compression=none
> 
> But maybe it would be better if for such cases wget would revert its
> default behaviour to the old one. Or is the described behaviour the
> expected one?

Correct server behavior here would be:
Content-Type: application/gzip
together with Content-Encoding: identity, which also may be omitted since it's 
the default.

A good explanation is here:
https://superuser.com/questions/901962/what-is-the-correct-mime-type-for-a-tar-gz-file


We can discuss a proposal for a work-around that handles both cases, like
if Content-Encoding == gzip and filename ends with .gz then don't uncompress.

Caveat: this may break our --xattr feature, which saves the mime type with the 
file. And then we have to adjust the mime type as well - and that could be 
really tedious.

Regards, Tim


signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] Support for HTTP 308 (Permanent Redirect)

2017-11-03 Thread Tim Rühsen
On Sonntag, 29. Oktober 2017 00:35:54 CET Vasya Pupkin wrote:
> Hello,
> 
> Some websites started using HTTP 308 redirects
> (https://tools.ietf.org/html/rfc7538) and Wget fails to follow such
> redirects currently. I'm not sure if this can be considered a bug, though.
> Anyway, would be nice if Wget supported it, 308 is the same as 301, but do
> not allow the HTTP method to change.

Thanks for the hint, an appropriate change is pushed to master now.

Regards, Tim

signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] Failing to compile without depricated features from openssl

2017-12-04 Thread Tim Rühsen
On 12/04/2017 01:13 AM, Matthew Thode wrote:
> Yep, confirmed that this fixed a possible issue, also tested it with
> openssl-1.1.
> 
> We are currently using the attached patch.

Thanks, I already pushed your patch (with my small changes) to master
yesterday. If there is any issue with it, let me know.

With Best Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] wget 1.19.2 fails on Test-metalink-http.py and Test-metalink-http-quoted.py

2017-10-29 Thread Tim Rühsen
On Sonntag, 29. Oktober 2017 21:00:35 CET Arkadiusz Miśkiewicz wrote:
> On Sunday 29 of October 2017, Tim Rühsen wrote:
> > On Sonntag, 29. Oktober 2017 13:45:53 CET Arkadiusz Miśkiewicz wrote:
> > > Hi.
> > > 
> > > Test suite for wget fails here on Test-metalink-http.py and
> > > Test-metalink- http-quoted.py
> > > 
> > > test-suite.log attached.
> > 
> > Could you please also send us the file 'config.log' ? That shows your
> > configuration - I would like to reproduce that issue.
> 
> Attached.

I meant config.log, you attached config.h.

Regards, Tim

signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] wget 1.19.2 fails on Test-metalink-http.py and Test-metalink-http-quoted.py

2017-10-30 Thread Tim Rühsen
On 10/30/2017 12:49 PM, Arkadiusz Miśkiewicz wrote:
> On Monday 30 of October 2017, Tim Rühsen wrote:
>> On 10/29/2017 09:39 PM, Arkadiusz Miśkiewicz wrote:
>>> On Sunday 29 of October 2017, Tim Rühsen wrote:
>>>> On Sonntag, 29. Oktober 2017 21:00:35 CET Arkadiusz Miśkiewicz wrote:
>>>>> On Sunday 29 of October 2017, Tim Rühsen wrote:
>>>>>> On Sonntag, 29. Oktober 2017 13:45:53 CET Arkadiusz Miśkiewicz wrote:
>>>>>>> Hi.
>>>>>>>
>>>>>>> Test suite for wget fails here on Test-metalink-http.py and
>>>>>>> Test-metalink- http-quoted.py
>>>>>>>
>>>>>>> test-suite.log attached.
>>>>>>
>>>>>> Could you please also send us the file 'config.log' ? That shows your
>>>>>> configuration - I would like to reproduce that issue.
>>>>>
>>>>> Attached.
>>>>
>>>> I meant config.log, you attached config.h.
>>
>> Looks pretty identical to mine.
>>
>> That made me looking a bit deeper into the original .log file, where you
>> can see that python file name structure dump. Well, this is that
>> additional file found in the temp. test directory. The test(s) expect
>> certain file and their correct contents after running... with those two
>> tests we see an additional (unexpected) file named '.gnupg/dirmngr.conf'.
>>
>> My guess is that this is created by the use of gpgme which calls gnupg
>> functions. And it has something to do with dirmngr configuration.
>> Please have a look at your configuration.
> 
> Thanks.
> 
> Added patch below (locally in my wget build) to avoid dependency on some 
> specific
> gnupg/dirmngr configuration. It fixes both tests for me.
> 
> --- wget-1.19.2/testenv/conf/expected_files.py.org2017-10-30 
> 12:36:46.911716601 +0100
> +++ wget-1.19.2/testenv/conf/expected_files.py2017-10-30 
> 12:41:03.358656484 +0100
> @@ -24,9 +24,9 @@ class ExpectedFiles:
>  snapshot = {}
>  for parent, dirs, files in os.walk('.'):
>  for name in files:
> -# pubring.kbx will be created by libgpgme if $HOME doesn't 
> contain the .gnupg directory.
> +# pubring.kbx, dirmngr.conf, gpg.conf can be created by 
> libgpgme if $HOME doesn't contain the .gnupg directory.
># setting $HOME to CWD (in 
> base_test.py) breaks two Metalink tests, so we skip this file here.
> -if name == 'pubring.kbx':
> +if name in [ 'pubring.kbx', 'dirmngr.conf', 'gpg.conf' ]:
>  continue
>  
>  f = {'content': ''}
> 

Great, thanks ! The changes are pushed.

Sorry that I didn't find/remember this immediately:
commit 5d4ada1b7b0b79f8053f3d6ffddda2e2c66d9dce
Author: Tim Rühsen <tim.rueh...@gmx.de>
Date:   Tue May 16 10:24:52 2017 +0200

Fix two Metalink tests if $HOME is changed

* conf/expected_files.py (gen_local_fs_snapshot): Skip processing
  of 'pubring.kbx'

:-)


With Best Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Patch: Avoid unnecessary UTF-8 encoded fallback

2017-10-25 Thread Tim Rühsen
Thanks, applied.


With Best Regards, Tim



On 10/25/2017 12:56 PM, YX Hao wrote:
> Dear there,
> 
> Things are clear as the patch shows. ☺
> 
> ==
> diff --git a/src/retr.c b/src/retr.c
> index a27d58af..c1bc600e 100644
> --- a/src/retr.c
> +++ b/src/retr.c
> @@ -1098,11 +1098,16 @@ retrieve_url (struct url * orig_parsed, const char 
> *origurl, char **file,
>u = url_parse (origurl, NULL, iri, true);
>if (u)
>  {
> -  DEBUGP (("[IRI fallbacking to non-utf8 for %s\n", quote (url)));
> -  xfree (url);
> -  url = xstrdup (u->url);
> -  iri_fallbacked = 1;
> -  goto redirected;
> +  if (strcmp(u->url, orig_parsed->url))
> +{
> +  DEBUGP (("[IRI fallbacking to non-utf8 for %s\n", quote 
> (url)));
> +  xfree (url);
> +  url = xstrdup (u->url);
> +  iri_fallbacked = 1;
> +  goto redirected;
> +}
> +  else
> +  DEBUGP (("[Needn't fallback to non-utf8 for %s\n", quote 
> (url)));
>  }
>else
>DEBUGP (("[Couldn't fallback to non-utf8 for %s\n", quote (url)));
> 
> 
> Regards,
> YX Hao
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] one strlen on loop

2017-10-25 Thread Tim Rühsen
On 10/25/2017 01:10 AM, Rodgger Bruno wrote:
> right?

Not quite

>> And there seems to be two buffer underflow issues in the old code.
>> Please consider fixing it as well:
>>
>>>  if (!c_strncasecmp((tok + (tok_len - 4)), ".DIR", 4))
>>
>>>  else if (!c_strncasecmp ((tok + (tok_len - 6)), ".DIR;1", 6))
>>
>> Should be like
>>
>>>  if ((tok_len >= 4) && !c_strncasecmp((tok + (tok_len - 4)),
>> ".DIR", 4))
>>
>>>  else if ((tok_len >= 6) && !c_strncasecmp ((tok + (tok_len - 6)),
>> ".DIR;1", 6))

You new patch is

+  if (tok_len <= 4 && c_strncasecmp((tok + (tok_len - 4)), ".DIR", 4))

+  else if (tok_len <= 6 && c_strncasecmp ((tok + (tok_len - 6)),
".DIR;1", 6))

You want to check that tok_len is *great* enough, else you might get a
buffer underflow in c_strncasecmp. The logic now is the opposite of what
you want.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Self test results on OS X 10.9

2017-10-30 Thread Tim Rühsen
On Montag, 30. Oktober 2017 12:28:23 CET Jeffrey Walton wrote:
> Hi Everyone,
> 
> I'm building 1.19.2 from sources. Does anyone have an idea about the
> "Use of uninitialized value $addr in concatenation ..."?

Looks like a copy misuse of an uninitialized value :-)

E.g. in Test-https-pfs.px is should be
  warn "Failed to resolve $testhostname, using $srcdir/certs/wgethosts\n";
instead of
  warn "Failed to resolve $addr, using $srcdir/certs/wgethosts\n";

But this disguises the real issue, that the HOSTALIASES trick doesn't work for 
you. That's why you see SKIP for certain tests. We use it to circumvent 
hostname lookups from /etc/hosts, which we can't rely on. HOSTALIASES let's us 
use our own hosts file. If it doesn't work, we have to SKIP to prevent a FAIL 
in case you have colliding entries in /etc/hosts.

Thanks for your report.

Regards, Tim

> 
> Thanks,
> 
> Jeff

signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] Need help

2018-05-08 Thread Tim Rühsen
Hi Sameeran,

people only answer questions if the know the answer.


Your original email was quite confusing. At least I didn't understand
your problem and thus didn't answer. Try to be precise and if possible
add an example or your error output (the relevant parts).

With Best Regards, Tim



On 05/08/2018 05:52 AM, Sameeran Joshi wrote:
> Hi,I have sent emails on mailing list regarding some doubts,they aren't
> replied so generally how many days does it take for reply from
> maintainers,as I am newbie I thought no one is replying to mails,but one of
> my friend told to ask on mailing list may be the people are busy.
> Thanku
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Wget not ignoring the return value of a void function in 1.19.5

2018-05-08 Thread Tim Rühsen
On 05/08/2018 09:16 AM, Josef Moellers wrote:
> Hi,
> 
> While trying to upgrade to 1.19.5, we found a bug in wget (src/host.c)
> where the (non-existing) return value of a void function is assigned to
> a variable.
> 
> A patch is appended.

Thanks,

setting timer to NULL is not needed here.

I'll amended and pushed the patch.

With Best Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Removing commented function

2018-05-05 Thread Tim Rühsen


On 05.05.2018 15:00, Sameeran Joshi wrote:
> 1.i have a doubt regarding the
> static void _urls_to_absolute(WGET_VECTOR *urls, WGET_IRI *base)
>
> function in
> wget2/libwget/html_url.c 
> file .As its commented why are we keeping it there,can't we remove it so
> the the work of preprocessor will be saved.
>
It can be removed because we don't need it any more.

Regards, Tim




signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] gcc: error: unrecognized command line option '-R'

2018-05-19 Thread Tim Rühsen
On 19.05.2018 20:53, Jeffrey Walton wrote:
> On Sat, May 19, 2018 at 12:27 PM, Tim Rühsen <tim.rueh...@gmx.de> wrote:
>> Hi Jeff,
>>
>> could you 'cd fuzz', then 'make -j1 V=1' and send us the ouput ?
>>
>> It should include the full gcc command line.
>>
>> Please attach your config.log.
>>
> Thanks Tim.
>
> $ cd wget-1.19.5
> $ make check V=1
> ...
>
> make[4]: Entering directory '/home/Build-Scripts/wget-1.19.5/src'
> make[4]: Leaving directory '/home/jwalton/Build-Scripts/wget-1.19.5/src'
> gcc   -Wno-unused-parameter -Wno-pedantic -I/usr/local/include
> -I/usr/local/include  -I/usr/local/include  -DNDEBUG -g2 -O2 -m64
> -march=native -fPIC  -L/usr/local/lib64 -m64 -Wl,-R,/usr/local/lib64
> -Wl,--enable-new-dtags -o wget_css_fuzzer wget_css_fuzzer.o main.o
> ../src/libunittest.a ../lib/libgnu.a   -L/usr/local/lib64 -liconv
> -R/usr/local/lib64  -lpthread   -ldl  -L/usr/local/lib64 -lpcre
> -lidn2 /usr/local/lib64/libssl.so /usr/local/lib64/libcrypto.so
> -Wl,-rpath -Wl,/usr/local/lib64 -ldl -L/usr/local/lib64 -lz
> -L/usr/local/lib64 -lpsl  -ldl -lpthread
> gcc: error: unrecognized command line option '-R'
> make[3]: *** [Makefile:1757: wget_css_fuzzer] Error 1
>
> Some of the command above looks a little unusual. For example:
>
> -Wl,-rpath -Wl,/usr/local/lib64
>
> Here are the variables I configure with:
>
> PKGCONFIG: /usr/local/lib64/pkgconfig
>  CPPFLAGS: -I/usr/local/include -DNDEBUG
>CFLAGS: -g2 -O2 -m64 -march=native -fPIC
>  CXXFLAGS: -g2 -O2 -m64 -march=native -fPIC
>   LDFLAGS: -L/usr/local/lib64 -m64 -Wl,-R,/usr/local/lib64
> -Wl,--enable-new-dtags
>LDLIBS: -ldl -lpthread
>

Could you please change $(LTLIBICONV) to $(LIBICONV) in
fuzz/Makefile.am, then autoreconf -fi and ./configure, ...

I think it's that wrong variable used there.

Regards, Tim




signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] gcc: error: unrecognized command line option '-R'

2018-05-19 Thread Tim Rühsen
Hi Jeff,


could you 'cd fuzz', then 'make -j1 V=1' and send us the ouput ?

It should include the full gcc command line.


Please attach your config.log.

Regards, Tim


On 19.05.2018 17:28, Jeffrey Walton wrote:
> Hi Everyone,
>
> This looks like a new issue with Wget 1.19.5:
>
> make
> ...
>   CC   libunittest_a-version.o
>   AR   libunittest.a
> gmake[3]: Leaving directory '/home/Build-Scripts/wget-1.19.5/src'
> gmake[2]: Leaving directory '/home/Build-Scripts/wget-1.19.5/src'
> Making check in doc
> ...
>
> gmake[4]: Leaving directory '/home/Build-Scripts/wget-1.19.5/src'
>   CCLD wget_css_fuzzer
> gcc: error: unrecognized command line option '-R'
> gmake[3]: *** [Makefile:1757: wget_css_fuzzer] Error 1
>
> The config.log is attached. It looks like there is a bad interaction
> with -Wl,-R,/usr/local/lib64, which is in my LDFLAGS. The complete
> LDFLAGS used is:
>
> -L/usr/local/lib64 -m64 -Wl,-R,/usr/local/lib64 -Wl,--enable-new-dtags
>
> Jeff




signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] gcc: error: unrecognized command line option '-R'

2018-05-20 Thread Tim Rühsen
On 19.05.2018 23:44, Jeffrey Walton wrote:
> On Sat, May 19, 2018 at 5:21 PM, Tim Rühsen <tim.rueh...@gmx.de> wrote:
>> On 19.05.2018 20:53, Jeffrey Walton wrote:
>>> On Sat, May 19, 2018 at 12:27 PM, Tim Rühsen <tim.rueh...@gmx.de> wrote:
>>> ...
>>> make[4]: Entering directory '/home/Build-Scripts/wget-1.19.5/src'
>>> make[4]: Leaving directory '/home/jwalton/Build-Scripts/wget-1.19.5/src'
>>> gcc   -Wno-unused-parameter -Wno-pedantic -I/usr/local/include
>>> -I/usr/local/include  -I/usr/local/include  -DNDEBUG -g2 -O2 -m64
>>> -march=native -fPIC  -L/usr/local/lib64 -m64 -Wl,-R,/usr/local/lib64
>>> -Wl,--enable-new-dtags -o wget_css_fuzzer wget_css_fuzzer.o main.o
>>> ../src/libunittest.a ../lib/libgnu.a   -L/usr/local/lib64 -liconv
>>> -R/usr/local/lib64  -lpthread   -ldl  -L/usr/local/lib64 -lpcre
>>> -lidn2 /usr/local/lib64/libssl.so /usr/local/lib64/libcrypto.so
>>> -Wl,-rpath -Wl,/usr/local/lib64 -ldl -L/usr/local/lib64 -lz
>>> -L/usr/local/lib64 -lpsl  -ldl -lpthread
>>> gcc: error: unrecognized command line option '-R'
>>> make[3]: *** [Makefile:1757: wget_css_fuzzer] Error 1
>>>
>>> Some of the command above looks a little unusual. For example:
>>>
>>> -Wl,-rpath -Wl,/usr/local/lib64
>>>
>>> Here are the variables I configure with:
>>>
>>> PKGCONFIG: /usr/local/lib64/pkgconfig
>>>  CPPFLAGS: -I/usr/local/include -DNDEBUG
>>>CFLAGS: -g2 -O2 -m64 -march=native -fPIC
>>>  CXXFLAGS: -g2 -O2 -m64 -march=native -fPIC
>>>   LDFLAGS: -L/usr/local/lib64 -m64 -Wl,-R,/usr/local/lib64
>>> -Wl,--enable-new-dtags
>>>LDLIBS: -ldl -lpthread
>>>
>> Could you please change $(LTLIBICONV) to $(LIBICONV) in
>> fuzz/Makefile.am, then autoreconf -fi and ./configure, ...
>>
>> I think it's that wrong variable used there.
> Yes, that was it. Thanks.
Thanks for testing. I couldn't have figured it out without your config.log.

The change is pushed to master.

Regards, Tim




signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Bug-wget mailing list

2018-05-17 Thread Tim Rühsen
On 15.05.2018 13:32, Graeme wrote:
> I opened the mailing list archives.
> http://lists.gnu.org/archive/html/bug-wget/
>
> I searched for Windows, looking for information on the latest Wget for
> windows version.
>
> I get 1091 documents but they are in order of score.  I change the
> sort to by date and click search.
>
> In fact the heading of the results, despite having 1091 documents says
> there are 0 documents and 0 keywords.
>
> When I click search to try and put the documents in date order I get
> No documents and the reference (can't open the index)
>
> What am I doing wrong?

For any changes (e.g. the sort order), you have to go back to
http://lists.gnu.org/archive/html/bug-wget/.
At least here it works.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Wget for WIndows download

2018-05-17 Thread Tim Rühsen
On 15.05.2018 13:48, Graeme wrote:
> I am trying to find the up to date version of Wget for Windows.
>
> On https://www.gnu.org/software/wget/faq.html#download there are 4 links.
>
> Sourceforge and Bart Puype links seem to be dead links.
>
> Christopher Lewis' link times out.
>
> Jernej Simoncic's link works but the files don't seem to have version
> information in them. This is problematic as I need to be able to get
> version info to know when to update to newer versions. I also notice
> that if I try downloading the zip file from this repository I get an
> error 0x80004005 unspecified error when I try to upzip, but only for
> the wget.exe file.
>

We do not provide or maintain binary packages of Wget.
Best is to directly ask Jernej about the issue.

Thanks for pointing out the dead links, we'll update the pages as soon
as time allows it.

Regards, Tim




signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] query

2018-06-11 Thread Tim Rühsen
Hi,

On 06/11/2018 07:30 AM, Md. Qudratullah Phd2011, MGL wrote:
> Hi,
> I started downloading with wget but it stopped after some time (25%
> completion) and I restarted with -c option but it does not show that it
> resume download. Every time I restart with -c option, it starts from 0%.
> I have a 20GB file to be downloaded but due to problem in INTERNET
> connection it get disconnected. For last two days I have been running
> "wget" command with -c option but all in vain. Please help me.

It might be that the server doesn't support the range request header.
But to say exactly, please provide us with the output of

wget --version

Also (maybe more important), send us the debug output of your wget
command (add --debug to the command line). If you think there is too
much private data in there, send it directly to me via email.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Retry on host error

2018-06-12 Thread Tim Rühsen
Hi Elliot,

thanks for your contribution.

I'll care for the integration / merge tomorrow.

We maintainers decided to accept your work without the FSF Copyright
Assignment from you (will be tagged as Copyright-paperwork-exempt).

Let us know if you plan to work on more and we'll send you the FSF
Assignment. It is not much work from your side.

Regards, Tim

On 11.06.2018 05:52, Elliot Chandler wrote:
> Dear Wget,
> 
> Based on the changes in commit
> d6d6a0c446f91a005d44ab72eb956c58ad2d, I have made a patch,
> attached, to add an option to retry on host errors.
> 
> It has been requested a few times (links at end of email), and just
> now it frustrated me, so I think it is a good option to have.
> 
> The patch does not affect the default behavior.
> 
> Thank you for your consideration!
> 
> —Elliot
> 
> 
> P. S.: Links to show value of patch, where this has been requested and
> impractical workarounds suggested instead:
> 
> https://stackoverflow.com/questions/30983511/wget-force-retry-until-there-is-a-connection
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=202956
> 
> https://ubuntuforums.org/showthread.php?t=1397043
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Retry on host error

2018-06-13 Thread Tim Rühsen
Congrats your patch has been pushed, though I detected a small issue
right after that (fixed now).

And don't you think --tries should be considered ?

Regards, Tim

On 12.06.2018 20:53, Tim Rühsen wrote:
> Hi Elliot,
> 
> thanks for your contribution.
> 
> I'll care for the integration / merge tomorrow.
> 
> We maintainers decided to accept your work without the FSF Copyright
> Assignment from you (will be tagged as Copyright-paperwork-exempt).
> 
> Let us know if you plan to work on more and we'll send you the FSF
> Assignment. It is not much work from your side.
> 
> Regards, Tim
> 
> On 11.06.2018 05:52, Elliot Chandler wrote:
>> Dear Wget,
>>
>> Based on the changes in commit
>> d6d6a0c446f91a005d44ab72eb956c58ad2d, I have made a patch,
>> attached, to add an option to retry on host errors.
>>
>> It has been requested a few times (links at end of email), and just
>> now it frustrated me, so I think it is a good option to have.
>>
>> The patch does not affect the default behavior.
>>
>> Thank you for your consideration!
>>
>> —Elliot
>>
>>
>> P. S.: Links to show value of patch, where this has been requested and
>> impractical workarounds suggested instead:
>>
>> https://stackoverflow.com/questions/30983511/wget-force-retry-until-there-is-a-connection
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=202956
>>
>> https://ubuntuforums.org/showthread.php?t=1397043
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] [unit-test] Crash on test_hsts_new_entry()

2018-05-29 Thread Tim Rühsen
Thanks for your report.

I just merged another branch into master, this issue seemed to be fixed
in there. Please try again latest master.

Regards, Tim

On 05/22/2018 05:06 PM, Gisle Vanem wrote:
> I've built unit-test on Windows (clang-cl). But when running
> it, it crashes after the message:
>   RUNNING TEST test_hsts_new_entry...
> 
> Since 'opt.homedir' and therefore 'get_hsts_store_filename()'
> returns NULL. How is 'opt.homedir' supposed to be set?
> 
> If I add:
>   opt.homedir = home_dir();
> 
> to 'all_tests()', I do get the correct %HOME path (equals %APPDATA).
> But it seems 'opt.homedir' gets cleared afterwards somewhere.
> In test_cmd_spec_restrict_file_names() or test_path_simplify()?
> 
> So if I do this, all tests passes:
> 
> --- a/tests/unit-tests.c 2018-05-21 17:59:47
> +++ b/tests/unit-tests.c 2018-05-22 15:00:19
> @@ -43,11 +43,19 @@
>  static const char *
>  all_tests(void)
>  {
> +  opt.homedir = home_dir();
> +
>  #ifdef HAVE_METALINK
>    mu_run_test (test_find_key_value);
>    mu_run_test (test_find_key_values);
>    mu_run_test (test_has_key);
>  #endif
> +#ifdef HAVE_HSTS
> +  mu_run_test (test_hsts_new_entry);
> +  mu_run_test (test_hsts_url_rewrite_superdomain);
> +  mu_run_test (test_hsts_url_rewrite_congruent);
> +  mu_run_test (test_hsts_read_database);
> +#endif
>    mu_run_test (test_parse_content_disposition);
>    mu_run_test (test_parse_range_header);
>    mu_run_test (test_subdir_p);
> @@ -58,12 +66,6 @@
>    mu_run_test (test_append_uri_pathel);
>    mu_run_test (test_are_urls_equal);
>    mu_run_test (test_is_robots_txt_url);
> -#ifdef HAVE_HSTS
> -  mu_run_test (test_hsts_new_entry);
> -  mu_run_test (test_hsts_url_rewrite_superdomain);
> -  mu_run_test (test_hsts_url_rewrite_congruent);
> -  mu_run_test (test_hsts_read_database);
> -#endif
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Error when compiling wget 1.19.5 with c-ares 1.14.0

2018-06-04 Thread Tim Rühsen
On 06/04/2018 01:14 PM, Krzysztof Malinowski wrote:
> Hello,
> 
> I am trying to build wget 1.19.5 with c-ares 1.14.0 and the
> compilation fails with error:
> 
> make[3]: Entering directory '/dev/shm/akm022/wget-1.19.5/src'
>   CC   host.o
> host.c: In function 'wait_ares':
> host.c:735:11: error: void value not ignored as it ought to be
>  timer = ptimer_destroy (timer);
>^
> make[3]: *** [Makefile:1687: host.o] Error 1
> 
> wget was configured with:
> 
> ./configure --prefix=/proj/subcm/tools/Linux-x86_64
> LDFLAGS='-Wl,-rpath,\$ORIGIN:\$ORIGIN/../lib64:\$ORIGIN/../Tikanga/lib64
> -Wl,-z,origin' --with-ssl=openssl --with-cares
> 
> GCC used is 8.1.0.
> 
> It seems that the fix should be quite trivial, could we get this fixed
> in the source code? If I could be of any help for fixing that, please
> let me know.

Hi,

it was the first thing fixed after the 1.19.5 release. So if you git
clone, the problem is solved.

But instead you can simply apply this change:

@@ -732,7 +732,7 @@ wait_ares (ares_channel channel)
 ares_process (channel, _fds, _fds);
 }
   if (timer)
-timer = ptimer_destroy (timer);
+ptimer_destroy (timer);
 }

 static void

(Just remove 'timer = ' and that's it.)


Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Wget follows "button" links

2018-06-05 Thread Tim Rühsen
On 06/05/2018 11:53 AM, CryHard wrote:
> Hey there,
> 
> I've used the following:
> 
> wget --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) 
> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36" 
> --user=myuser --ask-password --no-check-certificate --recursive 
> --page-requisites --adjust-extension --span-hosts 
> --restrict-file-names=windows --domains wiki.com --no-parent wiki.com 
> --no-clobber --convert-links --wait=0 --quota=inf -P /home/W
> 
> To download a wiki. The problem is that this will follow "button" links, e.g 
> the links that allow a user to put a page on a watchlist for further 
> modifications. This has led to me watching hundreds of pages. Not only that, 
> but apparently it also follows the links that lead to reverting changes made 
> by others on a page.
> 
> Is there a way to avoid this behavior?

Hi,

that depends on how these "button links" are realized.

A button may be part of a HTML FORM tag/structure where the URL is the
value of the 'action' attribute. Wget doesn't download such URLs because
of the problem you describe.

A dynamic web page can realize "button links" by using simple links.
Wget doesn't know about hidden semantics and so downloads these URLs -
and maybe they trigger some changes in a database.
If this is your issue, you have to look into the HTML files and exclude
those URLs from being downloaded. Or you create a whitelist. Look at
options -A/-R and --accept-regex and --reject-regex.

> I'm using the following version:
> 
>> wget --version
> GNU Wget 1.12 built on linux-gnu.

Ok, you should update wget if possible. Latest version is 1.19.5.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Wget follows "button" links

2018-06-05 Thread Tim Rühsen
Hi,

> "Both --no-clobber and --convert-links were specified, only
--convert-links will be used."

Right, I missed that. The combination of both flags was buggy by design
(also in 1.12) and suffered from several flaws (not to say bugs).

Regex more like '.*/xpage=watch.*'. The exact syntax depends on
  --regex-type=TYPE   regex type (posix|pcre)

What else can you do... try wget2. It allows the combination of
--no-clobber and --convert-links. And if you find bugs they can be fixed
(other as wget1.x were we have to redesign a whole lot of things).

See https://gitlab.com/gnuwget/wget2

If you don't like to build from git, you can download a pretty recent
tarball from https://alpha.gnu.org/gnu/wget/wget2-1.99.1.tar.gz.

Signature at https://alpha.gnu.org/gnu/wget/wget2-1.99.1.tar.gz.sig

Regards, Tim

On 06/05/2018 03:52 PM, CryHard wrote:
> Hey Tim,
> 
> Please see http://savannah.gnu.org/bugs/?31781 where it implemented. Since 
> version 1.12.1.
> 
> On my personal mac I have 1.19.5, and when I run the command with both 
> arguments i get: 
> 
> "Both --no-clobber and --convert-links were specified, only --convert-links 
> will be used."
> 
> As a response. 
> 
> Anyway, I might make due without -nc if I can use the regex argument. Could 
> you give an example on how would that argument work in my case? Can I just 
> use www.mywiki.com/delete/* as an argument for example? or .*/xpage=watch.* ?
> 
> Thanks!
> 
> 
> ​Sent with ProtonMail Secure Email.​
> 
> ‐‐‐ Original Message ‐‐‐
> 
> On June 5, 2018 2:40 PM, Tim Rühsen  wrote:
> 
>> Hi,
>>
>> in this case you could try it with -X / --exclude-directories.
>>
>> E.g. wget -X /delete,/remove
>>
>> That wouldn't help with "xpage=watch..." though.
>>
>> And I can't tell you if and how good -X works with wget 1.12.
>>
>> Why (or since when) doesn't --no-clobber plus --convert-links work any
>>
>> more ?
>>
>> Please feel free to open a bug report at
>>
>> https://savannah.gnu.org/bugs/?func=additem=wget with a detailed
>>
>> description, please.
>>
>> Cause it works for me :-)
>>
>> Regards, Tim
>>
>> On 06/05/2018 03:11 PM, CryHard wrote:
>>
>>> Hey Tim,
>>>
>>> Thanks for the info. The wiki software we use (xwiki) appends something to 
>>> wiki pages URLs to express a certain behavior. For example, to "watch" a 
>>> page, the button once pressed redirects you to 
>>> "www.wiki.com/WIKI-PAGE-NAME?xpage=watch=adddocument"
>>>
>>> Where the only thing that changes is the "WIKI-PAGE-NAME" part.
>>>
>>> Also, for actions such as like "deleting" or "reverting" a wiki page, the 
>>> URL changes by adding /remove/ or /delete/ 'sub-folders" in the URL. these 
>>> are usually in the middle, before the actual page name. For example: 
>>> www.wiki.com/delete/WIKI-PAGE-NAME. So in this case the "offending URL" is 
>>> in the middle of the actual wiki page URL.
>>>
>>> What I would need to do is exclude from wget visiting any 
>>> www.wiki.com/delete or www.wiki.com/remove/ pages. I'd also need to exclude 
>>> links that end with "xpage=watch=adddocument" which triggers me to watch 
>>> that page.
>>>
>>> I am using v1.12 because the most recent versions have disabled 
>>> --no-clobber and --convert-links from working together. I need --no-clobber 
>>> because if the download stops, I need to be able to resume without 
>>> re-downloading all the files. And I need --convert-links because this needs 
>>> to work as a local copy.
>>>
>>> From my understanding the options you mention have been added after v1.12. 
>>> Is there any way to achieve this?
>>>
>>> BTW, -N (timestamps) doesn't work, as the server on which the wiki is 
>>> hosted doesn't seem to support this, hence wget keeps redownloading the 
>>> same files.
>>>
>>> Thanks a lot!
>>>
>>> ‐‐‐ Original Message ‐‐‐
>>>
>>> On June 5, 2018 1:57 PM, Tim Rühsen tim.rueh...@gmx.de wrote:
>>>
>>>> On 06/05/2018 11:53 AM, CryHard wrote:
>>>>
>>>>> Hey there,
>>>>>
>>>>> I've used the following:
>>>>>
>>>>> wget --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) 
>>>>> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 
>>>>> Safari/537.36" --user=myuser --ask-p

Re: [Bug-wget] Wget follows "button" links

2018-06-05 Thread Tim Rühsen
Hi,

in this case you could try it with -X / --exclude-directories.

E.g. wget -X /delete,/remove

That wouldn't help with "xpage=watch..." though.

And I can't tell you if and how good -X works with wget 1.12.

Why (or since when) doesn't --no-clobber plus --convert-links work any
more ?
Please feel free to open a bug report at
https://savannah.gnu.org/bugs/?func=additem=wget with a detailed
description, please.
Cause it works for me :-)

Regards, Tim

On 06/05/2018 03:11 PM, CryHard wrote:
> Hey Tim,
> 
> Thanks for the info. The wiki software we use (xwiki) appends something to 
> wiki pages URLs to express a certain behavior. For example, to "watch" a 
> page, the button once pressed redirects you to 
> "www.wiki.com/WIKI-PAGE-NAME?xpage=watch=adddocument"
> 
> Where the only thing that changes is the "WIKI-PAGE-NAME" part.
> 
> Also, for actions such as like "deleting" or "reverting" a wiki page, the URL 
> changes by adding /remove/ or /delete/ 'sub-folders" in the URL. these are 
> usually in the middle, before the actual page name. For example: 
> www.wiki.com/delete/WIKI-PAGE-NAME. So in this case the "offending URL" is in 
> the middle of the actual wiki page URL.
> 
> What I would need to do is exclude from wget visiting any www.wiki.com/delete 
> or www.wiki.com/remove/ pages. I'd also need to exclude links that end with 
> "xpage=watch=adddocument" which triggers me to watch that page.
> 
> I am using v1.12 because the most recent versions have disabled --no-clobber 
> and --convert-links from working together. I need --no-clobber because if the 
> download stops, I need to be able to resume without re-downloading all the 
> files. And I need --convert-links because this needs to work as a local copy. 
> 
> From my understanding the options you mention have been added after v1.12. Is 
> there any way to achieve this?
> 
> BTW, -N (timestamps) doesn't work, as the server on which the wiki is hosted 
> doesn't seem to support this, hence wget keeps redownloading the same files.
> 
> Thanks a lot!
> ‐‐‐ Original Message ‐‐‐
> 
> On June 5, 2018 1:57 PM, Tim Rühsen  wrote:
> 
>> On 06/05/2018 11:53 AM, CryHard wrote:
>>
>>> Hey there,
>>>
>>> I've used the following:
>>>
>>> wget --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) 
>>> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36" 
>>> --user=myuser --ask-password --no-check-certificate --recursive 
>>> --page-requisites --adjust-extension --span-hosts 
>>> --restrict-file-names=windows --domains wiki.com --no-parent wiki.com 
>>> --no-clobber --convert-links --wait=0 --quota=inf -P /home/W
>>>
>>> To download a wiki. The problem is that this will follow "button" links, 
>>> e.g the links that allow a user to put a page on a watchlist for further 
>>> modifications. This has led to me watching hundreds of pages. Not only 
>>> that, but apparently it also follows the links that lead to reverting 
>>> changes made by others on a page.
>>>
>>> Is there a way to avoid this behavior?
>>
>> Hi,
>>
>> that depends on how these "button links" are realized.
>>
>> A button may be part of a HTML FORM tag/structure where the URL is the
>>
>> value of the 'action' attribute. Wget doesn't download such URLs because
>>
>> of the problem you describe.
>>
>> A dynamic web page can realize "button links" by using simple links.
>>
>> Wget doesn't know about hidden semantics and so downloads these URLs -
>>
>> and maybe they trigger some changes in a database.
>>
>> If this is your issue, you have to look into the HTML files and exclude
>>
>> those URLs from being downloaded. Or you create a whitelist. Look at
>>
>> options -A/-R and --accept-regex and --reject-regex.
>>
>>> I'm using the following version:
>>>
>>>> wget --version
>>>>
>>>> GNU Wget 1.12 built on linux-gnu.
>>
>> Ok, you should update wget if possible. Latest version is 1.19.5.
>>
>> Regards, Tim
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Retry on host error

2018-06-26 Thread Tim Rühsen
Hi Elliot,

On 06/25/2018 11:17 PM, Elliot Chandler wrote:
> Hello again,
> 
> Thank you for integrating the patch!

Thanks for contributing :-)

> Regarding licensing, I don't have immediate plans to contribute more
> patches to GNU (kind of swamped with things going on), but it's always a
> possibility. I'm happy to fill out a copyright grant to irrevocably assign
> ownership of my patches sent to GNU projects to FSF and/or GNU, which I
> assume is what that entails: just let me know.

No need for that right now. And then, such an assignment is only
required for GNU projects copyrighted by FSF (e.g. for GNU Wget).

> Sorry for overlooking that those earlier cases were supposed to fall
> through; that was sloppy of me.

NP

> I tested --retry-on-host-error with and without --tries 3. With --tries 3
> it stopped after 3 tries; without --tries 3 it kept going. Is there other
> support that is needed, or something that I'm overlooking?

No, it was my mistake.

Thanks for your feedback.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Why does -A not work?

2018-06-21 Thread Tim Rühsen
Just try

wget2 -nd -l2 -r -A "*little-nemo*s.jpeg"
'http://comicstriplibrary.org/search?search=little+nemo'

and you only get
little-nemo-19051015-s.jpeg
little-nemo-19051022-s.jpeg
little-nemo-19051029-s.jpeg
little-nemo-19051105-s.jpeg
little-nemo-19051112-s.jpeg
little-nemo-19051119-s.jpeg
little-nemo-19051126-s.jpeg
little-nemo-19051203-s.jpeg
little-nemo-19051210-s.jpeg
little-nemo-19051217-s.jpeg
little-nemo-19051224-s.jpeg
little-nemo-19051231-s.jpeg
little-nemo-19060107-s.jpeg
little-nemo-19060114-s.jpeg
little-nemo-19060121-s.jpeg
little-nemo-19060128-s.jpeg
little-nemo-19060204-s.jpeg
little-nemo-19060211-s.jpeg
little-nemo-19060218-s.jpeg
little-nemo-19060225-s.jpeg

Regards, Tim

On 06/20/2018 09:59 PM, Tim Rühsen wrote:
> On 20.06.2018 18:20, Nils Gerlach wrote:
>> It does not delete any html-file or anything else. Either it is accepted
>> and kept or it is saved forever.
>> With the tip about --accept and --acept-regex I can get wget to traverse
>> the links but it does not go deep
>> enough to get the *l.jpgs I tried to increase -l but to no avail. It seems
>> like it is going only 1 link deep.
>> And not deletes.
> 
> Yes, my failure. Looking at the code, the regex options are applied
> without taking --recursive or --level into account. They are dumb URL
> filters.
> 
> We are back at
> 
> wget -d -olog -r -Dcomicstriplibrary.org -A "*little-nemo*s.jpeg"
> 'http://comicstriplibrary.org/search?search=little+nemo'
> 
> that doesn't work as expected. Somehow it doesn't follow certain links
> so that little-nemo*s.jpeg files aren't found.
> 
> Interestingly, the same options with wget2 are finding + downloading
> those files. From a first glimpse: those files are linked from an RSS /
> Atom file. Those aren't supported by wget, but wget2 does parse them for
> URLs.
> 
> Want to give it a try ? https://gitlab.com/gnuwget/wget2
> 
> Regards, Tim
> 
>>
>> 2018-06-20 16:58 GMT+02:00 Tim Rühsen :
>>
>>> Hi Niels,
>>>
>>> please always answer to the mailing list (no problem if you CC me, but
>>> not needed).
>>>
>>> It was just an example for POSIX regexes - it's up to you to work out
>>> the details ;-) Or maybe there is a volunteer reading this.
>>>
>>> The implicitly downloaded HTML pages should be removed after parsing
>>> when you use --accept-regex. Except the explicitly 'starting' page from
>>> your command line.
>>>
>>> Regards, Tim
>>>
>>> On 06/20/2018 04:28 PM, Nils Gerlach wrote:
>>>> Hi Tim,
>>>>
>>>> I am sorry but your command does not work. It only downloads the
>>> thumbnails
>>>> from the first page
>>>> and follows none of the links. Open the link in a browser. Click on the
>>>> pictures to get a larger picture.
>>>> There is a link "high quality picture" the pictures behind those links
>>> are
>>>> the ones i want to download.
>>>> Regex being ".*little-nemo.*n\l.jpeg". And not only the first page but
>>> from
>>>> the other search result pages, too.
>>>> Can you work that one out? Does this work with wget? Best result would be
>>>> if the visited html-pages were
>>>> deleted by wget. But if they stay I can delete them afterwards. But
>>>> automatism would be better, that's why I am
>>>> trying to use wget ;)
>>>>
>>>> Thanks for the information on the filename and path, though.
>>>>
>>>> Greetings
>>>>
>>>> 2018-06-20 16:13 GMT+02:00 Tim Rühsen :
>>>>
>>>>> Hi Nils,
>>>>>
>>>>> On 06/20/2018 06:16 AM, Nils Gerlach wrote:
>>>>>> Hi there,
>>>>>>
>>>>>> in #wget on freenode I was suggested to write this to you:
>>>>>> I tried using wget to get some images:
>>>>>> wget -nd -rH -Dcomicstriplibrary.org -A
>>>>>> "little-nemo*s.jpeg","*html*","*.html.*","*.tmp","*page*","*display*"
>>>>> -p -e
>>>>>> robots=off 'http://comicstriplibrary.org/search?search=little+nemo'
>>>>>> I wanted to download the images only but wget was not following any of
>>>>> the
>>>>>> links so I got that much more into -A. But it still does not follow the
>>>>>> links.
>>>>>> Page numbers of the search result contain "page" in the link, links to
>>>>> the
>>>>

Re: [Bug-wget] Why does -A not work?

2018-06-20 Thread Tim Rühsen
On 20.06.2018 18:20, Nils Gerlach wrote:
> It does not delete any html-file or anything else. Either it is accepted
> and kept or it is saved forever.
> With the tip about --accept and --acept-regex I can get wget to traverse
> the links but it does not go deep
> enough to get the *l.jpgs I tried to increase -l but to no avail. It seems
> like it is going only 1 link deep.
> And not deletes.

Yes, my failure. Looking at the code, the regex options are applied
without taking --recursive or --level into account. They are dumb URL
filters.

We are back at

wget -d -olog -r -Dcomicstriplibrary.org -A "*little-nemo*s.jpeg"
'http://comicstriplibrary.org/search?search=little+nemo'

that doesn't work as expected. Somehow it doesn't follow certain links
so that little-nemo*s.jpeg files aren't found.

Interestingly, the same options with wget2 are finding + downloading
those files. From a first glimpse: those files are linked from an RSS /
Atom file. Those aren't supported by wget, but wget2 does parse them for
URLs.

Want to give it a try ? https://gitlab.com/gnuwget/wget2

Regards, Tim

> 
> 2018-06-20 16:58 GMT+02:00 Tim Rühsen :
> 
>> Hi Niels,
>>
>> please always answer to the mailing list (no problem if you CC me, but
>> not needed).
>>
>> It was just an example for POSIX regexes - it's up to you to work out
>> the details ;-) Or maybe there is a volunteer reading this.
>>
>> The implicitly downloaded HTML pages should be removed after parsing
>> when you use --accept-regex. Except the explicitly 'starting' page from
>> your command line.
>>
>> Regards, Tim
>>
>> On 06/20/2018 04:28 PM, Nils Gerlach wrote:
>>> Hi Tim,
>>>
>>> I am sorry but your command does not work. It only downloads the
>> thumbnails
>>> from the first page
>>> and follows none of the links. Open the link in a browser. Click on the
>>> pictures to get a larger picture.
>>> There is a link "high quality picture" the pictures behind those links
>> are
>>> the ones i want to download.
>>> Regex being ".*little-nemo.*n\l.jpeg". And not only the first page but
>> from
>>> the other search result pages, too.
>>> Can you work that one out? Does this work with wget? Best result would be
>>> if the visited html-pages were
>>> deleted by wget. But if they stay I can delete them afterwards. But
>>> automatism would be better, that's why I am
>>> trying to use wget ;)
>>>
>>> Thanks for the information on the filename and path, though.
>>>
>>> Greetings
>>>
>>> 2018-06-20 16:13 GMT+02:00 Tim Rühsen :
>>>
>>>> Hi Nils,
>>>>
>>>> On 06/20/2018 06:16 AM, Nils Gerlach wrote:
>>>>> Hi there,
>>>>>
>>>>> in #wget on freenode I was suggested to write this to you:
>>>>> I tried using wget to get some images:
>>>>> wget -nd -rH -Dcomicstriplibrary.org -A
>>>>> "little-nemo*s.jpeg","*html*","*.html.*","*.tmp","*page*","*display*"
>>>> -p -e
>>>>> robots=off 'http://comicstriplibrary.org/search?search=little+nemo'
>>>>> I wanted to download the images only but wget was not following any of
>>>> the
>>>>> links so I got that much more into -A. But it still does not follow the
>>>>> links.
>>>>> Page numbers of the search result contain "page" in the link, links to
>>>> the
>>>>> big pictures i want wget to download contain "display". Both are given
>> in
>>>>> -A and are seen in the html-document wget gets. Neither is followed by
>>>> wget.
>>>>>
>>>>> Why does this not work at all? Website is public, anybody is free to
>>>> test.
>>>>> But this is not my website!
>>>>
>>>> -A / -R works only on the filename, not on the path. The docs (man page)
>>>> is not very explicit about it.
>>>>
>>>> Instead try --accept-regex / --reject-regex which acts on the complete
>>>> URL - but shell wildcard's won't work.
>>>>
>>>> For your example this means to replace '.' by '\.' and '*' by '.*'.
>>>>
>>>> To download those nemo jpegs:
>>>> wget -d -rH -Dcomicstriplibrary.org --accept-regex
>>>> ".*little-nemo.*n\.jpeg" -p -e robots=off
>>>> 'http://comicstriplibrary.org/search?search=little+nemo'
>>>> --regex-type=posix
>>>>
>>>> Regards, Tim
>>>>
>>>>
>>>
>>
>>



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Feature request: option to not download rejected files

2018-07-03 Thread Tim Rühsen
On 07/03/2018 12:48 PM, Zoe Blade wrote:
>> In Wget2 there is an extra option for this, --filter-urls.
> 
> Thank you Tim, this sounds like exactly what I was after!  (It's especially 
> important when you have wget logged in as a user, to be able to tell it not 
> to go to the logout page.)  Though if that feature could be ported to the 
> original wget, with its WARC support etc, that'd be useful.  I guess I'll 
> stick with my hacked version for now.

WARC for wget2 is on the list, maybe as an extra library project.

Thanks for your feedback - I wasn't aware of WARC users out there ;-)

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Newbie help

2018-05-02 Thread Tim Rühsen
On 02.05.2018 19:25, Sameeran Joshi wrote:
> Hi,I am a newbie to wget,i have tried some interesting stuffs using wget by
> reading documentation,and I am curious about how the internal working of
> wget [url] works,what may be the logic used.So I downloaded the source code
> but I am confused where can I get the very simplest source code of wget
> [url] ,the internal functions used for it.I think it would be the simplest
> exercise to start with as a newbie.can anyone suggest me where can I get
> the correct file?Also I would like to make a small C program the way wget
> works.
> Thank you

The best is to start with Wget2. It comes with a library (libwget) that
makes it easy to download
and parse HTML/CSS/XML files. You'll also find some simple C examples in
the examples/ directory.

Have a look at https://gitlab.com/gnuwget/wget2, there are instruction
how to get the sources
and how to build+install wget2 and libwget.

Regards, Tim




signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Add support to bind to a local port

2018-05-03 Thread Tim Rühsen
On 03.05.2018 19:16, Darshit Shah wrote:
> +  { "bindport",  _port, 
> cmd_number },
>{ "bodydata", _data, cmd_string },

There are more tabs here.

I think that port binding does work with address 'ANY'.
So wouldn't it be possible to use --bind-port without --bind-address ?

Regards, Tim




signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] cipher_list string when using OpenSSL

2017-10-20 Thread Tim Rühsen
On 10/19/2017 11:49 AM, Jeffrey Walton wrote:
> On Thu, Oct 19, 2017 at 5:35 AM, Tim Rühsen <tim.rueh...@gmx.de> wrote:
>> Hi Jeffrey,
>>
>> thanks for heads up !
>>
>> Does OpenSSL meanwhile have a PFS for their cipher list ?
>>
>> Currently it looks like that each and every client has to amend their
>> cipher list from time to time. Instead, this should be done in the
>> library. So that new versions automatically make the client code more
>> secure. GnuTLS does it.
>>
>>
>> That's one reason why we (wget developers) already discussed about
>> dropping OpenSSL support completely. The background is that the OpenSSL
>> code in Wget has no maintainer. We take (small) patches every now and
>> then but there is no expert here for review or active progress.
>>
>> Having your random seeding issue in mind, there seems to be even more
>> reasons to drop that OpenSSL code.
>>
>> If there is someone here who wants to maintain the OpenSSL code of Wget
>> - you are very welcome (Let us know) ! In the meantime I'll ask the
>> other maintainers about their opinion.
> 
> Ack, just decide what you want to do. I should not influence the
> project's processes or bikeshed.

That's the wrong attitude. It's an community driven open source project
and every opinion and every input counts !

We will keep OpenSSL code for now - Ander Juaristi is willing to
maintain that code :-)

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] one strlen on loop

2017-10-23 Thread Tim Rühsen
Hi Rodgger,

thanks for your contribution !

Could you please amend a few things !?


Not compilable here:
> if (tok_len < 12) && (strchr( tok, '-') != NULL))


If you are about to touch the code, please also add a space where it is
missing (between tok and +):
> *(tok+ (tok_len - 4)) = '\0'; /* Discard ".DIR". */


And there seems to be two buffer underflow issues in the old code.
Please consider fixing it as well:

>  if (!c_strncasecmp((tok + (tok_len - 4)), ".DIR", 4))

>  else if (!c_strncasecmp ((tok + (tok_len - 6)), ".DIR;1", 6))

Should be like

>  if ((tok_len >= 4) && !c_strncasecmp((tok + (tok_len - 4)),
".DIR", 4))

>  else if ((tok_len >= 6) && !c_strncasecmp ((tok + (tok_len - 6)),
".DIR;1", 6))


Please amend the commit message to GNU style (One brief descriptive
line, empty line, listing all file/function + a more detailed
description). The sign-off is ok, but not needed.


With Best Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] nettle-3.4.tar.gz isn't actually compressed

2018-01-10 Thread Tim Rühsen
Hi Thomas,

sorry, I lost focus on this issue while waiting for more examples.

Just pushed a commit which excludes .gz and .tgz files from automatic
decompression.
BTW, the only example we so far is
http://www.lysator.liu.se/~nisse/archive/nettle-3.3.tar.gz. And that
server ignores 'Accept-Encoding: identity' in a way that it sends data
with Content-Type: x-gzip.
You'll see that with --compression=none.
What I want to say: that server seems to be misconfigured (nginx 1.6.2)


With Best Regards, Tim



On 01/08/2018 02:02 PM, Tomas Hozza wrote:
> Hi Tim.
> 
> I got a bug report in Fedora 27 for this issue 
> (https://bugzilla.redhat.com/show_bug.cgi?id=1532233). I checked the git 
> repository and I don't see any fix there yet. Do you have any ETA for the 
> fix? Unless you plan a new bugfix release soon, I would like to backport the 
> fix to our Fedora package.

I can't reproduce that issue

> 
> Thank you.
> 
> Regards,
> Tomas
> 
> On 20.11.2017 17:07, Tim Rühsen wrote:
>> There already has been a discussion about that (starting here:
>> http://lists.gnu.org/archive/html/bug-wget/2017-11/msg0.html).
>>
>> Looks like we didn't fix it correctly.
>>
>> Currently, to disable gzip compression at all: add
>>   compression = none
>> to ~/.wgetrc and/or /etc/wgetrc.
>>
>> We'll fix it the next days and make up a new release then.
>>
>> Interestingly, the change was introduced on 25th of July - and there was
>> no complaint until 1st of November. I guess we have a very limited base
>> of testers :-(
>>
>>
>> With Best Regards, Tim
>>
>>
>> On 11/20/2017 03:38 PM, baldu...@units.it wrote:
>>>> Maybe depends on version of wget? I probably used wget-1.18 (the version
>>>> in debian stable; I don't have access to the same system at the moment
>>>> so I'm not 100% sure). 
>>>
>>> Looks like the odd behavior is for 19.2 only; 19.1 behaves "normally":
>>>
>>> :1> wget --version|egrep built
>>> GNU Wget 1.19.1 built on linux-gnu.
>>>
>>> :2> wget http://www.lysator.liu.se/~nisse/archive/nettle-3.4.tar.gz
>>> --2017-11-20 15:17:26--  
>>> http://www.lysator.liu.se/~nisse/archive/nettle-3.4.ta\
>>> r.gz
>>> Resolving www.lysator.liu.se (www.lysator.liu.se)... 130.236.254.11, 
>>> 2001:6b0:1\
>>> 7:f0a0::b
>>> Connecting to www.lysator.liu.se (www.lysator.liu.se)|130.236.254.11|:80... 
>>> con\
>>> nected.
>>> HTTP request sent, awaiting response... 200 OK
>>> Length: 1935069 (1.8M) [application/unix-tar]
>>> Saving to: 'nettle-3.4.tar.gz'
>>> [...]
>>> 2017-11-20 15:17:26 (4.25 MB/s) - 'nettle-3.4.tar.gz' saved 
>>> [1935069/1935069]
>>>
>>> :3> file 'nettle-3.4.tar.gz'
>>> nettle-3.4.tar.gz: gzip compressed data, last modified: Sun Nov 19 13:36:24 
>>> 201\
>>> 7, from Unix
>>>
>>>
>>> :5> wget --version |egrep built
>>> GNU Wget 1.19.2 built on linux-gnu.
>>>
>>> :6> rm -f nettle-3.4.tar.gz && wget 
>>> http://www.lysator.liu.se/~nisse/archiv\
>>> e/nettle-3.4.tar.gz
>>> --2017-11-20 15:18:14--  
>>> http://www.lysator.liu.se/~nisse/archive/nettle-3.4.ta\
>>> r.gz
>>> Resolving www.lysator.liu.se (www.lysator.liu.se)... 130.236.254.11, 
>>> 2001:6b0:1\
>>> 7:f0a0::b
>>> Connecting to www.lysator.liu.se (www.lysator.liu.se)|130.236.254.11|:80... 
>>> con\
>>> nected.
>>> HTTP request sent, awaiting response... 200 OK
>>> Length: 1935069 (1.8M) [application/unix-tar]
>>> Saving to: 'nettle-3.4.tar.gz'
>>> [...]
>>> 2017-11-20 15:18:14 (4.11 MB/s) - 'nettle-3.4.tar.gz' saved [6348800]
>>>
>>> :7> file nettle-3.4.tar.gz
>>> nettle-3.4.tar.gz: POSIX tar archive (GNU)
>>>
>>> ciao
>>> gabriele
>>>
>>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Unexpected result with -H and -D

2018-01-17 Thread Tim Rühsen
Hi,

this is not a PSL matching, so no libpsl is needed.

Just sufmatch() has to be fixed to do (sub)domain matching.

Attached is a fix.


With Best Regards, Tim



On 01/17/2018 03:01 PM, Darshit Shah wrote:
> Hi,
> 
> This is a bug in Wget, apparently a really old one! Seems like the bug has 
> been
> around since atleast 1997.
> 
> Looking at the source, the issue is that Wget does a very simple suffix
> matching on the actual domain and accepted domains list. This is obviously
> wrong as you have just found out.
> 
> I'm going to try and implement this correctly, but I'm currently a little 
> short
> on time, so if anyone else wants to pick it up, please feel free to. It's
> simple, use libpsl to get the proper domain name and match against that.
> 
> 
> Of course, this change will require libpsl to no longer be an optional
> dependency
> 
> * Friso van Vollenhoven  [180117 14:40]:
>> Hello all,
>>
>> I am trying to do a recursive download of a webpage and span multiple hosts
>> within the same domain, but not cross to other domains. The issue is that
>> the crawl does extend to other domains. My full command is this:
>>
>> wget \
>> --recursive \
>> --no-clobber \
>> --page-requisites \
>> --adjust-extension \
>> --span-hosts \
>> --domains=scapino.nl \
>> --no-parent \
>> --tries=2 \
>> --wait=1 \
>> --random-wait \
>> --waitretry=2 \
>> --header='User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2)
>> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36' \
>> https://www.scapino.nl/winkels/scapino-utrecht-510061
>>
>> From this combination of --span-hosts and --domains, I would expect to
>> download assets from cdn.scapino.nl and www.scapino.nl, but not other
>> domains. For some reason that I don't understand, wget also starts to do
>> what looks like a full crawl of the domain werkenbijscapino.nl, which is
>> referenced from the original page.
>>
>> Any thoughts or direction would be much appreciated.
>>
>> I am using wget 1.18 on Debian.
>>
>>
>> Best regards,
>> Friso
> 
From 1ad636baa63cfe029c84235986626c17e4ff33cb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tim=20R=C3=BChsen?= 
Date: Wed, 17 Jan 2018 15:50:48 +0100
Subject: [PATCH] * src/host.c (sufmatch): Fix to domain matching

---
 src/host.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/src/host.c b/src/host.c
index 2ddae328..d337cc7c 100644
--- a/src/host.c
+++ b/src/host.c
@@ -1017,18 +1017,25 @@ sufmatch (const char **list, const char *what)
   int i, j, k, lw;
 
   lw = strlen (what);
+
   for (i = 0; list[i]; i++)
 {
-  if (list[i][0] == '\0')
-continue;
+  j = strlen (list[i]);
+  if (lw < j)
+continue; /* what is no (sub)domain of list[i] */
 
-  for (j = strlen (list[i]), k = lw; j >= 0 && k >= 0; j--, k--)
+  for (k = lw; j >= 0 && k >= 0; j--, k--)
 if (c_tolower (list[i][j]) != c_tolower (what[k]))
   break;
-  /* The domain must be first to reach to beginning.  */
-  if (j == -1)
+
+  /* Domain or subdomain match
+   * k == -1: exact match
+   * k >= 0 && what[k] == '.': subdomain match
+   */
+  if (j == -1 && (k == -1 || what[k] == '.'))
 return true;
 }
+
   return false;
 }
 
-- 
2.15.1



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] nettle-3.4.tar.gz isn't actually compressed

2018-01-19 Thread Tim Rühsen
On 01/19/2018 02:23 AM, Thomas Deutschmann wrote:
> Hi,
> 
>> sorry, I lost focus on this issue while waiting for
>> more examples.
> 
> In Gentoo we hit this several times:
> 
> https://bugs.gentoo.org/640930
> https://bugs.gentoo.org/639752
> https://bugs.gentoo.org/636238
> https://bugs.gentoo.org/641686
> 
> Like outlined in the bugs, this is caused by invalid gzip/deflate
> configurations found in upstream severs. We have already contacted some
> projects with buggy servers and some of them of them acknowledged and
> have already fixed their configuration.

Thanks for the extra information.

A work-around for those servers is in master. The 1.19.3 release is
ready to be uploaded - we are just waiting for Darshit's upload
permission to ftp-gnu.org. If not there until Monday, I'll jump in.

With Best Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Testsuite summary for wget 1.19.4: # FAIL: 8

2018-01-25 Thread Tim Rühsen
Thanks for the report.

This happens because we try to print a numeric IP address:

warn "Failed to resolve $$addr, using $srcdir/certs/wgethosts\n";

But then this is printed only if 'localhost' can't be resolved to
127.0.0.1. What is it in your case ?


Could you apply this patch and see what that test prints out ?

diff --git a/tests/Test-https-pfs.px b/tests/Test-https-pfs.px
index 627bd678..d28eec69 100755
--- a/tests/Test-https-pfs.px
+++ b/tests/Test-https-pfs.px
@@ -47,7 +47,7 @@ unless ($addr)
 }
 unless (inet_ntoa($addr) =~ "127.0.0.1")
 {
-warn "Failed to resolve $$addr, using $srcdir/certs/wgethosts\n";
+warn "Unexpected IP for localhost: ".inet_ntoa($addr)."\n";
 exit 77;
 }



With Best Regards, Tim



On 01/25/2018 07:47 AM, David McInnis wrote:
> For ArchLinux.
> 
> -Dave



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] wget 1.19.2 fails on Test-metalink-http.py and Test-metalink-http-quoted.py

2018-01-31 Thread Tim Rühsen
Thanks,

slightly amended and pushed.


With Best Regards, Tim



On 01/30/2018 09:42 AM, Arkadiusz Miśkiewicz wrote:
> On Monday 30 of October 2017, Tim Rühsen wrote:
>> On 10/30/2017 12:49 PM, Arkadiusz Miśkiewicz wrote:
>>> On Monday 30 of October 2017, Tim Rühsen wrote:
>>>> On 10/29/2017 09:39 PM, Arkadiusz Miśkiewicz wrote:
>>>>> On Sunday 29 of October 2017, Tim Rühsen wrote:
>>>>>> On Sonntag, 29. Oktober 2017 21:00:35 CET Arkadiusz Miśkiewicz wrote:
>>>>>>> On Sunday 29 of October 2017, Tim Rühsen wrote:
>>>>>>>> On Sonntag, 29. Oktober 2017 13:45:53 CET Arkadiusz Miśkiewicz wrote:
>>>>>>>>> Hi.
>>>>>>>>>
>>>>>>>>> Test suite for wget fails here on Test-metalink-http.py and
>>>>>>>>> Test-metalink- http-quoted.py
>>>>>>>>>
>>>>>>>>> test-suite.log attached.
>>>>>>>>
>>>>>>>> Could you please also send us the file 'config.log' ? That shows
>>>>>>>> your configuration - I would like to reproduce that issue.
>>>>>>>
>>>>>>> Attached.
>>>>>>
>>>>>> I meant config.log, you attached config.h.
>>>>
>>>> Looks pretty identical to mine.
>>>>
>>>> That made me looking a bit deeper into the original .log file, where you
>>>> can see that python file name structure dump. Well, this is that
>>>> additional file found in the temp. test directory. The test(s) expect
>>>> certain file and their correct contents after running... with those two
>>>> tests we see an additional (unexpected) file named
>>>> '.gnupg/dirmngr.conf'.
>>>>
>>>> My guess is that this is created by the use of gpgme which calls gnupg
>>>> functions. And it has something to do with dirmngr configuration.
>>>> Please have a look at your configuration.
>>>
>>> Thanks.
>>>
>>> Added patch below (locally in my wget build) to avoid dependency on some
>>> specific gnupg/dirmngr configuration. It fixes both tests for me.
>>>
>>> --- wget-1.19.2/testenv/conf/expected_files.py.org  2017-10-30
>>> 12:36:46.911716601 +0100 +++
>>> wget-1.19.2/testenv/conf/expected_files.py  2017-10-30 12:41:03.358656484
>>> +0100
>>>
>>> @@ -24,9 +24,9 @@ class ExpectedFiles:
>>>  snapshot = {}
>>>  
>>>  for parent, dirs, files in os.walk('.'):
>>>  for name in files:
>>> -# pubring.kbx will be created by libgpgme if $HOME
>>> doesn't contain the .gnupg directory. +# pubring.kbx,
>>> dirmngr.conf, gpg.conf can be created by libgpgme if $HOME doesn't
>>> contain the .gnupg directory.
>>>
>>>  # setting $HOME to CWD (in 
>>> base_test.py) breaks two Metalink
>>>  tests, so we skip this file here.
>>>
>>> -if name == 'pubring.kbx':
>>>
>>> +if name in [ 'pubring.kbx', 'dirmngr.conf', 'gpg.conf' ]:
>>>  continue
>>>  
>>>  f = {'content': ''}
>>
>> Great, thanks ! The changes are pushed.
>>
>> Sorry that I didn't find/remember this immediately:
>> commit 5d4ada1b7b0b79f8053f3d6ffddda2e2c66d9dce
>> Author: Tim Rühsen <tim.rueh...@gmx.de>
>> Date:   Tue May 16 10:24:52 2017 +0200
>>
>> Fix two Metalink tests if $HOME is changed
>>
>> * conf/expected_files.py (gen_local_fs_snapshot): Skip processing
>>   of 'pubring.kbx'
>>
>> :-)
>>
>> With Best Regards, Tim
> 
> And one more file can be created by libgpgme:
> 
> --- wget-1.19.4/testenv/conf/expected_files.py.org2018-01-30 
> 09:36:46.359579482 +0100
> +++ wget-1.19.4/testenv/conf/expected_files.py2018-01-30 
> 09:37:01.350209794 +0100
> @@ -24,9 +24,9 @@ class ExpectedFiles:
>  snapshot = {}
>  for parent, dirs, files in os.walk('.'):
>  for name in files:
> -# pubring.kbx, dirmngr.conf, gpg.conf will be created by 
> libgpgme if $HOME doesn't contain the .gnupg directory.
> +# pubring.gpg, pubring.kbx, dirmngr.conf, gpg.conf will be 
> created by libgpgme if $HOME doesn't contain the .gnupg directory.
>  # setting $HOME to CWD (in base_test.py) breaks two Metalink 
> tests, so we skip this file here.
> -if name in [ 'pubring.kbx', 'dirmngr.conf', 'gpg.conf' ]:
> +if name in [ 'pubring.gpg', 'pubring.kbx', 'dirmngr.conf', 
> 'gpg.conf' ]:
>  continue
>  
>  f = {'content': ''}
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] no data downloaded

2018-02-05 Thread Tim Rühsen
On 02/05/2018 12:03 PM, jos vaessen wrote:
> Hello,
> 
> I am downloading the home-webpage from my solar-power-converter using 
> WIFI and its IP adress 160.190.0.1 with WGET. No problem beside that the 
> page is zipped as 7ZIP. After unzipping it shows the content of 
> index.html like a copy/paste/save home.html version on mouseclick using 
> Firefox.
> 
> BUT..
> 
> Without the devicenumber, date, time, converted Watts, etc. So there is 
> no variable data in the downloaded html using WGET.
> 
> What I get is an empty html and the question is: what trigger to use to 
> download these data too like Firefox or IE show on screen?

Either it's missing credentials (do you have to login with Firefox ?) or
cookies.

Some servers deliver content depending on the user agent (UserAgent:
header field set by the client).

To see what Firefox set in the request headers for your box, sniff with
Wireshark (search the web for instructions).

But first try with something like this:
--user-agent="Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101
Firefox/58.0"


With Best Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] How to remove a header?

2018-02-09 Thread Tim Rühsen
Hi,


On 02/09/2018 12:40 AM, Peng Yu wrote:
> Hi,
> 
> wget sets some headers. Is there a way to remove some headers, e.g.,
> Accept-Encoding? Thanks.

No, you can change headers and add headers. But it looks like nobody
ever needed/asked for an option to remove headers.

With Best Regards, Tim



> 
> $ wget -qO- http://httpbin.org/get
> {
>   "args": {},
>   "headers": {
> "Accept": "*/*",
> "Accept-Encoding": "identity",
> "Connection": "close",
> "Host": "httpbin.org",
> "User-Agent": "Wget/1.16.3 (darwin13.4.0)"
>   },
>   "origin": "165.91.87.88",
>   "url": "http://httpbin.org/get;
> }
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] no data downloaded

2018-02-06 Thread Tim Rühsen
Hi Jos,

please always answer to the mailing list - just in case someone else is
thinking about the issue or has a similar issue.


You can also use Firefox's built-in developer tools to see what is going
on. I seldom use it, so I can't give you detailed instructions. Just
search the web for it.

With Best Regards, Tim



On 02/06/2018 09:55 AM, jos vaessen wrote:
> Hello Tim,
> 
> Thanks for te technical information. You' re the first one that dives in 
> deeper.
> 
> Adding the switch --user-agent="Mozilla/5.0 (X11; Linux x86_64; rv:58.0) 
> Gecko/20100101Firefox/58.0" does not help. The output is the same like 
> without it. Beside that now the still zipped format output file 
> "index.html" shows the html lines but no variable data is transmitted. 
> The only difference is that it shows up with an internet icon on screen 
> as without that switch the file shows up as unreadeble. Both respond the 
> same way to 7zip and show an empty webpage without colors nor data. 
> Think that I have find me a way to trigger the Java content too that is 
> stored in the aplying folder being: dropdown.js  and  zepto.js  (and  
> layout.css for the colors?).
> 
> Always something new to learn when I think I have seen it all, Jos
> 
> 
> Op 05-Feb-18 om 17:32 schreef Tim Rühsen:
>> Either it's missing credentials (do you have to login with Firefox ?) or
>> cookies.
>> Some servers deliver content depending on the user agent (UserAgent:
>> header field set by the client).
>> To see what Firefox set in the request headers for your box, sniff with
>> Wireshark (search the web for instructions).
>> But first try with something like this:
>> --user-agent="Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101
>> Firefox/58.0"
>> With Best Regards, Tim
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] "make check" fails in "Test-iri-disabled" on Mac OS X, after "./configure" and "make" of wget-1.19.4 complete

2018-02-20 Thread Tim Rühsen
The charset conversion has an issue on OSX since a while (OSX changed
something). And we have no OSX developer, so there is nothing to do for us.

In the past I had several contacts with charset issues on OSX, who found
configuration/installation issues after a while and then silenced
(problem solved for them).

Best thing is you send us the complete config.log and
tests/Test-iri-disabled.log.

You could also check if this might be a filesystem problem. HFS+ doesn't
always save files with the name you specify... it uses decomposed UTF-8
(AFAIR), while the perl test might check for composed UTF-8. You won't
see the difference on the console, but a byte-per-byte comparison of the
filenames wouldn't work and the test might fail. Just a guess.


With Best Regards, Tim



On 02/20/2018 11:53 AM, qihcqr2...@tutanota.com wrote:
> 
> ### I did the following:
> 
> export PKG_CONFIG='/opt/pkg-config/bin/pkg-config'
> export PKG_CONFIG_PATH='/opt/pkg-config/lib/pkgconfig'
> 
> ./configure --with-ssl=openssl --with-openssl=yes 
> --with-libssl-prefix=/opt/openssl --with-zlib=/opt/zlib
> 
> make
> make check
> 
> 
> ### Actual Result:
> 
> "make check" printed the following failure.
> 
> FAIL: Test-iri-disabled.px
> 
> 
> "config.log" holds the following pieces of information indicating failures.
> 
> http://localhost:49307/p1_fran\347ais.html
> http://localhost:49307/p1_fran%E7ais.html
> Incomplete or invalid multibyte sequence encountered
> Failed to convert file name 'p1_français.html' (UTF-8) -> '?' (US-ASCII)
> 
> 
> http://localhost:49307/p2_\351\351n.html
> http://localhost:49307/p2_%E9%E9n.html
> Incomplete or invalid multibyte sequence encountered
> Failed to convert file name 'p2_één.html' (UTF-8) -> '?' (US-ASCII)
> 
> 
> http://localhost:49307/p3_\244\244\244.html
> http://localhost:49307/p3_%A4%A4%A4.html
> Incomplete or invalid multibyte sequence encountered
> Failed to convert file name 'p3_¤¤¤.html' (UTF-8) -> '?' (US-ASCII)
> 
> 
> There are some non-ASCII characters in the above file names. In case the 
> non-ASCII characters get corrupted through the Internet, I will describe them.
> 
> "p1_fran" on the left of "(UTF-8)" is followed by the "lowercase c with 
> cedilla", whose octal code is 0347 and hex code is E7 for Unicode codepoint 
> and for ISO-8859-1. It is a single-byte character for ISO-8859-1 (ISO 
> Latin-1). However, it is a double-byte character for UTF-8, and its hex code 
> is 0xC3A7 for UTF-8.
> 
> 
> "p2_" on the left of "(UTF-8)" is followed by two occurrences of the 
> "lowercase e with acute", whose octal code is 0351 and hex code is E9 for 
> Unicode codepoint and for ISO-8859-1. It is a single-byte character for 
> ISO-8859-1 (ISO Latin-1). However, it is a double-byte character for UTF-8, 
> and its hex code is 0xC3A9 for UTF-8.
> 
> 
> "p3_" on the left of "(UTF-8)" is followed by three occurrences of the 
> "generic currency sign", whose octal code is 0244 and hex code is A4 for 
> Unicode codepoint and for ISO-8859-1. It is a single-byte character for 
> ISO-8859-1 (ISO Latin-1). However, it is a double-byte character for UTF-8, 
> and its hex code is 0xC2A4 for UTF-8.
> 
> 
> ### Questions
> 
> While I would like this problem to be fixed, I also would like to know what 
> this test is trying to do. Is the test trying to convert "c cedilla" to plain 
> "c", and "e acute" to plain "e", and "generic currency sign" to the "dollar 
> sign"? Moreover, before converting from UTF-8 to US-ASCII, it has to convert 
> from the single-byte Unicode codepoint or ISO-8859-1 to the double-byte 
> UTF-8, does it not?
> 
> 
> ### Related
> 
> "make check" also fails in "Test-https-pfs" and "Test-https-tlsv1x" at the 
> same time as "Test-iri-disabled" under the same condition and the same 
> environment. Separately from this report, I am planning to report about 
> "Test-https-pfs" and "Test-https-tlsv1x" later.
> 
> In Bug 50223 (thread "bug-wget/2017-02/msg00010.html"), "make" failed to 
> compile/link on Mac OS X. Wget has made progress since then. Now, in my 
> situation, "make" completes to compile and link, but "make check" fails.
> 
> 
> ### Environment
> 
> wget-1.19.4
> Mac OS X
> Intel 64-bit
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] segment fault while downloading .tar.gz file

2018-02-23 Thread Tim Rühsen
Hi Jay,

it seems related to https://gitlab.com/gnuwget/wget2/merge_requests/353.

Maybe you like to drop in there...

In short: the server ignores 'Accept-Encoding: identity' and gives us a
'Content-Type: gzip'.

If you build wget2 with libz (just install zlib1-dev and ./configure
again), the problem shouldn't occur and you can go on fixing issue 315.


With Best Regards, Tim



On 02/23/2018 03:13 PM, Jay Bhavsar wrote:
> So, I was trying to resolve this
>  issue.
> 
> Command I run:
> 
> ./wget2_noinstall http://www.lysator.liu.se/~nisse/archive/nettle-3.4.tar.gz
> 
> What I get:
> 
> [0] Downloading 'http://www.lysator.liu.se/~nisse/archive/nettle-3.4.tar.gz'
> ...
> Saving 'nettle-3.4.tar.gz.8'
> Segmentation fault (core dumped)
> 
> Investigating with gdb:
> 
> (gdb) bt
> #0  0x in ?? ()
> #1  0x7feecd9855b6 in wget_decompress (dc=0x7feec4036080,
> src=0x7feec4003310 "\037\213\b", srclen=10456)
> at decompressor.c:415
> #2  0x7feecd98e452 in wget_http_get_response_cb (conn=0x7feec40008c0)
> at http.c:1295
> #3  0x00419e06 in http_receive_response (conn=0x7feec40008c0) at
> wget.c:3518
> #4  0x004158f7 in downloader_thread (p=0x1bb5250) at wget.c:2125
> #5  0x7feecd31b6ba in start_thread (arg=0x7feecc646700) at
> pthread_create.c:333
> #6  0x7feecd05141d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> 
> (gdb) bt full
> #0  0x in ?? ()
> No symbol table info available.
> #1  0x7feecd9855b6 in wget_decompress (dc=0x7feec4036080,
> src=0x7feec4003310 "\037\213\b", srclen=10456)
> at decompressor.c:415
> rc = 32750
> #2  0x7feecd98e452 in wget_http_get_response_cb (conn=0x7feec40008c0)
> at http.c:1295
> bufsize = 102400
> body_len = 10456
> body_size = 0
> nbytes = 10774
> nread = 10774
> buf = 0x7feec4003310 "\037\213\b"
> p = 0x7feec400344e
> "\n\242\305]\274\206o%\035\306v\344$\321ks\250-d\227\334\347\241\345\261^|\357\271\066k\273\066\367\005g\026\230\245\021\061\341\016\273\237\313U\027\340Kaal\240\331c\027\001hZ\221\033\370'\232\024{\344\241\300;;
> \362\264P#\255\260
> d%+\242\035\205\t\242`Fk\313\330͜yV\224-W\222\321R\314\063\376\027r\274\377\247p\314\356\203h\302\\\237\324my\036\367*\304\034\350\254\325v\246T\207\326\020\365I0\203\002'\240\v\225>\301\230\330=g\261\340\243ث\020\002"
> resp = 0x7feec4035ce0
> dc = 0x7feec4036080
> req = 0x7feec401c7f0
> #3  0x00419e06 in http_receive_response (conn=0x7feec40008c0) at
> wget.c:3518
> resp = 0x0
> context = 0x0
> #4  0x004158f7 in downloader_thread (p=0x1bb5250) at wget.c:2125
> downloader = 0x1bb5250
> resp = 0x0
> job = 0x1bb51c0
> host = 0x1bb5000
> pending = 1
> max_pending = 1
> locked = 0
> pause = 0
> action = ACTION_GET_RESPONSE
> #5  0x7feecd31b6ba in start_thread (arg=0x7feecc646700) at
> pthread_create.c:333
> ---Type  to continue, or q  to quit---
> __res = 
> pd = 0x7feecc646700
> now = 
> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140663608076032,
> 5058422716409210776, 0, 140731387977695, 140663608076736,
> 0, -5048895408410564712, -5048892748985505896},
> mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
> data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
> not_first_call = 
> pagesize_m1 = 
> sp = 
> freesize = 
> __PRETTY_FUNCTION__ = "start_thread"
> #6  0x7feecd05141d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> No locals.
> 
> I haven't been able to reproduce this issue with any other url. Also, wget2
> freezes with all https urls. Anyone seen something like this before? Is
> this real issue or my system acting up? In any case, should I dig further
> and create issue?
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Why does -A not work?

2018-06-20 Thread Tim Rühsen
Hi Nils,

On 06/20/2018 06:16 AM, Nils Gerlach wrote:
> Hi there,
> 
> in #wget on freenode I was suggested to write this to you:
> I tried using wget to get some images:
> wget -nd -rH -Dcomicstriplibrary.org -A
> "little-nemo*s.jpeg","*html*","*.html.*","*.tmp","*page*","*display*" -p -e
> robots=off 'http://comicstriplibrary.org/search?search=little+nemo'
> I wanted to download the images only but wget was not following any of the
> links so I got that much more into -A. But it still does not follow the
> links.
> Page numbers of the search result contain "page" in the link, links to the
> big pictures i want wget to download contain "display". Both are given in
> -A and are seen in the html-document wget gets. Neither is followed by wget.
> 
> Why does this not work at all? Website is public, anybody is free to test.
> But this is not my website!

-A / -R works only on the filename, not on the path. The docs (man page)
is not very explicit about it.

Instead try --accept-regex / --reject-regex which acts on the complete
URL - but shell wildcard's won't work.

For your example this means to replace '.' by '\.' and '*' by '.*'.

To download those nemo jpegs:
wget -d -rH -Dcomicstriplibrary.org --accept-regex
".*little-nemo.*n\.jpeg" -p -e robots=off
'http://comicstriplibrary.org/search?search=little+nemo' --regex-type=posix

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Why does -A not work?

2018-06-20 Thread Tim Rühsen
Hi Niels,

please always answer to the mailing list (no problem if you CC me, but
not needed).

It was just an example for POSIX regexes - it's up to you to work out
the details ;-) Or maybe there is a volunteer reading this.

The implicitly downloaded HTML pages should be removed after parsing
when you use --accept-regex. Except the explicitly 'starting' page from
your command line.

Regards, Tim

On 06/20/2018 04:28 PM, Nils Gerlach wrote:
> Hi Tim,
> 
> I am sorry but your command does not work. It only downloads the thumbnails
> from the first page
> and follows none of the links. Open the link in a browser. Click on the
> pictures to get a larger picture.
> There is a link "high quality picture" the pictures behind those links are
> the ones i want to download.
> Regex being ".*little-nemo.*n\l.jpeg". And not only the first page but from
> the other search result pages, too.
> Can you work that one out? Does this work with wget? Best result would be
> if the visited html-pages were
> deleted by wget. But if they stay I can delete them afterwards. But
> automatism would be better, that's why I am
> trying to use wget ;)
> 
> Thanks for the information on the filename and path, though.
> 
> Greetings
> 
> 2018-06-20 16:13 GMT+02:00 Tim Rühsen :
> 
>> Hi Nils,
>>
>> On 06/20/2018 06:16 AM, Nils Gerlach wrote:
>>> Hi there,
>>>
>>> in #wget on freenode I was suggested to write this to you:
>>> I tried using wget to get some images:
>>> wget -nd -rH -Dcomicstriplibrary.org -A
>>> "little-nemo*s.jpeg","*html*","*.html.*","*.tmp","*page*","*display*"
>> -p -e
>>> robots=off 'http://comicstriplibrary.org/search?search=little+nemo'
>>> I wanted to download the images only but wget was not following any of
>> the
>>> links so I got that much more into -A. But it still does not follow the
>>> links.
>>> Page numbers of the search result contain "page" in the link, links to
>> the
>>> big pictures i want wget to download contain "display". Both are given in
>>> -A and are seen in the html-document wget gets. Neither is followed by
>> wget.
>>>
>>> Why does this not work at all? Website is public, anybody is free to
>> test.
>>> But this is not my website!
>>
>> -A / -R works only on the filename, not on the path. The docs (man page)
>> is not very explicit about it.
>>
>> Instead try --accept-regex / --reject-regex which acts on the complete
>> URL - but shell wildcard's won't work.
>>
>> For your example this means to replace '.' by '\.' and '*' by '.*'.
>>
>> To download those nemo jpegs:
>> wget -d -rH -Dcomicstriplibrary.org --accept-regex
>> ".*little-nemo.*n\.jpeg" -p -e robots=off
>> 'http://comicstriplibrary.org/search?search=little+nemo'
>> --regex-type=posix
>>
>> Regards, Tim
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Deprecate TLS 1.0 and TLS 1.1

2018-06-19 Thread Tim Rühsen
On 06/19/2018 12:44 PM, Loganaden Velvindron wrote:
> Hi All,
> 
> As per:
> https://tools.ietf.org/html/draft-moriarty-tls-oldversions-diediedie-00
> 
> Attached is a tentative patch to disable TLS 1.0 and TLS 1.1 by
> default. No doubt that this will cause some discussions, I'm open to
> hearing all opinions on this.
> 

Good idea for the public internet.

IMO there are too many 'internal' devices / hardware that are not
up-to-date and impossible to update.

What about amending the patch so that we apply it only to public IP
addresses ?

And even then - we should not just 'fail' on older servers but tell the
user why wget fails and what to do about it. In the end, the user is
responsible and in control.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Any explanation for the '-nc' returned value?

2018-07-30 Thread Tim Rühsen
On 30.07.2018 16:44, Yuxi Hao wrote:
> Let's take an example in practice.
> When there is a bad network connection, I try wget with '-nc' directly first, 
> if it fails, then I'll try it with proxy. If it says "File * already there ; 
> not retrieving.", and returns 1 as described (error occurred, failed), that 
> is so weird!

After the first try failed, you should explicitly move/remove the file
out of the way. That is not weird, it's a safety feature. It might save
you when having a typo or when an URL is retrieved that you itself can't
trust. You could easily overwrite files in your home directory, e.g.
.profile or .bashrc. That is easily used as an Remote Code Execution (RCE).

So no way we "fix" this ;-)

> And '-N' is not always working as desired, because of "Last-modified header 
> missing". One example:
> wget -N 
> https://www.gnu.org/software/wget/manual/html_node/Download-Options.html

If the server doesn't support it, it simply won't work.
All you can do is not to use -N or ask the server's admin to support it.

Regards, Tim

> 
> Best Regards,
> YX Hao
> 
> -Original Message-
> From: Dale R. Worley 
> To: Tim Rühsen
> Cc: lifenjoine; bug-wget
> Subject: Re: [Bug-wget] Any explanation for the '-nc' returned value?
> 
> Tim Rühsen  writes:
>>-nd, even if -r or -p are in effect.)  When -nc is 
>> specified, this behavior is suppressed, and Wget will
>>refuse to download newer copies of file.
> 
> Though strictly speaking, this doesn't say that wget will then exit with 
> error code 1.
> 
> Dale
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Any explanation for the '-nc' returned value?

2018-07-29 Thread Tim Rühsen
Hi,

from the man pages (--no-clobber):


   When running Wget without -N, -nc, -r, or -p, downloading the
same file in the same directory will result
   in the original copy of file being preserved and the second
copy being named file.1.  If that file is
   downloaded yet again, the third copy will be named file.2,
and so on.  (This is also the behavior with
   -nd, even if -r or -p are in effect.)  When -nc is specified,
this behavior is suppressed, and Wget will
   refuse to download newer copies of file.  Therefore,
""no-clobber"" is actually a misnomer in this
   mode---it's not clobbering that's prevented (as the numeric
suffixes were already preventing clobbering),
   but rather the multiple version saving that's prevented.


Regards, Tim

On 28.07.2018 15:52, Yuxi Hao wrote:
> Hi Guys,
> 
>  
> 
> The source code is:
> 
> 
> 
>   if (opt.noclobber && file_exists_p(opt.output_document, NULL))
> 
>{
> 
>   /* Check if output file exists; if it does, exit. */
> 
>   logprintf (LOG_VERBOSE,
> 
>  _("File %s already there; not retrieving.\n"),
> 
>  quote (opt.output_document));
> 
>   exit (WGET_EXIT_GENERIC_ERROR);
> 
>}
> 
> 
> 
>  
> 
> No explanation on it:
> 
> https://www.gnu.org/software/wget/manual/html_node/Exit-Status.html
> 
>  
> 
> I am confused on this. If it works as specified, why don’t we return succeed?
> 
>  
> 
> Best Regards,
> 
> YX Hao
> 
>  
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Async webcrawling

2018-07-31 Thread Tim Rühsen
On 31.07.2018 20:17, James Read wrote:
> Thanks,
> 
> as I understand it though there is only so much you can do with
> threading. For more scalable solutions you need to go with async
> programming techniques. See http://www.kegel.com/c10k.html for a summary
> of the problem. I want to do large scale webcrawling and am not sure if
> wget2 is up to the job.

Well, you'll surprised how fast wget2 is. Especially with HTTP/2
spreading more and more, you can easily fill larger bandwidths with even
a few threads. Of course it also heavily depends on the server's
capabilities and ping/RTT values you have.

Since you can control host spanning, you could also split your workload
onto several processes (or even hosts).

Are you going to crawl complete web sites or just a few files per site ?
The speed heavily depends on those (and more) details.

If it turns out that you really need a highly specialized crawler, it
might be the best to use libwget's API. I did so for scanning the top 1M
Alexa sites a while ago and it worked out pretty well (took ~2h on a
500/50 mbps cable connection). The source is in examples/ directory.

Maybe you just start with a test.

I am personally pretty interested in tuning bottlenecks (CPU, memory,
bandwidth, ...), so let me know if there is something and I go for it.

You can also PM me with more details, if you don't like to post it in
public.

Regards, Tim

> 
> On Tue, Jul 31, 2018 at 6:22 PM, Tim Rühsen  <mailto:tim.rueh...@gmx.de>> wrote:
> 
> On 31.07.2018 18:39, James Read wrote:
> > Hi,
> >
> > how much work would it take to convert wget into a fully fledged
> > asynchronous webcrawler?
> >
> > I was thinking something like using select. Ideally, I want to be
> able to
> > supply wget with a list of starting point URLs and then for wget
> to crawl
> > the web from those starting points in an asynchronous fashion.
> >
> > James
> >
> 
> Just use wget2. It is already packaged in Debian sid.
> To build from git source, see https://gitlab.com/gnuwget/wget2
> <https://gitlab.com/gnuwget/wget2>.
> 
> To build from tarball (much easier), download from
> https://alpha.gnu.org/gnu/wget/wget2-1.99.1.tar.gz
> <https://alpha.gnu.org/gnu/wget/wget2-1.99.1.tar.gz>.
> 
> Regards, Tim
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Async webcrawling

2018-07-31 Thread Tim Rühsen
On 31.07.2018 18:39, James Read wrote:
> Hi,
> 
> how much work would it take to convert wget into a fully fledged
> asynchronous webcrawler?
> 
> I was thinking something like using select. Ideally, I want to be able to
> supply wget with a list of starting point URLs and then for wget to crawl
> the web from those starting points in an asynchronous fashion.
> 
> James
> 

Just use wget2. It is already packaged in Debian sid.
To build from git source, see https://gitlab.com/gnuwget/wget2.

To build from tarball (much easier), download from
https://alpha.gnu.org/gnu/wget/wget2-1.99.1.tar.gz.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Greek translation error

2018-08-05 Thread Tim Rühsen
On 05.08.2018 19:54, Anyparktos wrote:
> I noticed an error in the greek translation of wget. It reads "Μήκος" which 
> is greek for "length" instead of "Μέγεθος" which is "size". I attached a 
> relevant screenshot for clarity. It drives me crazy, please correct it!
> 

In C locale:

Length: 321432 (314K) [text/html]
Saving to: 'index.html.1'

So the translation is correct.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] [PATCH] Don't limit the test suite HTTPS server to TLSv1

2018-08-11 Thread Tim Rühsen
Good catch, thanks !

Regards, Tim

On 10.08.2018 14:51, Tomas Hozza wrote:
> In Fedora, we are implementing crypto policies, in order to enhance the
> security of user systems. This is done on the system level by global
> configuration. It may happen that due to the active policy, only
> TLSv1.2 or higher will be available in crypto libraries. While wget as
> a client will by default determine the minimal TLS version supported by
> both client and server, the HTTPS server implementation in testenv/
> hardcodes use of TLSv1. As a result all HTTPS related tests fail in
> case a more hardened crypto policy is set on the Fedora system.
> 
> This change removes the explicit TLS version setting and leaves the
> determination of the minimal supported TLS version on the server and
> client.
> 
> More information about Fedora change can be found here:
> https://fedoraproject.org/wiki/Changes/StrongCryptoSettings
> 
> Signed-off-by: Tomas Hozza 
> ---
>  testenv/server/http/http_server.py | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/testenv/server/http/http_server.py 
> b/testenv/server/http/http_server.py
> index 434666dd..6d8fc9e8 100644
> --- a/testenv/server/http/http_server.py
> +++ b/testenv/server/http/http_server.py
> @@ -49,7 +49,6 @@ class HTTPSServer(StoppableHTTPServer):
> 'server-key.pem'))
>  self.socket = ssl.wrap_socket(
>  sock=socket.socket(self.address_family, self.socket_type),
> -ssl_version=ssl.PROTOCOL_TLSv1,
>  certfile=CERTFILE,
>  keyfile=KEYFILE,
>  server_side=True
> 



Re: [Bug-wget] wget wait_ares void value not ignored

2018-08-24 Thread Tim Rühsen
Thanks,

it has been remove on 8th of March 2018 (commit
7eff94e881b94d119ba22fc0c29edd65f4e6798b)

Regards, Tim

On 08/24/2018 09:39 AM, Endre Hagelund wrote:
> Hi
> 
> I've encountered a bug when compiling wget 1.19.5 from source with the
> following command:
> 
> ./configure --with-cares && make
> 
> Compile fails with the following error message:
> host.c: In function 'wait_ares'
> host.c:735:11: error: void value not ignored as it ought to be
> timer = ptimer_destroy (timer);
>^
> 
> Can this error be fixed by patching the line?
> ptimer_destroy (timer); timer = NULL;
> 
> Best Regards,
> Endre
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] [PATCH] Fixes for issues found by Coverity static analysis

2018-08-27 Thread Tim Rühsen
On 08/27/2018 11:01 AM, Tomas Hozza wrote:
> Hi Darshit.
> 
> On 25.08.2018 08:20, Darshit Shah wrote:
>> Hi Tomas,
>>
>> Thanks for running the scan and the patches you've made! I briefly glanced
>> through those and they seem fine. Of course, they will need to be slightly
>> modified to apply to the current git HEAD. I can do that in the coming days 
>> and
>> apply these patches.
> 
> These were based on the git HEAD at the time of sending. From what I checked 
> just now, that should be still the case. I'm working on 
> git://git.savannah.gnu.org/wget.git.
> 
>> I would like to ask you if there is a regular scan of Wget that you have set 
>> up
>> on Coverity. We used to run coverity scans regularly, but since the last year
>> or so, I haven't managed to get the coverity binaries to execute on my 
>> system.
>> So the scans stopped. If you have a scheduled run, I would like to be able to
>> see the results on Coverity so that we can keep fixing those issues.
> 
> This is Red Hat's internal instance of Coverity combined with other static 
> analyzers. Nevertheless I can share the full results with you if needed. 
> Please let me know if I should send it to mailing list or to you directly.

Also a big "thank you" from my side !

If you think there is no obvious security issue involved, just send to
the ML. Otherwise to Darshit and me please.

> 
>> P.S.: It seems like you haven't assigned your copyrights to the FSF for Wget.
>> Do you happen to know if your employer has assigned the copyrights on your
>> behalf? I couldn't find any mentions in the list I have locally. You will
>> shortly receive the assignment form in a separate email.
> 
> My knowledge is that Red Hat has agreement with FSF covering all its 
> employees. Since I'm a Red Hat employee and I'm sending these changes as part 
> of my job, I consider this to be implied. I have contributed to wget in the 
> past with the same rationale.

Sorry, that was my fault/doubt (asked Darshit and then was offline on
the weekend). Just found the entry in the FSF list of contributors. My
first grep was -i for 'redhat' - it actually is written 'Red Hat'.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] info wget: some improvement proposals on the documentation-content

2018-08-27 Thread Tim Rühsen
Thank you, Kalle !

We'll go through the docs soon and amend them.

On 08/26/2018 10:08 AM, kalle wrote:
> hello,
> here my proposals:
> 
> chapter 2, part "download all the URLs specified": make it clearer, what
> that exactly means in relationship to URLs describing a directory. Is
> the whole fs beneath downloaded? or only the file 'index.html' in it?
> 
> chapter 2.1: you mention, that it would be a security risk to write out
> the password in the commandline. But you don't mention here, that a
> transfer over the internet of a non-encrypted URL containing the
> password would be risky too.
> 
> ch.2.1: why isn't https mentioned? It appears in ch 2.5, though...
> 
> ch. 2.3, option '-nc', part "local file will  be 'clobbered', or
> overwritten,": replace the last part with "be 'clobbered' (which means
> overwritten)". I suppose that , since 'clobbering' means something like
> 'destroying', you don't need to write "be 'clobbered' (which means
> overwritten or that the newly downloaded file is saved under another
> name than the local one)"
> 
> ch. 2.3, option '-nc', part "will refuse to download newer copies of
> 'FILE'": the usage of 'newer copies' is ambiguous here. it could be,
> that the server-side file has been renewed, but probably it just
> means,that  it will not download the same file another time and give it
> another name."
> 
> ch. 2.3, option '-nc': if '--no-clobbering' is a misnaming as is said in
> the part "actually a misnomer in this mode", why isn't it changed? One
> could keep '-nc' for compatibility reasons, and forge aa new option
> name..., e.g. '-nn / --no-new'
> 
> 
> part 'of the character's ASCII value' -> add '(see ascii(7))' for reference.
> 
> ch. 2.4, replace the 'href="URL">'s with 'href="BASE-URL">'. There is
> one in '-i' and one in '-F'.
> 
> ch. 2.5, option '-c', part "really a valid prefix": I find the use of
> prefix not very understandable. I would rather write "Wget has no way of
> veryfying that the local file is the beginning part of the remote file".
> 
> ch.2.5, '--bind-address': I don't understand the meaning of the word
> 'bind' in the formulation "bind to ADDRESS"
> 
> greetings,kalle
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Planning update to wget. Should I upstream it?

2018-08-23 Thread Tim Rühsen
Hi Richard,

On 08/22/2018 08:21 PM, Richard Thomas wrote:
> Hi, hope this is the correct way to do this.
> 
> I want to be able to download a webpage and all its prerequisites and
> turn it into a multipart/related single file. Now, this requires
> identifying and changing URLs which, as most members of this list are
> no-doubt aware is a thorny problem. Fortunately, wget already does this
> as part of its -p and -k options. Unfortunately, though it's amazingly
> useful, it's difficult to use the output for what I want.
> 
> So I am planning on adding a way to implement this functionality
> directly into wget. Either I'll rewrite the links and filenames so that
> it's easy to piece together a multipart/related file from what is spit
> out or I'll have wget generate the multipart/related file itself
> (probably the latter or maybe both).
> 
> I was just wondering if I should bother trying to feed this back into
> the project if there's any interest. Also, any suggestions on ways I can
> make this as useful as possible are welcome.

Feedback into the project is what it lives on :-)

Your goal sounds interesting, what do you need it for ?

We are currently developing wget2 and decided that we maintain wget 1.x
but new development should go into wget2 only.

Please see https://gitlab.com/gnuwget/wget2 for further information.

To jump in quickly, examine src/wget.c, function _convert_links(). You
could copy it and amend it to your needs. Then add a new option
(see src/options.c) and call the new function instead of _convert_links().

To contribute non-trivial work, you have to assign the copyright of your
code to the FSF. Here is the standard intro/howto :-)

-

We at GNU try to enforce software freedom through a Copyleft
license (GPL)[0]. However, to enforce the said license, someone needs to
take proactive action when violations are found. Hence, we assign the
copyrights of the code to the FSF allowing them to act against anyone
that violates the license of the code you have written.

We, the maintainers of GNU Wget hence hereby request that you assign
the copyrights of the previous contributions that you have made, and
any future contributions to the FSF.

Should you have any questions, please feel free to reply back to this
mail. We will be glad to answer them and help you out.

Once you are willing to sign the Copyright assignment documents, kindly
copy the text after the marker in this email, fill it out and send it to
ass...@gnu.org

[0]: https://www.gnu.org/licenses/why-assign.en.html
Please email the following information to ass...@gnu.org, and we
will send you the assignment form for your past and future changes.

Please use your full legal name (in ASCII characters) as the subject
line of the message.
--
REQUEST: SEND FORM FOR PAST AND FUTURE CHANGES

[What is the name of the program or package you're contributing to?]


[Did you copy any files or text written by someone else in these changes?
Even if that material is free software, we need to know about it.]


[Do you have an employer who might have a basis to claim to own
your changes?  Do you attend a school which might make such a claim?]


[For the copyright registration, what country are you a citizen of?]


[What year were you born?]


[Please write your email address here.]


[Please write your postal address here.]





[Which files have you changed so far, and which new files have you written
so far?]

With best Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Running wget with -O and -q in the background yields a file wget-log

2018-08-29 Thread Tim Rühsen
On 8/28/18 10:13 AM, Tomas Korbar wrote:
> Hello,
> when quiet option is provided and wget runs in background an empty log file
> is created. I think this file is obsolete because quiet option prevents any
> log to be written into that file. This problem is caused by redirection of
> logs. Wget checks only if it is running in background and then redirects
> logs to default log file and initialization of logs creates the file which
> stays empty.This problem can be fixed with simple patch which i am
> attaching.
> Fedora bug: https://bugzilla.redhat.com/show_bug.cgi?id=1484411
> 

Thanks, pushed to master.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Deprecate TLS 1.0 and TLS 1.1

2018-07-15 Thread Tim Rühsen
On 14.07.2018 23:57, Jeffrey Walton wrote:
> On Tue, Jun 19, 2018 at 6:44 AM, Loganaden Velvindron  
> wrote:
>> ...
>> As per:
>> https://tools.ietf.org/html/draft-moriarty-tls-oldversions-diediedie-00
>>
>> Attached is a tentative patch to disable TLS 1.0 and TLS 1.1 by
>> default. No doubt that this will cause some discussions, I'm open to
>> hearing all opinions on this.
> 
> What will users do?
> 
> I'm guessing most will turn to --no-check-certificate or HTTP, which
> has the net effect of removing security, not improving it.
> 
> Stack Overflow is littered with the --no-check-certificate answer for
> questions ranging from "how do I use wget to download a file" to "how
> do I make my PHP work again".

This is to accept "broken / misused" certificates (lifetime exceeded,
wrong domain, etc.) - but maybe I am wrong. Could you explain what the
TLS version has to do with this ? AFAICS, if a server doesn't speak
TLS1.2, this option this option isn't of any use.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Failed 1.19.5 install on Solaris 11.3

2018-07-18 Thread Tim Rühsen


On 18.07.2018 14:58, Jeffrey Walton wrote:
> On Wed, Jul 18, 2018 at 7:14 AM, Tim Rühsen  wrote:
>> Maybe it's an bash/sh incompatibility. Anyways - what does 'make
>> install' do !? It basically copies the 'wget' executable into a
>> directory (e.g. /usr/local/bin/) that is listed in your PATH env variable.
>>
>> You can do that by hand. If you want the updated man file, copy wget.1
>> into your man1 directory (e.g. /usr/local/share/man/man1/).
> 
> I see what was happening. After unpacking this patch was applied:
> 
> sed -e 's|$(LTLIBICONV)|$(LIBICONV)|g' fuzz/Makefile.am >
> fuzz/Makefile.am.fixed
> mv fuzz/Makefile.am.fixed fuzz/Makefile.am
> 
> But it lacked this:
> 
> touch -t 19700101 fuzz/Makefile.am
> 
> So Autotools was trying to regenerate all of its shit. Autotools sucks
> so bad I cringe when I have to work with it. What a miserable set of
> programs.

This really isn't a fault of autotools :-)
You simply didn't have the things installed that you needed for amending
Makefile.am and rebuilding.

So what Darshit says would be an option: build a tarball from the latest
source (or with your changes) on *any* development machine, copy the
tarball over to the destination machine and simply './configure && make
&& sudo make install'. No autotools needed !

Another option would be that we make up a new release 1.19.6.

And another one: Don't touch Makefile.am - instead amend Makefile.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] wget in a 'dynamic' pipe

2018-07-19 Thread Tim Rühsen


On 19.07.2018 17:24, Paul Wagner wrote:
> Dear wgetters,
> 
> apologies if this has been asked before.
> 
> I'm using wget to download DASH media files, i.e. a number of URLs in
> the form domain.com/path/segment_1.mp4, domain.com/path/segment_2.mp4,
> ..., which represent chunks of audio or video, and which are to be
> combined to form the whole programme.  I used to call individuall
> instances of wget for each chunk and combine them, which was dead slow. 
> Now I tried
> 
>   { i=1; while [[ $i != 100 ]]; do echo
> "http://domain.com/path/segment_$((i++)).mp4"; done } | wget -O foo.mp4
> -i -
> 
> which works like a charm *as long as the 'generator process' is finite*,
> i.e. the loop is actually programmed as in the example.  The problem is
> that it would be much easier if I could let the loop run forever, let
> wget get whatever is there and then fail after the counter extends to a
> segment number not available anymore, which would in turn fail the whole
> pipe.  Turns out that
> 
>   { i=1; while true; do echo
> "http://domain.com/path/segment_$((i++)).mp4"; done } | wget -O foo.mp4
> -i -
> 
> hangs in the sense that the first process loops forever while wget
> doesn't even bother to start retrieving.  Am I right assuming that wget
> waits until the file specified by -i is actually fully written?  Is
> there any way to change this behavour?
> 
> Any help appreciated.  (I'm using wget 1.19.1 under cygwin.)

Hi Paul,

Wget2 behaves like what you need. So you can run it with an endless loop
without wget2 hanging.

I should build under CygWin without problems, though my last test is a
while ago.

See https://gitlab.com/gnuwget/wget2

Latest tarball is
https://alpha.gnu.org/gnu/wget/wget2-1.99.1.tar.gz

or latest git
git clone https://gitlab.com/gnuwget/wget2.git


Regards, Tim



signature.asc
Description: OpenPGP digital signature


[Bug-wget] TLS1.3 via GnuTLS

2018-07-16 Thread Tim Rühsen
FYI

GnuTLS 3.6.3 has been released today with TLS1.3 support (latest draft).

So if you rebuild/link wget or wget2 with the new GnuTLS version, you
can enable TLS1.3 via --ciphers="NORMAL:+VERS-TLS1.3"  (wget) resp.
--gnutls-options="NORMAL:+VERS-TLS1.3" (wget2).

Wget2 seems to get a 0RTT with --tls-resume on www.google.com.
I have a ping of 11.5ms and regarding the debug output of wget2, it
takes 13ms to load all 133 certificates from the local store (to load
all certs is flaw in GnuTLS that I brought up there some years ago, but
no solution yet).

$time src/wget2_noinstall -d --gnutls-options="NORMAL:+VERS-TLS1.3"
--tls-resume https://www.google.com
...

real0m0,027s

That is 14ms left for creating the connection, sending the request and
getting the response on a 11.5ms RTT. The 2.5ms are overhead due to
initializing wget2, printing all the debug messages and saving the file.

Oh, I forgot to say, TCP Fast Open is enabled by default and it is for a
'warm' connection.

Happy testing.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] wget2 Feature Suggestion - Triston Line

2018-07-14 Thread Tim Rühsen
Hi Triston,

On 13.07.2018 18:52, Triston Line wrote:
> Hi Tim,
> 
> Excellent answer thank you very much for this info, "-N" or
> "--timestamping" sounds like a much better way to go, however if I'm
> converting links, using wget (1) I think I've read somewhere and noticed
> that two separate commands running in series wouldn't be able to
> continue due to the links from the previous session/command-instance?
> More clearly, I've read that the primary reason continuing from a fault
> is impossible is due to the fact that converting links to mirror isn't
> something that can be continued and the links are only valid for that
> session. Sounds silly to me because you're just formatting  tags
> from my understanding but there's probably a bit more to it. 

Well, the links/URLs in the converted file are adapted to your local
directory structure (relative). Depending on the wget's directory
options that are in use, you cannot reconstruct the original URLs.

What we would need is some metadata for each file downloaded, e.g. the
original URL, the referrer URL, ...

We already have such data (see --xattr option) since a while - *if* your
filesystem supports it. So we *could* use this metadata if possible.

That would be a new feature to be implemented.

> I have used max-threads in the past and I've tried a suggestion for
> xargs on one of the stack exchange forums, so I do toy with those
> settings while testing out my friend's servers at UBC. Government on the
> other hand I might get in a bit of trouble if I'm loading them during
> working hours (Gosh knows I don't wanna come in at some ungodly hour
> (e.g. 3 am) with the network-services team to toy around with their
> stuff at different sites or perform intranet backups around different
> sites from my local). 
> 
> " The server then only sends payload/data if it has a newer version of
> that document, else it responds with 304 Not Modified." This is 400
> Bytes to respond with the last modification date of a file?

No, we send the GET request with the local file's timestamp. If the
server has a newer version, it sends it together with a 200 OK, else it
sends 304 Not Modified with an empty body.

Just give it a try. If you see, everything is re-downloaded, stop and
try again with '-N --no-if-modified-since'. This makes wget to send a
HEAD request first - and depending on the timestamp info - wget
eventually creates a GET request thereafter (or nothing if the local
file is up-to-date).

But even the HEAD method can fail if the server sends wrong timestamps.
I saw servers sending always the current date instead of the files date
(e.g. true for dynamic / on-the-fly generated web pages).

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Failed 1.19.5 install on Solaris 11.3

2018-07-18 Thread Tim Rühsen
Maybe it's an bash/sh incompatibility. Anyways - what does 'make
install' do !? It basically copies the 'wget' executable into a
directory (e.g. /usr/local/bin/) that is listed in your PATH env variable.

You can do that by hand. If you want the updated man file, copy wget.1
into your man1 directory (e.g. /usr/local/share/man/man1/).

Regards, Tim

On 18.07.2018 12:58, Jeffrey Walton wrote:
> Hi Everyone,
> 
> I'm working from the Wget 1.19.5 tarball. 'make install' is failing on
> Solaris 11.3. Is there any way to avoid the automake version checks?
> As it stands I'm in a DoS situation because I need an updated Wget.
> 
> It has been my experience it is nearly impossible to update Autotools
> (I have never been able to do it on Linux or Solaris). I am stuck with
> the tools Oracle ships with Solaris.
> 
> Thanks in advance.
> 
> Jeff
> 
> ==
> 
> $ sudo make install
> ...
> 
> echo ' cd .. && /bin/sh
> /export/home/build/wget-1.19.5/build-aux/missing automake-1.15 --gnu
> fuzz/Makefile'; \
> CDPATH="${ZSH_VERSION+.}:" && cd .. && \
>   /bin/sh /export/home/build/wget-1.19.5/build-aux/missing
> automake-1.15 --gnu fuzz/Makefile
> make: Fatal error: Command failed for target `Makefile.in'
> Current working directory /export/home/build/wget-1.19.5/fuzz
> *** Error code 1
> The following command caused the error:
> fail=; \
> if (target_option=k; case ${target_option-} in  ?) ;;  *) echo
> "am__make_running_with_option: internal error: invalid"  "target
> option '${target_option-}' specified" >&2;  exit 1;;  esac;
> has_opt=no;  sane_makeflags=$MAKEFLAGS;  if {  if test -z '1'; then
> false;  elif test -n ''; then  true;  elif test -n '' && test -n '';
> then  true;  else  false;  fi;  }; then  sane_makeflags=$MFLAGS;  else
>  case $MAKEFLAGS in  *\\[\ \  ]*)  bs=\\;  sane_makeflags=`printf
> '%s\n' "$MAKEFLAGS"  | sed "s/$bs$bs[$bs $bs]*//g"`;;  esac;
> fi;  skip_next=no;  strip_trailopt ()  {  flg=`printf '%s\n' "$flg" |
> sed "s/$1.*$//"`;  };  for flg in $sane_makeflags; do  test $skip_next
> = yes && { skip_next=no; continue; };  case $flg in  *=*|--*)
> continue;;  -*I) strip_trailopt 'I'; skip_next=yes;;  -*I?*)
> strip_trailopt 'I';;  -*O) strip_trailopt 'O'; skip_next=yes;;  -*O?*)
> strip_trailopt 'O';;  -*l) strip_trailopt 'l'; skip_next=yes;;  -*l?*)
> strip_trailopt 'l';;  -[dEDm]) skip_next=yes;;  -[JT]) skip_next=yes;;
>  esac;  case $flg in  *$target_option*) has_opt=yes; break;;  esac;
> done;  test $has_opt = yes); then \
>   failcom='fail=yes'; \
> else \
>   failcom='exit 1'; \
> fi; \
> dot_seen=no; \
> target=`echo install-recursive | sed s/-recursive//`; \
> case "install-recursive" in \
>   distclean-* | maintainer-clean-*) list='lib src doc po util fuzz
> tests testenv' ;; \
>   *) list='lib src doc po util fuzz tests testenv' ;; \
> esac; \
> for subdir in $list; do \
>   echo "Making $target in $subdir"; \
>   if test "$subdir" = "."; then \
> dot_seen=yes; \
> local_target="$target-am"; \
>   else \
> local_target="$target"; \
>   fi; \
>   (CDPATH="${ZSH_VERSION+.}:" && cd $subdir && make  $local_target) \
>   || eval $failcom; \
> done; \
> if test "$dot_seen" = "no"; then \
>   make  "$target-am" || exit 1; \
> fi; test -z "$fail"
> make: Fatal error: Command failed for target `install-recursive'
> Current working directory /export/home/build/wget-1.19.5
> *** Error code 1
> make: Fatal error: Command failed for target `install'
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] [PATCH] Add TLS 1.3 support for GnuTLS

2018-09-07 Thread Tim Rühsen
Pushed. Thank you, Tomas !

Regards, Tim


On 9/4/18 11:22 AM, Tomas Hozza wrote:
> Wget currently allows specifying "TLSv1_3" as the parameter for
> --secure-protocol option. However it is only implemented for OpenSSL
> and in case wget is compiled with GnuTLS, it causes wget to abort with:
> GnuTLS: unimplemented 'secure-protocol' option value 6
> 
> GnuTLS contains TLS 1.3 implementation since version 3.6.3 [1]. However
> currently it must be enabled explicitly in the application of it to be
> used. This will change after the draft is finalized. [2] However for
> the time being, I enabled it explicitly in case "TLSv1_3" is used with
> --secure-protocol.
> 
> I also fixed man page to contain "TLSv1_3" in all listings of available
> parameters for --secure-protocol
> 
> [1] https://lists.gnupg.org/pipermail/gnutls-devel/2018-July/008584.html
> [2] https://nikmav.blogspot.com/2018/05/gnutls-and-tls-13.html
> 
> Signed-off-by: Tomas Hozza 
> ---
>  doc/wget.texi |  6 +++---
>  src/gnutls.c  | 28 
>  2 files changed, 31 insertions(+), 3 deletions(-)
> 
> diff --git a/doc/wget.texi b/doc/wget.texi
> index 38b4a245..7ae19d8e 100644
> --- a/doc/wget.texi
> +++ b/doc/wget.texi
> @@ -1784,9 +1784,9 @@ If Wget is compiled without SSL support, none of these 
> options are available.
>  @cindex SSL protocol, choose
>  @item --secure-protocol=@var{protocol}
>  Choose the secure protocol to be used.  Legal values are @samp{auto},
> -@samp{SSLv2}, @samp{SSLv3}, @samp{TLSv1}, @samp{TLSv1_1}, @samp{TLSv1_2}
> -and @samp{PFS}.  If @samp{auto} is used, the SSL library is given the
> -liberty of choosing the appropriate protocol automatically, which is
> +@samp{SSLv2}, @samp{SSLv3}, @samp{TLSv1}, @samp{TLSv1_1}, @samp{TLSv1_2},
> +@samp{TLSv1_3} and @samp{PFS}.  If @samp{auto} is used, the SSL library is
> +given the liberty of choosing the appropriate protocol automatically, which 
> is
>  achieved by sending a TLSv1 greeting. This is the default.
>  
>  Specifying @samp{SSLv2}, @samp{SSLv3}, @samp{TLSv1}, @samp{TLSv1_1},
> diff --git a/src/gnutls.c b/src/gnutls.c
> index 07844c52..206d0b09 100644
> --- a/src/gnutls.c
> +++ b/src/gnutls.c
> @@ -565,6 +565,15 @@ set_prio_default (gnutls_session_t session)
>err = gnutls_priority_set_direct (session, 
> "NORMAL:-VERS-SSL3.0:-VERS-TLS1.0:-VERS-TLS1.1", NULL);
>break;
>  
> +case secure_protocol_tlsv1_3:
> +#if GNUTLS_VERSION_NUMBER >= 0x030603
> +  err = gnutls_priority_set_direct (session, 
> "NORMAL:-VERS-SSL3.0:+VERS-TLS1.3:-VERS-TLS1.0:-VERS-TLS1.1:-VERS-TLS1.2", 
> NULL);
> +  break;
> +#else
> +  logprintf (LOG_NOTQUIET, _("Your GnuTLS version is too old to support 
> TLS 1.3\n"));
> +  return -1;
> +#endif
> +
>  case secure_protocol_pfs:
>err = gnutls_priority_set_direct (session, "PFS:-VERS-SSL3.0", NULL);
>if (err != GNUTLS_E_SUCCESS)
> @@ -596,19 +605,38 @@ set_prio_default (gnutls_session_t session)
>allowed_protocols[0] = GNUTLS_TLS1_0;
>allowed_protocols[1] = GNUTLS_TLS1_1;
>allowed_protocols[2] = GNUTLS_TLS1_2;
> +#if GNUTLS_VERSION_NUMBER >= 0x030603
> +  allowed_protocols[3] = GNUTLS_TLS1_3;
> +#endif
>err = gnutls_protocol_set_priority (session, allowed_protocols);
>break;
>  
>  case secure_protocol_tlsv1_1:
>allowed_protocols[0] = GNUTLS_TLS1_1;
>allowed_protocols[1] = GNUTLS_TLS1_2;
> +#if GNUTLS_VERSION_NUMBER >= 0x030603
> +  allowed_protocols[2] = GNUTLS_TLS1_3;
> +#endif
>err = gnutls_protocol_set_priority (session, allowed_protocols);
>break;
>  
>  case secure_protocol_tlsv1_2:
>allowed_protocols[0] = GNUTLS_TLS1_2;
> +#if GNUTLS_VERSION_NUMBER >= 0x030603
> +  allowed_protocols[1] = GNUTLS_TLS1_3;
> +#endif
> +  err = gnutls_protocol_set_priority (session, allowed_protocols);
> +  break;
> +
> +case secure_protocol_tlsv1_3:
> +#if GNUTLS_VERSION_NUMBER >= 0x030603
> +  allowed_protocols[0] = GNUTLS_TLS1_3;
>err = gnutls_protocol_set_priority (session, allowed_protocols);
>break;
> +#else
> +  logprintf (LOG_NOTQUIET, _("Your GnuTLS version is too old to support 
> TLS 1.3\n"));
> +  return -1;
> +#endif
>  
>  default:
>logprintf (LOG_NOTQUIET, _("GnuTLS: unimplemented 'secure-protocol' 
> option value %d\n"), opt.secure_protocol);
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] wget in a 'dynamic' pipe

2018-09-11 Thread Tim Rühsen
On 9/11/18 5:34 AM, Dale R. Worley wrote:
> Paul Wagner  writes:
>> Now I tried
>>
>>{ i=1; while [[ $i != 100 ]]; do echo 
>> "http://domain.com/path/segment_$((i++)).mp4"; done } | wget -O foo.mp4 
>> -i -
>>
>> which works like a charm *as long as the 'generator process' is finite*, 
>> i.e. the loop is actually programmed as in the example.  The problem is 
>> that it would be much easier if I could let the loop run forever, let 
>> wget get whatever is there and then fail after the counter extends to a 
>> segment number not available anymore, which would in turn fail the whole 
>> pipe.
> 
> Good God, this finally motivates me to learn about Bash coprocesses.
> 
> I think the answer is something like this:
> 
> coproc wget -O foo.mp4 -i -
> 
> i=1
> while true
> do
> rm -f foo.mp4
> echo "http://domain.com/path/segment_$((i++)).mp4" >&$wget[1]
> sleep 5
> # The only way to test for non-existence of the URL is whether the
> # output file exists.
> [[ ! -e foo.mp4 ]] && break
> # Do whatever you already do to wait for foo.mp4 to be completed and
> # then use it.
> done
> 
> # Close wget's input.
> exec $wget[1]<&-
> # Wait for it to finish.
> wait $wget_pid
> 
> Dale

Thanks for the pointer to coproc, never heard of it ;-) (That means I
never had a problem that needed coproc).

Anyways, copy the script results in a file '[1]' with bash 4.4.23.

Also, wget -i - waits with downloading until stdin has been closed. How
can you circumvent that ?

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] wget2 Feature Suggestion - Triston Line

2018-07-13 Thread Tim Rühsen
On 07/12/2018 08:12 PM, Triston Line wrote:
> If that's possible that would help immensely. I "review" sites for my
> friends at UBC and we look at geographic performance on their apache and
> nginx servers, the only problem is they encounter minor errors from time to
> time while recursively downloading (server-side errors nothing to do with
> wget) so the session ends.

Just forgot: Check out Wget2's --stats-site option. It gives you
statistical information about all pages downloaded, including parent
(linked from), status, size, compression, timing, encoding and a few
more. You can visualize with graphviz or put the data into a database
for easy analysis.

Example:
$ wget2 --stats-site=csv:site.csv -r -p https://www.google.com
$ cat site.csv
ID,ParentID,URL,Status,Link,Method,Size,SizeDecompressed,TransferTime,ResponseTime,Encoding,Verification
1,0,https://www.google.com/robots.txt,200,1,1,1842,6955,33,33,1,0
2,0,https://www.google.com,200,1,1,4637,10661,83,83,1,0
4,2,https://www.google.com/images/branding/product/ico/googleg_lodp.ico,200,1,1,1494,5430,32,31,1,0
5,2,https://www.google.com/images/branding/googlelogo/1x/googlelogo_white_background_color_272x92dp.png,200,1,1,5482,5482,36,36,0,0
3,2,https://www.google.com/images/nav_logo229.png,200,1,1,12263,12263,59,58,0,0

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] request to change retry default

2018-07-13 Thread Tim Rühsen
On 07/08/2018 02:59 AM, John Roman wrote:
> Greetings,
> I wish to discuss a formal change of the default retry for wget from 20
> to something more pragmatic such as two or three.
> 
> While I believe 20 retries may have been the correct default many years
> ago, it seems overkill for the modern "cloud based" internet, where most 
> sites are
> backed by one or more load balancers.  Geolocateable A records further
> reduce the necessity for retries by providing a second or third option
> for browsers to try.  To a lesser extent, GTM and GSLB technologies
> (however maligned they may be) are sufficient as well to
> properly handle failures for significant amounts of traffic.  BGP
> network technology for large hosting providers has also further reduced
> the need to perform several retries to a site.  Finally, for better or
> worse, environments such as Kubernetes and other container orchestration
> tools seem to afford sites an unlimited uptime should the marketing be
> trusted.

Solution: Just add 'tries = 3' to /etc/wgetrc or to ~/.wgetrc and never
care for it again.

But I wonder myself a bit about your request... if 3 tries would always
be enough to catch a file safely, then it doesn't matter if tries is set
to 20, 20.000 or even unlimited. Is there something you might have
forgotten to write !?

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] wget2 Feature Suggestion - Triston Line

2018-07-13 Thread Tim Rühsen
Hi,

On 07/12/2018 08:12 PM, Triston Line wrote:
> Hi Wget team,
> 
> I am but a lowly user and linux sysadmin, however, after noticing the wget2
> project I have wondered about a feature that could be added to the new
> version.
> 
> I approve of all the excellent new features already being added (especially
> the PFS, Shoutcast and scanning features), but has there been any
> consideration about continuing a "session" (Not a cookie session, a
> recursive session)? Perhaps retaining the last command in a backup/log file
> with the progress it last saved or if a script/command is interrupted and
> entered again in the same folder, wget will review the existing files
> before commencing the downloads and or link conversion depending on what
> stage of the "session" it was at.

-N/--timestamping nearly does what you need. If a page to download
already exists locally, wget2 (also newer versions of wget) adds the
If-Modified-Since HTTP header to the GET request. The server then only
sends payload/data if it has a newer version of that document, else it
responds with 304 Not Modified.

That is ~400 bytes per page, so just 400k bytes per 1000 pages.
Depending on the server's power and your bandwidth, you can increase the
number of parallel connections with --max-threads.

> If that's possible that would help immensely. I "review" sites for my
> friends at UBC and we look at geographic performance on their apache and
> nginx servers, the only problem is they encounter minor errors from time to
> time while recursively downloading (server-side errors nothing to do with
> wget) so the session ends.

Some server errors, e.g. like 404 or 5xx will prevent wget from trying
again that page. Wget2 has just recently got --retry-on-http-status to
change this behavior (see the docs for an example, also --tries).

> The other example I have is while updating my recursive downloads, we
> encounter power-failures during winter storms and from time to time very
> large recursions are interrupted and it feels bad downloading a web portal
> your team made together consisting of roughly 25,000 or so web pages and at
> the 10,000th page mark your wget session ends at like 3am. (Worse than
> stepping on lego I promise).

See above (-N). 1 pages would mean 4Mb of extra download then...
plus a few minutes. Let me know if you still think that this is a
problem. A 'do-not-download-local-files-again' option wouldn't be too
hard to implement. But the -N option is perfect for syncing - it just
downloads what has changed since the last time.

Caveat: Some server's don't support the If-Modified-Since header, which
is pretty stupid and normally just a server side configuration knob.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] wget 1.19.4 - possible double free in url.c

2018-03-01 Thread Tim Rühsen
On 03/01/2018 03:01 PM, Volkmar Klatt wrote:
> Dear wget maintainer,
> 
> 1) in wget 1.19.4 (and probably earlier)
> please check carefully
> 
> static char * convert_fname (char *fname)
> in
> src/url.c
> 
> I run a OpenBSD machine, x86, ABI=32
> and I guess there's a double free when using iconv,
> see patch attached.

Good catch, thanks !
Though it's not a double free, but a free on -1 if iconv_open() fails.
The solution is to move iconv_close() two lines up into the else case.

I took the opportunity and rearranged the code a bit, commit pushed.

> With this change, all my non-skipped tests pass,
> whereas with original url.c most tests fail with core dump:
> 
> wget(24305) in free(): error: bogus pointer (double free?)
> 0x
> -->
> 
> #0  0x1c187cb1 in kill () at :2
> #1  0x1c1b5ab6 in raise (s=6) at
> /usr/src/lib/libc/gen/raise.c:39
> #2  0x1c1b5a00 in abort () at
> /usr/src/lib/libc/stdlib/abort.c:53
> #3  0x1c1967f7 in wrterror (msg=0x3c119b56 "bogus pointer
> (double free?)", p=0x)
> at /usr/src/lib/libc/stdlib/malloc.c:281
> #4  0x1c197d09 in free (ptr=0x) at
> /usr/src/lib/libc/stdlib/malloc.c:1282
> #5  0x1c06d54d in libiconv_close ()
> #6  0x1c032334 in url_file_name ()
> #7  0x1c01facf in http_loop ()
> #8  0x1c02dd7e in retrieve_url ()
> #9  0x1c027068 in main ()
> 
> 2) The documentation might mention that
> strict firewall settings may also hinder the tests,
> e.g. when traffic from/to 127.0.0.1 is blocked.
> 
> Solution: Isolate the machine (no net)
> and temporalily disable the firewall, then 'make test'
> 
> Thanks,
> Volkmar Klatt

With Best Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] About GSoC project: Support QUIC Protocol

2018-03-09 Thread Tim Rühsen
On 03/09/2018 08:43 AM, Daniel Stenberg wrote:
> On Fri, 9 Mar 2018, Gisle Vanem wrote:
> 
>> I agree on ngtcp2. Foremost because it seems to have good support for
>> MSVC/Windows. My next contender would be MozQuic. Written in C++, but
>> with C interface. A bit of a bummer for Wget2 or libcurl?
> 
> I personally believe a lot in ngtcp2 much due to its origins and their
> track record in nghttp2 etc.
> 
> Two more contenders are "LiteSpeed QUIC client" and "quicly".
> 
> There's some data on all this gathered in the curl wiki page for QUIC:
> 
>  https://github.com/curl/curl/wiki/QUIC

Thanks for sharing your knowledge, Daniel.

ngtcp2 would be my choice since we already made good experiences with
nghttp2 from the same author.

With Best Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Miscellaneous thoughts & concerns

2018-04-06 Thread Tim Rühsen
Hi Jeffrey,


thanks for your feedback !


On 06.04.2018 23:30, Jeffrey Fetterman wrote:
> Thanks to the fix that Tim posted on gitlab, I've got wget2 running just
> fine in WSL. Unfortunately it means I don't have TCP Fast Open, but given
> how fast it's downloading a ton of files at once, it seems like it must've
> been only a small gain.
>
>
> I've come across a few annoyances however.
>
> 1. There doesn't seem to be any way to control the size of the download
> queue, which I dislike because I want to download a lot of large files at
> once and I wish it'd just focus on a few at a time, rather than over a
> dozen.
The number of parallel downloads ? --max-threads=n

> 3. Doing a TLS resume will cause a 'Failed to write 305 bytes (32: Broken
> pipe) error to be thrown', seems to be related to how certificate
> verification is handled upon resume, but I was worried at first that the
> WLS problems were rearing their ugly head again.
Likely the WSL issue is also affecting the TLS layer. TLS resume is
considered 'insecure',
thus we have it disabled by default. There still is TLS False Start
enabled by default.


> 3. --no-check-certificate causes significantly more errors about how the
> certificate issuer isn't trusted to be thrown (even though it's not
> supposed to be doing anything related to certificates).
Maybe a bit too verbose - these should be warnings, not errors.

> 4. --force-progress doesn't seem to do anything despite being recognized as
> a valid paramater, using it in conjunction with -nv is no longer beneficial.
You likely want to use --progress=bar. --force-progress is to enable the
progress bar even when redirecting (e.g. to a log file).
@Darshit, we shoudl adjust the behavior to be the same as in Wget1.x.

> 5. The documentation is unclear as to how to disable things that are
> enabled by default. Am I to assume that --robots=off is equivalent to -e
> robots=off?

-e robots=off should still work. We also allow --robots=off or --no-robots.

> 6. The documentation doesn't document being able to use 'M' for chunk-size,
> e.g. --chunk-size=2M

The wget2 documentation has to be brushed up - one of the blockers for
the first release.

>
> 7. The documentation's instructions regarding --progress is all wrong.
I'll take a look the next days.

>
> 8. The http/https proxy options return as unknown options despite being in
> the documentation.
Yeah, the docs... see above. Also, proxy support is currently limited.


> Lastly I'd like someone to look at the command I've come up with and offer
> me critiques (and perhaps help me address some of the remarks above if
> possible).

No need for --continue.
Think about using TLS Session Resumption.
--domains is not needed in your example.

Did you build with http/2 and compression support ?

Regards, Tim
> #!/bin/bash
>
> wget2 \
>   `#WSL compatibility` \
>   --restrict-file-names=windows --no-tcp-fastopen \
>   \
>   `#No certificate checking` \
>   --no-check-certificate \
>   \
>   `#Scrape the whole site` \
>   --continue --mirror --adjust-extension \
>   \
>   `#Local viewing` \
>   --convert-links --backup-converted \
>   \
>   `#Efficient resuming` \
>   --tls-resume --tls-session-file=.\tls.session \
>   \
>   `#Chunk-based downloading` \
>   --chunk-size=2M \
>   \
>   `#Swiper no swiping` \
>   --robots=off --random-wait \
>   \
>   `#Target` \
>   --domains=example.com example.com
>





Re: [Bug-wget] Miscellaneous thoughts & concerns

2018-04-07 Thread Tim Rühsen
On 07.04.2018 04:31, Jeffrey Fetterman wrote:
> > The number of parallel downloads ? --max-threads=n
>
> Okay, well, when I was running it earlier, I was noticing an entire
> directory of pdfs slowly getting larger every time I refreshed the
> directory, and there were something like 30 in there. It wasn't just
> five. I was very confused and I'm not sure what's going on there, and
> I really would like it to not do that.
>
It's unclear to me what you exactly mean. Maybe you have an example !?
>
> > Likely the WSL issue is also affecting the TLS layer. TLS resume is
> considered 'insecure', thus we have it disabled by default. There
> still is TLS False Start enabled by default.
>
> Are you implying TLS False Start will perform the same function as TLS
> Resume?
>

Both reduce RTT by 1, but they can't be combined.

>
> > You likely want to use --progress=bar. --force-progress is to enable
> the progress bar even when redirecting (e.g. to a log file). @Darshit,
> we shoudl adjust the behavior to be the same as in Wget1.x.
>
> That does work but it's very buggy. Only one shows at a time and it
> doesn't even always show the file that is downloading. Like it'll seem
> to be downloading a txt file when it's really downloading several
> larger files in the background.
>
>
> > Did you build with http/2 and compression support ?
>
> Yes, why?
>
Just to possibly increase download speed. HTTP/2 only works with TLS
though...
>
> P.S. I'm willing to help out with your documentation if you push some
> stuff that makes my life on WSL a little less painful, haha. I'd run
> this in a VM in an instant but I feel like that would be a bottleneck
> on what's supposed to be a high performance program. Speaking of high
> performance, just how much am I missing out on by not being able to
> take advantage of tcp fast open?
>
With a VM you can at least test whether a problem (e.g. progress bar) is
WSL related or not.

TFO reduces RTT by one (on 'hot' connections only). So only under
certain conditions,e.g. when closing and opening connections to the same
IP often.
It combines with TLS False Start, so that you can drop connection
latency from 3RTT to 1RTT. 0RTT is possible with TLS1.3, which is coming
soon (GnuTLS already supports the current draft 26 - but we didn't
test/implemented it yet).

> On Fri, Apr 6, 2018 at 5:01 PM, Tim Rühsen <tim.rueh...@gmx.de
> <mailto:tim.rueh...@gmx.de>> wrote:
>
> Hi Jeffrey,
>
>
> thanks for your feedback !
>
>
> On 06.04.2018 23:30, Jeffrey Fetterman wrote:
> > Thanks to the fix that Tim posted on gitlab, I've got wget2
> running just
> > fine in WSL. Unfortunately it means I don't have TCP Fast Open,
> but given
> > how fast it's downloading a ton of files at once, it seems like
> it must've
> > been only a small gain.
> >
> >
> > I've come across a few annoyances however.
> >
> > 1. There doesn't seem to be any way to control the size of the
> download
> > queue, which I dislike because I want to download a lot of large
> files at
> > once and I wish it'd just focus on a few at a time, rather than
> over a
> > dozen.
> The number of parallel downloads ? --max-threads=n
>
> > 3. Doing a TLS resume will cause a 'Failed to write 305 bytes
> (32: Broken
> > pipe) error to be thrown', seems to be related to how certificate
> > verification is handled upon resume, but I was worried at first
> that the
> > WLS problems were rearing their ugly head again.
> Likely the WSL issue is also affecting the TLS layer. TLS resume is
> considered 'insecure',
> thus we have it disabled by default. There still is TLS False Start
> enabled by default.
>
>
> > 3. --no-check-certificate causes significantly more errors about
> how the
> > certificate issuer isn't trusted to be thrown (even though it's not
> > supposed to be doing anything related to certificates).
> Maybe a bit too verbose - these should be warnings, not errors.
>
> > 4. --force-progress doesn't seem to do anything despite being
> recognized as
> > a valid paramater, using it in conjunction with -nv is no longer
> beneficial.
> You likely want to use --progress=bar. --force-progress is to
> enable the
> progress bar even when redirecting (e.g. to a log file).
> @Darshit, we shoudl adjust the behavior to be the same as in Wget1.x.
>
> > 5. The documentation is unclear as to how to disable things that are
> > enabled by default. Am I to assume that --robots=off is
> eq

Re: [Bug-wget] wget2 hanging, possible I/O issue

2018-04-06 Thread Tim Rühsen
On 04/04/2018 01:32 PM, Jeffrey Fetterman wrote:
> How well does TeamViewer work on Linux? My laptop has been collecting dust,
> I can just leave it running for a couple days with a fresh install of
> Windows and a fresh install of WSL Debian (with apt-get update and upgrade
> already ran)

I made some tests yesterday without success.
--no-tcp-fastopen makes a small difference, write() sets errno to 32
(broken pipe).
Removing the gnulib wrapper code didn't make a difference, neither did
removal of SO_REUSEADDR.

Regards, Tim

> 
> On Wed, Apr 4, 2018 at 3:22 AM, Tim Rühsen <tim.rueh...@gmx.de> wrote:
> 
>> Hi Jeffrey,
>>
>> possibly I can get my hands on a fast Win10 desktop the coming
>> weekend... no promise but I'll try.
>>
>>
>> With Best Regards, Tim
>>
>>
>>
>> On 04/04/2018 09:54 AM, Tim Rühsen wrote:
>>> Hi Jeffrey,
>>>
>>> I can't tell you. Basically because the only WSL I can get my hands on
>>> is on my wife's laptop which is *very* slow. And it needs some analysis
>>> on that side, maybe with patches for gnulib. Send me a fast Win10
>>> machine and I analyse+fix the problem ;-)
>>>
>>>
>>> BTW, we are also not using SO_REUSEPORT. The links you provided assume
>>> that it's a problem in that area. All I can say is that Wget2 was
>>> definitely working on WSL just a few weeks ago.
>>>
>>>
>>> Another option for you is to install Debian/Ubuntu in a VM. Until the
>>> hickups with WSL have been solved one or another way.
>>>
>>>
>>> With Best Regards, Tim
>>>
>>>
>>> On 04/04/2018 09:01 AM, Jeffrey Fetterman wrote:
>>>> Tim, do you know when you'll be able to examine and come up with a
>>>> workaround for the issue? There are alternatives to wget2 but either
>>>> they're not high performance or they're not really cut out for site
>>>> scraping.
>>>>
>>>> On Mon, Apr 2, 2018 at 12:30 PM, Jeffrey Fetterman <
>> jfett...@mail.ccsf.edu>
>>>> wrote:
>>>>
>>>>> I can tell you the exact steps I took from nothing to a fresh install,
>> I
>>>>> have the commands copied.
>>>>>
>>>>> install Debian from Windows Store, set up username/password
>>>>>
>>>>> $ sudo sh -c "echo kernel.yama.ptrace_scope = 0 >>
>>>>> /etc/sysctl.d/10-ptrace.conf; sysctl --system -a -p | grep yama"
>>>>> (this is a workaround for Valgrind and anything else that relies
>>>>> on prctl(PR_SET_PTRACER) and the wget2 problem will occur either way)
>>>>>
>>>>> $ sudo apt-get update
>>>>> $ sudo apt-get upgrade
>>>>> $ sudo apt-get install autoconf autogen automake autopoint doxygen flex
>>>>> gettext git gperf lcov libtool lzip make pandoc python3.5 pkg-config
>>>>> texinfo valgrind libbz2-dev libgnutls28-dev libgpgme11-dev
>>>>> libiconv-hook-dev libidn2-0-dev liblzma-dev libnghttp2-dev
>>>>> libmicrohttpd-dev libpcre3-dev libpsl-dev libunistring-dev zlib1g-dev
>>>>> $ sudo update-alternatives --install /usr/bin/python python
>>>>> /usr/bin/python3.5 1
>>>>>
>>>>> then the commands outlined as per the documentation. config.log
>> attached.
>>>>>
>>>>> On Mon, Apr 2, 2018 at 11:53 AM, Tim Rühsen <tim.rueh...@gmx.de>
>> wrote:
>>>>>
>>>>>> Hi Jeffrey,
>>>>>>
>>>>>>
>>>>>> basically wget2 should work on WSL, I just tested it scarcely two
>> weeks
>>>>>> ago without issues.
>>>>>>
>>>>>>
>>>>>> I suspect it might have to do with your dependencies (e.g. did you
>>>>>> install libnghttp2-dev ?).
>>>>>>
>>>>>> To find out, please send your config.log. That allows me to see your
>>>>>> compiler, CFLAGS and the detected dependencies etc..
>>>>>>
>>>>>> I will try to reproduce the issue then.
>>>>>>
>>>>>>
>>>>>> Regards, Tim
>>>>>>
>>>>>>
>>>>>> On 02.04.2018 17:42, Jeffrey Fetterman wrote:
>>>>>>>  wget2 will not download any files, and I think there's some sort of
>>>>>> disk
>>>>>>> access issue.
>>>>>>>
>>>>>>> this is on Windows Subsystem for Linux Debian 9.3 Stretch. (Ubuntu
>> 16.04
>>>>>>> LTS had the same issue.)
>>>>>>>
>>>>>>> Here's the output of strace -o strace.txt -ff wget2
>>>>>> https://www.google.com
>>>>>>>
>>>>>>> https://pastebin.com/4MEL88qs
>>>>>>>
>>>>>>> wget2 -d https://www.google.com just hangs after the line
>>>>>> '02.103350.008
>>>>>>> ALPN offering http/1.1'
>>>>>>>
>>>>>>> ultimately I might have to submit a bug to WSL but I wouldn't know
>> what
>>>>>> to
>>>>>>> report, I don't know what's wrong. And it'd be great if there was a
>>>>>>> workaround
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] retry_connrefused?

2018-04-10 Thread Tim Rühsen


On 10.04.2018 20:37, Jeffrey Fetterman wrote:
>  with --tries=5 set, Failed to connect (111) will still instantly abort the
> operation.

As I wrote, not reproducible here (see my debug output). Please append
your debug output.

Regards, Tim

> On Tue, Apr 10, 2018 at 2:45 AM, Tim Rühsen <tim.rueh...@gmx.de> wrote:
>
>> On 04/10/2018 03:12 AM, Jeffrey Fetterman wrote:
>>> --retry_connrefused is mentioned in the documentation but it doesn't seem
>>> to be an option anymore. I can't find a replacement for it, either. My
>> VPN
>>> is being a bit fussy today and I keep having to restart my script because
>>> of 111 errors.
>>>
>> I assume wget2... use --tries. That value is currently also used for
>> connection failures.
>>
>> ...
>> [0] Downloading 'http://localhost' ...
>> 10.073423.223 cookie_create_request_header for host=localhost path=(null)
>> Failed to write 207 bytes (111: Connection refused)
>> 10.073423.223 host_increase_failure: localhost failures=1
>> ...
>>
>> Regards, Tim
>>
>>





Re: [Bug-wget] retry_connrefused?

2018-04-10 Thread Tim Rühsen
On 04/10/2018 03:12 AM, Jeffrey Fetterman wrote:
> --retry_connrefused is mentioned in the documentation but it doesn't seem
> to be an option anymore. I can't find a replacement for it, either. My VPN
> is being a bit fussy today and I keep having to restart my script because
> of 111 errors.
> 

I assume wget2... use --tries. That value is currently also used for
connection failures.

...
[0] Downloading 'http://localhost' ...
10.073423.223 cookie_create_request_header for host=localhost path=(null)
Failed to write 207 bytes (111: Connection refused)
10.073423.223 host_increase_failure: localhost failures=1
...

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] make.exe warnings

2018-04-06 Thread Tim Rühsen
On 04/06/2018 04:30 AM, Jeffrey Fetterman wrote:
> I've successfully built wget2 through msys2 as a Windows binary, and it
> appears to be working (granted I've not used it much yet), but I'm
> concerned about some of the warnings that occurred during compilation.
> 
> Unsurprisingly they seem to be socket-related.
> 
> https://spit.mixtape.moe/view/9f38bd83

These are warnings from gnulib code. The code itself looks good to me.
Our CFLAGS for building the gnulib code are maybe too strong, I'll see
if reducing verbosity is recommended here.

With Best Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Miscellaneous thoughts & concerns

2018-04-07 Thread Tim Rühsen
WSL fix for TLS:


Search libwget/ssl_gnutls.c for EINPROGRESS and extend the code to also
check errno for 22 and 32.

There are just two places in _ssl_writev().


After these changes TLS works for me including --tls-resume.
But you still have to use --no-tcp-fastopen.

Regards, Tim


On 07.04.2018 04:31, Jeffrey Fetterman wrote:
> > The number of parallel downloads ? --max-threads=n
>
> Okay, well, when I was running it earlier, I was noticing an entire
> directory of pdfs slowly getting larger every time I refreshed the
> directory, and there were something like 30 in there. It wasn't just
> five. I was very confused and I'm not sure what's going on there, and
> I really would like it to not do that.
>
>
> > Likely the WSL issue is also affecting the TLS layer. TLS resume is
> considered 'insecure', thus we have it disabled by default. There
> still is TLS False Start enabled by default.
>
> Are you implying TLS False Start will perform the same function as TLS
> Resume?
>
>
> > You likely want to use --progress=bar. --force-progress is to enable
> the progress bar even when redirecting (e.g. to a log file). @Darshit,
> we shoudl adjust the behavior to be the same as in Wget1.x.
>
> That does work but it's very buggy. Only one shows at a time and it
> doesn't even always show the file that is downloading. Like it'll seem
> to be downloading a txt file when it's really downloading several
> larger files in the background.
>
>
> > Did you build with http/2 and compression support ?
>
> Yes, why?
>
>
> P.S. I'm willing to help out with your documentation if you push some
> stuff that makes my life on WSL a little less painful, haha. I'd run
> this in a VM in an instant but I feel like that would be a bottleneck
> on what's supposed to be a high performance program. Speaking of high
> performance, just how much am I missing out on by not being able to
> take advantage of tcp fast open?
>
>
> On Fri, Apr 6, 2018 at 5:01 PM, Tim Rühsen <tim.rueh...@gmx.de
> <mailto:tim.rueh...@gmx.de>> wrote:
>
> Hi Jeffrey,
>
>
> thanks for your feedback !
>
>
> On 06.04.2018 23:30, Jeffrey Fetterman wrote:
> > Thanks to the fix that Tim posted on gitlab, I've got wget2
> running just
> > fine in WSL. Unfortunately it means I don't have TCP Fast Open,
> but given
> > how fast it's downloading a ton of files at once, it seems like
> it must've
> > been only a small gain.
> >
> >
> > I've come across a few annoyances however.
> >
> > 1. There doesn't seem to be any way to control the size of the
> download
> > queue, which I dislike because I want to download a lot of large
> files at
> > once and I wish it'd just focus on a few at a time, rather than
> over a
> > dozen.
> The number of parallel downloads ? --max-threads=n
>
> > 3. Doing a TLS resume will cause a 'Failed to write 305 bytes
> (32: Broken
> > pipe) error to be thrown', seems to be related to how certificate
> > verification is handled upon resume, but I was worried at first
> that the
> > WLS problems were rearing their ugly head again.
> Likely the WSL issue is also affecting the TLS layer. TLS resume is
> considered 'insecure',
> thus we have it disabled by default. There still is TLS False Start
> enabled by default.
>
>
> > 3. --no-check-certificate causes significantly more errors about
> how the
> > certificate issuer isn't trusted to be thrown (even though it's not
> > supposed to be doing anything related to certificates).
> Maybe a bit too verbose - these should be warnings, not errors.
>
> > 4. --force-progress doesn't seem to do anything despite being
> recognized as
> > a valid paramater, using it in conjunction with -nv is no longer
> beneficial.
> You likely want to use --progress=bar. --force-progress is to
> enable the
> progress bar even when redirecting (e.g. to a log file).
> @Darshit, we shoudl adjust the behavior to be the same as in Wget1.x.
>
> > 5. The documentation is unclear as to how to disable things that are
> > enabled by default. Am I to assume that --robots=off is
> equivalent to -e
> > robots=off?
>
> -e robots=off should still work. We also allow --robots=off or
> --no-robots.
>
> > 6. The documentation doesn't document being able to use 'M' for
> chunk-size,
> > e.g. --chunk-size=2M
>
> The wget2 documentation has to be brushed up - one of the blockers for
> the first re

Re: [Bug-wget] Miscellaneous thoughts & concerns

2018-04-09 Thread Tim Rühsen
;>>>> Both reduce RTT by 1, but they can't be combined.
>>>>
>>>> I was using TLS Resume because, well, for a 300+GB download it just
>>> seemed
>>>> to make sense, so it wouldn't have to check over 100GB of files before
>>>> getting back to where I left off.
>>>>
>>>>> You use TLS Resume, but you don't explicitly need to specify a file.
>>> By
>>>> default it will use ~/.wget-session.
>>>>
>>>> I figure a 300GB+ transfer should have its own session file just in
>>> case I
>>>> do something smaller between resumes that might overwrite .wget-session,
>>>> plus you've got to remember I'm on WSL and I'd rather have relevant
>>> files
>>>> kept within my normal folders rather than my WSL filesystem.
>>>>
>>> I'm not sure if you've understood TLS Session Resume correctly. TLS
>>> Session
>>> Resume is not going to resume your download session from where it left
>>> off. Due
>>> to the way HTTP works, Wget will still have to scan all your existing
>>> files and
>>> send HEAD requests for each of them when resuming. This is just a
>>> limitation of
>>> HTTP and there's nothing anybody can do about it.
>>>
>>> TLS Session Resume will simply reduce 1 RTT when starting a new TLS
>>> Session. It
>>> simply matters for the TLS handshake and nothing else. It doesn't resume
>>> the
>>> Wget session at all. Also, the ~/.wget-session file simply stores the TLS
>>> Session information for each TLS Session. So you can use it for multiple
>>> sessions. It is just a cache.
>>>> On Sat, Apr 7, 2018 at 3:04 AM, Darshit Shah <dar...@gmail.com> wrote:
>>>>
>>>>> Hi Jefferey,
>>>>>
>>>>> Thanks a lot for your feedback. This is what helps us improve.
>>>>>
>>>>> * Tim Rühsen <tim.rueh...@gmx.de> [180407 00:01]:
>>>>>>
>>>>>> On 06.04.2018 23:30, Jeffrey Fetterman wrote:
>>>>>>> Thanks to the fix that Tim posted on gitlab, I've got wget2
>>> running
>>>>> just
>>>>>>> fine in WSL. Unfortunately it means I don't have TCP Fast Open,
>>> but
>>>>> given
>>>>>>> how fast it's downloading a ton of files at once, it seems like it
>>>>> must've
>>>>>>> been only a small gain.
>>>>>>>
>>>>> TCP Fast Open will not save you a lot in your particular scenario. It
>>>>> simply
>>>>> saves one round trip when opening a new connection. So, if you're
>>> using
>>>>> Wget2
>>>>> to download a lot of files, you are probably only opening ~5
>>> connections
>>>>> at the
>>>>> beginning and reusing them all. It depends on your RTT to the server,
>>> but
>>>>> 1 RTT
>>>>> when downloading several megabytes is already an insignificant amount
>>> if
>>>>> time.
>>>>>
>>>>>>>
>>>>>>> I've come across a few annoyances however.
>>>>>>>
>>>>>>> 1. There doesn't seem to be any way to control the size of the
>>> download
>>>>>>> queue, which I dislike because I want to download a lot of large
>>> files
>>>>> at
>>>>>>> once and I wish it'd just focus on a few at a time, rather than
>>> over a
>>>>>>> dozen.
>>>>>> The number of parallel downloads ? --max-threads=n
>>>>>
>>>>> I don't think he meant --max-threads. Given how he is using HTTP/2,
>>>>> there's a
>>>>> chance what he's seeing is HTTP Stream Multiplexing. There is also,
>>>>> `--http2-request-window` which you can try.
>>>>>>
>>>>>>> 3. Doing a TLS resume will cause a 'Failed to write 305 bytes (32:
>>>>> Broken
>>>>>>> pipe) error to be thrown', seems to be related to how certificate
>>>>>>> verification is handled upon resume, but I was worried at first
>>> that
>>>>> the
>>>>>>> WLS problems were rearing their ugly head again.
>>>>>> Likely the WSL issue is also affecting the TLS layer. TLS resume is
>>>>>> considered 'insecure',
>>>>>> thus we have it disabled by default

Re: [Bug-wget] --http2=off causes Segmentation fault but ./configure --without-libnghttp2 does not

2018-04-09 Thread Tim Rühsen
On 04/09/2018 01:04 PM, Jeffrey Fetterman wrote:
> So I wanted to see how scraping a large site compared with multiplexing
> off. I used the -http2=off parameter, but I got a segfault.

Not reproducible here. Could you give me the whole command line ?

> So I decided I'd configure wget2 without the http2 library and just try the
> same command again (without -http2=off since it wasn't compiled with it
> anyway) and it worked just fine.
> 
> (Also.. it does seem like wget2 is faster without http2, for the site full
> of large pdfs I'm scraping anyway.)

I also had the impression that http/2 at least sometimes is slower, but
didn't make exact measurements. There are much pitfalls on the server
side that an admin has to deal with.
If you know a good site / command line for benchmarking, please let me know.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] --http2=off causes Segmentation fault but ./configure --without-libnghttp2 does not

2018-04-09 Thread Tim Rühsen
On 04/09/2018 01:18 PM, Jeffrey Fetterman wrote:
> God damnit, I just got it to happen with ./configure --without-libnghttp2
> 
> Now I'm not sure what is triggering it.

If you can trigger it in a Linux VM:
Install valgrind, build the code with -g and use wget2_noinstall instead
of wget2.

e.g.
valgrind src/wget2_noinstall ...

It should spill out a backtrace with line numbers. Post that here.

> 
> On Mon, Apr 9, 2018 at 6:04 AM, Jeffrey Fetterman 
> wrote:
> 
>> So I wanted to see how scraping a large site compared with multiplexing
>> off. I used the -http2=off parameter, but I got a segfault.
>>
>> So I decided I'd configure wget2 without the http2 library and just try
>> the same command again (without -http2=off since it wasn't compiled with it
>> anyway) and it worked just fine.
>>
>> (Also.. it does seem like wget2 is faster without http2, for the site full
>> of large pdfs I'm scraping anyway.)
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] wget2: exclude-directories, in documentation but not functional

2018-04-22 Thread Tim Rühsen


On 22.04.2018 08:00, Jeffrey Fetterman wrote:
> So there's a directory in a site I've been using wget2 on that has a bunch
> of files I don't need, but I can't figure out how to filter it out.
>
> --exclude-directories is in the documentation but it says it's an unknown
> option.
>
> Was it replaced by a different option? How do I filter out a certain
> directory?
>

There was a recent discussion incl. alternatives:
https://gitlab.com/gnuwget/wget2/issues/365

And -X is work in progress (branch tmp-exclude-directories).

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] GSOC Project Availability

2018-03-27 Thread Tim Rühsen
Hi Eric,

IMO the deadline for GSOC student applications is today 18:00 CEST, so
you have to hurry. During 27th March and 23rd April the organizations
review and decide for/against proposals.

Both, http/2 test suite and WARC library hasn't been addressed by an
application so far (but that may change until the deadline, so no promise).

With Best Regards, Tim


On 03/26/2018 11:34 PM, Eric Ngo wrote:
> To whom this may concern,
> 
> My name is Eric Ngo and I am a computer science major at San Francisco
> State University. I was looking for open-source projects to contribute to
> in GSOC 2018 and came across the GNU Project. I was looking at the list of
> ideas and came across this page(
> https://gitlab.com/gnuwget/wget2/wikis/GSoC-2018:-List-of-Projects). I am
> interested in contributing to the HTTP/2 Test Suite and/or the WARC Library
> and Integration projects. Were these ideas already implemented, or can I
> still contribute?
> 
> Sincerely,
> Eric Ngo.
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] wget2 hanging, possible I/O issue

2018-04-02 Thread Tim Rühsen
Hi Jeffrey,


basically wget2 should work on WSL, I just tested it scarcely two weeks
ago without issues.


I suspect it might have to do with your dependencies (e.g. did you
install libnghttp2-dev ?).

To find out, please send your config.log. That allows me to see your
compiler, CFLAGS and the detected dependencies etc..

I will try to reproduce the issue then.


Regards, Tim


On 02.04.2018 17:42, Jeffrey Fetterman wrote:
>  wget2 will not download any files, and I think there's some sort of disk
> access issue.
>
> this is on Windows Subsystem for Linux Debian 9.3 Stretch. (Ubuntu 16.04
> LTS had the same issue.)
>
> Here's the output of strace -o strace.txt -ff wget2 https://www.google.com
>
> https://pastebin.com/4MEL88qs
>
> wget2 -d https://www.google.com just hangs after the line '02.103350.008
> ALPN offering http/1.1'
>
> ultimately I might have to submit a bug to WSL but I wouldn't know what to
> report, I don't know what's wrong. And it'd be great if there was a
> workaround




signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] wget2 hanging, possible I/O issue

2018-04-02 Thread Tim Rühsen
Hi Jeffrey,


back then I installed Ubuntu via WSL. A fresh build of Wget2 took
~30mins on my wife's laptop. Time-wasting.

But I can reproduce a hang with HTTPS and (repeating) errors with HTTP.


This might be an issue with Windows Sockets... maybe someone has a
faster machine to do some testing !?


Regards, Tim

On 02.04.2018 19:30, Jeffrey Fetterman wrote:
> I can tell you the exact steps I took from nothing to a fresh install,
> I have the commands copied.
>
> install Debian from Windows Store, set up username/password
>
> $ sudo sh -c "echo kernel.yama.ptrace_scope = 0 >>
> /etc/sysctl.d/10-ptrace.conf; sysctl --system -a -p | grep yama"
> (this is a workaround for Valgrind and anything else that relies
> on prctl(PR_SET_PTRACER) and the wget2 problem will occur either way)
>
> $ sudo apt-get update
> $ sudo apt-get upgrade
> $ sudo apt-get install autoconf autogen automake autopoint doxygen
> flex gettext git gperf lcov libtool lzip make pandoc python3.5
> pkg-config texinfo valgrind libbz2-dev libgnutls28-dev libgpgme11-dev
> libiconv-hook-dev libidn2-0-dev liblzma-dev libnghttp2-dev
> libmicrohttpd-dev libpcre3-dev libpsl-dev libunistring-dev zlib1g-dev
> $ sudo update-alternatives --install /usr/bin/python python
> /usr/bin/python3.5 1
>
> then the commands outlined as per the documentation. config.log attached.
>
> On Mon, Apr 2, 2018 at 11:53 AM, Tim Rühsen <tim.rueh...@gmx.de
> <mailto:tim.rueh...@gmx.de>> wrote:
>
> Hi Jeffrey,
>
>
> basically wget2 should work on WSL, I just tested it scarcely two
> weeks
> ago without issues.
>
>
> I suspect it might have to do with your dependencies (e.g. did you
> install libnghttp2-dev ?).
>
> To find out, please send your config.log. That allows me to see your
> compiler, CFLAGS and the detected dependencies etc..
>
> I will try to reproduce the issue then.
>
>
> Regards, Tim
>
>
> On 02.04.2018 17:42, Jeffrey Fetterman wrote:
> >  wget2 will not download any files, and I think there's some
> sort of disk
> > access issue.
> >
> > this is on Windows Subsystem for Linux Debian 9.3 Stretch.
> (Ubuntu 16.04
> > LTS had the same issue.)
> >
> > Here's the output of strace -o strace.txt -ff wget2
> https://www.google.com
> >
> > https://pastebin.com/4MEL88qs
> >
> > wget2 -d https://www.google.com just hangs after the line
> '02.103350.008
> > ALPN offering http/1.1'
> >
> > ultimately I might have to submit a bug to WSL but I wouldn't
> know what to
> > report, I don't know what's wrong. And it'd be great if there was a
> > workaround
>
>
>





Re: [Bug-wget] wget2 hanging, possible I/O issue

2018-04-04 Thread Tim Rühsen
Hi Jeffrey,

possibly I can get my hands on a fast Win10 desktop the coming
weekend... no promise but I'll try.


With Best Regards, Tim



On 04/04/2018 09:54 AM, Tim Rühsen wrote:
> Hi Jeffrey,
> 
> I can't tell you. Basically because the only WSL I can get my hands on
> is on my wife's laptop which is *very* slow. And it needs some analysis
> on that side, maybe with patches for gnulib. Send me a fast Win10
> machine and I analyse+fix the problem ;-)
> 
> 
> BTW, we are also not using SO_REUSEPORT. The links you provided assume
> that it's a problem in that area. All I can say is that Wget2 was
> definitely working on WSL just a few weeks ago.
> 
> 
> Another option for you is to install Debian/Ubuntu in a VM. Until the
> hickups with WSL have been solved one or another way.
> 
> 
> With Best Regards, Tim
> 
> 
> On 04/04/2018 09:01 AM, Jeffrey Fetterman wrote:
>> Tim, do you know when you'll be able to examine and come up with a
>> workaround for the issue? There are alternatives to wget2 but either
>> they're not high performance or they're not really cut out for site
>> scraping.
>>
>> On Mon, Apr 2, 2018 at 12:30 PM, Jeffrey Fetterman <jfett...@mail.ccsf.edu>
>> wrote:
>>
>>> I can tell you the exact steps I took from nothing to a fresh install, I
>>> have the commands copied.
>>>
>>> install Debian from Windows Store, set up username/password
>>>
>>> $ sudo sh -c "echo kernel.yama.ptrace_scope = 0 >>
>>> /etc/sysctl.d/10-ptrace.conf; sysctl --system -a -p | grep yama"
>>> (this is a workaround for Valgrind and anything else that relies
>>> on prctl(PR_SET_PTRACER) and the wget2 problem will occur either way)
>>>
>>> $ sudo apt-get update
>>> $ sudo apt-get upgrade
>>> $ sudo apt-get install autoconf autogen automake autopoint doxygen flex
>>> gettext git gperf lcov libtool lzip make pandoc python3.5 pkg-config
>>> texinfo valgrind libbz2-dev libgnutls28-dev libgpgme11-dev
>>> libiconv-hook-dev libidn2-0-dev liblzma-dev libnghttp2-dev
>>> libmicrohttpd-dev libpcre3-dev libpsl-dev libunistring-dev zlib1g-dev
>>> $ sudo update-alternatives --install /usr/bin/python python
>>> /usr/bin/python3.5 1
>>>
>>> then the commands outlined as per the documentation. config.log attached.
>>>
>>> On Mon, Apr 2, 2018 at 11:53 AM, Tim Rühsen <tim.rueh...@gmx.de> wrote:
>>>
>>>> Hi Jeffrey,
>>>>
>>>>
>>>> basically wget2 should work on WSL, I just tested it scarcely two weeks
>>>> ago without issues.
>>>>
>>>>
>>>> I suspect it might have to do with your dependencies (e.g. did you
>>>> install libnghttp2-dev ?).
>>>>
>>>> To find out, please send your config.log. That allows me to see your
>>>> compiler, CFLAGS and the detected dependencies etc..
>>>>
>>>> I will try to reproduce the issue then.
>>>>
>>>>
>>>> Regards, Tim
>>>>
>>>>
>>>> On 02.04.2018 17:42, Jeffrey Fetterman wrote:
>>>>>  wget2 will not download any files, and I think there's some sort of
>>>> disk
>>>>> access issue.
>>>>>
>>>>> this is on Windows Subsystem for Linux Debian 9.3 Stretch. (Ubuntu 16.04
>>>>> LTS had the same issue.)
>>>>>
>>>>> Here's the output of strace -o strace.txt -ff wget2
>>>> https://www.google.com
>>>>>
>>>>> https://pastebin.com/4MEL88qs
>>>>>
>>>>> wget2 -d https://www.google.com just hangs after the line
>>>> '02.103350.008
>>>>> ALPN offering http/1.1'
>>>>>
>>>>> ultimately I might have to submit a bug to WSL but I wouldn't know what
>>>> to
>>>>> report, I don't know what's wrong. And it'd be great if there was a
>>>>> workaround
>>>>
>>>>
>>>>
>>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] wget2 hanging, possible I/O issue

2018-04-04 Thread Tim Rühsen
Hi Jeffrey,

I can't tell you. Basically because the only WSL I can get my hands on
is on my wife's laptop which is *very* slow. And it needs some analysis
on that side, maybe with patches for gnulib. Send me a fast Win10
machine and I analyse+fix the problem ;-)


BTW, we are also not using SO_REUSEPORT. The links you provided assume
that it's a problem in that area. All I can say is that Wget2 was
definitely working on WSL just a few weeks ago.


Another option for you is to install Debian/Ubuntu in a VM. Until the
hickups with WSL have been solved one or another way.


With Best Regards, Tim


On 04/04/2018 09:01 AM, Jeffrey Fetterman wrote:
> Tim, do you know when you'll be able to examine and come up with a
> workaround for the issue? There are alternatives to wget2 but either
> they're not high performance or they're not really cut out for site
> scraping.
> 
> On Mon, Apr 2, 2018 at 12:30 PM, Jeffrey Fetterman <jfett...@mail.ccsf.edu>
> wrote:
> 
>> I can tell you the exact steps I took from nothing to a fresh install, I
>> have the commands copied.
>>
>> install Debian from Windows Store, set up username/password
>>
>> $ sudo sh -c "echo kernel.yama.ptrace_scope = 0 >>
>> /etc/sysctl.d/10-ptrace.conf; sysctl --system -a -p | grep yama"
>> (this is a workaround for Valgrind and anything else that relies
>> on prctl(PR_SET_PTRACER) and the wget2 problem will occur either way)
>>
>> $ sudo apt-get update
>> $ sudo apt-get upgrade
>> $ sudo apt-get install autoconf autogen automake autopoint doxygen flex
>> gettext git gperf lcov libtool lzip make pandoc python3.5 pkg-config
>> texinfo valgrind libbz2-dev libgnutls28-dev libgpgme11-dev
>> libiconv-hook-dev libidn2-0-dev liblzma-dev libnghttp2-dev
>> libmicrohttpd-dev libpcre3-dev libpsl-dev libunistring-dev zlib1g-dev
>> $ sudo update-alternatives --install /usr/bin/python python
>> /usr/bin/python3.5 1
>>
>> then the commands outlined as per the documentation. config.log attached.
>>
>> On Mon, Apr 2, 2018 at 11:53 AM, Tim Rühsen <tim.rueh...@gmx.de> wrote:
>>
>>> Hi Jeffrey,
>>>
>>>
>>> basically wget2 should work on WSL, I just tested it scarcely two weeks
>>> ago without issues.
>>>
>>>
>>> I suspect it might have to do with your dependencies (e.g. did you
>>> install libnghttp2-dev ?).
>>>
>>> To find out, please send your config.log. That allows me to see your
>>> compiler, CFLAGS and the detected dependencies etc..
>>>
>>> I will try to reproduce the issue then.
>>>
>>>
>>> Regards, Tim
>>>
>>>
>>> On 02.04.2018 17:42, Jeffrey Fetterman wrote:
>>>>  wget2 will not download any files, and I think there's some sort of
>>> disk
>>>> access issue.
>>>>
>>>> this is on Windows Subsystem for Linux Debian 9.3 Stretch. (Ubuntu 16.04
>>>> LTS had the same issue.)
>>>>
>>>> Here's the output of strace -o strace.txt -ff wget2
>>> https://www.google.com
>>>>
>>>> https://pastebin.com/4MEL88qs
>>>>
>>>> wget2 -d https://www.google.com just hangs after the line
>>> '02.103350.008
>>>> ALPN offering http/1.1'
>>>>
>>>> ultimately I might have to submit a bug to WSL but I wouldn't know what
>>> to
>>>> report, I don't know what's wrong. And it'd be great if there was a
>>>> workaround
>>>
>>>
>>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] GSoC'18: DNS over HTTPS.

2018-03-22 Thread Tim Rühsen
On 03/22/2018 02:01 PM, Aniketh Gireesh wrote:
> Further, In my opinion, I think it would be better as a different
> library/directory. I think that would be a better refactoring method as
> well as it would be easier to work on the codebase at a later point in
> time. Further, as far as my understanding goes, libwget is a library
> handling HTTP, helping in creating an HTTP request. It seems better to have
> something different to handle DNS and other things regarding that. It would
> feel like all cluttered up inside libwget.
> 
> If this is not the way we want it in Wget2, just let me know. I will change
> the proposal as well as the plans for implementation :)

Since your code will likely use functions from libwget and the other way
round, we should place it in libwget/. But if it makes your development
easier during GSOC, feel free to put it into a separate directory.

For the future we have a splitting of libwget in several libraries in
mind, but it currently has low priority. We may have some day
libwget-common, libwget-doh, libwget-warc, ...

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] GSoC'18: DNS over HTTPS.

2018-03-22 Thread Tim Rühsen
On 03/21/2018 07:11 PM, Aniketh Gireesh wrote:
> Hello everyone,
> 
> I'm Aniketh Girish, a computer science undergraduate from India. Just
> giving a small introduction about myself, I am quite convenient with open
> source contribution and I have done GSoC last year with KDE under Krita, a
> graphics libre software.
> 
> I was interested in a project inside Wget2 called DNS-over-HTTPS and I have
> prepared a proposal for the same[1].
> 
> Review from the community before submission is really vital and I wish if I
> could get a review on this so that I could improve on the proposal much
> more and gain better clarity about the project as well as learn about my
> mistakes.
> 
> Look forward towards your suggestions and comments :)

Hi Anikesh,

your proposal very well written with a good amount of details. I enjoyed
reading it, good work !

Daniel Stenberg already mentioned a few points and I guess he is
currently *the* expert for DoH client implementation.

There is not much to add from my side, just a few remarks.

This link is absolutely not relevant for Wget2:
  2. http://wget.addictivecode.org/OptionsHowto.html
We should write up something similar for Wget2 to give contributors an
easy start.

You mention two paths for the sources, wget2/libdns and wget2/dnslib.
Accidentally or did I miss something ?
You could also put the files into libwget/ - and skip building these
files in the Makefile.am using a 'conditional' from configure.ac.

So you also need a configure flag (e.g. --enable-dns-over-https
[default: on]).

Maybe you change the (protocol independent) '--dns-resolver' to a more
specific '--doh-resolver'.


With Best Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Feature request: option to not download rejected files

2018-06-29 Thread Tim Rühsen
On 06/29/2018 03:20 PM, Zoe Blade wrote:
> For anyone else who needs to do this, I adapted Sergey Svishchev's 1.8-era 
> patch for 19.1 (one of the few versions I managed to get to compile in OS X; 
> I'm on a Mac, and not the best programmer):
> 
> recur.c:578
> -  if (blacklist_contains (blacklist, url))
> +  if (blacklist_contains (blacklist, url) || !acceptable (url))
> 
> It's not ideal, but it seems to solve the problem as a temporary fix.  
> Hopefully it might help someone else who needs this functionality.

Hi Zoë,

we recently had a discussion (20.6.2018 "Why does -A not work") where I
confirmed that --reject-regex works like a filter for detected URLs.

BTW, the OP wanted --reject-regex to download+parse HTML (and delete
thereafter if matching the rejected regex) - so the opposite from your
request.

In Wget2 there is an extra option for this, --filter-urls. Maybe
--filter-mime-type is also worth a look.

Best would be if you can provide a small example / reproducer. It can
also be a hand-crafted HTML file.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] A C developer

2018-10-05 Thread Tim Rühsen
Hi Tom,

On 10/4/18 11:13 AM, Tom Mounet wrote:
> Hello,
> 
> I saw your needs for a C developer on Savannah home page. I'm a student
> in IT I have good basis in Networking (so TCP/IP) and I've been coding
> for the last 2 years in C, It's not my main language but I love it, and
> I would love to learn more about this language.
> 
> I am really motivated ! I would very much like to work with you on this
> project.

That sounds good, of course you are welcome :-)

The current development mainly takes place on GNU Wget2, collaboration
site is https://gitlab.com/gnuwget/wget2.

Wget2 is written in C99 while Wget1.x is in C89.

First steps would be to

- make sure you have a pretty recent development machine (linux or bsd
seems the best)
- make up a Gitlab.com account
- clone the repo
- build and test wget2
- check the issues, ask questions

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] no post-handshake auth under gnutls

2018-10-08 Thread Tim Rühsen
Thanks, Nikos.

Slightly amended and pushed.

Regards, Tim

On 10/8/18 10:47 AM, Nikos Mavrogiannopoulos wrote:
> Hi,
>  It seems that wget does not enable/use post-handshake authentication
> with gnutls when running under TLS1.3.
> 
> The enabling of TLS1.3 although transparent for all uses cases, is not
> for the use case where the server allows a client to connect without
> certificate but requests authentication later after the location of
> access is known. Under TLS1.2 this was working via a re-handshake, but
> under TLS1.3 a client must enable and perform post-handshake
> authentication instead.
> 
> A quick and dirty patch to demonstrate how to enable it, is attached.
> If you wait until gnutls 3.6.5, there may be a simpler way to enable
> it:
> https://gitlab.com/gnutls/gnutls/merge_requests/766
> 
> 
> More info at:
> https://nikmav.blogspot.com/2018/05/gnutls-and-tls-13.html
> 
> regards,
> Nikos
> 



signature.asc
Description: OpenPGP digital signature


<    1   2   3   4   5   6   7   8   9   >