subject:"\[Bug\-wget\] problem"

Re: [Bug-wget] Problem downloading with RIGHT SINGLE QUOTATION MARK (U+2019) in filename

2019-10-11 Thread Tim Rühsen

On 11.10.19 11:07, Eli Zaretskii wrote:
>> From: Cameron Tacklind 
>> Date: Thu, 10 Oct 2019 20:31:02 -0700
>>
>> The error is pretty clearly an encoding conversion issue, going from UTF-8,
>> assumed to be CP1252, converting into UTF-8, which becomes wrong.
> 
> I think you need to tell Wget that the page encoding is UTF-8, by
> using the --remote-encoding switch.  Did you try that?
> 

Cameron's html file contains a 'meta' tag with attribute
'charset=utf-8'. So wget should detect it and convert the URL correctly.

And I can confirm that wget is working properly here. My version is
1.20.3 and I am working on Linux.

I put this file onto my local apache web server and named it quote.html:

RIGHT SINGLE QUOTE TEST

test

My command line is
  wget -d -r http://localhost/quote.html

Output is
...
Decided to load it.
URI encoding = »utf-8«
Enqueuing http://localhost/%E2%80%99 at depth 1
Queue count 1, maxcount 1.
[IRI Enqueuing »http://localhost/%E2%80%99« with »utf-8«
Dequeuing http://localhost/%E2%80%99 at depth 1
Queue count 0, maxcount 1.
Converted file name 'localhost/’' (UTF-8) -> 'localhost/’' (UTF-8)
--2019-10-11 18:06:21--  http://localhost/%E2%80%99
...
---request begin---
GET /%E2%80%99 HTTP/1.1
Referer: http://localhost/quote.html
User-Agent: Wget/1.20.3 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: localhost
Connection: Keep-Alive

---request end---
...

@Cameron: Your wget version seems ok, so I am a bit clueless right.now...

Could you give me the output of 'wget --version' ?
Could you test in the same way as I did above to see if that is
reproducible for you or not ?

Regards, Tim

signature.asc
Description: OpenPGP digital signature

Re: [Bug-wget] Problem downloading with RIGHT SINGLE QUOTATION MARK (U+2019) in filename

2019-10-11 Thread Eli Zaretskii

> From: Cameron Tacklind 
> Date: Thu, 10 Oct 2019 20:31:02 -0700
> 
> The error is pretty clearly an encoding conversion issue, going from UTF-8,
> assumed to be CP1252, converting into UTF-8, which becomes wrong.

I think you need to tell Wget that the page encoding is UTF-8, by
using the --remote-encoding switch.  Did you try that?

[Bug-wget] Problem downloading with RIGHT SINGLE QUOTATION MARK (U+2019) in filename

2019-10-10 Thread Cameron Tacklind

Hello,

I think I've found a bug with wget.

I originally came across this problem when recursively downloading folders
that were presented by nginx's fancy-index module. Sometimes a filename
would include a "’" [RIGHT SINGLE QUOTATION MARK (U+2019)] and wget would
always get a 404 error when downloading the file.

Downloading this simple html file (simplified output of nginx fancy-index)
shows the error:


RIGHT SINGLE QUOTE TEST

test


Full command line (Windows cmd.exe)
wget -d --no-verbose --tries 0
 --continue --show-progress --wait 0.1 --waitretry 5
 -e robots=off --rejected-log=rejected.log --recursive --level inf --reject
"index.html*,jpg,png,zip"
 --no-parent --no-host-directories --auth-no-challenge --user xxx
--password xxx -P output_dir
https://mydomain.com/test/

Debug Output:
DEBUG output created by Wget 1.20.3 on mingw32.

Reading HSTS entries from C:\ProgramData\chocolatey\lib\Wget\tools/.wget-hsts
URI encoding = 'CP1252'
iconv UTF-8 -> CP1252
iconv outlen=60 inlen=30
converted 'https://mydomain.com/test/' (CP1252) -> '
https://mydomain.com/test/' (UTF-8)
URI encoding = 'CP1252'
Enqueuing https://mydomain.com/test/ at depth 0
Queue count 1, maxcount 1.
[IRI Enqueuing 'https://mydomain.com/test/' with 'CP1252'
Dequeuing https://mydomain.com/test/ at depth 0
Queue count 0, maxcount 1.
iconv UTF-8 -> CP1252
iconv outlen=60 inlen=30
converted 'https://mydomain.com/test/' (CP1252) -> '
https://mydomain.com/test/' (UTF-8)
Converted file name 'test/index.html' (UTF-8) -> 'test/index.html' (CP1252)
Auth-without-challenge set, sending Basic credentials.
seconds 0.00, Caching mydomain.com => my.ip.add.ress
seconds 0.00, Created socket 4.
Releasing 0x00b3bf60 (new refcount 1).
Initiating SSL handshake.
seconds 900.00, Winsock error: 0
Handshake successful; connected socket 4 to SSL handle 0x00b52260
certificate:
  subject: CN=mydomain.com
  issuer:  CN=Let's Encrypt Authority X3,O=Let's Encrypt,C=US
X509 certificate successfully verified and matches host mydomain.com

---request begin---
GET /test/ HTTP/1.1

User-Agent: Wget/1.20.3 (mingw32)

Accept: */*

Accept-Encoding: identity

Authorization: Basic 

Host: mydomain.com

Connection: Keep-Alive



---request end---
seconds 900.00, Winsock error: 0

---response begin---
HTTP/1.1 200 OK

Server: nginx/1.14.1

Date: Fri, 11 Oct 2019 02:17:57 GMT

Content-Type: text/html

Content-Length: 185

Last-Modified: Fri, 11 Oct 2019 02:17:52 GMT

Connection: keep-alive

Keep-Alive: timeout=20

ETag: "5d9fe650-b9"

Accept-Ranges: bytes



---response end---
Registered socket 4 for persistent reuse.
seconds 900.00, Winsock error: 0

 0K   100%  282K=0
.001s2019-10-10 19:17:01 URL:https://mydomain.com/test/ [185/185] ->
"E:/test/poops/test/index.html.tmp" [1]
Loaded E:/test/poops/test/index.html.tmp (size 185).
URI encoding = 'CP1252'
E:/test/poops/test/index.html.tmp: merge('https://mydomain.com/test/',
'%E2%80%99') -> https://mydomain.com/test/%E2%80%99
iconv UTF-8 -> CP1252
iconv outlen=66 inlen=33
converted 'https://mydomain.com/test/%E2%80%99' (CP1252) -> '
https://mydomain.com/test/â€™' (UTF-8)
appending 'https://mydomain.com/test/%C3%A2%E2%82%AC%E2%84%A2' to urlpos.
URI content encoding = 'utf-8'
no-follow in E:/test/poops/test/index.html.tmp: 0
Deciding whether to enqueue "
https://mydomain.com/test/%C3%A2%E2%82%AC%E2%84%A2";.
Decided to load it.
URI encoding = 'utf-8'
Enqueuing https://mydomain.com/test/%C3%A2%E2%82%AC%E2%84%A2 at depth 1
Queue count 1, maxcount 1.
[IRI Enqueuing 'https://mydomain.com/test/%C3%A2%E2%82%AC%E2%84%A2' with
'utf-8'
Removing file due to recursive rejection criteria in recursive_retrieve():
Dequeuing https://mydomain.com/test/%C3%A2%E2%82%AC%E2%84%A2 at depth 1
Queue count 0, maxcount 1.
Converted file name 'test/â€™' (UTF-8) -> 'test/’' (CP1252)
Auth-without-challenge set, sending Basic credentials.
Reusing fd 4.

---request begin---
GET /test/%C3%A2%E2%82%AC%E2%84%A2 HTTP/1.1

Referer: https://mydomain.com/test/

User-Agent: Wget/1.20.3 (mingw32)

Accept: */*

Accept-Encoding: identity

Authorization: Basic 

Host: mydomain.com

Connection: Keep-Alive



---request end---
seconds 900.00, Winsock error: 0

---response begin---
HTTP/1.1 404 Not Found

Server: nginx/1.14.1

Date: Fri, 11 Oct 2019 02:17:58 GMT

Content-Type: text/html

Content-Length: 169

Connection: keep-alive

Keep-Alive: timeout=20



---response end---
Skipping 169 bytes of body: [seconds 900.00, Winsock error: 0


404 Not Found



404 Not Found

nginx/1.14.1





] done.
https://mydomain.com/test/%C3%A2%E2%82%AC%E2%84%A2:
2019-10-10 19:17:02 ERROR 404: Not Found.
FINISHED --2019-10-10 19:17:02--
Total wall clock time: 1.6s
Downloaded: 1 files, 185 in 0.001s (282 KB/s)

The error is pretty clearly an encoding conversion issue, going from UTF-8,
assumed to be CP1252, converting into UTF-8, which becomes wrong. This is
nicely described at the end of this page:
http://www.an

Re: [Bug-wget] problem installing wget 1.20.1 on GNU/Linux 4.4.0-134-generic

2019-03-15 Thread Mauricio Zambrano Bigiarini

Thank you very much Tim for your prompt reply.


Kind regards,

Mauricio

=
"Mistakes are always forgivable, if one has
the courage to admit them" (Bruce Lee)
=
Linux user #454569 -- Linux Mint user

On Fri, 15 Mar 2019 at 19:22, Tim Rühsen  wrote:
>
> That was a problem with the perl https daemon not supporting IPv6.
>
> Here on Debian unstable, we just got a patch that fixes it.
>
> See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=667738
>
> Regards, Tim
>
>
> On 15.03.19 20:41, Mauricio Zambrano Bigiarini wrote:
> > I'm reporting the results of make check on Linux Mint 18.3
> >
> > Thanks in advance for any comment on this.
> >
> > Kind regards
> >
> > Mauricio Zambrano-Bigiarini, PhD
> >
> > =
> > Department of Civil Engineering
> > Faculty of Engineering and Sciences
> > Universidad de La Frontera, Temuco, Chile
> > http://hzambran.github.io/
> > =
> > mailto : mauricio.zambr...@ufrontera.cl
> > work-phone : +56 45 259 2812
> > =
> > "Mistakes are always forgivable, if one has
> > the courage to admit them" (Bruce Lee)
> > =
> > Linux user #454569 -- Linux Mint user
> >
> > uname -a
> >
> > 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 17 11:07:07 UTC 2018
> > x86_64 x86_64 x86_64 GNU/Linux
> >
> >
> > ./configure --sysconfdir=/etc  --with-ssl=openssl
> >
> > configure: configuring for GNU Wget 1.20.1
> > checking for a BSD-compatible install... /usr/bin/install -c
> > checking whether build environment is sane... yes
> > checking for a thread-safe mkdir -p... /bin/mkdir -p
> > checking for gawk... gawk
> > checking whether make sets $(MAKE)... yes
> > checking whether make supports nested variables... yes
> > checking build system type... x86_64-pc-linux-gnu
> > checking host system type... x86_64-pc-linux-gnu
> > checking whether make supports nested variables... (cached) yes
> > checking for gcc... gcc
> > checking whether the C compiler works... yes
> > checking for C compiler default output file name... a.out
> > checking for suffix of executables...
> > checking whether we are cross compiling... no
> > checking for suffix of object files... o
> > checking whether we are using the GNU C compiler... yes
> > checking whether gcc accepts -g... yes
> > checking for gcc option to enable C11 features... -std=gnu11
> > checking whether make supports the include directive... yes (GNU style)
> > checking dependency style of gcc -std=gnu11... gcc3
> > checking how to run the C preprocessor... gcc -std=gnu11 -E
> > checking for grep that handles long lines and -e... /bin/grep
> > checking for egrep... /bin/grep -E
> > checking for ANSI C header files... yes
> > checking for sys/types.h... yes
> > checking for sys/stat.h... yes
> > checking for stdlib.h... yes
> > checking for string.h... yes
> > checking for memory.h... yes
> > checking for strings.h... yes
> > checking for inttypes.h... yes
> > checking for stdint.h... yes
> > checking for unistd.h... yes
> > checking minix/config.h usability... no
> > checking minix/config.h presence... no
> > checking for minix/config.h... no
> > checking whether it is safe to define __EXTENSIONS__... yes
> > checking whether _XOPEN_SOURCE should be defined... no
> > checking for Minix Amsterdam compiler... no
> > checking for ar... ar
> > checking for ranlib... ranlib
> > checking for _LARGEFILE_SOURCE value needed for large files... no
> > checking for special C compiler options needed for large files... no
> > checking for _FILE_OFFSET_BITS value needed for large files... no
> > checking for a Python interpreter with version >= 3.0... python3
> > checking for python3... /usr/bin/python3
> > checking for python3 version... 3.4
> > checking for python3 platform... linux
> > checking for python3 script directory... 
> > ${prefix}/lib/python3.4/site-packages
> > checking for python3 extension module directory...
> > ${exec_prefix}/lib/python3.4/site-packages
> > checking whether NLS is requested... yes
> > checking for msgfmt... /usr/bin/msgfmt
> > checking for gmsgfmt... /usr/bin/msgfmt
> > checking for xgettext... /usr/bin/xgettext
> > checking for msgmerge... /usr/bin/msgmerge
> > checking for ld used by gcc -std=gnu11... /usr/bin/ld
> > checking if the linker (/usr/bin/ld) is GNU ld... yes
> > checking for shared library run path origin... done
> > checking 32-bit host C ABI... no
> > checking for the common suffixes of directories in the library search
> > path... lib,lib
> > checking for CFPreferencesCopyAppValue... no
> > checking for CFLocaleCopyCurrent... no
> > checking for GNU gettext in libc... yes
> > checking whether to use NLS... yes
> > checking where the gettext function comes from... libc
> > checking for ranlib... (cached) ranlib
> > checking for flex... flex
> > checking lex output file root... lex.yy
> > checking lex library... -l

Re: [Bug-wget] problem installing wget 1.20.1 on GNU/Linux 4.4.0-134-generic

2019-03-15 Thread Tim Rühsen

That was a problem with the perl https daemon not supporting IPv6.

Here on Debian unstable, we just got a patch that fixes it.

See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=667738

Regards, Tim


On 15.03.19 20:41, Mauricio Zambrano Bigiarini wrote:
> I'm reporting the results of make check on Linux Mint 18.3
> 
> Thanks in advance for any comment on this.
> 
> Kind regards
> 
> Mauricio Zambrano-Bigiarini, PhD
> 
> =
> Department of Civil Engineering
> Faculty of Engineering and Sciences
> Universidad de La Frontera, Temuco, Chile
> http://hzambran.github.io/
> =
> mailto : mauricio.zambr...@ufrontera.cl
> work-phone : +56 45 259 2812
> =
> "Mistakes are always forgivable, if one has
> the courage to admit them" (Bruce Lee)
> =
> Linux user #454569 -- Linux Mint user
> 
> uname -a
> 
> 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 17 11:07:07 UTC 2018
> x86_64 x86_64 x86_64 GNU/Linux
> 
> 
> ./configure --sysconfdir=/etc  --with-ssl=openssl
> 
> configure: configuring for GNU Wget 1.20.1
> checking for a BSD-compatible install... /usr/bin/install -c
> checking whether build environment is sane... yes
> checking for a thread-safe mkdir -p... /bin/mkdir -p
> checking for gawk... gawk
> checking whether make sets $(MAKE)... yes
> checking whether make supports nested variables... yes
> checking build system type... x86_64-pc-linux-gnu
> checking host system type... x86_64-pc-linux-gnu
> checking whether make supports nested variables... (cached) yes
> checking for gcc... gcc
> checking whether the C compiler works... yes
> checking for C compiler default output file name... a.out
> checking for suffix of executables...
> checking whether we are cross compiling... no
> checking for suffix of object files... o
> checking whether we are using the GNU C compiler... yes
> checking whether gcc accepts -g... yes
> checking for gcc option to enable C11 features... -std=gnu11
> checking whether make supports the include directive... yes (GNU style)
> checking dependency style of gcc -std=gnu11... gcc3
> checking how to run the C preprocessor... gcc -std=gnu11 -E
> checking for grep that handles long lines and -e... /bin/grep
> checking for egrep... /bin/grep -E
> checking for ANSI C header files... yes
> checking for sys/types.h... yes
> checking for sys/stat.h... yes
> checking for stdlib.h... yes
> checking for string.h... yes
> checking for memory.h... yes
> checking for strings.h... yes
> checking for inttypes.h... yes
> checking for stdint.h... yes
> checking for unistd.h... yes
> checking minix/config.h usability... no
> checking minix/config.h presence... no
> checking for minix/config.h... no
> checking whether it is safe to define __EXTENSIONS__... yes
> checking whether _XOPEN_SOURCE should be defined... no
> checking for Minix Amsterdam compiler... no
> checking for ar... ar
> checking for ranlib... ranlib
> checking for _LARGEFILE_SOURCE value needed for large files... no
> checking for special C compiler options needed for large files... no
> checking for _FILE_OFFSET_BITS value needed for large files... no
> checking for a Python interpreter with version >= 3.0... python3
> checking for python3... /usr/bin/python3
> checking for python3 version... 3.4
> checking for python3 platform... linux
> checking for python3 script directory... ${prefix}/lib/python3.4/site-packages
> checking for python3 extension module directory...
> ${exec_prefix}/lib/python3.4/site-packages
> checking whether NLS is requested... yes
> checking for msgfmt... /usr/bin/msgfmt
> checking for gmsgfmt... /usr/bin/msgfmt
> checking for xgettext... /usr/bin/xgettext
> checking for msgmerge... /usr/bin/msgmerge
> checking for ld used by gcc -std=gnu11... /usr/bin/ld
> checking if the linker (/usr/bin/ld) is GNU ld... yes
> checking for shared library run path origin... done
> checking 32-bit host C ABI... no
> checking for the common suffixes of directories in the library search
> path... lib,lib
> checking for CFPreferencesCopyAppValue... no
> checking for CFLocaleCopyCurrent... no
> checking for GNU gettext in libc... yes
> checking whether to use NLS... yes
> checking where the gettext function comes from... libc
> checking for ranlib... (cached) ranlib
> checking for flex... flex
> checking lex output file root... lex.yy
> checking lex library... -lfl
> checking whether yytext is a pointer... yes
> checking for an ANSI C-conforming const... yes
> checking for inline... inline
> checking for working volatile... yes
> checking for ANSI C header files... (cached) yes
> checking for special C compiler options needed for large files... (cached) no
> checking for _FILE_OFFSET_BITS value needed for large files... (cached) no
> checking size of off_t... 8
> checking for stdbool.h that conforms to C99... yes
> checking for _Bool... yes
> checking for unistd.h... (cached) yes
> checking

[Bug-wget] problem installing wget 1.20.1 on GNU/Linux 4.4.0-134-generic

2019-03-15 Thread Mauricio Zambrano Bigiarini

I'm reporting the results of make check on Linux Mint 18.3

Thanks in advance for any comment on this.

Kind regards

Mauricio Zambrano-Bigiarini, PhD

=
Department of Civil Engineering
Faculty of Engineering and Sciences
Universidad de La Frontera, Temuco, Chile
http://hzambran.github.io/
=
mailto : mauricio.zambr...@ufrontera.cl
work-phone : +56 45 259 2812
=
"Mistakes are always forgivable, if one has
the courage to admit them" (Bruce Lee)
=
Linux user #454569 -- Linux Mint user

uname -a

4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 17 11:07:07 UTC 2018
x86_64 x86_64 x86_64 GNU/Linux


./configure --sysconfdir=/etc  --with-ssl=openssl

configure: configuring for GNU Wget 1.20.1
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking whether make supports nested variables... (cached) yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to enable C11 features... -std=gnu11
checking whether make supports the include directive... yes (GNU style)
checking dependency style of gcc -std=gnu11... gcc3
checking how to run the C preprocessor... gcc -std=gnu11 -E
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking minix/config.h usability... no
checking minix/config.h presence... no
checking for minix/config.h... no
checking whether it is safe to define __EXTENSIONS__... yes
checking whether _XOPEN_SOURCE should be defined... no
checking for Minix Amsterdam compiler... no
checking for ar... ar
checking for ranlib... ranlib
checking for _LARGEFILE_SOURCE value needed for large files... no
checking for special C compiler options needed for large files... no
checking for _FILE_OFFSET_BITS value needed for large files... no
checking for a Python interpreter with version >= 3.0... python3
checking for python3... /usr/bin/python3
checking for python3 version... 3.4
checking for python3 platform... linux
checking for python3 script directory... ${prefix}/lib/python3.4/site-packages
checking for python3 extension module directory...
${exec_prefix}/lib/python3.4/site-packages
checking whether NLS is requested... yes
checking for msgfmt... /usr/bin/msgfmt
checking for gmsgfmt... /usr/bin/msgfmt
checking for xgettext... /usr/bin/xgettext
checking for msgmerge... /usr/bin/msgmerge
checking for ld used by gcc -std=gnu11... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for shared library run path origin... done
checking 32-bit host C ABI... no
checking for the common suffixes of directories in the library search
path... lib,lib
checking for CFPreferencesCopyAppValue... no
checking for CFLocaleCopyCurrent... no
checking for GNU gettext in libc... yes
checking whether to use NLS... yes
checking where the gettext function comes from... libc
checking for ranlib... (cached) ranlib
checking for flex... flex
checking lex output file root... lex.yy
checking lex library... -lfl
checking whether yytext is a pointer... yes
checking for an ANSI C-conforming const... yes
checking for inline... inline
checking for working volatile... yes
checking for ANSI C header files... (cached) yes
checking for special C compiler options needed for large files... (cached) no
checking for _FILE_OFFSET_BITS value needed for large files... (cached) no
checking size of off_t... 8
checking for stdbool.h that conforms to C99... yes
checking for _Bool... yes
checking for unistd.h... (cached) yes
checking sys/time.h usability... yes
checking sys/time.h presence... yes
checking for sys/time.h... yes
checking termios.h usability... yes
checking termios.h presence... yes
checking for termios.h... yes
checking sys/ioctl.h usability... yes
checking sys/ioctl.h presence... yes
checking for sys/ioctl.h... yes
checking sys/select.h usability... yes
checking sys/select.h presence... yes
checking for sys/select.h... yes
checking utime.h usability... yes
checking utime.h presence... yes
checkin

Re: [Bug-wget] problem downloading

2015-05-10 Thread Luke Bryan

Of course it would do that.  I knew that. (ha ha)

Your reply solved the problem.  Now it works perfectly.  I'll be able to
write a bash script and get all the stock quotes I need.  Thanks!


On Sun, May 10, 2015 at 7:02 PM, Hubert Tarasiuk 
wrote:

> Are you enclosing this URL in quotes? Otherwise, shell will treat it as
> $ wget -A csv -O /tmp/myfile.csv
> http://finance.yahoo.com/d/quotes.csv?s=XOM+GE+MSFT &
> $ f=spn
> Ie. run wget in background and assign "spn" to variable $f.
>
> On Sun, May 10, 2015 at 10:57 PM, Luke Bryan 
> wrote:
>
>> I've reviewed the --help page and the manual.
>>
>> I tried the following with several different options and I can't get it to
>> work.  What am I doing wrong?
>>
>>
>>
>> wget -A csv -O /tmp/myfile.csv
>> http://finance.yahoo.com/d/quotes.csv?s=XOM+GE+MSFT&f=spn
>>
>
>

Re: [Bug-wget] problem downloading

2015-05-10 Thread Hubert Tarasiuk

Are you enclosing this URL in quotes? Otherwise, shell will treat it as
$ wget -A csv -O /tmp/myfile.csv
http://finance.yahoo.com/d/quotes.csv?s=XOM+GE+MSFT &
$ f=spn
Ie. run wget in background and assign "spn" to variable $f.

On Sun, May 10, 2015 at 10:57 PM, Luke Bryan  wrote:

> I've reviewed the --help page and the manual.
>
> I tried the following with several different options and I can't get it to
> work.  What am I doing wrong?
>
>
>
> wget -A csv -O /tmp/myfile.csv
> http://finance.yahoo.com/d/quotes.csv?s=XOM+GE+MSFT&f=spn
>

[Bug-wget] problem downloading

2015-05-10 Thread Luke Bryan

I've reviewed the --help page and the manual.

I tried the following with several different options and I can't get it to
work.  What am I doing wrong?



wget -A csv -O /tmp/myfile.csv
http://finance.yahoo.com/d/quotes.csv?s=XOM+GE+MSFT&f=spn

Re: [Bug-wget] Problem with headers getting put into the file

2015-01-08 Thread Tim Ruehsen

On Tuesday 06 January 2015 05:37:49 Nuzhna Pomoshch wrote:
> Hi,
> 
> I have been using wget for many years, and have just recently begun to
> encounter a strange problem.
> 
> I typically do "wget -S http:/path.to/some.file -O local.filename", which
> has always worked fine in the past.
> 
> On some sites now, the headers are getting put into the beginning of the
> output file.
> 
> A typical set of those headers (from the saved file) is:
> 
> HTTP/1.1 200 OK
> Server: nginx
> Date: Thui, 01 Jan 2015 00:00:00 GMT
> Content-Type: application/force-download
> Content-Length: 1073741824
> Last-Modified: Thu, 25 Dec 2014 00:00:00 GMT
> Connection: keep-alive
> Content-Disposition: attachment; filename="some.file"
> ETag: "-"
> Accept-Ranges: bytes
> 
> I am wondering if the mime type "application/force-download" isn't causing
> the problem.
> 
> This is unpleasant at best (although I can usually remove the headers with
> dd). The big problem comes when the download is interrupted, and I try to
> resume it. When that happens, the partial range requested doesn't match
> what is on the disk (it is off by the size of the headers at the beginning
> of the file), and the file (most often a large media file) gets corrupted
> (bytes are missing from the middle).
> 
> Has anyone encountered this before and does anyone have any thoughts on how
> to resolve this?

It looks like the web server is misconfigured and sends an additional HTTP 
header within the response body.

BTW, application/force-download seems to be a hack (that some browsers 
support). see 
http://stackoverflow.com/questions/10615797/utility-of-http-header-content-type-application-force-download-for-mobile

If you give me your real URL and I can investigate to patch Wget to support 
this hacky mime-type.

Tim

signature.asc
Description: This is a digitally signed message part.

[Bug-wget] Problem with headers getting put into the file

2015-01-06 Thread Nuzhna Pomoshch

Hi,

I have been using wget for many years, and have just recently begun to 
encounter a strange problem.

I typically do "wget -S http:/path.to/some.file -O local.filename", which has 
always worked fine in the past.

On some sites now, the headers are getting put into the beginning of the output 
file.

A typical set of those headers (from the saved file) is:

HTTP/1.1 200 OK
Server: nginx
Date: Thui, 01 Jan 2015 00:00:00 GMT
Content-Type: application/force-download
Content-Length: 1073741824
Last-Modified: Thu, 25 Dec 2014 00:00:00 GMT
Connection: keep-alive
Content-Disposition: attachment; filename="some.file"
ETag: "-"
Accept-Ranges: bytes

I am wondering if the mime type "application/force-download" isn't causing the 
problem.

This is unpleasant at best (although I can usually remove the headers with dd). 
The big problem
comes when the download is interrupted, and I try to resume it. When that 
happens, the partial
range requested doesn't match what is on the disk (it is off by the size of the 
headers at the
beginning of the file), and the file (most often a large media file) gets 
corrupted (bytes are missing
from the middle).

Has anyone encountered this before and does anyone have any thoughts on how to 
resolve this?

Re: [Bug-wget] Problem with -include directory

2014-12-17 Thread Tim Ruehsen

Hi Richard,

On Friday 05 December 2014 13:37:24 Richard Tan wrote:
> I'm having a problem with the -I flag (include directories) and I was
> wondering if somebody could help me out.
> 
> I am attempting to download two subdirectories from a website. I told wget
> to download from this address:
> 
> www.abcde.com/site/eng/weeklyChecklist/issues.html
> 
> With the list
> 
> "/2012, /2011 www.abcde.com/site/eng/weeklyChecklist"

I guess, your list looks like
"/2012, /2011, www.abcde.com/site/eng/weeklyChecklist" !?

> However, when I run wget, it reject the first file (issues.html), which
> seems to indicate that wget isn't reading the root directory properly.

I can not reproduce the described behavior. The first file (the one you 
request on the command line), is always downloaded, no matter what the 
argument to -I looks like.

It would be nice to have the output of wget --version, the complete command 
line and the output when you add --debug.

Maybe your site is redirecting or whatever... we need more facts to reproduce.

Tim

signature.asc
Description: This is a digitally signed message part.

[Bug-wget] Problem with -include directory

2014-12-06 Thread Richard Tan

I'm having a problem with the -I flag (include directories) and I was
wondering if somebody could help me out.

I am attempting to download two subdirectories from a website. I told wget
to download from this address:

www.abcde.com/site/eng/weeklyChecklist/issues.html

With the list

"/2012, /2011 www.abcde.com/site/eng/weeklyChecklist"

However, when I run wget, it reject the first file (issues.html), which
seems to indicate that wget isn't reading the root directory properly.

Thanks for any help.

Best,
Richard Tan


-- 
Richard Tan
Metadata Analyst
University of California, Berkeley
(510) 643-2041

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-10-02 Thread Ángel González


On 24/09/13 10:38, Tim Ruehsen wrote:

Just for completeness: these guessing steps called "encoding sniffing
algorithm" are described in 12.2.2.2.
But only "In some cases, it might be impractical to unambiguously determine
the encoding before parsing the document.".
Yes, it allows to start parsing with one encoding, then abort and change 
to a

different one.


I found this iso-8859-1 / windows-1252 issue mentioned on the Wikipedia
'windows-1252' page, but couldn't find it on the HTML Living Standard pages.
Could you give me a pointer, please ?
It's at the beginning of html parsing, it lists several encodings given 
by the page
and the encoding you should use to parse them, saying it is a willful 
violation.




What do you think, how can we address the iso / windows encoding issue (should
we ?) ? As I understood, it is only valid for HTML5...
It's just a matter of comparing the input encoding with a well-known 
list and replace it.



Is there a practical need for the sniffing algorithm ?
If we want to deal with the "ÅÄÖ links" properly, we should do encoding 
detection.



Do you know any real web sites / pages where the encoding is ambiguous ?

I consider those web sites broken. But I don't have numbers.

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-24 Thread Tim Ruehsen

On Monday 23 September 2013 23:32:39 Ángel González wrote:
> On 17/09/13 09:49, Tim Ruehsen wrote:
> > On Tuesday 17 September 2013 00:17:21 Ángel González wrote:
> >>> [1] http://nikitathespider.com/articles/EncodingDivination.html
> >> 
> >> Note that these steps are outdated now (that was written at most at
> >> 2008).
> > 
> > Outdated by exactly what ? RFC3986 is of 2005 and does not contradict to
> > [1]. See my explanation above.
> 
> By the HTML Living Standard (formerly known as HTML5)
> http://www.whatwg.org/specs/web-apps/current-work/multipage/
> 
> The Content-type header is sometimes overriden, ISO-8859-1 now means
> windows-1252,
> there are some well-defined guessing steps when there's such need...

Just for completeness: these guessing steps called "encoding sniffing 
algorithm" are described in 12.2.2.2.
But only "In some cases, it might be impractical to unambiguously determine 
the encoding before parsing the document.".

I found this iso-8859-1 / windows-1252 issue mentioned on the Wikipedia  
'windows-1252' page, but couldn't find it on the HTML Living Standard pages.
Could you give me a pointer, please ?

What do you think, how can we address the iso / windows encoding issue (should 
we ?) ? As I understood, it is only valid for HTML5...

Is there a practical need for the sniffing algorithm ?
Do you know any real web sites / pages where the encoding is ambiguous ?

Tim

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-23 Thread Ángel González


On 17/09/13 09:49, Tim Ruehsen wrote:

On Tuesday 17 September 2013 00:17:21 Ángel González wrote:

[1] http://nikitathespider.com/articles/EncodingDivination.html

Note that these steps are outdated now (that was written at most at 2008).

Outdated by exactly what ? RFC3986 is of 2005 and does not contradict to [1].
See my explanation above.

By the HTML Living Standard (formerly known as HTML5)
http://www.whatwg.org/specs/web-apps/current-work/multipage/

The Content-type header is sometimes overriden, ISO-8859-1 now means 
windows-1252,

there are some well-defined guessing steps when there's such need...

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-17 Thread Tim Ruehsen

On Tuesday 17 September 2013 00:17:21 Ángel González wrote:
> On 16/09/13 12:50, Tim Ruehsen wrote:
> > Just to have it mentioned:
> > Your download (wget -r http://bmit.se/wget) succeeds, but it shouldn't !
> > IMHO, Wget has a bug here and just because of this bug your test case
> > succeeds.
> > 
> > Why ?
> > Your wget/index.html holds the UTF-8 encoded URL 'teståäöÅÄÖ', but neither
> > the server header (Content-Type: text/html) nor the document itself (META
> > http- equiv ...) defines the charset. That means the charset encoding of
> > index.html should be ISO-8859-1. See [1].
> > Wget should have taken the URL 'teståäöÅÄÖ' as ISO-8859-1 and convert it
> > into UTF-8, which would fail to download.
> > 
> > Conclusion
> > 1. Be prepared that Wget will change it's behaviour sooner or later (make
> > sure, you specify / deliver the charset encoding of your documents).
> > 2. Wget will/does have problems with ISO-8859-1 text/html pages if the
> > charset is not  specified AND special chars are used.
> > 
> > Someone proving me wrong ?
> 
> I think that in the past, if the document was in iso-8859-1, imho
> it would be legal to give the server the url *encoded in iso-8859-1*,
> thus resulting in the same %-encoded url.

Just to make clear, we are talking about two different things.
1. What is the encoding of an URL found in a downloaded document ?
[1] makes it clear, how the steps are.

2. What is the encoding of the URL provided in the GET request ?
RFC2616 and RFC3986 have the same opinion:
a. convert the URL into UTF-8
b. percent encode characters that are in the 'unreserved' set.
(see RFC3986 2.5, last paragraph)

To do 2.a. one has to know the 'original' character set.
When we are in recursive mode (as in the bmit.se example) we have to use [1] 
to determine the 'original' charset before we can generate a GET request.

The reason why Wget works with so many sites is that the most sites are either 
ASCII or ISO-8859-1. And sites with non-ASCII domain names use very often 
ASCII characters only in their URL patch/query/fragment. So no problem here.

I know nothing about how servers interpret the GET URL. They might have some 
guessing, e.g. using ISO-8859-1 if decoding from UTF-8 fails.

> > [1] http://nikitathespider.com/articles/EncodingDivination.html
> Note that these steps are outdated now (that was written at most at 2008).

Outdated by exactly what ? RFC3986 is of 2005 and does not contradict to [1].
See my explanation above.

> 
> On 16/09/13 16:29, Tony Lewis wrote:
> > Neither Firefox nor Internet Explorer can navigate that link. Both
> > fail trying to retrieve testÃ¥Ã¤Ã¶Ã…Ã„Ã–.
> 
> That's strange. I can browse it on Firefox 23. Perhaps its guessing is
> better.

In Firefox 23.0.1 (Debian SID), running in an UTF-8 environment:
When I enter bmit.se/wget the third link is displayed wrong.
This is as expected since following [1], the page should be ISO-8859-1, but 
includes an UTF-8 encoded URL. These UTF-8 characters are now converted from 
ISO-8859-1 to UTF-8 and thus display wrong.
If you are in an ISO-8859-1 or similar environment (some Windows encodings are 
very similar), the link would display correctly but this is just a lucky 
effect (same goes to Wget here).

To have it looking correct everywhere, bmit.se/wget should either provide the 
charset UTF-8 (in the response header or in a META tag) or should have 
ISO-8859-1 characters in the page.

[1] http://nikitathespider.com/articles/EncodingDivination.html

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-16 Thread Ángel González


On 16/09/13 12:50, Tim Ruehsen wrote:

Just to have it mentioned:
Your download (wget -r http://bmit.se/wget) succeeds, but it shouldn't !
IMHO, Wget has a bug here and just because of this bug your test case
succeeds.

Why ?
Your wget/index.html holds the UTF-8 encoded URL 'teståäöÅÄÖ', but neither the
server header (Content-Type: text/html) nor the document itself (META http-
equiv ...) defines the charset. That means the charset encoding of index.html
should be ISO-8859-1. See [1].
Wget should have taken the URL 'teståäöÅÄÖ' as ISO-8859-1 and convert it into
UTF-8, which would fail to download.

Conclusion
1. Be prepared that Wget will change it's behaviour sooner or later (make
sure, you specify / deliver the charset encoding of your documents).
2. Wget will/does have problems with ISO-8859-1 text/html pages if the charset
is not  specified AND special chars are used.

Someone proving me wrong ?

I think that in the past, if the document was in iso-8859-1, imho
it would be legal to give the server the url *encoded in iso-8859-1*, 
thus resulting
in the same %-encoded url. However, rfc3986 & rfc3987 already set that 
they shall

be in utf-8.


[1] http://nikitathespider.com/articles/EncodingDivination.html

Note that these steps are outdated now (that was written at most at 2008).


On 16/09/13 16:29, Tony Lewis wrote:
Neither Firefox nor Internet Explorer can navigate that link. Both 
fail trying to retrieve testÃ¥Ã¤Ã¶Ã…Ã„Ã–.
That's strange. I can browse it on Firefox 23. Perhaps its guessing is 
better.

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-16 Thread Tony Lewis

Tim Ruehsen wrote:

> Wget should have taken the URL 'teståäöÅÄÖ' as ISO-8859-1 and convert it
into UTF-8, which would fail to download.

Neither Firefox nor Internet Explorer can navigate that link. Both fail
trying to retrieve testÃ¥Ã¤Ã¶ÃÃÃ.

I concur with Tim that this behavior of wget is accidental and should not be
relied on. Perhaps it would be useful to have an option to wget to specify
the encoding when a broken server fails to do so: --encoding=utf-8. If such
an option were added, it seems to me that it should not override explicit
encoding from the server (although the encoding option might allow the user
to override even that: --encoding=utf-16,force).

Tony

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-16 Thread Tim Ruehsen

> > I switched my environment to UTF-8 now and it seems to work:
> On my main-machine to, didn't have access to that one yesterday-evening.

Just to have it mentioned:
Your download (wget -r http://bmit.se/wget) succeeds, but it shouldn't !
IMHO, Wget has a bug here and just because of this bug your test case 
succeeds.

Why ?
Your wget/index.html holds the UTF-8 encoded URL 'teståäöÅÄÖ', but neither the 
server header (Content-Type: text/html) nor the document itself (META http-
equiv ...) defines the charset. That means the charset encoding of index.html 
should be ISO-8859-1. See [1].
Wget should have taken the URL 'teståäöÅÄÖ' as ISO-8859-1 and convert it into 
UTF-8, which would fail to download.

Conclusion
1. Be prepared that Wget will change it's behaviour sooner or later (make 
sure, you specify / deliver the charset encoding of your documents).
2. Wget will/does have problems with ISO-8859-1 text/html pages if the charset 
is not  specified AND special chars are used.

Someone proving me wrong ?

[1] http://nikitathespider.com/articles/EncodingDivination.html

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-15 Thread Bykov Aleksey


Greetings

Thanks for correcting.
Sorry for unclean code and troubling.

- Make wget recognise utf-8 urls and accept them without nocontrol when  
the filesystem encoding is utf-8.

Did You sure? UTF-8 name can contain colon (i remember, that see likewise
files). And at
least in Windows colon still to be restricted char.
I think, that it is possible to use current --restrict-file-names logic,
just with add convert to widechar (and vistaversa), add checking only
symbols with code lower that 256 and in pair place replace type from
"char" to "wchar_t". Need to check. Sorry, after some time.


What happens if the filename has more than 1024 characters?

Just filename crop. Now buffer_size determines by MultiByteToWideChar. Not
sure, that it need now multipling by sizeof(wchar_t).

Big bug. The sixth argument is the space available for w_filename *in  
characters*, not bytes.
Why bother allocating memory, if you are using a fixed size? Another  
opiton would be to use alloca()

I guess rename() would also need a wrapper.

Thanks.


This code should be on mswindows.c

I'm just forgot about mswindows.*. Yes, it much more situable place.


What mades w_fopen() different so it is on utils.h instead of the .c?

Sorry, i dont know. I had very little experience to understood.
Can You please take look and say what i do wrong?
I remember (belive?) that in NAME.h must be function declaration, and in
NAME.c - function body. And only if exist declaration in (included
directly or indirectly) NAME.h, other files can receive access to function
body. But with that structure my code not work.
Now all functions in utils.h (except w_fopen() ) can work in other files
without declaration, and w_fopen work only then its body in utils.h . In
attachment diffs for working and non-working variants (sorry, it based on
utils.* because in mswindows.h it is not worked at all. It must be just
appending code to tail).

--
Best regars, Alex

On Sun, 15 Sep 2013 03:54:07 +0300, Ángel González 
wrote:


On 15/09/13 00:59, Bykov Aleksey wrote:

Greetings

Great thanks for pushing in correct direction.

With attached patch Wget in Windows can work with UTF-8 names. But -  
also only with "--restrict-file-names=nocontrol"...

I think there are two issues:
- Make wget recognise utf-8 urls and accept them without nocontrol when  
the filesystem encoding is utf-8.

- Correctly store the filenames in Windows.

I would have started with the first one, and then treat Windows as utf-8  
enabled fs, which is what this patch does. Also, isn't there any library  
doing already this?



diff --git a/src/utils.c b/src/utils.c
index 2ec9601..6307c88 100644
--- a/src/utils.c
+++ b/src/utils.c
@@ -2544,3 +2544,42 @@ test_dir_matches_p()

  #endif /* TESTING */

+#ifdef WINDOWS
+/* For UTF-8 in Windows support. Replacement standart fopen() utime()  
stat() lstat() mkdir() with wide character
+analogs route. w_fopen() declared in utils.h, w_utime(), w_stat() and  
w_mkdir - in utils.c */


This code should be on mswindows.c
What mades w_fopen() different so it is on utils.h instead of the .c?

Commenting on just one function, as they all follow the same templte:


+int
+w_stat (const char *filename, struct_stat *buffer )
+{
+  wchar_t *w_filename;
+  int buffer_size = 1024; /* I cant push it to work with strlen() */

What happens if the filename has more than 1024 characters?

+  w_filename = malloc (buffer_size);
+  MultiByteToWideChar(65001, 0, filename, -1, w_filename, buffer_size);

Using CP_UTF8 instead of 65001 would be preferable IMHO.

Big bug. The sixth argument is the space available for w_filename *in  
characters*, not bytes.
I would multiply buffer_size by sizeof(wchar_t) in the malloc (although  
you could instead divide here, too).



+  int res = _wstati64 (w_filename, buffer);

It would be better to declare res at the beginning of the function.


+  free (w_filename);
+  return res;
+}
Why bother allocating memory, if you are using a fixed size? Another  
opiton would be to use alloca()



I guess rename() would also need a wrapper.

non_work.diff
Description: Binary data
gcc  -static -fno-unwind-tables -fno-asynchronous-unwind-tables -DPCRE_STATIC 
-DLARGEFILES -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_PC_NAME_MAX=255 -Os 
-s -msse2  -static -s -L/usr/local/winiconv/lib -o wget.exe cmpt.o connect.o 
convert.o cookies.o ftp.o css_.o css-url.o ftp-basic.o ftp-ls.o hash.o host.o 
html-parse.o html-url.o http.o init.o log.o main.o netrc.o progress.o ptimer.o 
recur.o res.o retr.o spider.o url.o warc.o utils.o exits.o build_info.o iri.o 
version.o ftp-opie.o mswindows.o openssl.o http-ntlm.o ../lib/libgnu.a 
/usr/local/winiconv/lib/libiconv.a /usr/local/winiconv/lib/libintl.a 
-L/usr/local/winiconv/lib /usr/local/winiconv/lib/libiconv.a 
/usr/local/ssl3/lib/libssl.a /usr/local/ssl3/lib/libcrypto.a -lz -lz -lz  
-lws2_32 -lgdi32 -lidn -lpcre
convert.o:convert.c:(.text+0xc56): undefined reference to `w_fopen'
cookies.o:cookies.c:(.text+0x109c): unde

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-14 Thread Ángel González


On 15/09/13 00:59, Bykov Aleksey wrote:

Greetings

Great thanks for pushing in correct direction.

With attached patch Wget in Windows can work with UTF-8 names. But - 
also only with "--restrict-file-names=nocontrol"...

I think there are two issues:
- Make wget recognise utf-8 urls and accept them without nocontrol when 
the filesystem encoding is utf-8.

- Correctly store the filenames in Windows.

I would have started with the first one, and then treat Windows as utf-8 
enabled fs, which is what this patch does. Also, isn't there any library 
doing already this?



diff --git a/src/utils.c b/src/utils.c
index 2ec9601..6307c88 100644
--- a/src/utils.c
+++ b/src/utils.c
@@ -2544,3 +2544,42 @@ test_dir_matches_p()

  #endif /* TESTING */

+#ifdef WINDOWS
+/* For UTF-8 in Windows support. Replacement standart fopen() utime() stat() 
lstat() mkdir() with wide character
+analogs route. w_fopen() declared in utils.h, w_utime(), w_stat() and w_mkdir 
- in utils.c */


This code should be on mswindows.c
What mades w_fopen() different so it is on utils.h instead of the .c?

Commenting on just one function, as they all follow the same templte:


+int
+w_stat (const char *filename, struct_stat *buffer )
+{
+  wchar_t *w_filename;
+  int buffer_size = 1024; /* I cant push it to work with strlen() */

What happens if the filename has more than 1024 characters?

+  w_filename = malloc (buffer_size);
+  MultiByteToWideChar(65001, 0, filename, -1, w_filename, buffer_size);

Using CP_UTF8 instead of 65001 would be preferable IMHO.

Big bug. The sixth argument is the space available for w_filename *in 
characters*, not bytes.
I would multiply buffer_size by sizeof(wchar_t) in the malloc (although 
you could instead divide here, too).



+  int res = _wstati64 (w_filename, buffer);

It would be better to declare res at the beginning of the function.


+  free (w_filename);
+  return res;
+}
Why bother allocating memory, if you are using a fixed size? Another 
opiton would be to use alloca()



I guess rename() would also need a wrapper.

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-14 Thread Bykov Aleksey

Greetings

Great thanks for pushing in correct direction.

With attached patch Wget in Windows can work with UTF-8 names. But - also  
only with "--restrict-file-names=nocontrol"...

Windows need conversion for all work with wide chars.  
MultiByteToWideChar() choosed because it allow to force set input  
encdoing. And after convertion chars can be checked separatly for  
restriction. As variant - restricted symbol replaced and whole string  
converted back to UTF-8 with WideCharToMultiByte().
It is possible in UNIX use mbstowcs()/wcstombs with setlocale(LC_ALL,  
"UTF-8") for same purpose? Or exist some better way to convert shortstring  
to widestring during character quoting?

--
Best regars, Alex

On Fri, 13 Sep 2013 16:13:10 +0300, Tim Ruehsen  wrote:

On Friday 13 September 2013 12:43:53 Bykov Aleksey wrote:

Greetings
Yes, You show correct cyrillic filename.
Sorry, I'm not aggree that this bug is ready to close.
Your method is mentioned in it.
This bug about filenames in non UTF-8 locales.

Main qoute:
> If you are using a unix-like OS where the filesystem interface uses
> utf-8, there is a workaround of using --restrict-file-names=nocontrol
> (which is still too big, as that would allow problematic control
> characters %01 or %09).
> If you are using Windows, --restrict-file-names=nocontrol still gives
> garbage (the utf-8 characters are treated as if they were in latin1).

Thanks for pointing this out. I missed it.

I'm tried to solve this bug by adding new options
--local-filesystem-encoding
http://lists.gnu.org/archive/html/bug-wget/2013-05/msg00102.html
but patch was (rejected?)/(frozen?)/(lack of demand?).

It seems, there has be no discussion about. I interpret that it might be  
a

lack of interest - but i am not sure.

But quick net search reveals that NTFS is using UTF-16 (UNICODE) while  
fopen()

demands ASCII !?
[1] suggests to feed UTF-8 strings to CreateFile() or wfopen() when  
built with

UNICODE. For a non-UNICODE build use CreateFileW() or wfopen().

So maybe your patch used the wrong approach.
You should try to use the above mentioned functions for WINDOWS builds.
If that works, the patch will be just a few lines...

Sorry, I don't know how Björn Mattsson swith it Windows Vista (x64)
filesystem to UTF-8.
In Russian locales Windows 98, XP (x86), Vista (x86) use filesystem
encoding CP866.

Wasn't there something like international language support even for  
Windows 98
? Together with perhaps some new fonts, that should do it... but hey, I  
out of

the Windows business since 12 years now and I never regretted it.

[1]  
http://stackoverflow.com/questions/2050973/what-encoding-are-filenames-in-ntfs-stored-as

[2] http://en.wikipedia.org/wiki/Filename

win_utf-8.diff
Description: Binary data

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-13 Thread Tim Ruehsen

On Friday 13 September 2013 12:43:53 Bykov Aleksey wrote:
> Greetings
> Yes, You show correct cyrillic filename.
> Sorry, I'm not aggree that this bug is ready to close.
> Your method is mentioned in it.
> This bug about filenames in non UTF-8 locales.
> 
> Main qoute:
> > If you are using a unix-like OS where the filesystem interface uses
> > utf-8, there is a workaround of using --restrict-file-names=nocontrol
> > (which is still too big, as that would allow problematic control
> > characters %01 or %09).
> > If you are using Windows, --restrict-file-names=nocontrol still gives
> > garbage (the utf-8 characters are treated as if they were in latin1).

Thanks for pointing this out. I missed it.

> I'm tried to solve this bug by adding new options
> --local-filesystem-encoding
> http://lists.gnu.org/archive/html/bug-wget/2013-05/msg00102.html
> but patch was (rejected?)/(frozen?)/(lack of demand?).

It seems, there has be no discussion about. I interpret that it might be a 
lack of interest - but i am not sure.

But quick net search reveals that NTFS is using UTF-16 (UNICODE) while fopen() 
demands ASCII !?
[1] suggests to feed UTF-8 strings to CreateFile() or wfopen() when built with 
UNICODE. For a non-UNICODE build use CreateFileW() or wfopen().

So maybe your patch used the wrong approach.
You should try to use the above mentioned functions for WINDOWS builds.
If that works, the patch will be just a few lines...

> Sorry, I don't know how Björn Mattsson swith it Windows Vista (x64)
> filesystem to UTF-8.
> In Russian locales Windows 98, XP (x86), Vista (x86) use filesystem
> encoding CP866.

Wasn't there something like international language support even for Windows 98 
? Together with perhaps some new fonts, that should do it... but hey, I out of 
the Windows business since 12 years now and I never regretted it.

[1] 
http://stackoverflow.com/questions/2050973/what-encoding-are-filenames-in-ntfs-stored-as
[2] http://en.wikipedia.org/wiki/Filename

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-13 Thread Bykov Aleksey


Greetings
Yes, You show correct cyrillic filename.
Sorry, I'm not aggree that this bug is ready to close.
Your method is mentioned in it.
This bug about filenames in non UTF-8 locales.
Main qoute:

If you are using a unix-like OS where the filesystem interface uses
utf-8, there is a workaround of using --restrict-file-names=nocontrol
(which is still too big, as that would allow problematic control
characters %01 or %09).
If you are using Windows, --restrict-file-names=nocontrol still gives
garbage (the utf-8 characters are treated as if they were in latin1).


I'm tried to solve this bug by adding new options
--local-filesystem-encoding
http://lists.gnu.org/archive/html/bug-wget/2013-05/msg00102.html
but patch was (rejected?)/(frozen?)/(lack of demand?).

Sorry, I don't know how Björn Mattsson swith it Windows Vista (x64)
filesystem to UTF-8.
In Russian locales Windows 98, XP (x86), Vista (x86) use filesystem
encoding CP866.

--
Best regars, Alex



On Fri, 13 Sep 2013 10:50:10 +0300, Tim Ruehsen  wrote:


Wasn't that problem always there?
Looks like bug 37564 [1], you can work around it with
--restrict-file-names=nocontrol
You may find some more information in the list archives.

1- https://savannah.gnu.org/bugs/index.php?37564


Please excuse me for my confusion. In my first tests I didn't have a  
proper
utf-8 environment. Than I had, but didn't use  
--restrict-file-names=nocontrol.


Now, having a proper utf-8 env AND using nocontrol, everything looks  
perfect.

The bug should be closed.

$ wget --restrict-file-names=nocontrol
'http://upload.wikimedia.org/wikipedia/commons/a/ad/%D0%9F%D0%B0%D0%BC%D1%8F%D1%82%D0%BD%D0%B8%D0%BA_%D0%B7%D0%B0%D1%82%D0%BE%D0%BF%D0%BB%D0%B5%D0%BD%D0%BD%D1%8B%D0%BC_%D0%BA%D0%BE%D1%80%D0%B0%D0%B1%D0%BB%D1%8F%D0%BC_%D0%B2_%D0%A1%D0%B5%D0%B2%D0%B0%D1%81%D1%82%D0%BE%D0%BF%D0%BE%D0%BB%D0%B5.JPG'

--2013-09-13 09:43:01--
http://upload.wikimedia.org/wikipedia/commons/a/ad/%D0%9F%D0%B0%D0%BC%D1%8F%D1%82%D0%BD%D0%B8%D0%BA_%D0%B7%D0%B0%D1%82%D0%BE%D0%BF%D0%BB%D0%B5%D0%BD%D0%BD%D1%8B%D0%BC_%D0%BA%D0%BE%D1%80%D0%B0%D0%B1%D0%BB%D1%8F%D0%BC_%D0%B2_%D0%A1%D0%B5%D0%B2%D0%B0%D1%81%D1%82%D0%BE%D0%BF%D0%BE%D0%BB%D0%B5.JPG
Resolving upload.wikimedia.org (upload.wikimedia.org)... 91.198.174.234,
2620:0:862:ed1a::b
Connecting to upload.wikimedia.org (upload.wikimedia.org)|
91.198.174.234|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6937886 (6.6M) [image/jpeg]
Saving to: ‘Памятник_затопленным_кораблям_в_Севастополе.JPG’

100%[===>]
6,937,886   1.27MB/s   in 7.2s

2013-09-13 09:43:09 (936 KB/s) -
‘Памятник_затопленным_кораблям_в_Севастополе.JPG’ saved [6937886/6937886]

$ ls -la
total 119316
drwxr-xr-x  3 oms users 20480 13-09-13 09:43:01 .
drwxr-xr-x 17 oms users  4096 09-09-13 10:36:19 ..
-rw-r--r--  1 oms users   6937886 01-09-12 10:41:38
Памятник_затопленным_кораблям_в_Севастополе.JPG

I hope my email program does it right and you can see the cyrillic  
filename.


Tim

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-13 Thread Björn Mattsson


On 2013-09-13 09:42, Tim Ruehsen wrote:

On Thursday 12 September 2013 21:34:01 Björn Mattsson wrote:

On 2013-09-12 21:21, Tim Rühsen wrote:

Am Donnerstag, 12. September 2013, 12:59:00 schrieb Björn Mattsson:

Run into a bug in wget last week.
Done some digging but can't solve it by my self.

If i tries to wget a file containing capital ÅÄÖ they gets coverted
wrongly, and åäö works fine.

I uses wget -m to backup one of my webb-sites to another machine. Have
worked like a cahrm for the last 4-5 years but a couple of week ago one
of teh files came down wrong. Thought it was a college that had uploaded
something wrong but after some digging it's wget that converts wrongly.

I have UTF-8 as charset on my machine.

If you want to test/see the problem

wget -m http://bmit.se/wget

Just use
wget --restrict-file-names=nocontrol -m http://bmit.se/wget

Still the same problem. åäö OK but ÅÄÖ gets wrong.

I switched my environment to UTF-8 now and it seems to work:


On my main-machine to, didn't have access to that one yesterday-evening.

Thanx for the help.

--
Best regards
Björn Mattsson
Network engineer
IT Department
Blekinge Institute of Technology
 
bjorn.matts...@bth.se

Office: +46 (0)455-385163
IT Helpdesk: +46 (0)455-385100

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-13 Thread Tim Ruehsen

> Wasn't that problem always there?
> Looks like bug 37564 [1], you can work around it with
> --restrict-file-names=nocontrol
> You may find some more information in the list archives.
> 
> 1- https://savannah.gnu.org/bugs/index.php?37564

Please excuse me for my confusion. In my first tests I didn't have a proper 
utf-8 environment. Than I had, but didn't use --restrict-file-names=nocontrol.

Now, having a proper utf-8 env AND using nocontrol, everything looks perfect.
The bug should be closed.

$ wget --restrict-file-names=nocontrol 
'http://upload.wikimedia.org/wikipedia/commons/a/ad/%D0%9F%D0%B0%D0%BC%D1%8F%D1%82%D0%BD%D0%B8%D0%BA_%D0%B7%D0%B0%D1%82%D0%BE%D0%BF%D0%BB%D0%B5%D0%BD%D0%BD%D1%8B%D0%BC_%D0%BA%D0%BE%D1%80%D0%B0%D0%B1%D0%BB%D1%8F%D0%BC_%D0%B2_%D0%A1%D0%B5%D0%B2%D0%B0%D1%81%D1%82%D0%BE%D0%BF%D0%BE%D0%BB%D0%B5.JPG'

--2013-09-13 09:43:01--  
http://upload.wikimedia.org/wikipedia/commons/a/ad/%D0%9F%D0%B0%D0%BC%D1%8F%D1%82%D0%BD%D0%B8%D0%BA_%D0%B7%D0%B0%D1%82%D0%BE%D0%BF%D0%BB%D0%B5%D0%BD%D0%BD%D1%8B%D0%BC_%D0%BA%D0%BE%D1%80%D0%B0%D0%B1%D0%BB%D1%8F%D0%BC_%D0%B2_%D0%A1%D0%B5%D0%B2%D0%B0%D1%81%D1%82%D0%BE%D0%BF%D0%BE%D0%BB%D0%B5.JPG
Resolving upload.wikimedia.org (upload.wikimedia.org)... 91.198.174.234, 
2620:0:862:ed1a::b
Connecting to upload.wikimedia.org (upload.wikimedia.org)|
91.198.174.234|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6937886 (6.6M) [image/jpeg]
Saving to: ‘Памятник_затопленным_кораблям_в_Севастополе.JPG’

100%[===>]
 
6,937,886   1.27MB/s   in 7.2s   

2013-09-13 09:43:09 (936 KB/s) - 
‘Памятник_затопленным_кораблям_в_Севастополе.JPG’ saved [6937886/6937886]

$ ls -la
total 119316
drwxr-xr-x  3 oms users 20480 13-09-13 09:43:01 .
drwxr-xr-x 17 oms users  4096 09-09-13 10:36:19 ..
-rw-r--r--  1 oms users   6937886 01-09-12 10:41:38 
Памятник_затопленным_кораблям_в_Севастополе.JPG

I hope my email program does it right and you can see the cyrillic filename.

Tim

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-13 Thread Tim Ruehsen

On Thursday 12 September 2013 21:34:01 Björn Mattsson wrote:
> On 2013-09-12 21:21, Tim Rühsen wrote:
> > Am Donnerstag, 12. September 2013, 12:59:00 schrieb Björn Mattsson:
> >> Run into a bug in wget last week.
> >> Done some digging but can't solve it by my self.
> >> 
> >> If i tries to wget a file containing capital ÅÄÖ they gets coverted
> >> wrongly, and åäö works fine.
> >> 
> >> I uses wget -m to backup one of my webb-sites to another machine. Have
> >> worked like a cahrm for the last 4-5 years but a couple of week ago one
> >> of teh files came down wrong. Thought it was a college that had uploaded
> >> something wrong but after some digging it's wget that converts wrongly.
> >> 
> >> I have UTF-8 as charset on my machine.
> >> 
> >> If you want to test/see the problem
> >> 
> >> wget -m http://bmit.se/wget
> > 
> > Just use
> > wget --restrict-file-names=nocontrol -m http://bmit.se/wget
> 
> Still the same problem. åäö OK but ÅÄÖ gets wrong.

I switched my environment to UTF-8 now and it seems to work:
$ wget --restrict-file-names=nocontrol -m http://bmit.se/wget
...
--2013-09-13 09:37:29--  
http://bmit.se/wget/test%C3%A5%C3%A4%C3%B6%C3%85%C3%84%C3%96
Reusing existing connection to bmit.se:80.
HTTP request sent, awaiting response... 200 OK
Length: 0
Saving to: ‘bmit.se/wget/teståäöÅÄÖ’
2013-09-13 09:37:29 (0.00 B/s) - ‘bmit.se/wget/teståäöÅÄÖ’ saved [0/0]

$ ls -la bmit.se/wget/
total 12
drwxr-xr-x 2 oms users 4096 13-09-13 09:37:29 .
drwxr-xr-x 3 oms users 4096 13-09-13 09:37:29 ..
-rw-r--r-- 1 oms users  120 11-09-13 17:24:38 index.html
-rw-r--r-- 1 oms users0 11-09-13 17:20:53 test
-rw-r--r-- 1 oms users0 11-09-13 17:21:01 teståäöÅÄÖ

$ wget --version
oms@blitz-lx:~/src/wget/tmp$ wget --version
GNU Wget 1.14 built on linux-gnu.

+digest +https +ipv6 +iri +large-file +nls -ntlm +opie +ssl/gnutls 


Please check (and maybe post) wget --version.
And check your environment:
$ set|egrep 'LANG|LC_'
LANG=en_US.UTF-8
LANGUAGE=en_US.UTF-8
LC_ALL=en_US.UTF-8


Regards, Tim

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-12 Thread Ángel González


Tim Rühsen schrieb:

On Thursday 12 September 2013 12:59:00 Björn Mattsson wrote:

Run into a bug in wget last week.
Done some digging but can't solve it by my self.

If i tries to wget a file containing capital ÅÄÖ they gets coverted
wrongly, and åäö works fine.

I uses wget -m to backup one of my webb-sites to another machine. Have
worked like a cahrm for the last 4-5 years but a couple of week ago one
of teh files came down wrong. Thought it was a college that had uploaded
something wrong but after some digging it's wget that converts wrongly.

I have UTF-8 as charset on my machine.

If you want to test/see the problem

wget -m http://bmit.se/wget

(...)

Sorry, forget my answer.
Meanwhile I could make some tests in an utf-8 env, and yes, Wget 1.14 (Debian
package as well as current git) has the problem you described.

I am not shure if we can change it without breaking backward compatibility !?

Tim

Wasn't that problem always there?
Looks like bug 37564 [1], you can work around it with 
--restrict-file-names=nocontrol

You may find some more information in the list archives.

1- https://savannah.gnu.org/bugs/index.php?37564

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-12 Thread Tim Rühsen

Am Donnerstag, 12. September 2013, 17:37:17 schrieb Tim Ruehsen:
> On Thursday 12 September 2013 12:59:00 Björn Mattsson wrote:
> > Run into a bug in wget last week.
> > Done some digging but can't solve it by my self.
> > 
> > If i tries to wget a file containing capital ÅÄÖ they gets coverted
> > wrongly, and åäö works fine.
> > 
> > I uses wget -m to backup one of my webb-sites to another machine. Have
> > worked like a cahrm for the last 4-5 years but a couple of week ago one
> > of teh files came down wrong. Thought it was a college that had uploaded
> > something wrong but after some digging it's wget that converts wrongly.
> > 
> > I have UTF-8 as charset on my machine.
> > 
> > If you want to test/see the problem
> > 
> > wget -m http://bmit.se/wget
> 
> A request to http://bmit.se/wget/ returns text/html document without
> specifying the charset (AFAIR, default is iso-8859-1).
> Either your Server has to tag the response as utf-8 (Content-Type:
> text/html; charset=utf-8) or you have to specify utf-8 in your document
> header.
> 
> Or you specify --remote-encoding=utf-8 when calling wget.
> 
> Could you give it a try, maybe with -d to see what is going on.

Sorry, forget my answer.
Meanwhile I could make some tests in an utf-8 env, and yes, Wget 1.14 (Debian 
package as well as current git) has the problem you described.

I am not shure if we can change it without breaking backward compatibility !?

Tim


signature.asc
Description: This is a digitally signed message part.

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-12 Thread Tim Rühsen

Am Donnerstag, 12. September 2013, 12:59:00 schrieb Björn Mattsson:
> Run into a bug in wget last week.
> Done some digging but can't solve it by my self.
> 
> If i tries to wget a file containing capital ÅÄÖ they gets coverted
> wrongly, and åäö works fine.
> 
> I uses wget -m to backup one of my webb-sites to another machine. Have
> worked like a cahrm for the last 4-5 years but a couple of week ago one
> of teh files came down wrong. Thought it was a college that had uploaded
> something wrong but after some digging it's wget that converts wrongly.
> 
> I have UTF-8 as charset on my machine.
> 
> If you want to test/see the problem
> 
> wget -m http://bmit.se/wget

Just use 
wget --restrict-file-names=nocontrol -m http://bmit.se/wget

Tim


signature.asc
Description: This is a digitally signed message part.

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-12 Thread Björn Mattsson


On 2013-09-12 17:37, Tim Ruehsen wrote:

On Thursday 12 September 2013 12:59:00 Björn Mattsson wrote:

Run into a bug in wget last week.
Done some digging but can't solve it by my self.

If i tries to wget a file containing capital ÅÄÖ they gets coverted
wrongly, and åäö works fine.

I uses wget -m to backup one of my webb-sites to another machine. Have
worked like a cahrm for the last 4-5 years but a couple of week ago one
of teh files came down wrong. Thought it was a college that had uploaded
something wrong but after some digging it's wget that converts wrongly.

I have UTF-8 as charset on my machine.

If you want to test/see the problem

wget -m http://bmit.se/wget

A request to http://bmit.se/wget/ returns text/html document without
specifying the charset (AFAIR, default is iso-8859-1).
Either your Server has to tag the response as utf-8 (Content-Type: text/html;
charset=utf-8) or you have to specify utf-8 in your document header.

Ok.
But why is åäö working but not ÅÄÖ  (same letters but capital)


Or you specify --remote-encoding=utf-8 when calling wget.

Didn't work


Could you give it a try, maybe with -d to see what is going on.

Done and attached the log-file.


Tim


// Björn
Script started on Thu 12 Sep 2013 09:16:23 PM CEST
]0;bmt@ronneby]0;bmt@ronneby: /tmpbmt@ronneby:/tmp$ wget -m -d 
http://bmit.se/wget/
DEBUG output created by Wget 1.12 on linux-gnu.

Enqueuing http://bmit.se/wget/ at depth 0
Queue count 1, maxcount 1.
[IRI Enqueuing "http://bmit.se/wget/"; with None
Dequeuing http://bmit.se/wget/ at depth 0
Queue count 0, maxcount 1.
--2013-09-12 21:16:39--  http://bmit.se/wget/
Resolving bmit.se... 31.209.29.190
Caching bmit.se => 31.209.29.190
Connecting to bmit.se|31.209.29.190|:80... connected.
Created socket 3.
Releasing 0x09089948 (new refcount 1).

---request begin---
GET /wget/ HTTP/1.0

User-Agent: Wget/1.12 (linux-gnu)

Accept: */*

Host: bmit.se

Connection: Keep-Alive



---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 200 OK

Date: Thu, 12 Sep 2013 19:16:39 GMT

Server: Apache/2.2.22 (Debian)

Last-Modified: Wed, 11 Sep 2013 15:24:38 GMT

ETag: "ac004-78-4e61d3830b980"

Accept-Ranges: bytes

Content-Length: 120

Vary: Accept-Encoding

Keep-Alive: timeout=5, max=100

Connection: Keep-Alive

Content-Type: text/html



---response end---
200 OK
Registered socket 3 for persistent reuse.
Length: 120 [text/html]
Saving to: "bmit.se/wget/index.html"


 0% [   
   ] 0   --.-K/s
  
100%[=>]
 120 --.-K/s   in 0s  

2013-09-12 21:16:40 (2.64 MB/s) - "bmit.se/wget/index.html" saved [120/120]

Loaded bmit.se/wget/index.html (size 120).
bmit.se/wget/index.html: merge("http://bmit.se/wget/";, "index.html") -> 
http://bmit.se/wget/index.html
appending "http://bmit.se/wget/index.html"; to urlpos.
bmit.se/wget/index.html: merge("http://bmit.se/wget/";, "test") -> 
http://bmit.se/wget/test
appending "http://bmit.se/wget/test"; to urlpos.
bmit.se/wget/index.html: merge("http://bmit.se/wget/";, 
"testÃ¥Ã¤Ã¶Ã\205Ã\204Ã\226") -> http://bmit.se/wget/testÃ¥Ã¤Ã¶Ã\205Ã\204Ã\226
appending "http://bmit.se/wget/testÃ¥Ã¤Ã¶Ã\205Ã\204Ã\226"; to urlpos.
no-follow in bmit.se/wget/index.html: 0
Deciding whether to enqueue "http://bmit.se/wget/index.html";.
Loading robots.txt; please ignore errors.
--2013-09-12 21:16:40--  http://bmit.se/robots.txt
Reusing existing connection to bmit.se:80.
Reusing fd 3.

---request begin---
GET /robots.txt HTTP/1.0

User-Agent: Wget/1.12 (linux-gnu)

Accept: */*

Host: bmit.se

Connection: Keep-Alive



---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 404 Not Found

Date: Thu, 12 Sep 2013 19:16:40 GMT

Server: Apache/2.2.22 (Debian)

Vary: Accept-Encoding

Content-Length: 281

Keep-Alive: timeout=5, max=99

Connection: Keep-Alive

Content-Type: text/html; charset=iso-8859-1



---response end---
404 Not Found
Skipping 281 bytes of body: [

404 Not Found

Not Found
The requested URL /robots.txt was not found on this server.

Apache/2.2.22 (Debian) Server at bmit.se Port 80

] done.
2013-09-12 21:16:40 ERROR 404: Not Found.

Decided to load it.
Enqueuing http://bmit.se/wget/index.html at depth 1
Queue count 1, maxcount 1.
[IRI Enqueuing "http://bmit.se/wget/index.html"; with None
Deciding whether to enqueue "http://bmit.se/wget/test";.
Decided to load it.
Enqueuing http://bmit.se/wget/test at depth 1
Queue count 2, maxcount 2.
[IRI Enqueuing "http://bmit.se/wget/test"; with None
Deciding whether to enqueue "http://bmit.se/wget/testÃ¥Ã¤Ã¶Ã…Ã„Ã–";.
Decided to load it.
Enqueuing http://bmit.se/wget/testÃ¥Ã¤Ã¶Ã\205Ã\204Ã\226 at depth 1
Queue count 3, maxcount 3.
[IRI Enqueuing "http://bmit.se/wget/testÃ¥Ã¤Ã¶Ã\205Ã\204Ã\226"; wi

[Bug-wget] Problem with ÅÄÖ and wget

2013-09-12 Thread Björn Mattsson


Run into a bug in wget last week.
Done some digging but can't solve it by my self.

If i tries to wget a file containing capital ÅÄÖ they gets coverted 
wrongly, and åäö works fine.


I uses wget -m to backup one of my webb-sites to another machine. Have 
worked like a cahrm for the last 4-5 years but a couple of week ago one 
of teh files came down wrong. Thought it was a college that had uploaded 
something wrong but after some digging it's wget that converts wrongly.


I have UTF-8 as charset on my machine.

If you want to test/see the problem

wget -m http://bmit.se/wget

--
Best regards
Björn Mattsson
Network engineer
IT Department
Blekinge Institute of Technology
 
bjorn.matts...@bth.se

Office: +46 (0)455-385163
IT Helpdesk: +46 (0)455-385100

Re: [Bug-wget] Problem with ÅÄÖ and wget

2013-09-12 Thread Tim Ruehsen

On Thursday 12 September 2013 12:59:00 Björn Mattsson wrote:
> Run into a bug in wget last week.
> Done some digging but can't solve it by my self.
> 
> If i tries to wget a file containing capital ÅÄÖ they gets coverted
> wrongly, and åäö works fine.
> 
> I uses wget -m to backup one of my webb-sites to another machine. Have
> worked like a cahrm for the last 4-5 years but a couple of week ago one
> of teh files came down wrong. Thought it was a college that had uploaded
> something wrong but after some digging it's wget that converts wrongly.
> 
> I have UTF-8 as charset on my machine.
> 
> If you want to test/see the problem
> 
> wget -m http://bmit.se/wget

A request to http://bmit.se/wget/ returns text/html document without 
specifying the charset (AFAIR, default is iso-8859-1).
Either your Server has to tag the response as utf-8 (Content-Type: text/html; 
charset=utf-8) or you have to specify utf-8 in your document header.

Or you specify --remote-encoding=utf-8 when calling wget.

Could you give it a try, maybe with -d to see what is going on.

Tim

Re: [Bug-wget] problem

2013-07-23 Thread Darshit Shah

As the error states, there is something wrong with your wgetrc file.

Did you touch it recently?
Please send us the contents of your wgetrc file, so we can point out the
error.

On Tue, Jul 23, 2013 at 10:29 PM, jordie9  wrote:

> this is what i get trying to use wget:
>
>  wget -d
> wget: Syntax error in /etc/wgetrc at line 83.
> Parsing system wgetrc file failed.  Please check
> '/etc/wgetrc',
>
> please, any help welcome. I am a n00b. lol
>
> Ade
>
>

-- 
Thanking You,
Darshit Shah

[Bug-wget] problem

2013-07-23 Thread jordie9


this is what i get trying to use wget:

 wget -d
wget: Syntax error in /etc/wgetrc at line 83.
Parsing system wgetrc file failed.  Please check
'/etc/wgetrc',

please, any help welcome. I am a n00b. lol

Ade

[Bug-wget] Problem to get Header of a File

2012-09-07 Thread Clément Péron

Hello,

I would like to have the Content size of a file before download it. So i
use the spider Mode :

*wget -4 --spider
http://ftp5.gwdg.de/pub/tdf/libreoffice/stable/3.6.1/win/x86/LibO_3.6.1_Win_x86_install_multi.msi
*
Spider mode enabled. Check if remote file exists.
--2012-09-07 11:20:22--
http://ftp5.gwdg.de/pub/tdf/libreoffice/stable/3.6.1/win/x86/LibO_3.6.1_Win_x86_install_multi.msi
Resolving ftp5.gwdg.de... 134.76.12.5
Connecting to ftp5.gwdg.de|134.76.12.5|:80... connected.
HTTP request sent, awaiting response... *No data received.*
Retrying.

But No Data :/

*wget -4 -d
http://ftp5.gwdg.de/pub/tdf/libreoffice/stable/3.6.1/win/x86/LibO_3.6.1_Win_x86_install_multi.msi
*
DEBUG output created by Wget 1.12 on linux-gnu.

--2012-09-07 11:21:12--
http://ftp5.gwdg.de/pub/tdf/libreoffice/stable/3.6.1/win/x86/LibO_3.6.1_Win_x86_install_multi.msi
Resolving ftp5.gwdg.de... 134.76.12.5
Caching ftp5.gwdg.de => 134.76.12.5
Connecting to ftp5.gwdg.de|134.76.12.5|:80... connected.
Created socket 3.
Releasing 0x01349320 (new refcount 1).

---request begin---
GET
/pub/tdf/libreoffice/stable/3.6.1/win/x86/LibO_3.6.1_Win_x86_install_multi.msi
HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: ftp5.gwdg.de
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
*Content-Length: 209506304*
*Content-Type: text/plain*
ETag: "40354bfb-c7cd000-4c80231b0c200"
Server: Apache/2.2.12 (Linux/SUSE)
Expires: Fri, 07 Sep 2012 05:04:41 GMT
Last-Modified: Fri, 24 Aug 2012 12:34:16 GMT
Connection: keep-alive
Date: Fri, 07 Sep 2012 04:20:28 GMT

---response end---
200 OK
Registered socket 3 for persistent reuse.
Length: 209506304 (200M) [text/plain]
Saving to: âLibO_3.6.1_Win_x86_install_multi.msiâ

 0% [
 ] 13,273  14.9K/s


I Dont no if it's a bug of the spider mode. If not do you have another
solution ?

Clement

Re: [Bug-wget] Problem loging to a web page with wget

2012-07-06 Thread Micah Cowan

Yes, --post-data has been around for quite some time, so you should be
fine, at least as far as form-based data submission is concerned.

-mjc

On 07/06/2012 02:17 AM, Gargiulo Antonio (EURIS) wrote:
> Now I’ve another question for you.
> 
> On our environment machine, we can upgrade only the wget 1.9.1 version.
> 
> Looking to this link,
> http://sourceforge.net/project/shownotes.php?release_id=196169
> 
> It seems that the 1.9.1 wget version already has the post-data options.
> 
>  
> 
> Could you please confirm it?

Re: [Bug-wget] Problem loging to a web page with wget

2012-07-06 Thread Gargiulo Antonio (EURIS)

Morning Micah,

many thanks for your reply!

You're right, I was trying to run the wget for a a web-form.

Anyway I've installed the latest version 1.13.4 and tried on my local
machine and it worked perfectly.

Now I've another question for you.

On our environment machine, we can upgrade only the wget 1.9.1 version.

Looking to this link,
http://sourceforge.net/project/shownotes.php?release_id=196169

It seems that the 1.9.1 wget version already has the post-data options.

Could you please confirm it?

Many thanks again

Antonio

-Original Message-
From: Micah Cowan [mailto:mi...@cowan.name]
Sent: 05 July 2012 19:55
To: Gargiulo Antonio (EURIS)
Cc: BUG-WGET@gnu.org
Subject: Re: [Bug-wget] Problem loging to a web page with wget

On 07/05/2012 06:21 AM, Gargiulo Antonio (EURIS) wrote:

> I'm working with wget but I have a problem trying to authenticate to a

> login page.

I'm not 100% sure I understood your original message, but it sounds to

me like you're trying to use --http-user and --http-passwd to log into a

page that uses form-based authentication (where you type your username

and password into fields on the page), and not HTTP authentication

(where a dialog box usually pops up asking you for that information.

--http-user and --http-passwd are only meant for HTTP authentication.

For forms-based authentication, see:

http://wget.addictivecode.org/FrequentlyAskedQuestions#password-protecte
d

Hope that helps.

-mjc

Ai sensi del D.Lgs. 196/2003 si precisa che le informazioni contenute in questo 
messaggio sono riservate ed a uso esclusivo del destinatario. Qualora il 
messaggio in parola Le fosse pervenuto per errore, La invitiamo ad eliminarlo 
senza copiarlo e a non inoltrarlo a terzi, dandocene gentilmente comunicazione. 
Grazie.

Pursuant to Legislative Decree No. 196/2003, you are hereby informed that this 
message contains confidential information intended only for the use of the 
addressee. If you are not the addressee, and have received this message by 
mistake, please delete it and immediately notify us. You may not copy or 
disseminate this message to anyone. Thank you.

Re: [Bug-wget] Problem loging to a web page with wget

2012-07-05 Thread Micah Cowan

On 07/05/2012 06:21 AM, Gargiulo Antonio (EURIS) wrote:
> I'm working with wget but I have a problem trying to authenticate to a
> login page.

I'm not 100% sure I understood your original message, but it sounds to
me like you're trying to use --http-user and --http-passwd to log into a
page that uses form-based authentication (where you type your username
and password into fields on the page), and not HTTP authentication
(where a dialog box usually pops up asking you for that information.

--http-user and --http-passwd are only meant for HTTP authentication.
For forms-based authentication, see:
http://wget.addictivecode.org/FrequentlyAskedQuestions#password-protected

Hope that helps.

-mjc

[Bug-wget] Problem loging to a web page with wget

2012-07-05 Thread Gargiulo Antonio (EURIS)

Hi All,



I'm working with wget but I have a problem trying to authenticate to a
login page.



I'm using the following command:



wget --http-user=MYUSER --http-passwd=MYPASSWORD
--save-cookies=cookies.txt http://MYSERVER/MYLOGIN.aspx



I can't login and trying to investigate I discovered that the login page
does NOT have the input tag for the user that is the following:







 Email

 







Do there is a way to fix this problem?



Any suggestion will be very appreciate.

Many thanks.

Regards




 
Ai sensi del D.Lgs. 196/2003 si precisa che le informazioni contenute in questo 
messaggio sono riservate ed a uso esclusivo del destinatario. Qualora il 
messaggio in parola Le fosse pervenuto per errore, La invitiamo ad eliminarlo 
senza copiarlo e a non inoltrarlo a terzi, dandocene gentilmente comunicazione. 
Grazie.

Pursuant to Legislative Decree No. 196/2003, you are hereby informed that this 
message contains confidential information intended only for the use of the 
addressee. If you are not the addressee, and have received this message by 
mistake, please delete it and immediately notify us. You may not copy or 
disseminate this message to anyone. Thank you.

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-21 Thread Ángel González

On 21/03/12 07:12, Ray Satiro wrote:
> Yes that is the way it used to be with the structs. As far as bypassing 
> stat,fstat that's probably not the best way but it works.
Good point. Below the stat define there should have been a #define fstat
_fstati64
fstat is only used at two points in wget. At main.c for checking if what
we opened was a regular file (no problem there with the original one),
and at wget_read_file if HAVE_MMAP (which is not directly available on
windows, but could be on cygwin), where it does use st_size.

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-20 Thread Ray Satiro

> From: Ángel González 
> To: Ray Satiro ; bug-wget 
> Cc: 
> Sent: Tuesday, March 20, 2012 3:27 PM
> Subject: Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version
> 

[...]

> The bug is on line 3058 of http.c
> hstat.restval = st.st_size;
> 
> st_size is a 32 bit off_t, being sign-extended to a 64 bit wgint.
> I don't think it can be fixed at that point. I would replace the stat to
> stati64.
> 
> In fact, wget code seems designed to do that, see the struct_stat
> comment in sysdep.h
> I was able to build a wget without the issue by adding this to mswindows.h
> 
>>  --- src/mswindows.h    2011-08-13 08:43:43 +
>>  +++ src/mswindows.h    2012-03-20 19:20:01 +
>>  @@ -102,6 +102,11 @@
>>   # define fstat(f, b) fstat_alias (f, b)
>>   #endif
>>   
>>  +#define struct_stat struct _stati64
>>  +#define struct_fstat struct _stati64
>>  +#undef stat
>>  +#define stat _stati64
>>  +
>>   #define PATH_SEPARATOR '\\'
>>   
>>   /* Additional declarations needed for IPv6: */
>> 
> 
> This bypasses gnulib stat, though.
>

Yes that is the way it used to be with the structs. As far as bypassing 
stat,fstat that's probably not the best way but it works.

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-20 Thread Ángel González

On 20/03/12 08:00, Ray Satiro wrote:
> Actually it looks like there is a problem with some later versions.
>
> ---request begin---
> GET /fedora/releases/16/Fedora/i386/iso/Fedora-16-i386-DVD.iso HTTP/1.1
> Range: bytes=-2147483648-
> User-Agent: Wget/1.13.1 (mingw32)
> Accept: */*
> Host: mirrors.kernel.org
> Connection: Keep-Alive
>
> ---request end---
>
> The 1.11.4 version I have from gnuwin32 looks fine though
>
> ---request begin---
> GET /fedora/releases/16/Fedora/i386/iso/Fedora-16-i386-DVD.iso HTTP/1.0
> Range: bytes=2147483648-
> User-Agent: Wget/1.11.4
> Accept: */*
> Host: mirrors.kernel.org
> Connection: Keep-Alive
>
> ---request end---
>
>
> If you want the latest version the maintainer of mypaint compiled Wget/1.13.4 
> (mingw32)
>
> go here
> http://opensourcepack.blogspot.com/2010/05/wget-112-for-windows.html
> click on wget-1.13.4
>
> So it looks there was (and probably still is) a problem with the fstat 
> replacement. I don't see anything submitted.
Confirmed. It is still present in trunk.
The only big OS affected is probably windows 32 bit (maybe some non
__USE_LARGEFILE64 systems are, too).

The bug is on line 3058 of http.c
 hstat.restval = st.st_size;

st_size is a 32 bit off_t, being sign-extended to a 64 bit wgint.
I don't think it can be fixed at that point. I would replace the stat to
stati64.

In fact, wget code seems designed to do that, see the struct_stat
comment in sysdep.h
I was able to build a wget without the issue by adding this to mswindows.h

> --- src/mswindows.h2011-08-13 08:43:43 +
> +++ src/mswindows.h2012-03-20 19:20:01 +
> @@ -102,6 +102,11 @@
>  # define fstat(f, b) fstat_alias (f, b)
>  #endif
>  
> +#define struct_stat struct _stati64
> +#define struct_fstat struct _stati64
> +#undef stat
> +#define stat _stati64
> +
>  #define PATH_SEPARATOR '\\'
>  
>  /* Additional declarations needed for IPv6: */
>

This bypasses gnulib stat, though.

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-20 Thread Ángel González

On 19/03/12 21:10, Micah Cowan wrote:
> On 03/19/2012 01:06 PM, Henrik Holst wrote:
>> Considering that the failing file in question is 3.5GiB it's probably a
>> signed 32-bit problem with the size and/or range in either wget or the
>> server. Would be interesting to see the range requests done by your version
>> of wget around tje.signed 32 limit.
> A very pertinent point, that I'm surprised hadn't occurred to anyone
> else yet.
>
> Wget has had >2GB support since at least 1.10.x, but I think it may need
> to be built in - it's entirely possible that copy of wget was built
> without it (somehow).
>
> It's also conceivable that 32-bit file problems persist in the Windows
> code, that is fixed in the Unix code. Or it could be problems with the
> system libraries for Cygwin or MinGW (whichever was used to build it).
I have used wget with large files without problems. I'm sure recent
versions
are able to dowload big files without issues. On early versions of
windows wget
the progress bar went nuts with big files, but I think they still
downloaded
correctly.

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-20 Thread Micah Cowan

On 03/20/2012 07:44 AM, Micah Cowan wrote:
> On 03/20/2012 02:01 AM, Paul Wratt wrote:
>> so let me re-itterate others on the list:
>> It is possible for wget to get a true response to 206, but fail to
>> "seek to partial start", rather starting from 0. if file is of unknown
>> length it may be added to end of current file
> 
> I'm having a little trouble understanding exactly what you're saying.
> 
> Is it that, in response to wget's request for ranged content, some
> servers send back a 206, ranged response, but for a range of "0-",
> instead of what wget requested? I did not know this. I had thought that
> would be illegal, but I can't find language in the spec to that effect.
> 
> I'm not sure, but I think wget may assume that 206 responses match the
> range that was requested, without checking. If that's the case, then its
> clearly broken behavior, given what you say above.

So, I checked, and at least _looking_ at the source code, it does look
like wget validates 206 responses, in that it makes sure the start
position is the same one it asked for, or else is zero. Anything else,
and wget closes the connection to the server.

Wget never ever seeks in the file (with the exception that fseeko may be
used to determine the existing file's length, and the file is then
closed afterwards). It opens files in append mode, and skips content
from the server until we've received up to the current file size (which
is a bit unsafe if the content is dynamically-generated, but it's been
that way for a long while now...).

It'd be worth finding out if the servers in question are sending
erroneous Content-Range information (happens from time to time), or if
there's a flaw somewhere in wget's verification of that information.

-mjc

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-20 Thread Paul Wratt

yeah I think you have got it, tho I'm not saying its wget fault

I will try and find a download so you can verify the issue, I have had
at least 2 in recent months..

yeah I do realise the problem atm is with a known length file - the
examples I will provide are also known length

however it may take some time to track down the downloads - I just
lost 1Tb drive with url references and everything I have ever
downloaded.. so it may take some time - I have had one within last 30
days tho - might be able to find it thru browser history

note that I never did debug output for any of these servers - just
observed responses to normal wget use, did various tests to confirm it
was reproduced and not user or filesystem issue

On Wed, Mar 21, 2012 at 3:44 AM, Micah Cowan  wrote:
> On 03/20/2012 02:01 AM, Paul Wratt wrote:
>> so let me re-itterate others on the list:
>> It is possible for wget to get a true response to 206, but fail to
>> "seek to partial start", rather starting from 0. if file is of unknown
>> length it may be added to end of current file
>
> I'm having a little trouble understanding exactly what you're saying.
>
> Is it that, in response to wget's request for ranged content, some
> servers send back a 206, ranged response, but for a range of "0-",
> instead of what wget requested? I did not know this. I had thought that
> would be illegal, but I can't find language in the spec to that effect.
>
> I'm not sure, but I think wget may assume that 206 responses match the
> range that was requested, without checking. If that's the case, then its
> clearly broken behavior, given what you say above.
>
> Still, none of this is likely to be DJ's problem, since he's doing this
> for a known-length file on a heavily-used server.
>
> -mjc

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-20 Thread Micah Cowan

On 03/20/2012 02:01 AM, Paul Wratt wrote:
> so let me re-itterate others on the list:
> It is possible for wget to get a true response to 206, but fail to
> "seek to partial start", rather starting from 0. if file is of unknown
> length it may be added to end of current file

I'm having a little trouble understanding exactly what you're saying.

Is it that, in response to wget's request for ranged content, some
servers send back a 206, ranged response, but for a range of "0-",
instead of what wget requested? I did not know this. I had thought that
would be illegal, but I can't find language in the spec to that effect.

I'm not sure, but I think wget may assume that 206 responses match the
range that was requested, without checking. If that's the case, then its
clearly broken behavior, given what you say above.

Still, none of this is likely to be DJ's problem, since he's doing this
for a known-length file on a heavily-used server.

-mjc

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-20 Thread Jochen Roderburg


Zitat von Micah Cowan :


On 03/20/2012 12:00 AM, Ray Satiro wrote:


Actually it looks like there is a problem with some later versions.

---request begin---
GET /fedora/releases/16/Fedora/i386/iso/Fedora-16-i386-DVD.iso HTTP/1.1
Range: bytes=-2147483648-
User-Agent: Wget/1.13.1 (mingw32)
Accept: */*
Host: mirrors.kernel.org
Connection: Keep-Alive

---request end---

The 1.11.4 version I have from gnuwin32 looks fine though

---request begin---
GET /fedora/releases/16/Fedora/i386/iso/Fedora-16-i386-DVD.iso HTTP/1.0
Range: bytes=2147483648-
User-Agent: Wget/1.11.4
Accept: */*
Host: mirrors.kernel.org
Connection: Keep-Alive

---request end---


Are you trying to illustrate something here? Because I don't see any
difference whatsoever between the request headers in the two different
Wget versions, other than of course the Wget version numbers, and also
the HTTP version numbers.

-mjc


Here is the difference:   ;-)


Range: bytes=-2147483648-
Range: bytes=2147483648-


Regards,
J.Roderburg

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-20 Thread Micah Cowan

On 03/20/2012 12:00 AM, Ray Satiro wrote:

> Actually it looks like there is a problem with some later versions.
> 
> ---request begin---
> GET /fedora/releases/16/Fedora/i386/iso/Fedora-16-i386-DVD.iso HTTP/1.1
> Range: bytes=-2147483648-
> User-Agent: Wget/1.13.1 (mingw32)
> Accept: */*
> Host: mirrors.kernel.org
> Connection: Keep-Alive
> 
> ---request end---
> 
> The 1.11.4 version I have from gnuwin32 looks fine though
> 
> ---request begin---
> GET /fedora/releases/16/Fedora/i386/iso/Fedora-16-i386-DVD.iso HTTP/1.0
> Range: bytes=2147483648-
> User-Agent: Wget/1.11.4
> Accept: */*
> Host: mirrors.kernel.org
> Connection: Keep-Alive
> 
> ---request end---

Are you trying to illustrate something here? Because I don't see any
difference whatsoever between the request headers in the two different
Wget versions, other than of course the Wget version numbers, and also
the HTTP version numbers.

-mjc

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-20 Thread Paul Wratt

I have seen strange results on ftp (usually) with files around key
values (128/256/512 bytes) and possibly continuing that pattern.

paul

On Tue, Mar 20, 2012 at 11:22 AM, Henrik Holst
 wrote:
> Well I think that we can rule out the server because it seams to do this
> the correct way.
>
> I created an "empty" file just the size of which a signed 32-bit integer
> would have troubles with:
>
> henrik@ubuntu:~$ truncate --size 2147483648 Fedora-16-i386-DVD.iso
>
> I then turned on capture in Wireshark and told wget to do a resume:
>
> henrik@ubuntu:~$ wget -c "
> http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/Fedora-16-i386-DVD.iso
> "
>
> Now looking at the HTTP request in Wireshark I can see that my version of
> Wget sends the correct range in order to resume the download:
>
> GET /fedora/releases/16/Fedora/i386/iso/Fedora-16-i386-DVD.iso HTTP/1.0
> *Range: bytes=2147483648- *
> User-Agent: Wget/1.12 (linux-gnu)
> Accept: */*
> Host: mirrors.kernel.org
> Connection: Keep-Alive
>
> And I can also see that the server does respond in a correct manner:
>
> HTTP/1.1 206 Partial Content
> Date: Mon, 19 Mar 2012 22:14:09 GMT
> Server: Apache/2.2.22 (Fedora)
> Last-Modified: Thu, 03 Nov 2011 03:18:38 GMT
> ETag: "276805c0-e2e0b000-4b0cc0b679f80"
> Accept-Ranges: bytes
> *Content-Length: 1658892288
> Content-Range: bytes 2147483648-3806375935/3806375936 *
> Keep-Alive: timeout=5, max=1000
> Connection: Keep-Alive
> Content-Type: application/x-iso9660-image
>
> That of course only takes us half-way the problem since we also must ensure
> that wget fseeks to the correct position and that the server sends from the
> correct position (another fseek) but that I will not try tonight since the
> complete download of that file will take 6h for me and I have no time for
> that at the moment :(
>
> Of course seeing a capture of the above using the 32-bit windows version
> that JD uses would be quite interesting.
>
> /HH
>
> 2012/3/19 Micah Cowan 
>
>> On 03/19/2012 01:13 PM, JD wrote:
>> > I am sorry -
>> > Range requests??
>> > How can I see that when I run wget -c  
>> > You're asking for info I am at a loss as to how to obtain.
>>
>> Sorry, I was slipping into potential technical explanations. You don't
>> need to know what ranged requests are.
>>
>> As long as you follow the steps I outlined earlier (checking the point
>> where the corruption happens, and runnin wget with the --debug flag on
>> (so it gets as much information about what's going on as possible), we
>> should be able to help you figure out what's going on.
>>
>> But again, first try a couple different builds of wget if you can, so we
>> can eliminate the possibility that you just got your hands on a bad build.
>>
>> -mjc
>>
>>

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-20 Thread Paul Wratt

On Tue, Mar 20, 2012 at 7:53 AM, Micah Cowan  wrote:
> On 03/19/2012 09:19 AM, Anthony Bryan wrote:
>
>> using the latest version of wget is always better than not, but you
>> can't fix this download with wget. (by that, I mean if you've kept the
>> error file & not deleted it).
>> wget will be fine if no errors occur, but the larger the download the
>> more likely you are to run into an error (probably). & you have
>> already gotten errors at least twice.
>
> I don't see why a working wget environment could not solve his problem,
> that is what -c is for, after all.
>
> The problem seems to be that he lacks a working wget (that, or it could
> conceivably be a server issue). Fixing his wget should fix his problem.
>
> -mjc
>

ok the key to -c being successful if some data after the 206 in:
HTTP/1.1 206 Partial Content

ok - but I recently had (iso on) server the although said successful
what actually restarting the download from 0 again. the difference
here was this actual file was also zero'd too

I do have super "shitty" net connection - so I can confirm the the
only time I get -c issues is with the above instance, and (only) one
other time when the file was unknown length it would restart by adding
to end of current file

not I explicitly use v1.12 except for 6 months last year when I use
v1.13 cos it was upgraded in distro

even after rewrite 1.13 reacts the same as 1.12 - I will not use an
older version

so let me re-itterate others on the list:
It is possible for wget to get a true response to 206, but fail to
"seek to partial start", rather starting from 0. if file is of unknown
length it may be added to end of current file

it is server specific - it does not change per download time or after
successful download (1 session with no interruptions) - it is also
common with large files on php download redirect with a slow
connection (connection times out after 2+ hours) - use of php
redirects does not guarentee reproduction of issue - if the file is of
unknown size it is reproducable on some servers (206=true but
partial=0 start).

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-20 Thread Ray Satiro

> From: JD 
> To: bug-wget@gnu.org
> Cc: 
> Sent: Sunday, March 18, 2012 6:24 PM
> Subject: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version
> 
> When using wget with the -c option, it does recover and resume the download
> after network failures. However, after it finishes the download (in my case
> downloading
> Fedora-16-i386-DVD.iso), I run the sha256sum on the downloaded ISO and it is
> completely different to the value stored in the file of CHECKSUMS on the
> same
> page URL - http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/
> 
> I downloaded this iso at least twice, with the same result - the sha256sum
> performed on the file does not match the one at the above URL, and nor
> does it match the result of sha256sum performed on the previous downloads
> of the iso file.

Actually it looks like there is a problem with some later versions.

---request begin---
GET /fedora/releases/16/Fedora/i386/iso/Fedora-16-i386-DVD.iso HTTP/1.1
Range: bytes=-2147483648-
User-Agent: Wget/1.13.1 (mingw32)
Accept: */*
Host: mirrors.kernel.org
Connection: Keep-Alive

---request end---

The 1.11.4 version I have from gnuwin32 looks fine though

---request begin---
GET /fedora/releases/16/Fedora/i386/iso/Fedora-16-i386-DVD.iso HTTP/1.0
Range: bytes=2147483648-
User-Agent: Wget/1.11.4
Accept: */*
Host: mirrors.kernel.org
Connection: Keep-Alive

---request end---


If you want the latest version the maintainer of mypaint compiled Wget/1.13.4 
(mingw32)

go here
http://opensourcepack.blogspot.com/2010/05/wget-112-for-windows.html
click on wget-1.13.4

So it looks there was (and probably still is) a problem with the fstat 
replacement. I don't see anything submitted.

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread Henrik Holst

Well I think that we can rule out the server because it seams to do this
the correct way.

I created an "empty" file just the size of which a signed 32-bit integer
would have troubles with:

henrik@ubuntu:~$ truncate --size 2147483648 Fedora-16-i386-DVD.iso

I then turned on capture in Wireshark and told wget to do a resume:

henrik@ubuntu:~$ wget -c "
http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/Fedora-16-i386-DVD.iso
"

Now looking at the HTTP request in Wireshark I can see that my version of
Wget sends the correct range in order to resume the download:

GET /fedora/releases/16/Fedora/i386/iso/Fedora-16-i386-DVD.iso HTTP/1.0
*Range: bytes=2147483648- *
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: mirrors.kernel.org
Connection: Keep-Alive

And I can also see that the server does respond in a correct manner:

HTTP/1.1 206 Partial Content
Date: Mon, 19 Mar 2012 22:14:09 GMT
Server: Apache/2.2.22 (Fedora)
Last-Modified: Thu, 03 Nov 2011 03:18:38 GMT
ETag: "276805c0-e2e0b000-4b0cc0b679f80"
Accept-Ranges: bytes
*Content-Length: 1658892288
Content-Range: bytes 2147483648-3806375935/3806375936 *
Keep-Alive: timeout=5, max=1000
Connection: Keep-Alive
Content-Type: application/x-iso9660-image

That of course only takes us half-way the problem since we also must ensure
that wget fseeks to the correct position and that the server sends from the
correct position (another fseek) but that I will not try tonight since the
complete download of that file will take 6h for me and I have no time for
that at the moment :(

Of course seeing a capture of the above using the 32-bit windows version
that JD uses would be quite interesting.

/HH

2012/3/19 Micah Cowan 

> On 03/19/2012 01:13 PM, JD wrote:
> > I am sorry -
> > Range requests??
> > How can I see that when I run wget -c  
> > You're asking for info I am at a loss as to how to obtain.
>
> Sorry, I was slipping into potential technical explanations. You don't
> need to know what ranged requests are.
>
> As long as you follow the steps I outlined earlier (checking the point
> where the corruption happens, and runnin wget with the --debug flag on
> (so it gets as much information about what's going on as possible), we
> should be able to help you figure out what's going on.
>
> But again, first try a couple different builds of wget if you can, so we
> can eliminate the possibility that you just got your hands on a bad build.
>
> -mjc
>
>

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread Micah Cowan

On 03/19/2012 01:13 PM, JD wrote:
> I am sorry -
> Range requests??
> How can I see that when I run wget -c  
> You're asking for info I am at a loss as to how to obtain.

Sorry, I was slipping into potential technical explanations. You don't
need to know what ranged requests are.

As long as you follow the steps I outlined earlier (checking the point
where the corruption happens, and runnin wget with the --debug flag on
(so it gets as much information about what's going on as possible), we
should be able to help you figure out what's going on.

But again, first try a couple different builds of wget if you can, so we
can eliminate the possibility that you just got your hands on a bad build.

-mjc

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread JD

I am sorry -
Range requests??
How can I see that when I run wget -c  
You're asking for info I am at a loss as to how to obtain.


On Mon, Mar 19, 2012 at 2:06 PM, Henrik Holst
wrote:

> Considering that the failing file in question is 3.5GiB it's probably a
> signed 32-bit problem with the size and/or range in either wget or the
> server. Would be interesting to see the range requests done by your version
> of wget around tje.signed 32 limit.
>
> /hh
> Den 19 mar 2012 20:21 skrev "JD" :
>
>> Thank you brian. But as I already stated in a previous msg,
>> I have no build (compilation) env on my win xp :(
>>
>>
>> On Mon, Mar 19, 2012 at 12:53 PM, Anthony Bryan > >wrote:
>>
>> > here's the link for aria2 - you use 'aria2c -M metalinkfile'
>> >
>> > that will guarantee an error free download
>> >
>> > http://aria2.sourceforge.net/
>> >
>> > On Mon, Mar 19, 2012 at 2:50 PM, JD  wrote:
>> > > Sorry! That link led me nowhere...
>> > > So I still need latest wget compiled for windows 32.
>> > >
>> > >
>> > >
>> > > On Mon, Mar 19, 2012 at 12:45 PM, JD  wrote:
>> > >
>> > >> gnu does not distribute windows binaries.
>> > >> So, I will resort to downloading it from from
>> > >>
>> > >>
>> >
>> http://code.google.com/p/mingw-and-ndk/downloads/detail?name=wget-1.13.4-static-mingw.7z
>> > >>
>> > >>
>> > >> On Mon, Mar 19, 2012 at 9:33 AM, Micah Cowan 
>> wrote:
>> > >>
>> > >>> On 03/18/2012 03:24 PM, JD wrote:
>> > >>> > When using wget with the -c option, it does recover and resume the
>> > >>> download
>> > >>> > after network failures. However, after it finishes the download
>> (in
>> > my
>> > >>> case
>> > >>> > downloading
>> > >>> > Fedora-16-i386-DVD.iso), I run the sha256sum on the downloaded ISO
>> > and
>> > >>> it is
>> > >>> > completely different to the value stored in the file of CHECKSUMS
>> on
>> > the
>> > >>> > same
>> > >>> > page URL -
>> > >>> http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/
>> > >>> >
>> > >>> > I downloaded this iso at least twice, with the same result - the
>> > >>> sha256sum
>> > >>> > performed on the file does not match the one at the above URL, and
>> > nor
>> > >>> > does it match the result of sha256sum performed on the previous
>> > >>> downloads
>> > >>> > of the iso file.
>> > >>> >
>> > >>> > So, something is not right with wget!!
>> > >>>
>> > >>> As others have said, using a newer version is probably a good idea.
>> > >>>
>> > >>> However, it's probably also worth asking where you got your wget
>> from,
>> > >>> since we don't really provide official binaries for Wget. Perhaps it
>> > has
>> > >>> a special case...
>> > >>>
>> > >>> It's also conceivable that it could be the server's issue, and isn't
>> > >>> doing HTTP ranged requests correctly. Whether because of wget, or
>> > >>> because of the server, the constantly varying sha256 sums are a clue
>> > >>> that it's not happening correctly (assuming, of course, that all
>> files
>> > >>> are completely downloaded).
>> > >>>
>> > >>> With a partially-downloaded iso, I'd say, make a note of exactly how
>> > >>> many bytes are in the partial download, and take a look at what the
>> > tail
>> > >>> end looks like. Then, when you continue the download, take a look at
>> > >>> that same spot, and see what you find. If HTTP headers suddenly
>> appear
>> > >>> there, or you see what appears to be the beginning of the file at
>> the
>> > >>> continuation point in the file, those are big clues. Also save a
>> copy
>> > of
>> > >>> the original partial download, so you can continue it again and see
>> if
>> > >>> you get different results, or if they're reproducible for the
>> > same-sized
>> > >>> partial download being continued.
>> > >>>
>> > >>> And add the --debug flag to wget to get as much information about
>> > what's
>> > >>> going on as possible. If you manage to find out what's happening,
>> you
>> > >>> may need these logs to know whether to blame wget, or kernel.org.
>> > >>>
>> > >>> Hope that helps,
>> > >>> -mjc
>> > >>>
>> > >>
>> > >>
>> >
>> >
>> >
>> > --
>> > (( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
>> >   )) Easier, More Reliable, Self Healing Downloads
>> >
>>
>

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread JD

Thank you kindly.

Best regards,

JD

On Mon, Mar 19, 2012 at 1:35 PM, Anthony Bryan wrote:

> don't worry, you don't need them. just get the windows version...
>
>
> http://sourceforge.net/projects/aria2/files/stable/aria2-1.14.2/aria2-1.14.2-mingw32msvc-build1.zip/download
>
> On Mon, Mar 19, 2012 at 3:20 PM, JD  wrote:
> > Thank you brian. But as I already stated in a previous msg,
> > I have no build (compilation) env on my win xp :(
> >
> >
> > On Mon, Mar 19, 2012 at 12:53 PM, Anthony Bryan  >wrote:
> >
> >> here's the link for aria2 - you use 'aria2c -M metalinkfile'
> >>
> >> that will guarantee an error free download
> >>
> >> http://aria2.sourceforge.net/
> >>
> >> On Mon, Mar 19, 2012 at 2:50 PM, JD  wrote:
> >> > Sorry! That link led me nowhere...
> >> > So I still need latest wget compiled for windows 32.
> >> >
> >> >
> >> >
> >> > On Mon, Mar 19, 2012 at 12:45 PM, JD  wrote:
> >> >
> >> >> gnu does not distribute windows binaries.
> >> >> So, I will resort to downloading it from from
> >> >>
> >> >>
> >>
> http://code.google.com/p/mingw-and-ndk/downloads/detail?name=wget-1.13.4-static-mingw.7z
> >> >>
> >> >>
> >> >> On Mon, Mar 19, 2012 at 9:33 AM, Micah Cowan 
> wrote:
> >> >>
> >> >>> On 03/18/2012 03:24 PM, JD wrote:
> >> >>> > When using wget with the -c option, it does recover and resume the
> >> >>> download
> >> >>> > after network failures. However, after it finishes the download
> (in
> >> my
> >> >>> case
> >> >>> > downloading
> >> >>> > Fedora-16-i386-DVD.iso), I run the sha256sum on the downloaded ISO
> >> and
> >> >>> it is
> >> >>> > completely different to the value stored in the file of CHECKSUMS
> on
> >> the
> >> >>> > same
> >> >>> > page URL -
> >> >>> http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/
> >> >>> >
> >> >>> > I downloaded this iso at least twice, with the same result - the
> >> >>> sha256sum
> >> >>> > performed on the file does not match the one at the above URL, and
> >> nor
> >> >>> > does it match the result of sha256sum performed on the previous
> >> >>> downloads
> >> >>> > of the iso file.
> >> >>> >
> >> >>> > So, something is not right with wget!!
> >> >>>
> >> >>> As others have said, using a newer version is probably a good idea.
> >> >>>
> >> >>> However, it's probably also worth asking where you got your wget
> from,
> >> >>> since we don't really provide official binaries for Wget. Perhaps it
> >> has
> >> >>> a special case...
> >> >>>
> >> >>> It's also conceivable that it could be the server's issue, and isn't
> >> >>> doing HTTP ranged requests correctly. Whether because of wget, or
> >> >>> because of the server, the constantly varying sha256 sums are a clue
> >> >>> that it's not happening correctly (assuming, of course, that all
> files
> >> >>> are completely downloaded).
> >> >>>
> >> >>> With a partially-downloaded iso, I'd say, make a note of exactly how
> >> >>> many bytes are in the partial download, and take a look at what the
> >> tail
> >> >>> end looks like. Then, when you continue the download, take a look at
> >> >>> that same spot, and see what you find. If HTTP headers suddenly
> appear
> >> >>> there, or you see what appears to be the beginning of the file at
> the
> >> >>> continuation point in the file, those are big clues. Also save a
> copy
> >> of
> >> >>> the original partial download, so you can continue it again and see
> if
> >> >>> you get different results, or if they're reproducible for the
> >> same-sized
> >> >>> partial download being continued.
> >> >>>
> >> >>> And add the --debug flag to wget to get as much information about
> >> what's
> >> >>> going on as possible. If you manage to find out what's happening,
> you
> >> >>> may need these logs to know whether to blame wget, or kernel.org.
> >> >>>
> >> >>> Hope that helps,
> >> >>> -mjc
> >> >>>
> >> >>
> >> >>
> >>
> >>
> >>
> >> --
> >> (( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
> >>   )) Easier, More Reliable, Self Healing Downloads
> >>
>
>
>
> --
> (( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
>   )) Easier, More Reliable, Self Healing Downloads
>

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread Micah Cowan

On 03/19/2012 01:06 PM, Henrik Holst wrote:
> Considering that the failing file in question is 3.5GiB it's probably a
> signed 32-bit problem with the size and/or range in either wget or the
> server. Would be interesting to see the range requests done by your version
> of wget around tje.signed 32 limit.

A very pertinent point, that I'm surprised hadn't occurred to anyone
else yet.

Wget has had >2GB support since at least 1.10.x, but I think it may need
to be built in - it's entirely possible that copy of wget was built
without it (somehow).

It's also conceivable that 32-bit file problems persist in the Windows
code, that is fixed in the Unix code. Or it could be problems with the
system libraries for Cygwin or MinGW (whichever was used to build it).

And it could also be the server failing to handle ranged requests
reaching into that domain, though that would be a strange problem indeed
for kernel.org to have...

-mjc

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread Henrik Holst

Considering that the failing file in question is 3.5GiB it's probably a
signed 32-bit problem with the size and/or range in either wget or the
server. Would be interesting to see the range requests done by your version
of wget around tje.signed 32 limit.

/hh
Den 19 mar 2012 20:21 skrev "JD" :

> Thank you brian. But as I already stated in a previous msg,
> I have no build (compilation) env on my win xp :(
>
>
> On Mon, Mar 19, 2012 at 12:53 PM, Anthony Bryan  >wrote:
>
> > here's the link for aria2 - you use 'aria2c -M metalinkfile'
> >
> > that will guarantee an error free download
> >
> > http://aria2.sourceforge.net/
> >
> > On Mon, Mar 19, 2012 at 2:50 PM, JD  wrote:
> > > Sorry! That link led me nowhere...
> > > So I still need latest wget compiled for windows 32.
> > >
> > >
> > >
> > > On Mon, Mar 19, 2012 at 12:45 PM, JD  wrote:
> > >
> > >> gnu does not distribute windows binaries.
> > >> So, I will resort to downloading it from from
> > >>
> > >>
> >
> http://code.google.com/p/mingw-and-ndk/downloads/detail?name=wget-1.13.4-static-mingw.7z
> > >>
> > >>
> > >> On Mon, Mar 19, 2012 at 9:33 AM, Micah Cowan 
> wrote:
> > >>
> > >>> On 03/18/2012 03:24 PM, JD wrote:
> > >>> > When using wget with the -c option, it does recover and resume the
> > >>> download
> > >>> > after network failures. However, after it finishes the download (in
> > my
> > >>> case
> > >>> > downloading
> > >>> > Fedora-16-i386-DVD.iso), I run the sha256sum on the downloaded ISO
> > and
> > >>> it is
> > >>> > completely different to the value stored in the file of CHECKSUMS
> on
> > the
> > >>> > same
> > >>> > page URL -
> > >>> http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/
> > >>> >
> > >>> > I downloaded this iso at least twice, with the same result - the
> > >>> sha256sum
> > >>> > performed on the file does not match the one at the above URL, and
> > nor
> > >>> > does it match the result of sha256sum performed on the previous
> > >>> downloads
> > >>> > of the iso file.
> > >>> >
> > >>> > So, something is not right with wget!!
> > >>>
> > >>> As others have said, using a newer version is probably a good idea.
> > >>>
> > >>> However, it's probably also worth asking where you got your wget
> from,
> > >>> since we don't really provide official binaries for Wget. Perhaps it
> > has
> > >>> a special case...
> > >>>
> > >>> It's also conceivable that it could be the server's issue, and isn't
> > >>> doing HTTP ranged requests correctly. Whether because of wget, or
> > >>> because of the server, the constantly varying sha256 sums are a clue
> > >>> that it's not happening correctly (assuming, of course, that all
> files
> > >>> are completely downloaded).
> > >>>
> > >>> With a partially-downloaded iso, I'd say, make a note of exactly how
> > >>> many bytes are in the partial download, and take a look at what the
> > tail
> > >>> end looks like. Then, when you continue the download, take a look at
> > >>> that same spot, and see what you find. If HTTP headers suddenly
> appear
> > >>> there, or you see what appears to be the beginning of the file at the
> > >>> continuation point in the file, those are big clues. Also save a copy
> > of
> > >>> the original partial download, so you can continue it again and see
> if
> > >>> you get different results, or if they're reproducible for the
> > same-sized
> > >>> partial download being continued.
> > >>>
> > >>> And add the --debug flag to wget to get as much information about
> > what's
> > >>> going on as possible. If you manage to find out what's happening, you
> > >>> may need these logs to know whether to blame wget, or kernel.org.
> > >>>
> > >>> Hope that helps,
> > >>> -mjc
> > >>>
> > >>
> > >>
> >
> >
> >
> > --
> > (( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
> >   )) Easier, More Reliable, Self Healing Downloads
> >
>

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread JD

Thank you brian. But as I already stated in a previous msg,
I have no build (compilation) env on my win xp :(


On Mon, Mar 19, 2012 at 12:53 PM, Anthony Bryan wrote:

> here's the link for aria2 - you use 'aria2c -M metalinkfile'
>
> that will guarantee an error free download
>
> http://aria2.sourceforge.net/
>
> On Mon, Mar 19, 2012 at 2:50 PM, JD  wrote:
> > Sorry! That link led me nowhere...
> > So I still need latest wget compiled for windows 32.
> >
> >
> >
> > On Mon, Mar 19, 2012 at 12:45 PM, JD  wrote:
> >
> >> gnu does not distribute windows binaries.
> >> So, I will resort to downloading it from from
> >>
> >>
> http://code.google.com/p/mingw-and-ndk/downloads/detail?name=wget-1.13.4-static-mingw.7z
> >>
> >>
> >> On Mon, Mar 19, 2012 at 9:33 AM, Micah Cowan  wrote:
> >>
> >>> On 03/18/2012 03:24 PM, JD wrote:
> >>> > When using wget with the -c option, it does recover and resume the
> >>> download
> >>> > after network failures. However, after it finishes the download (in
> my
> >>> case
> >>> > downloading
> >>> > Fedora-16-i386-DVD.iso), I run the sha256sum on the downloaded ISO
> and
> >>> it is
> >>> > completely different to the value stored in the file of CHECKSUMS on
> the
> >>> > same
> >>> > page URL -
> >>> http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/
> >>> >
> >>> > I downloaded this iso at least twice, with the same result - the
> >>> sha256sum
> >>> > performed on the file does not match the one at the above URL, and
> nor
> >>> > does it match the result of sha256sum performed on the previous
> >>> downloads
> >>> > of the iso file.
> >>> >
> >>> > So, something is not right with wget!!
> >>>
> >>> As others have said, using a newer version is probably a good idea.
> >>>
> >>> However, it's probably also worth asking where you got your wget from,
> >>> since we don't really provide official binaries for Wget. Perhaps it
> has
> >>> a special case...
> >>>
> >>> It's also conceivable that it could be the server's issue, and isn't
> >>> doing HTTP ranged requests correctly. Whether because of wget, or
> >>> because of the server, the constantly varying sha256 sums are a clue
> >>> that it's not happening correctly (assuming, of course, that all files
> >>> are completely downloaded).
> >>>
> >>> With a partially-downloaded iso, I'd say, make a note of exactly how
> >>> many bytes are in the partial download, and take a look at what the
> tail
> >>> end looks like. Then, when you continue the download, take a look at
> >>> that same spot, and see what you find. If HTTP headers suddenly appear
> >>> there, or you see what appears to be the beginning of the file at the
> >>> continuation point in the file, those are big clues. Also save a copy
> of
> >>> the original partial download, so you can continue it again and see if
> >>> you get different results, or if they're reproducible for the
> same-sized
> >>> partial download being continued.
> >>>
> >>> And add the --debug flag to wget to get as much information about
> what's
> >>> going on as possible. If you manage to find out what's happening, you
> >>> may need these logs to know whether to blame wget, or kernel.org.
> >>>
> >>> Hope that helps,
> >>> -mjc
> >>>
> >>
> >>
>
>
>
> --
> (( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
>   )) Easier, More Reliable, Self Healing Downloads
>

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread Micah Cowan

Unfortunately, the best information I have on binary packages for
Windows is at

http://wget.addictivecode.org/FrequentlyAskedQuestions#download

Which doesn't give you anything newer than wget 1.11.4.

However, it could still be a problem with your particular wget 1.11.4,
so perhaps try a couple of different wgets from more than one source, to
see if one works better for you than another.

If you get the same problem on more than one wget, then try the steps I
mentioned earlier to try to track down how exactly the corrupting is
taking place, and who's responsible.

Thanks!
-mjc

On 03/19/2012 12:09 PM, JD wrote:
> I honestly do not recall where I downloaded it from.
> Also, I do not have  build tools, build  env on my win XP laptop.
> 
> 
> On Mon, Mar 19, 2012 at 12:32 PM, Micah Cowan  wrote:
> 
>> Binary packages aren't provided on the GNU web site (for Windows, nor
>> Unixen). Did you download the Wget sources and build them yourself - and
>> if so, what did you use? Cygwin? Msys?
>>
>> -mjc
>>
>> On 03/19/2012 11:29 AM, JD wrote:
>>> The Fedora Distribution does not list MD5 sums. Only sha256 sums.
>>>
>>> Also, I had downloaded my version directly from the gnu web site.
>>> But I will look for more recent versions there.
>>>
>>> On Mon, Mar 19, 2012 at 1:49 AM, Paul Wratt 
>> wrote:
>>>
 you should be using at least the last know stable version 1.12 but
 that is still at least 5 years old
 1.13 versions are from within the last 12+months

 but I have a feeling that the sha256sum you are using is not right,
 verify against the md5 (maybe google for it)

 Paul

 On Mon, Mar 19, 2012 at 11:24 AM, JD  wrote:
> When using wget with the -c option, it does recover and resume the
 download
> after network failures. However, after it finishes the download (in my
 case
> downloading
> Fedora-16-i386-DVD.iso), I run the sha256sum on the downloaded ISO and
 it is
> completely different to the value stored in the file of CHECKSUMS on
>> the
> same
> page URL -
>> http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/
>
> I downloaded this iso at least twice, with the same result - the
 sha256sum
> performed on the file does not match the one at the above URL, and nor
> does it match the result of sha256sum performed on the previous
>> downloads
> of the iso file.
>
> So, something is not right with wget!!

>>
>>

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread JD

I honestly do not recall where I downloaded it from.
Also, I do not have  build tools, build  env on my win XP laptop.


On Mon, Mar 19, 2012 at 12:32 PM, Micah Cowan  wrote:

> Binary packages aren't provided on the GNU web site (for Windows, nor
> Unixen). Did you download the Wget sources and build them yourself - and
> if so, what did you use? Cygwin? Msys?
>
> -mjc
>
> On 03/19/2012 11:29 AM, JD wrote:
> > The Fedora Distribution does not list MD5 sums. Only sha256 sums.
> >
> > Also, I had downloaded my version directly from the gnu web site.
> > But I will look for more recent versions there.
> >
> > On Mon, Mar 19, 2012 at 1:49 AM, Paul Wratt 
> wrote:
> >
> >> you should be using at least the last know stable version 1.12 but
> >> that is still at least 5 years old
> >> 1.13 versions are from within the last 12+months
> >>
> >> but I have a feeling that the sha256sum you are using is not right,
> >> verify against the md5 (maybe google for it)
> >>
> >> Paul
> >>
> >> On Mon, Mar 19, 2012 at 11:24 AM, JD  wrote:
> >>> When using wget with the -c option, it does recover and resume the
> >> download
> >>> after network failures. However, after it finishes the download (in my
> >> case
> >>> downloading
> >>> Fedora-16-i386-DVD.iso), I run the sha256sum on the downloaded ISO and
> >> it is
> >>> completely different to the value stored in the file of CHECKSUMS on
> the
> >>> same
> >>> page URL -
> http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/
> >>>
> >>> I downloaded this iso at least twice, with the same result - the
> >> sha256sum
> >>> performed on the file does not match the one at the above URL, and nor
> >>> does it match the result of sha256sum performed on the previous
> downloads
> >>> of the iso file.
> >>>
> >>> So, something is not right with wget!!
> >>
>
>

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread JD

I appreciate your info.
But it still does not answer the question about why my older version of wget
craps out when network disconnects are frqeuent, and thus manual restarts
of wget are also frequent.
ALso, where in tar-nation do I get latest windows 32 binary of wget?


On Mon, Mar 19, 2012 at 10:19 AM, Anthony Bryan wrote:

> On Sun, Mar 18, 2012 at 6:24 PM, JD  wrote:
> > When using wget with the -c option, it does recover and resume the
> download
> > after network failures. However, after it finishes the download (in my
> case
> > downloading
> > Fedora-16-i386-DVD.iso), I run the sha256sum on the downloaded ISO and
> it is
> > completely different to the value stored in the file of CHECKSUMS on the
> > same
> > page URL - http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/
> >
> > I downloaded this iso at least twice, with the same result - the
> sha256sum
> > performed on the file does not match the one at the above URL, and nor
> > does it match the result of sha256sum performed on the previous downloads
> > of the iso file.
> >
> > So, something is not right with wget!!
>
> JD, errors can pop up in a number of places during the download process.
>
> there are 3 things you can do to fix the download: rsync, bittorrent,
> or metalink.
>
> using the latest version of wget is always better than not, but you
> can't fix this download with wget. (by that, I mean if you've kept the
> error file & not deleted it).
> wget will be fine if no errors occur, but the larger the download the
> more likely you are to run into an error (probably). & you have
> already gotten errors at least twice.
>
> many Linux distributions use metalink or bittorrent for these large
> ISO downloads to correct errors & for other features like mirror
> usage.
>
> I would suggest using aria2 & the attached metalink file. aria2 is a
> command line downloader like wget.
>
> Fedora already provides metalinks for all their files, but
> unfortunately they don't include all the repair information, just the
> sha256sum to detect errors.
>
>
> http://mirrors.fedoraproject.org/metalink?path=pub/fedora/linux/releases/16/Fedora/i386/iso/Fedora-16-i386-DVD.iso
>
> --
> (( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
>   )) Easier, More Reliable, Self Healing Downloads
>

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread Micah Cowan

On 03/19/2012 09:19 AM, Anthony Bryan wrote:

> using the latest version of wget is always better than not, but you
> can't fix this download with wget. (by that, I mean if you've kept the
> error file & not deleted it).
> wget will be fine if no errors occur, but the larger the download the
> more likely you are to run into an error (probably). & you have
> already gotten errors at least twice.

I don't see why a working wget environment could not solve his problem,
that is what -c is for, after all.

The problem seems to be that he lacks a working wget (that, or it could
conceivably be a server issue). Fixing his wget should fix his problem.

-mjc

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread JD

Sorry! That link led me nowhere...
So I still need latest wget compiled for windows 32.



On Mon, Mar 19, 2012 at 12:45 PM, JD  wrote:

> gnu does not distribute windows binaries.
> So, I will resort to downloading it from from
>
> http://code.google.com/p/mingw-and-ndk/downloads/detail?name=wget-1.13.4-static-mingw.7z
>
>
> On Mon, Mar 19, 2012 at 9:33 AM, Micah Cowan  wrote:
>
>> On 03/18/2012 03:24 PM, JD wrote:
>> > When using wget with the -c option, it does recover and resume the
>> download
>> > after network failures. However, after it finishes the download (in my
>> case
>> > downloading
>> > Fedora-16-i386-DVD.iso), I run the sha256sum on the downloaded ISO and
>> it is
>> > completely different to the value stored in the file of CHECKSUMS on the
>> > same
>> > page URL -
>> http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/
>> >
>> > I downloaded this iso at least twice, with the same result - the
>> sha256sum
>> > performed on the file does not match the one at the above URL, and nor
>> > does it match the result of sha256sum performed on the previous
>> downloads
>> > of the iso file.
>> >
>> > So, something is not right with wget!!
>>
>> As others have said, using a newer version is probably a good idea.
>>
>> However, it's probably also worth asking where you got your wget from,
>> since we don't really provide official binaries for Wget. Perhaps it has
>> a special case...
>>
>> It's also conceivable that it could be the server's issue, and isn't
>> doing HTTP ranged requests correctly. Whether because of wget, or
>> because of the server, the constantly varying sha256 sums are a clue
>> that it's not happening correctly (assuming, of course, that all files
>> are completely downloaded).
>>
>> With a partially-downloaded iso, I'd say, make a note of exactly how
>> many bytes are in the partial download, and take a look at what the tail
>> end looks like. Then, when you continue the download, take a look at
>> that same spot, and see what you find. If HTTP headers suddenly appear
>> there, or you see what appears to be the beginning of the file at the
>> continuation point in the file, those are big clues. Also save a copy of
>> the original partial download, so you can continue it again and see if
>> you get different results, or if they're reproducible for the same-sized
>> partial download being continued.
>>
>> And add the --debug flag to wget to get as much information about what's
>> going on as possible. If you manage to find out what's happening, you
>> may need these logs to know whether to blame wget, or kernel.org.
>>
>> Hope that helps,
>> -mjc
>>
>
>

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread JD

gnu does not distribute windows binaries.
So, I will resort to downloading it from from
http://code.google.com/p/mingw-and-ndk/downloads/detail?name=wget-1.13.4-static-mingw.7z


On Mon, Mar 19, 2012 at 9:33 AM, Micah Cowan  wrote:

> On 03/18/2012 03:24 PM, JD wrote:
> > When using wget with the -c option, it does recover and resume the
> download
> > after network failures. However, after it finishes the download (in my
> case
> > downloading
> > Fedora-16-i386-DVD.iso), I run the sha256sum on the downloaded ISO and
> it is
> > completely different to the value stored in the file of CHECKSUMS on the
> > same
> > page URL - http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/
> >
> > I downloaded this iso at least twice, with the same result - the
> sha256sum
> > performed on the file does not match the one at the above URL, and nor
> > does it match the result of sha256sum performed on the previous downloads
> > of the iso file.
> >
> > So, something is not right with wget!!
>
> As others have said, using a newer version is probably a good idea.
>
> However, it's probably also worth asking where you got your wget from,
> since we don't really provide official binaries for Wget. Perhaps it has
> a special case...
>
> It's also conceivable that it could be the server's issue, and isn't
> doing HTTP ranged requests correctly. Whether because of wget, or
> because of the server, the constantly varying sha256 sums are a clue
> that it's not happening correctly (assuming, of course, that all files
> are completely downloaded).
>
> With a partially-downloaded iso, I'd say, make a note of exactly how
> many bytes are in the partial download, and take a look at what the tail
> end looks like. Then, when you continue the download, take a look at
> that same spot, and see what you find. If HTTP headers suddenly appear
> there, or you see what appears to be the beginning of the file at the
> continuation point in the file, those are big clues. Also save a copy of
> the original partial download, so you can continue it again and see if
> you get different results, or if they're reproducible for the same-sized
> partial download being continued.
>
> And add the --debug flag to wget to get as much information about what's
> going on as possible. If you manage to find out what's happening, you
> may need these logs to know whether to blame wget, or kernel.org.
>
> Hope that helps,
> -mjc
>

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread Micah Cowan

Binary packages aren't provided on the GNU web site (for Windows, nor
Unixen). Did you download the Wget sources and build them yourself - and
if so, what did you use? Cygwin? Msys?

-mjc

On 03/19/2012 11:29 AM, JD wrote:
> The Fedora Distribution does not list MD5 sums. Only sha256 sums.
> 
> Also, I had downloaded my version directly from the gnu web site.
> But I will look for more recent versions there.
> 
> On Mon, Mar 19, 2012 at 1:49 AM, Paul Wratt  wrote:
> 
>> you should be using at least the last know stable version 1.12 but
>> that is still at least 5 years old
>> 1.13 versions are from within the last 12+months
>>
>> but I have a feeling that the sha256sum you are using is not right,
>> verify against the md5 (maybe google for it)
>>
>> Paul
>>
>> On Mon, Mar 19, 2012 at 11:24 AM, JD  wrote:
>>> When using wget with the -c option, it does recover and resume the
>> download
>>> after network failures. However, after it finishes the download (in my
>> case
>>> downloading
>>> Fedora-16-i386-DVD.iso), I run the sha256sum on the downloaded ISO and
>> it is
>>> completely different to the value stored in the file of CHECKSUMS on the
>>> same
>>> page URL - http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/
>>>
>>> I downloaded this iso at least twice, with the same result - the
>> sha256sum
>>> performed on the file does not match the one at the above URL, and nor
>>> does it match the result of sha256sum performed on the previous downloads
>>> of the iso file.
>>>
>>> So, something is not right with wget!!
>>

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread JD

Thank you so much for doing that.
All 3 downloads have different MD5 sums :(

Also, my response to Paul's feedback is this

Did you deliberate disconnect the network, so that wget exists,
and reconnect the network, and restart wget with the -c option?

In my case network disconnects are very frequent because
I use hotspot, cafe's ... etc. So, I have to frequently restart
wget.



On Mon, Mar 19, 2012 at 4:49 AM, Henrik Holst
wrote:

> It's strange though that he gets different sha256 for each download even
> if the problem should be in sha256sum?
>
> I did download the file with Wget v1.12 on Ubuntu and got the same
> sha256sum as the one from the CHECKSUM file. So I did calculate the md5 so
> that JD can check that aswell:
>
> *henrik@anonymous:~$* sha256sum Fedora-16-i386-DVD.iso
> af7f172962ab47748914edb7c4d30565d23b4cf21f3bc4b7e3cd770b384d9a75
> Fedora-16-i386-DVD.iso
> *henrik@anonymous:~$* md5sum Fedora-16-i386-DVD.iso
> 0d64ab6b1b800827a9c83d95395b3da0  Fedora-16-i386-DVD.iso
>
> /HH
>
> 2012/3/19 Paul Wratt 
>
>> you should be using at least the last know stable version 1.12 but
>> that is still at least 5 years old
>> 1.13 versions are from within the last 12+months
>>
>> but I have a feeling that the sha256sum you are using is not right,
>> verify against the md5 (maybe google for it)
>>
>> Paul
>>
>> On Mon, Mar 19, 2012 at 11:24 AM, JD  wrote:
>> > When using wget with the -c option, it does recover and resume the
>> download
>> > after network failures. However, after it finishes the download (in my
>> case
>> > downloading
>> > Fedora-16-i386-DVD.iso), I run the sha256sum on the downloaded ISO and
>> it is
>> > completely different to the value stored in the file of CHECKSUMS on the
>> > same
>> > page URL -
>> http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/
>> >
>> > I downloaded this iso at least twice, with the same result - the
>> sha256sum
>> > performed on the file does not match the one at the above URL, and nor
>> > does it match the result of sha256sum performed on the previous
>> downloads
>> > of the iso file.
>> >
>> > So, something is not right with wget!!
>>
>>
>

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread JD

The Fedora Distribution does not list MD5 sums. Only sha256 sums.

Also, I had downloaded my version directly from the gnu web site.
But I will look for more recent versions there.

On Mon, Mar 19, 2012 at 1:49 AM, Paul Wratt  wrote:

> you should be using at least the last know stable version 1.12 but
> that is still at least 5 years old
> 1.13 versions are from within the last 12+months
>
> but I have a feeling that the sha256sum you are using is not right,
> verify against the md5 (maybe google for it)
>
> Paul
>
> On Mon, Mar 19, 2012 at 11:24 AM, JD  wrote:
> > When using wget with the -c option, it does recover and resume the
> download
> > after network failures. However, after it finishes the download (in my
> case
> > downloading
> > Fedora-16-i386-DVD.iso), I run the sha256sum on the downloaded ISO and
> it is
> > completely different to the value stored in the file of CHECKSUMS on the
> > same
> > page URL - http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/
> >
> > I downloaded this iso at least twice, with the same result - the
> sha256sum
> > performed on the file does not match the one at the above URL, and nor
> > does it match the result of sha256sum performed on the previous downloads
> > of the iso file.
> >
> > So, something is not right with wget!!
>

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread Anthony Bryan

On Sun, Mar 18, 2012 at 6:24 PM, JD  wrote:
> When using wget with the -c option, it does recover and resume the download
> after network failures. However, after it finishes the download (in my case
> downloading
> Fedora-16-i386-DVD.iso), I run the sha256sum on the downloaded ISO and it is
> completely different to the value stored in the file of CHECKSUMS on the
> same
> page URL - http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/
>
> I downloaded this iso at least twice, with the same result - the sha256sum
> performed on the file does not match the one at the above URL, and nor
> does it match the result of sha256sum performed on the previous downloads
> of the iso file.
>
> So, something is not right with wget!!

JD, errors can pop up in a number of places during the download process.

there are 3 things you can do to fix the download: rsync, bittorrent,
or metalink.

using the latest version of wget is always better than not, but you
can't fix this download with wget. (by that, I mean if you've kept the
error file & not deleted it).
wget will be fine if no errors occur, but the larger the download the
more likely you are to run into an error (probably). & you have
already gotten errors at least twice.

many Linux distributions use metalink or bittorrent for these large
ISO downloads to correct errors & for other features like mirror
usage.

I would suggest using aria2 & the attached metalink file. aria2 is a
command line downloader like wget.

Fedora already provides metalinks for all their files, but
unfortunately they don't include all the repair information, just the
sha256sum to detect errors.

http://mirrors.fedoraproject.org/metalink?path=pub/fedora/linux/releases/16/Fedora/i386/iso/Fedora-16-i386-DVD.iso

-- 
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
  )) Easier, More Reliable, Self Healing Downloads

Fedora-16-i386-DVD.iso.meta4
Description: Binary data

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread Micah Cowan

On 03/18/2012 03:24 PM, JD wrote:
> When using wget with the -c option, it does recover and resume the download
> after network failures. However, after it finishes the download (in my case
> downloading
> Fedora-16-i386-DVD.iso), I run the sha256sum on the downloaded ISO and it is
> completely different to the value stored in the file of CHECKSUMS on the
> same
> page URL - http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/
> 
> I downloaded this iso at least twice, with the same result - the sha256sum
> performed on the file does not match the one at the above URL, and nor
> does it match the result of sha256sum performed on the previous downloads
> of the iso file.
> 
> So, something is not right with wget!!

As others have said, using a newer version is probably a good idea.

However, it's probably also worth asking where you got your wget from,
since we don't really provide official binaries for Wget. Perhaps it has
a special case...

It's also conceivable that it could be the server's issue, and isn't
doing HTTP ranged requests correctly. Whether because of wget, or
because of the server, the constantly varying sha256 sums are a clue
that it's not happening correctly (assuming, of course, that all files
are completely downloaded).

With a partially-downloaded iso, I'd say, make a note of exactly how
many bytes are in the partial download, and take a look at what the tail
end looks like. Then, when you continue the download, take a look at
that same spot, and see what you find. If HTTP headers suddenly appear
there, or you see what appears to be the beginning of the file at the
continuation point in the file, those are big clues. Also save a copy of
the original partial download, so you can continue it again and see if
you get different results, or if they're reproducible for the same-sized
partial download being continued.

And add the --debug flag to wget to get as much information about what's
going on as possible. If you manage to find out what's happening, you
may need these logs to know whether to blame wget, or kernel.org.

Hope that helps,
-mjc

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread Henrik Holst

It's strange though that he gets different sha256 for each download even if
the problem should be in sha256sum?

I did download the file with Wget v1.12 on Ubuntu and got the same
sha256sum as the one from the CHECKSUM file. So I did calculate the md5 so
that JD can check that aswell:

*henrik@anonymous:~$* sha256sum Fedora-16-i386-DVD.iso
af7f172962ab47748914edb7c4d30565d23b4cf21f3bc4b7e3cd770b384d9a75
Fedora-16-i386-DVD.iso
*henrik@anonymous:~$* md5sum Fedora-16-i386-DVD.iso
0d64ab6b1b800827a9c83d95395b3da0  Fedora-16-i386-DVD.iso

/HH

2012/3/19 Paul Wratt 

> you should be using at least the last know stable version 1.12 but
> that is still at least 5 years old
> 1.13 versions are from within the last 12+months
>
> but I have a feeling that the sha256sum you are using is not right,
> verify against the md5 (maybe google for it)
>
> Paul
>
> On Mon, Mar 19, 2012 at 11:24 AM, JD  wrote:
> > When using wget with the -c option, it does recover and resume the
> download
> > after network failures. However, after it finishes the download (in my
> case
> > downloading
> > Fedora-16-i386-DVD.iso), I run the sha256sum on the downloaded ISO and
> it is
> > completely different to the value stored in the file of CHECKSUMS on the
> > same
> > page URL - http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/
> >
> > I downloaded this iso at least twice, with the same result - the
> sha256sum
> > performed on the file does not match the one at the above URL, and nor
> > does it match the result of sha256sum performed on the previous downloads
> > of the iso file.
> >
> > So, something is not right with wget!!
>
>

Re: [Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-19 Thread Paul Wratt

you should be using at least the last know stable version 1.12 but
that is still at least 5 years old
1.13 versions are from within the last 12+months

but I have a feeling that the sha256sum you are using is not right,
verify against the md5 (maybe google for it)

Paul

On Mon, Mar 19, 2012 at 11:24 AM, JD  wrote:
> When using wget with the -c option, it does recover and resume the download
> after network failures. However, after it finishes the download (in my case
> downloading
> Fedora-16-i386-DVD.iso), I run the sha256sum on the downloaded ISO and it is
> completely different to the value stored in the file of CHECKSUMS on the
> same
> page URL - http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/
>
> I downloaded this iso at least twice, with the same result - the sha256sum
> performed on the file does not match the one at the above URL, and nor
> does it match the result of sha256sum performed on the previous downloads
> of the iso file.
>
> So, something is not right with wget!!

[Bug-wget] Problem using GNU Wget 1.11.4 Windows version

2012-03-18 Thread JD

When using wget with the -c option, it does recover and resume the download
after network failures. However, after it finishes the download (in my case
downloading
Fedora-16-i386-DVD.iso), I run the sha256sum on the downloaded ISO and it is
completely different to the value stored in the file of CHECKSUMS on the
same
page URL - http://mirrors.kernel.org/fedora/releases/16/Fedora/i386/iso/

I downloaded this iso at least twice, with the same result - the sha256sum
performed on the file does not match the one at the above URL, and nor
does it match the result of sha256sum performed on the previous downloads
of the iso file.

So, something is not right with wget!!

Re: [Bug-wget] problem with --continue and already completed ftp downloads

2011-11-29 Thread Paul Wratt

I have had this problem in the past. I should be more intelligent
about the use of "--continue"

According to your output, it could either compare REST again 213, or
rather notify (not stop) because of "504 Reply marker must be 0"

A note here for Eike regarding the use of --continue with *.ZIP, its
redundant unless wget last quit halfway through a download. wget has
"auto-continue" on by default and will retry X times (maybe 4/5 or 20,
cant remember off hand) which can also be changed/set on command line

Paul


On Wed, Nov 30, 2011 at 1:58 AM, Eike Kohnert  wrote:
> Hi all,
>
> i use wget to download a complete directory from a ftp server and want to
> download new files every day.
>
> I use a command like
>
> wget --continue ftp://someserver/somedirectory/*.ZIP
>
> It seems like wget tries to resume files, even if they are already finished.
> I sniffed the generated network traffic and found something like this:
>
> SIZE .ZIP
> 213 872953
> PASV
> 227 Entering Passive Mode ().
> REST 872953
> 504 Reply marker must be 0.
> RETR .ZIP
>
> I think wget should not do the REST command because the retrieved size
> equals the size of the already downloaded file. So the resuming download
> would download only the remaining 0 bytes which does not make much sense. On
> servers which do not support resume this will cause the whole file being
> downloaded again.
>
> I used wget version 1.12
>
> Best regards
> Eike
>

[Bug-wget] problem with --continue and already completed ftp downloads

2011-11-29 Thread Eike Kohnert


Hi all,

i use wget to download a complete directory from a ftp server and want 
to download new files every day.


I use a command like

wget --continue ftp://someserver/somedirectory/*.ZIP

It seems like wget tries to resume files, even if they are already 
finished. I sniffed the generated network traffic and found something 
like this:


SIZE .ZIP
213 872953
PASV
227 Entering Passive Mode ().
REST 872953
504 Reply marker must be 0.
RETR .ZIP

I think wget should not do the REST command because the retrieved size 
equals the size of the already downloaded file. So the resuming download 
would download only the remaining 0 bytes which does not make much 
sense. On servers which do not support resume this will cause the whole 
file being downloaded again.


I used wget version 1.12

Best regards
Eike

Re: [Bug-wget] Problem with --content-disposition, HTTP/FTP connection not reused

2011-10-20 Thread Jochen Roderburg


Zitat von Jochen Roderburg :


Zitat von Ángel González :


Jochen Roderburg wrote:

This looks like the same issue I decribed recently here:

wget makes a HEAD request first, and the reply-headers do not  
contain a Content-Disposition header.
The Content-Disposition header comes then on the subsequent GET  
request, but wget seems to ignore it there.


Regards,
Jochen Roderburg
Confirmed. Running  wget --timestamp -S --content-disposition  
http://example.com and giving the Content-Disposition header just  
on GET, gives the above result.


Precisely, that is the combination of options which triggers the  
effect in recent wget 1.13.x versions because --timestamp=on forces  
the HEAD requests.
Older versions with Content-Disposition support (like the 1.11.4  
which the OP reported) made *always* HEAD requests with  
--content-disposition=on alone and had this error then also always.


Regards,
Jochen Roderburg



Hi Mark,

Just an additional remark what this findings now mean for your  
original question: You *can't* avoid the problem with your wget  
version 1.11.4, but you usually *can* avoid it with newer 1.13.x  
versions.


Regards,
Jochen Roderburg

Re: [Bug-wget] Problem with --content-disposition, HTTP/FTP connection not reused

2011-10-19 Thread markk

Hi,

Ángel González wrote:
>> The second issue is that the --content-disposition option doesn't seem
>> to work, at least not for the https URL I tried:
>>
>> $ wget -S --content-disposition
>> "https://fp-pr1.ds.microsoft.com/TransferFile/FileTransfer.dll?Cmd=1&MN=1234567890&Dir=1&Mode=0&Off=0&TS=-1ACD-4945-BCD6-DDAFE738ECB3&CVN=5,0,0,32";
>
> Those urls have expired, but I made a quick test script (with http), and
> it worked.

I neglected to mention that I munged the URL in the example in case it
contained info related to my login ID. If you want to test downloading
files from that site specifically:
 Log into http://connect.microsoft.com
 Go to
https://connect.microsoft.com/site148/Downloads/DownloadDetails.aspx?DownloadID=21028
Click one of the download links at the bottom. As your browser is
downloading the file, right-click the downloading file and choose copy
link location, then use that URL with wget. (That procedure works in
Firefox.)

-- Mark

Re: [Bug-wget] Problem with --content-disposition, HTTP/FTP connection not reused

2011-10-19 Thread Jochen Roderburg


Zitat von Ángel González :


Jochen Roderburg wrote:

This looks like the same issue I decribed recently here:

wget makes a HEAD request first, and the reply-headers do not  
contain a Content-Disposition header.
The Content-Disposition header comes then on the subsequent GET  
request, but wget seems to ignore it there.


Regards,
Jochen Roderburg
Confirmed. Running  wget --timestamp -S --content-disposition  
http://example.com and giving the Content-Disposition header just on  
GET, gives the above result.


Precisely, that is the combination of options which triggers the  
effect in recent wget 1.13.x versions because --timestamp=on forces  
the HEAD requests.
Older versions with Content-Disposition support (like the 1.11.4 which  
the OP reported) made *always* HEAD requests with  
--content-disposition=on alone and had this error then also always.


Regards,
Jochen Roderburg

Re: [Bug-wget] Problem with --content-disposition, HTTP/FTP connection not reused

2011-10-18 Thread Ángel González


Jochen Roderburg wrote:

This looks like the same issue I decribed recently here:

wget makes a HEAD request first, and the reply-headers do not contain 
a Content-Disposition header.
The Content-Disposition header comes then on the subsequent GET 
request, but wget seems to ignore it there.


Regards,
Jochen Roderburg
Confirmed. Running  wget --timestamp -S --content-disposition 
http://example.com and giving the Content-Disposition header just on 
GET, gives the above result.

Re: [Bug-wget] Problem with --content-disposition, HTTP/FTP connection not reused

2011-10-18 Thread Jochen Roderburg


Zitat von ma...@clara.co.uk:


The second issue is that the --content-disposition option doesn't seem to
work, at least not for the https URL I tried:

$ wget -S --content-disposition
"https://fp-pr1.ds.microsoft.com/TransferFile/FileTransfer.dll?Cmd=1&MN=1234567890&Dir=1&Mode=0&Off=0&TS=-1ACD-4945-BCD6-DDAFE738ECB3&CVN=5,0,0,32";
--2011-10-16 21:05:06--
https://fp-pr1.ds.microsoft.com/TransferFile/FileTransfer.dll?Cmd=1&MN=1234567890&Dir=1&Mode=0&Off=0&TS=-1ACD-4945-BCD6-DDAFE738ECB3&CVN=5,0,0,32
Resolving fp-pr1.ds.microsoft.com... 65.54.120.201
Connecting to fp-pr1.ds.microsoft.com|65.54.120.201|:443... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Content-Length: 342
  Content-Type: text/html
  Server: Microsoft-IIS/7.5
  ServerVersion: 5, 0, 0, 42
  FTMException: 22073
  FTMExceptionText: ule: CFileTransferIsapi.cpp, Line#: 232  Debug
Text: General Application Error Context Text:
CFileTransferIsapi::Run Command:CMD is invalid
  FTMClass:
  FTMClassText:
  X-Powered-By: ASP.NET
  Date: Sun, 16 Oct 2011 20:05:07 GMT
  Connection: keep-alive
Length: 342 [text/html]
--2011-10-16 21:05:07--
https://fp-pr1.ds.microsoft.com/TransferFile/FileTransfer.dll?Cmd=1&MN=1234567890&Dir=1&Mode=0&Off=0&TS=-1ACD-4945-BCD6-DDAFE738ECB3&CVN=5,0,0,32
Reusing existing connection to fp-pr1.ds.microsoft.com:443.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Content-Length: 86751232
  Content-Type: application/octet-stream
  Expires: -11
  Server: Microsoft-IIS/7.5
  ServerVersion: 5, 0, 0, 42
  SPTransferStatus:
  Content-Disposition: inline; filename="winddk.rtm.iso"
  X-Powered-By: ASP.NET
  Date: Sun, 16 Oct 2011 20:05:08 GMT
  Connection: keep-alive
Length: 86751232 (83M) [application/octet-stream]
Saving to:
`FileTransfer.dll?Cmd=1&MN=1234567890&Dir=1&Mode=0&Off=0&TS=-1ACD-4945-BCD6-DDAFE738ECB3&CVN=5,0,0,32'


-- Mark


This looks like the same issue I decribed recently here:

wget makes a HEAD request first, and the reply-headers do not contain  
a Content-Disposition header.
The Content-Disposition header comes then on the subsequent GET  
request, but wget seems to ignore it there.


Regards,
Jochen Roderburg

Re: [Bug-wget] Problem with --content-disposition, HTTP/FTP connection not reused

2011-10-18 Thread Ángel González


On 16/10/11 22:33, markk wrote:

Hi,

I'm writing to report a couple of issues with wget (version 1.11.4, so
apologies if either has been fixed recently).

The first issue is that wget doesn't seem to reuse the HTTP or FTP
connection when multiple URLs (to the same site) are given on the command
line, e.g.
$ wget ftp://site.example.com/path1/file1.bin
ftp://site.example.com/path2/file2.bin

It is reused here (1.13.4) with http, but not with ftp.



The second issue is that the --content-disposition option doesn't seem to
work, at least not for the https URL I tried:

$ wget -S --content-disposition
"https://fp-pr1.ds.microsoft.com/TransferFile/FileTransfer.dll?Cmd=1&MN=1234567890&Dir=1&Mode=0&Off=0&TS=-1ACD-4945-BCD6-DDAFE738ECB3&CVN=5,0,0,32";


Those urls have expired, but I made a quick test script (with http), and 
it worked.

[Bug-wget] Problem with --content-disposition, HTTP/FTP connection not reused

2011-10-17 Thread markk

Hi,

I'm writing to report a couple of issues with wget (version 1.11.4, so
apologies if either has been fixed recently).

The first issue is that wget doesn't seem to reuse the HTTP or FTP
connection when multiple URLs (to the same site) are given on the command
line, e.g.
$ wget ftp://site.example.com/path1/file1.bin
ftp://site.example.com/path2/file2.bin

The second issue is that the --content-disposition option doesn't seem to
work, at least not for the https URL I tried:

$ wget -S --content-disposition
"https://fp-pr1.ds.microsoft.com/TransferFile/FileTransfer.dll?Cmd=1&MN=1234567890&Dir=1&Mode=0&Off=0&TS=-1ACD-4945-BCD6-DDAFE738ECB3&CVN=5,0,0,32";
--2011-10-16 21:05:06-- 
https://fp-pr1.ds.microsoft.com/TransferFile/FileTransfer.dll?Cmd=1&MN=1234567890&Dir=1&Mode=0&Off=0&TS=-1ACD-4945-BCD6-DDAFE738ECB3&CVN=5,0,0,32
Resolving fp-pr1.ds.microsoft.com... 65.54.120.201
Connecting to fp-pr1.ds.microsoft.com|65.54.120.201|:443... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Content-Length: 342
  Content-Type: text/html
  Server: Microsoft-IIS/7.5
  ServerVersion: 5, 0, 0, 42
  FTMException: 22073
  FTMExceptionText: ule: CFileTransferIsapi.cpp, Line#: 232  Debug
Text: General Application Error Context Text:
CFileTransferIsapi::Run Command:CMD is invalid
  FTMClass:
  FTMClassText:
  X-Powered-By: ASP.NET
  Date: Sun, 16 Oct 2011 20:05:07 GMT
  Connection: keep-alive
Length: 342 [text/html]
--2011-10-16 21:05:07-- 
https://fp-pr1.ds.microsoft.com/TransferFile/FileTransfer.dll?Cmd=1&MN=1234567890&Dir=1&Mode=0&Off=0&TS=-1ACD-4945-BCD6-DDAFE738ECB3&CVN=5,0,0,32
Reusing existing connection to fp-pr1.ds.microsoft.com:443.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Content-Length: 86751232
  Content-Type: application/octet-stream
  Expires: -11
  Server: Microsoft-IIS/7.5
  ServerVersion: 5, 0, 0, 42
  SPTransferStatus:
  Content-Disposition: inline; filename="winddk.rtm.iso"
  X-Powered-By: ASP.NET
  Date: Sun, 16 Oct 2011 20:05:08 GMT
  Connection: keep-alive
Length: 86751232 (83M) [application/octet-stream]
Saving to:
`FileTransfer.dll?Cmd=1&MN=1234567890&Dir=1&Mode=0&Off=0&TS=-1ACD-4945-BCD6-DDAFE738ECB3&CVN=5,0,0,32'


-- Mark

Re: [Bug-wget] Problem with wget ,

2011-10-07 Thread Ángel González


Juda Barnes wrote:

Dear Wget Bug alias ,

I am facing a problem with Wget,

Looks like I am getting a strange  charter ג
At the end of some lines (see capture below)

Also  when the wget finish to download a file there is no indication for the 
average speed in KB/sec

I suspect it is related to terminal somehow ,

[cid:image001.png@01CC8458.BC008180]



This footnote confirms that this email message has been scanned by
PineApp Mail-SeCure for the presence of malicious code, vandals&  computer 
viruses.

I had trouble getting your screenshot it was embedded as base64, but 
didn't show as an attachment.
You don't show which wget version you are using, but in those places 
there are quotes, so it's either
` ' or “” (smart quotes), for which putty doesn't seem to show an 
appropiate glyph.
Anyway, the lack of proper appearance of those quotation marks doesn't 
affect wget functionality.

[Bug-wget] Problem with IPv4/IPv6 DNS resolution

2011-07-14 Thread Nelson A. de Oliveira

Hi!

I am seeing a wget behavior that in my opinion shouldn't be seen.
While trying to resolve a domain that has both IPv4 and IPv6 address,
wget fails to resolve.
For example:

=
$ wget -d git.wifi.pps.jussieu.fr
DEBUG output created by Wget 1.12 on linux-gnu.

--2011-07-14 10:36:33--  http://git.wifi.pps.jussieu.fr/
Resolving git.wifi.pps.jussieu.fr... failed: Name or service not known.
wget: unable to resolve host address `git.wifi.pps.jussieu.fr'
=

Saying to prefer IPv4 also doesn't work:

=
$ wget -d --prefer-family=IPv4 git.wifi.pps.jussieu.fr
Setting --prefer-family (preferfamily) to IPv4
DEBUG output created by Wget 1.12 on linux-gnu.

--2011-07-14 10:07:47--  http://git.wifi.pps.jussieu.fr/
Resolving git.wifi.pps.jussieu.fr... failed: Name or service not known.
wget: unable to resolve host address `git.wifi.pps.jussieu.fr'
=

But forcing to use IPv4 works:

=
$ wget -d -4 git.wifi.pps.jussieu.fr
Setting --inet4-only (inet4only) to 1
DEBUG output created by Wget 1.12 on linux-gnu.

--2011-07-14 10:11:01--  http://git.wifi.pps.jussieu.fr/
Resolving git.wifi.pps.jussieu.fr... 91.121.16.100
Caching git.wifi.pps.jussieu.fr => 91.121.16.100
Connecting to git.wifi.pps.jussieu.fr|91.121.16.100|:80... connected.
(...)
=

It's possible to see that the record has both IPv4 and IPv6 addresses:

=
$ host git.wifi.pps.jussieu.fr
git.wifi.pps.jussieu.fr is an alias for coloquinte.cristau.org.
coloquinte.cristau.org has address 91.121.16.100
coloquinte.cristau.org has IPv6 address 2001:41d0:1:6364::1
=

It's strange that everything else works here: GET, all the browsers,
ping and everything else. Only wget fails to connect to them (not only
with this example domain, but with a lot of domains that have both
IPv4 and IPv6 records).

In /etc/wgetrc I have only "passive_ftp = on"

wget -V gives:

=
GNU Wget 1.12 built on linux-gnu.

+digest +ipv6 +nls +ntlm +opie +md5/openssl +https -gnutls +openssl
-iri

Wgetrc:
/etc/wgetrc (system)
Locale: /usr/share/locale
Compile: gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc"
-DLOCALEDIR="/usr/share/locale" -I. -I../lib -g -O2 -DNO_SSLv2
-D_FILE_OFFSET_BITS=64 -O2 -g -Wall
Link: gcc -g -O2 -DNO_SSLv2 -D_FILE_OFFSET_BITS=64 -O2 -g -Wall
/usr/lib/libssl.so /usr/lib/libcrypto.so -ldl -lrt ftp-opie.o
openssl.o http-ntlm.o gen-md5.o ../lib/libgnu.a
=

wget is Debian's package 1.12-3.1 if it helps.

Thank you!

Best regards,
Nelson

Re: [Bug-wget] Problem mirroring site with two domain names

2011-05-20 Thread Ángel González

Chris Dorsey wrote:
> I am trying to mirror a web site that has two domain names, let's call them 
> www.abc.com and www.abcdef.com. Both URLs get to the same site. If I browse 
> the site in IE I can see some hyperlinks point to http://www.abc.com/... and 
> some point to http://www.abcdef.com.
>
> I am using this command line:
>
> wget.exe -r -l inf -w 10 --random-wait -E -k -K -N -H -D abcdef.com,abc.com 
> -o wgetlog.txt http://abc.com/
>
> What I get is two directories named www.abc.com/ and www.abcdef.com/ with 
> almost identical contents. The content has effectively been downloaded twice.
>
> What I want to do is make a single mirror copy of www.abc.com, with all the 
> references to www.abcdef.com treated as references to www.abc.com when the 
> links are converted in the local copy (-k).
>
> Any ideas?
>
>
> Chris Dorsey

Precreate the folders with abcdef.com being a symlink to abc.com
The links are not converted, but you will only download things almost
once. When wget goes to check the second domain it will find out that
it already has every file dowloaded from the other domain.

[Bug-wget] Problem mirroring site with two domain names

2011-05-18 Thread Chris Dorsey

I am trying to mirror a web site that has two domain names, let's call them 
www.abc.com and www.abcdef.com. Both URLs get to the same site. If I browse the 
site in IE I can see some hyperlinks point to http://www.abc.com/... and some 
point to http://www.abcdef.com.

I am using this command line:

wget.exe -r -l inf -w 10 --random-wait -E -k -K -N -H -D abcdef.com,abc.com -o 
wgetlog.txt http://abc.com/

What I get is two directories named www.abc.com/ and www.abcdef.com/ with 
almost identical contents. The content has effectively been downloaded twice.

What I want to do is make a single mirror copy of www.abc.com, with all the 
references to www.abcdef.com treated as references to www.abc.com when the 
links are converted in the local copy (-k).

Any ideas?


Chris Dorsey
WARNING
===
The content of this message is intended only for the use of the person it is 
addressed to and is confidential and may also be legally privileged.  
If this message is not addressed to you, you must not read, use, distribute or 
copy this document.  If you have received this message in error please advise 
Solid Energy by return email at administra...@solidenergy.co.nz and destroy the 
original message.  

Please consider the environment before printing this email 


Thankyou.

===

Re: [Bug-wget] Problem with WGET

2011-04-10 Thread Tobias Senz

Hi!

On 10.04.2011 03:06, James K Lewis wrote:
> I have utilized wget for several years to ftp data to my computer to force
> ocean circulation models.  I recently installed the CA Security software,
> replacing Norton 360.  The software that executes the calls to wget has now
> started running somewhat unusually.  I begin the software, it in turn begins
> making a number of calls to wget to download files via ftp.  The strange
> part is that in some cases (not always), wget downloads and stores the files
> but doesn't stop executing (I can still see it in the Windows Task Manager,
> but using no CPU time).  My software in which the calls to wget are imbedded
> then hangs.  When I highlight wget in the Task Manager and click on "End
> Process", the software then continues to execute.
> 
> I have run tests in which I executed the software several times within 5
> minutes, and the wget hang-ups DO NOT occur when ftp'ing the same files.  It
> appears to hangup at random calls to wget.
> 
> Is there a way from the inputs in the command line to wget to force itself
> to end?  Something like waiting for 20 s and then forcing wget to end?

Just a couple of points that might or might not be any help.
I've seen this exact behaviour, software forking another executable and
the forked executable just no longer doing anything in the past. That
was neither wget nor with CA Internet Security Suite but with the old
Comodo "Firewall" in combination with some other software. (It was the
old Comodo which was also still compatible with Windows 2000. They
discontinued that "Firewall" line, because ... well ... it didn't work
as it created more problems than solved.)
It just halted the task as you describe, no user interaction whatsoever.
No pop-up, no nothing.

So you might want to look into taking it up wit CA, as it might not be a
problem with wget or your oceanography software at all. Or checking the
settings of your security suite relating to code injection / dll
execution, "advanced code security" or whatever else they might call it.
The security suite might inject code into running processes to detect
malware (spyware), which interfers with pretty much anything run on that
computer.
(Or use a security suite that actually works as intended. I guess that
goes without saying ;) )

If you have a wgetrc or .wgetrc configuration file (depending how and
where you got the compiled wget the locations will differ - I'm using
Cygwin so I couldn't tell you where it is on other binaries) you could
try to add

debug = on
logfile = c:\wgetlog.txt

which might give some indication what the last instance of wget was able
to do.
I'm not sure how to specify "append" for log file in the configuration
file, so the log file gets written over on every instance of wget.

On command line append would be
--debug -a c:\wgetlog.txt

If you need a "kill after 20 seconds" I don't think wget itself can
provide that. I guess you could write a wrapper batch script that forks
wget, waits for a while whether it does or does not return and then
kills it.

I'm not sure which if any Windows OS versions come with the NT
"kill.exe". At the very least there is a version in this "Debugging
Tools for Windows" package
http://msdl.microsoft.com/download/symbols/debuggers/dbg_x86_6.11.1.404.msi

If you need any newer or 64 bit version:
http://www.microsoft.com/whdc/devtools/debugging/default.mspx

I usually just unpack (not install) that .msi with 7-zip and only use
the "KillEXE" file renamed as kill.exe . So no idea what option(s) would
be appropriate when installing the .msi. (Custom install with only
"Tools" selected? Change install folder to something WITHOUT spaces or
brackets?)

To kill any and all instances of wget.exe the command line would look like
kill.exe -f wget.exe
(Might be a good idea to give the complete path for kill.exe.)

Waiting for a specified time in plain Windows batch scripting is usually
done with using "ping". (No this isn't a joke. There isn't any
"sleep.exe" unless using Cygwin. :) )
Something like
%SYSTEMROOT%\system32\ping.exe -n 20 127.0.0.1
for approx. 20 seconds.

Or write a wrapper in another programming language.

Oh, it might even be the case that the security suite prevents you from
killing the (wget) executable, as was the case with my problems with
Comodo. From your description, if killing via Task Manager works, I
guess kill.exe also should work with "-f".

Regards, Tobias.

[Bug-wget] Problem with WGET

2011-04-09 Thread James K Lewis

I have utilized wget for several years to ftp data to my computer to force
ocean circulation models.  I recently installed the CA Security software,
replacing Norton 360.  The software that executes the calls to wget has now
started running somewhat unusually.  I begin the software, it in turn begins
making a number of calls to wget to download files via ftp.  The strange
part is that in some cases (not always), wget downloads and stores the files
but doesn't stop executing (I can still see it in the Windows Task Manager,
but using no CPU time).  My software in which the calls to wget are imbedded
then hangs.  When I highlight wget in the Task Manager and click on "End
Process", the software then continues to execute.

I have run tests in which I executed the software several times within 5
minutes, and the wget hang-ups DO NOT occur when ftp'ing the same files.  It
appears to hangup at random calls to wget.

Is there a way from the inputs in the command line to wget to force itself
to end?  Something like waiting for 20 s and then forcing wget to end?


James Lewis

Dr. James K Lewis
Senior Scientist, Oceanography
Scientific Solutions, Inc.
4875 Kikala Road
Kalaheo, HI 96741
808-651-7740
www.hawaii-ocean.com

Re: [Bug-wget] Problem with encoded spaces in CSS @import

2010-09-14 Thread Giuseppe Scrivano

Manuel Reinhardt  writes:

> When downloading an html document with
>
> wget -E -k -p http://...
>
> I noticed that sometimes some of the Stylesheets are not found when
> opening the local copy. This happens when the html uses a CSS @import
> statement that includes a URL containing spaces encoded as %20. While
> converting to a local URL, wget changes the spaces to ", which does
> not work there.

Thanks for your report!  I am going to push this patch, it disables HTML
quoting in CSS files.


Cheers,
Giuseppe



=== modified file 'src/convert.c'
--- src/convert.c   2010-08-20 01:11:07 +
+++ src/convert.c   2010-09-14 09:55:31 +
@@ -203,7 +203,7 @@
 static const char *replace_attr (const char *, int, FILE *, const char *);
 static const char *replace_attr_refresh_hack (const char *, int, FILE *,
   const char *, int);
-static char *local_quote_string (const char *);
+static char *local_quote_string (const char *, bool);
 static char *construct_relative (const char *, const char *);
 
 /* Change the links in one file.  LINKS is a list of links in the
@@ -301,7 +301,8 @@
   /* Convert absolute URL to relative. */
   {
 char *newname = construct_relative (file, link->local_name);
-char *quoted_newname = local_quote_string (newname);
+char *quoted_newname = local_quote_string (newname,
+   link->link_css_p);
 
 if (link->link_css_p)
   p = replace_plain (p, link->size, fp, quoted_newname);
@@ -325,7 +326,7 @@
 char *quoted_newlink = html_quote_string (newlink);
 
 if (link->link_css_p)
-  p = replace_plain (p, link->size, fp, quoted_newlink);
+  p = replace_plain (p, link->size, fp, newlink);
 else if (!link->link_refresh_p)
   p = replace_attr (p, link->size, fp, quoted_newlink);
 else
@@ -612,14 +613,14 @@
because those characters have special meanings in URLs.  */
 
 static char *
-local_quote_string (const char *file)
+local_quote_string (const char *file, bool no_html_quote)
 {
   const char *from;
   char *newname, *to;
 
   char *any = strpbrk (file, "?#%;");
   if (!any)
-return html_quote_string (file);
+return no_html_quote ? strdup (file) : html_quote_string (file);
 
   /* Allocate space assuming the worst-case scenario, each character
  having to be quoted.  */
@@ -656,7 +657,7 @@
   }
   *to = '\0';
 
-  return html_quote_string (newname);
+  return no_html_quote ? strdup (newname) : html_quote_string (newname);
 }
 
 /* Book-keeping code for dl_file_url_map, dl_url_file_map,

[Bug-wget] Problem with encoded spaces in CSS @import

2010-09-10 Thread Manuel Reinhardt

Hi all,

When downloading an html document with

wget -E -k -p http://...

I noticed that sometimes some of the Stylesheets are not found when
opening the local copy. This happens when the html uses a CSS @import
statement that includes a URL containing spaces encoded as %20. While
converting to a local URL, wget changes the spaces to ", which does
not work there.

Tested with wget 1.12 and Firefox 3.6.9 on Ubuntu Linux.

Cheers,

Manuel Reinhardt


-- 
"If the machine produces tranquility it's right. If it disturbs you it's
wrong until either the machine or your mind is changed."

Manuel Reinhardt
SYSLAB.COM GmbH, Landwehrstrasse 60-62, 80336 Munich, Germany
http://www.syslab.com

Re: [Bug-wget] Problem: files and directories with the same name

2010-07-30 Thread Keisial

 Islon Scherer wrote:
> Hi, I'm using wget to recursively download content from a bunch os sites.
> The command line is "wget -x -r -l1 [url]"
> I have a problem with one url:
> http://olhardigital.uol.com.br/ultimas_noticias/1
> If I execute wget with my parameters in this url it gives me lots of 
> "No such file or directory"
> for every file inside the
> 'olhardigital.uol.com.br/produtos/digital_news/' directory
> because 'olhardigital.uol.com.br/produtos/digital_news' is a file too
> (html) saved by wget
> previously so it can't create the 'digital_news' directory in the file
> system.
> I can't remove the directory sctructure (-x option) because I have to
> know the url of the downloaded
> files for further processing.
> Is there a way to circunvent the file/dir with the same name problem?
> Or a way to
> retrieve the original url of the file without using the directory
> structure?
>
> Reproduce the problem executing: wget -x -r -l1
> http://olhardigital.uol.com.br/ultimas_noticias/1
>
> Regards.
> Islon Scherer

Adding the -E option seems to skip the problem.

[Bug-wget] Problem: files and directories with the same name

2010-07-30 Thread Islon Scherer


Hi, I'm using wget to recursively download content from a bunch os sites.
The command line is "wget -x -r -l1 [url]"
I have a problem with one url: 
http://olhardigital.uol.com.br/ultimas_noticias/1
If I execute wget with my parameters in this url it gives me lots of  
"No such file or directory"
for every file inside the 
'olhardigital.uol.com.br/produtos/digital_news/' directory
because 'olhardigital.uol.com.br/produtos/digital_news' is a file too 
(html) saved by wget
previously so it can't create the 'digital_news' directory in the file 
system.
I can't remove the directory sctructure (-x option) because I have to 
know the url of the downloaded

files for further processing.
Is there a way to circunvent the file/dir with the same name problem? Or 
a way to

retrieve the original url of the file without using the directory structure?

Reproduce the problem executing: wget -x -r -l1 
http://olhardigital.uol.com.br/ultimas_noticias/1


Regards.
Islon Scherer

Re: [Bug-wget] Problem with Twitter Basic Authetification

2010-07-14 Thread Michelle Konzack

Hello Roland Mösl,

Am 2010-07-14 10:21:06, hacktest Du folgendes herunter:
> I use Perl and make the requests to Twitter.com with WGET. 

If you use already "perl", why do you not use its native "curl" package?

Thanks, Greetings and nice Day/Evening
Michelle Konzack

-- 
# Debian GNU/Linux Consultant ##
   Development of Intranet and Embedded Systems with Debian GNU/Linux

itsyst...@tdnet France EURL   itsyst...@tdnet UG (limited liability)
Owner Michelle KonzackOwner Michelle Konzack

Apt. 917 (homeoffice)
50, rue de Soultz Kinzigstraße 17
67100 Strasbourg/France   77694 Kehl/Germany
Tel: +33-6-61925193 mobil Tel: +49-177-9351947 mobil
Tel: +33-9-52705884 fix

  
 

Jabber linux4miche...@jabber.ccc.de
ICQ#328449886

Linux-User #280138 with the Linux Counter, http://counter.li.org/


signature.pgp
Description: Digital signature

[Bug-wget] Problem with Twitter Basic Authetification

2010-07-14 Thread Roland Mösl

I purchased in February TweetAdder Software. 

But this software has so many time consuming bugs and so few features, that I 
feel forced to write my own Twitter software. 

I use Perl and make the requests to Twitter.com with WGET. 

Geting lists works fine, but now I want to sent follow and unfollow commands to 
Twitter requiring authentification. 

First I tried the following call of WGET 

"C:/Programme/GnuWin32/bin/wget.exe" --append-output=c:/twitter/twitter.log 
--server-response --save-headers --http-user=MYUSERNAME 
--http-password=MYPASSWORD --auth-no-challenge 
http://api.twitter.com/1/friendships/create.xml?user_id=27016367 
--output-document=c:/twitter/_follow_sent.txt 


After this failed always with 401 error, I tried to put direct a basic 
authentification line into --header 

"C:/Programme/GnuWin32/bin/wget.exe" --append-output=c:/twitter/twitter.log 
--server-response --save-headers --header "Authorization: Basic BASE64 encoded 
username=password" 
http://api.twitter.com/1/friendships/create.xml?user_id=27016367 
--output-document=c:/twitter/_follow_sent.txt 

Same 401 error problem 

No idea. 

best regards

Roland Mösl

Re: [Bug-wget] Problem with option setting.

2010-06-28 Thread Giuseppe Scrivano

vivi  writes:

> Hi all, in an attempt to add a config option this error occured when
> running "./wget --config=/some/place/wgetrc google.com"

how have you changed "struct cmdline_option option_data" in main.c?  I
get the error you have reported if the `data' member is different than
"config".

Cheers,
Giuseppe

[Bug-wget] Problem with option setting.

2010-06-27 Thread vivi

Hi all, in an attempt to add a config option this error occured when
running "./wget --config=/some/place/wgetrc google.com"

wget: init.c:723: setval_internal: Assertion `0 <= comind && ((size_t)
comind) < (sizeof (commands) / sizeof ((commands)[0]))' failed.
Aborted

I understand that this usually occurs when the commands are not
alphabetically sorted, however, this seems to be alphabetically sorted,
and the error does not occur when using other options.


 133   { "cache",&opt.allow_cache,   cmd_boolean },
 134 #ifdef HAVE_SSL
 135   { "cadirectory",  &opt.ca_directory,  cmd_directory },
 136   { "certificate",  &opt.cert_file, cmd_file },
 137   { "certificatetype",  &opt.cert_type, cmd_cert_type },
 138   { "checkcertificate", &opt.check_cert,cmd_boolean },
 139 #endif
 140
 141   { "config",   &opt.choose_config, cmd_file },
 142   { "connecttimeout",   &opt.connect_timeout,   cmd_time },
 143   { "contentdisposition", &opt.content_disposition, cmd_boolean },
 144   { "continue", &opt.always_rest,   cmd_boolean },
 145   { "convertlinks", &opt.convert_links, cmd_boolean },
 146   { "cookies",  &opt.cookies,   cmd_boolean },

I set a breakpoint just before and changed comind from -1 to 100 which
seemed to appease wget and it continued.

Any help is greatly appreciated.

Reza

Re: [Bug-wget] Problem downloading pages

2010-05-31 Thread Giuseppe Scrivano

it can't be done with a single call to wget but you need a script.  This
shell function can help you to get the desired pdf file.

function download_article
{
until fgrep "POST" $1.html; do
wget -O $1.html --keep-session-cookies \
--save-cookies=cookies.$1  --load-cookies=cookies.$1 \
"http://archivio.lastampa.it/LaStampaArchivio/servlet/CreaPdf?ID=$1";
sleep 2s
done

wget --post-data="" -O $1.pdf --keep-session-cookies \
--save-cookies=cookies.$1  --load-cookies=cookies.$1 \
"http://archivio.lastampa.it/LaStampaArchivio/servlet/CreaPdf?ID=$1";

rm $1.html
rm cookies.$1
}

# Call the function
download_article 1050435


Cheers,
Giuseppe



"Non scrivetemi"  writes:

> Hi,
> could you please tell me how can I download these pages with wget?
>
> http://archivio.lastampa.it/LaStampa...Pdf?ID=1050435
> http://archivio.lastampa.it/LaStampa...Pdf?ID=1050435
> http://archivio.lastampa.it/LaStampa...Pdf?ID=1129534
> .
> .
> .
>
> If I try to download them I get a "Pdf creation in progress page", not the 
> real pdf!

[Bug-wget] Problem downloading pages

2010-05-31 Thread Non scrivetemi

Hi,
could you please tell me how can I download these pages with wget?

http://archivio.lastampa.it/LaStampa...Pdf?ID=1050435
http://archivio.lastampa.it/LaStampa...Pdf?ID=1050435
http://archivio.lastampa.it/LaStampa...Pdf?ID=1129534
.
.
.

If I try to download them I get a "Pdf creation in progress page", not the real 
pdf!

RE: [Bug-wget] Problem with --post-data

2009-12-10 Thread Tony Lewis

The error is being reported by your command-line processor (bash), not wget.

Another alternative is to put the POST data into a file and then use
--post-file. (The POST file needs to include the 'data=' as well as the
contents of infile.txt.)

Tony
-Original Message-
From: bug-wget-bounces+wget=exelana@gnu.org
[mailto:bug-wget-bounces+wget=exelana@gnu.org] On Behalf Of Liza
Al-Shikhley
Sent: Thursday, December 10, 2009 8:43 AM
To: bug-wget@gnu.org
Subject: [Bug-wget] Problem with --post-data

Dear all,

I'm trying to download the content of a file with this command line:

wget --post-data="data=`more infile.txt`" "myServer" -O resultfile.txt

It works if infile.txt is a small file: I retrieve the content of infile.txt
in resultfile.txt
But if infile.txt is a much bigger file, I get this message:

-bash: /opt/local/bin/wget: Argument list too long.

How can I fix this problem?

Thank you very much for your help.

Best regards,

Liza

1 2 >

1 - 100 of 110 matches

Mail list logo