Re: [Bug-wget] limit download size -- 201901233

2019-01-23 Thread Yousong Zhou
On Thu, 24 Jan 2019 at 12:11,  wrote:
>
> - Yousong Zhou  wrote :
> > On Thu, 24 Jan 2019 at 02:32, Tim Rühsen  wrote:
> > >
> > > On 23.01.19 03:47, c...@free.fr wrote:
> > > > Hi,
> > > >   acording to
> > > > $wget --help
> > > >   i should send reports and suggestions to this address, so i hope i'm 
> > > > doing right here.
> > > >
> > > >the version of my distribution, given by the above command, is "GNU 
> > > > Wget 1.18"
> > > >
> > > >and i don't seem to see an option to limit the retrieval to a 
> > > > certain amount of data or a range.
> > > >is it possible?
> > > >
> > > > thanks in advance and happy new year,
> > > >
> > > > Zui
> > > > 201901233
> > > >
> > >
> > > You could set the Range HTTP header - many servers support it.
> > >
> > > Like
> > >
> > > wget --header "Range: bytes=0-1" https://www.example.com/filename
> > >
> > > Regards, Tim
> > >
> >
> > At least for wget 1.19.1, it will ignore 206 "Partial Content", unless
> > we need to make it think it's continuing previous partial download.
> > Specifying Range header is not an reliable option in this regard
> >
> > echo -n aaa >b
> > wget -c -O b --header "Range: 3-1000" URL
> >
> > yousong
> Thank you both for your input...
>   and, as yousong wrote the Range header is not handled correctly by wget 
> (removing boring parts) :
> $ wget --header "Range: bytes=500-1000" https://free.fr
>   --2019-01-24 02:22:25--  https://server.dom/
>   Resolving server.dom (server.dom)... 
>   Connecting to server.dom (server.dom)...   connected.
>   HTTP request sent, awaiting response... 206 Partial Content
>   Retrying.
>
>   --2019-01-24 02:22:26--  (try: 2)  https://server.dom/
>   Connecting to server.dom (server.dom)...   connected.
>   HTTP request sent, awaiting response... 206 Partial Content
>   Retrying.
>
>   <...loop af retries...>
>
>   but curl is not exempt of problems as in (both cases bring the whole thing):
>   $ curl   https://ddg.gg > a
> % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>   Dload  Upload   Total   SpentLeft  
> Speed
>   100   178  100   1780 0310  0 --:--:-- --:--:-- 
> --:--:--   370
>   $ curl --header "Range: bytes=10-40"  https://ddg.gg > a
> % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>   Dload  Upload   Total   SpentLeft  
> Speed
>   100   178  100   1780 0314  0 --:--:-- --:--:-- 
> --:--:--   376
>

curl has --range specifically for this.

>   as for using "| head -c (end-start)" as you apply in mget, doesn' it 
> actually generate more traffic
>   than the expected (end-start) nimber of bytes?
>   (i mean, since the download goes systematically till the end, if i am 
> correct)
>
> zui
> 201901244

when head quit, wget writing to stdout will receive SIGPIPE and is
expected to quit.  It's likely that buffering in wget may cause excess
traffic be transferred on wire but I think the amount should be
neglectable.

yousong



Re: [Bug-wget] limit download size -- 201901233

2019-01-23 Thread Yousong Zhou
On Thu, 24 Jan 2019 at 02:32, Tim Rühsen  wrote:
>
> On 23.01.19 03:47, c...@free.fr wrote:
> > Hi,
> >   acording to
> > $wget --help
> >   i should send reports and suggestions to this address, so i hope i'm 
> > doing right here.
> >
> >the version of my distribution, given by the above command, is "GNU Wget 
> > 1.18"
> >
> >and i don't seem to see an option to limit the retrieval to a certain 
> > amount of data or a range.
> >is it possible?
> >
> > thanks in advance and happy new year,
> >
> > Zui
> > 201901233
> >
>
> You could set the Range HTTP header - many servers support it.
>
> Like
>
> wget --header "Range: bytes=0-1" https://www.example.com/filename
>
> Regards, Tim
>

At least for wget 1.19.1, it will ignore 206 "Partial Content", unless
we need to make it think it's continuing previous partial download.
Specifying Range header is not an reliable option in this regard

echo -n aaa >b
wget -c -O b --header "Range: 3-1000" URL

yousong



Re: [Bug-wget] limit download size -- 201901233

2019-01-22 Thread Yousong Zhou
On Wed, 23 Jan 2019 at 12:06,  wrote:
>
> Hi,
>   acording to
> $wget --help
>   i should send reports and suggestions to this address, so i hope i'm doing 
> right here.
>
>the version of my distribution, given by the above command, is "GNU Wget 
> 1.18"
>
>and i don't seem to see an option to limit the retrieval to a certain 
> amount of data or a range.
>is it possible?
>
> thanks in advance and happy new year,
>
> Zui
> 201901233
>

Wget has an option "--start-pos ZERO-based-offset".  I use "head -c N"
for limiting the download size [1].

 [1] mget for downloading pieces of remote file in parallel,
https://github.com/yousong/dconf/blob/master/data/_usr.env/bin/mget#L43

yousong



[Bug-wget] [bug #50260] Link failed caused by bad linker option -R

2017-02-10 Thread Yousong Zhou
Follow-up Comment #7, bug #50260 (project wget):

Yes, using LIB instead of LTLIB worked.  And indeed, according to
the gnulib manual, the LTLIB was intended for linking with libtool.

https://www.gnu.org/software/gnulib/manual/html_node/Searching-for-Libraries.html

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #50260] Link failed caused by bad linker option -R

2017-02-09 Thread Yousong Zhou
Follow-up Comment #5, bug #50260 (project wget):

Digging further, this was probably caused by the commit adding LTLIBICONV to
LDADD for wget


commit d4f97dc9afd149afe1f7b16a84eebb4bab1f044a
Author: Tim Rühsen 
Date:   Sat Jun 11 22:38:42 2016 +0200

Add libraries to LDADD for wget

* src/Makefile.am: Add $(GETADDRINFO_LIB) $(HOSTENT_LIB) $(INET_NTOP_LIB)
 $(LIBSOCKET) $(LIB_CLOCK_GETTIME) $(LIB_CRYPTO) $(LIB_SELECT)
 $(LTLIBICONV) $(LTLIBINTL) $(LTLIBTHREAD) $(SERVENT_LIB) to LDADD


___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #50260] Link failed caused by bad linker option -R

2017-02-09 Thread Yousong Zhou
Follow-up Comment #4, bug #50260 (project wget):

The error happened with wget-1.19 vanilla tarball.

The -R option was probably from m4/lib-link.m4 (search for "-R$found_dir" in
that file) and according from the comment there, it was intended for libtool.

That macro in m4/lib-link.m4 was imported by the AM_ICONV macro in
configure.ac

Below is the related snippet in the generated src/Makefile after configure
run


LTLIBICONV='-L/home/yousong/.usr/lib -liconv -R/home/yousong/.usr/lib'


Note that I have those wget dependency libs including libiconv-1.14 in a
non-standard location ~/.usr/

This should be reproducible with the following steps

 - clone the build-scripts git repo
 - remove the do_patch func in build-wget.sh
 - remove the PKG_AUTOCONF_FIXUP=1 in build-wget.sh
 - "make wget/install/test" will build do the download, compile, install is
test_dir/ without polluting system-level settings.

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #50260] Link failed caused by bad linker option -R

2017-02-08 Thread Yousong Zhou
Follow-up Comment #2, bug #50260 (project wget):

Patching configure.ac to use libtool with AC_PROG_LIBTOOL fixed the link issue
for me, though I had to also patch the m4/po.m4 file to workaround the gettext
version check.

The patch hunks are available at
https://github.com/yousong/build-scripts/blob/683ff79878665fd1a422c4f9a8fddf2fd30be718/build-wget.sh.
 Feel free to pick them if they suit your needs ;)

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #50260] Link failed caused by bad linker option -R

2017-02-08 Thread Yousong Zhou
Follow-up Comment #1, bug #50260 (project wget):

Sorry, 1.17.1 was intended as a comparison instead of 1.18.

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #50260] Link failed caused by bad linker option -R

2017-02-08 Thread Yousong Zhou
URL:
  

 Summary: Link failed caused by bad linker option -R
 Project: GNU Wget
Submitted by: yousong
Submitted on: Thu 09 Feb 2017 03:14:16 AM UTC
Category: Build/Install
Severity: 3 - Normal
Priority: 5 - Normal
  Status: None
 Privacy: Public
 Assigned to: None
 Originator Name: 
Originator Email: 
 Open/Closed: Open
 Discussion Lock: Any
 Release: 1.19
Operating System: GNU/Linux
 Reproducibility: Every Time
   Fixed Release: None
 Planned Release: None
  Regression: Yes
   Work Required: None
  Patch Included: None

___

Details:

With 1.19


gcc  -I/home/yousong/.usr/include   -I/home/yousong/.usr/include  
-DHAVE_LIBSSL -I/home/yousong/.usr/include   -DNDEBUG -isystem
/home/yousong/.usr/include  -L/home/yousong/.usr/lib
-Wl,-rpath,/home/yousong/.usr/lib -L/home/yousong/.usr/lib64
-Wl,-rpath,/home/yousong/.usr/lib64 -o wget connect.o convert.o cookies.o
ftp.o css_.o css-url.o ftp-basic.o ftp-ls.o hash.o host.o hsts.o html-parse.o
html-url.o http.o init.o log.o main.o netrc.o progress.o ptimer.o recur.o
res.o retr.o spider.o url.o warc.o xattr.o utils.o exits.o build_info.o  
version.o ftp-opie.o openssl.o http-ntlm.o ../lib/libgnu.a -lrt
-L/home/yousong/.usr/lib -liconv -R/home/yousong/.usr/lib
-L/home/yousong/.usr/lib -lpcre   -luuid -L/home/yousong/.usr/lib -lssl
-lcrypto   -L/home/yousong/.usr/lib -lz
gcc: error: unrecognized option '-R'


With 1.18


gcc  -I/home/yousong/.usr/include   -I/home/yousong/.usr/include  
-DHAVE_LIBSSL -I/home/yousong/.usr/include   -DNDEBUG -isystem
/home/yousong/.usr/include  -L/home/yousong/.usr/lib
-Wl,-rpath,/home/yousong/.usr/lib -L/home/yousong/.usr/lib64
-Wl,-rpath,/home/yousong/.usr/lib64 -o wget connect.o convert.o cookies.o
ftp.o css_.o css-url.o ftp-basic.o ftp-ls.o hash.o host.o hsts.o html-parse.o
html-url.o http.o init.o log.o main.o netrc.o progress.o ptimer.o recur.o
res.o retr.o spider.o url.o warc.o utils.o exits.o build_info.o   version.o
ftp-opie.o openssl.o http-ntlm.o ../lib/libgnu.a
/home/yousong/.usr/lib/libiconv.so -Wl,-rpath -Wl,/home/yousong/.usr/lib 
-L/home/yousong/.usr/lib -lpcre   -luuid -L/home/yousong/.usr/lib -lssl
-lcrypto   -L/home/yousong/.usr/lib -lz-lrt


The problem is that I build my own copy of wget and its dependencies and
install them into a non-standard location.  The build system of wget
incorrectly thought that libtool was used and specified the library location
with -R which gcc did not understand and erred.

The script I am using is available at
https://github.com/yousong/build-scripts/blob/master/build-wget.sh




___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




Re: [Bug-wget] Multi segment download

2015-09-09 Thread Yousong Zhou
On 9 September 2015 at 11:20, Hubert Tarasiuk  wrote:
> On Sat, Aug 29, 2015 at 12:50 AM, Darshit Shah  wrote:
>> Thanking You,
>> Darshit Shah
>> Sent from mobile device. Please excuse my brevity
>> On 29-Aug-2015 1:13 pm, "Tim Rühsen"  wrote:
>>>
>>> Hi,
>>>
>>> normally it makes much more sense when having several download mirrors and
>>> checksums for each chunk. The perfect technique for such is called
>> 'Metalink'
>>> (more on www.metalinker.org).
>>> Wget has it in branch 'master'. A GSOC project of Hubert Tarasiuk.
>>>
>> Sometimes the evil ISPs enforce a per connection bandwidth limit. In such a
>> case, multi segment downloads from a single server do make sense.
>>
>> Since metalink already has support for downloading a file over multiple
>> connections, it should not be too difficult to reuse the code for use
>> outside of metalink.
> The current Metalink impl in Wget will not download from multiple
> mirrors simultaneously since Wget itself is single-threaded.
> Adding optional (POSIX) threads support to Wget (especially for the
> Metalinks) could be perhaps worth discussion.
> For now the solution might be to start multiple Wget instances using
> the --start-pos option and somehow limit the length of download (I am
> not sure if Wget currently has an option to do that).
>

As said in the discussion when we were about to introduce --start-pos
option, we can limit the length of download with other utilities such
as dd.  This is for the consideration of complexity.

Well, I just made a proof of concept shell script for starting
multiple wget processes to download HTTP files [1].

[1] Concurrent WGET with --start-pos option.
https://gist.github.com/yousong/48266375afb68f9fb85f

Cheers,

yousong



Re: [Bug-wget] Multi segment download

2015-09-08 Thread Yousong Zhou
On 29 August 2015 at 04:04, Abhilash Mhaisne  wrote:
> Hey all. I am new to this mailing list.
> As far as I've used wget, it downloads a specified file as a single segment.
> Can we modify this such that wget will download a file by dividing it into
> multiple
> segments and then combining all at reciever host? Just like some
> proprietary download
> managers do? If work on such a feature is going on, I'd like to be a part
> of it.
>

I guess this can be scripted with --start-pos option of wget?

yousong



Re: [Bug-wget] Issue with --content-on-error and --convert-links

2015-04-04 Thread Yousong Zhou
Hi Alexander,


On Apr 5, 2015 3:57 AM, Alexander Kurakin kuraga...@mail.ru wrote:

  Good day!

 So when the patch can be applied?

It has been in the git repository for a while shortly after the
release of v1.16.3 (not yet in a release version).

cheers,
   yousong



Re: [Bug-wget] Issue with --content-on-error and --convert-links

2015-03-08 Thread Yousong Zhou
Hi,

On 31 January 2015 at 10:13, Yousong Zhou yszhou4t...@gmail.com wrote:
 Ah sorry, my fault / misunderstanding.
 Since the patch changes Wget behaviour I would apply it after the next 
 bugfix
 release.
 A test case would be perfect. Please consider creating a python test case 
 (see
 directory testenv). We will move all test cases from perl to python by the
 time.


 Well, there they are, with a few fixes for other issues I encountered
 when preparing for this.

 patches look fine to me, could you please ensure to write the commit
 message using the ChangeLog format?

 When it is just one line log, you can just use the format:

 * blah/file (function): Describe what changed.

 Otherwise use the format:

 one short line to describe the change

 * blah/file1 (foo): Describe what changed here.
 * blah/file2 (bar): And here.

 More about the ChangeLog style here:

 https://www.gnu.org/prep/standards/html_node/Style-of-Change-Logs.html#Style-of-Change-Logs


 Changes are made to

  - Follow GNU Changelog style commit message.
  - Update the 2nd patch so that no color codes will be printed when
 the stdout is not a tty-like device.


I noticed that wget v1.16.2 has been released.  Is it okay that this
series can be reviewed again and applied?  FYI, attachments can be
found at link [1]

 [1] Re: [Bug-wget] Issue with --content-on-error and --convert-links,
https://lists.gnu.org/archive/html/bug-wget/2015-01/msg00073.html

Regards.

   yousong



Re: [Bug-wget] Issue with --content-on-error and --convert-links

2015-01-30 Thread Yousong Zhou
On 31 January 2015 at 07:53, Giuseppe Scrivano gscriv...@gnu.org wrote:
 Yousong Zhou yszhou4t...@gmail.com writes:

 On 29 January 2015 at 21:26, Tim Ruehsen tim.rueh...@gmx.de wrote:
 Hi Yousong,

  this patch seems to be incomplete. Do you have a complete patch (e.g. 
  +new
  option, + docs) or are you going to work on it ?

 That patch was only intended as a ephemeral one to see if it can solve
 the issue reported by Joe at the time.  But checking it again, I now
 think the patch actually does the right thing.  The reason is that
 since those --content-on-error pages are downloaded, then links within
 those pages should be converted as specified by --convert-links.
 There is no need for a new option for this and the current doc is just
 fine.  But I will try adding an test cases for this.

 Ah sorry, my fault / misunderstanding.
 Since the patch changes Wget behaviour I would apply it after the next 
 bugfix
 release.
 A test case would be perfect. Please consider creating a python test case 
 (see
 directory testenv). We will move all test cases from perl to python by the
 time.


 Well, there they are, with a few fixes for other issues I encountered
 when preparing for this.

 patches look fine to me, could you please ensure to write the commit
 message using the ChangeLog format?

 When it is just one line log, you can just use the format:

 * blah/file (function): Describe what changed.

 Otherwise use the format:

 one short line to describe the change

 * blah/file1 (foo): Describe what changed here.
 * blah/file2 (bar): And here.

 More about the ChangeLog style here:

 https://www.gnu.org/prep/standards/html_node/Style-of-Change-Logs.html#Style-of-Change-Logs


Changes are made to

 - Follow GNU Changelog style commit message.
 - Update the 2nd patch so that no color codes will be printed when
the stdout is not a tty-like device.


yousong

 Thanks,
 Giuseppe


0001-testenv-typo-and-style-fix.patch
Description: Binary data


0002-testenv-improve-color-output-a-bit.patch
Description: Binary data


0003-testenv-fix-http_server.py-with-Response-and-Authent.patch
Description: Binary data


0004-testenv-add-test-case-Test-convert-links-content-on-.patch
Description: Binary data


0005-Fix-content-on-error-option-handling.patch
Description: Binary data


Re: [Bug-wget] Issue with --content-on-error and --convert-links

2015-01-30 Thread Yousong Zhou
On 29 January 2015 at 21:26, Tim Ruehsen tim.rueh...@gmx.de wrote:
 Hi Yousong,

  this patch seems to be incomplete. Do you have a complete patch (e.g. +new
  option, + docs) or are you going to work on it ?

 That patch was only intended as a ephemeral one to see if it can solve
 the issue reported by Joe at the time.  But checking it again, I now
 think the patch actually does the right thing.  The reason is that
 since those --content-on-error pages are downloaded, then links within
 those pages should be converted as specified by --convert-links.
 There is no need for a new option for this and the current doc is just
 fine.  But I will try adding an test cases for this.

 Ah sorry, my fault / misunderstanding.
 Since the patch changes Wget behaviour I would apply it after the next bugfix
 release.
 A test case would be perfect. Please consider creating a python test case (see
 directory testenv). We will move all test cases from perl to python by the
 time.


Well, there they are, with a few fixes for other issues I encountered
when preparing for this.

 Tim


0001-testenv-typo-and-style-fix.patch
Description: Binary data


0002-testenv-allow-color-printer-for-Darwin-platform.patch
Description: Binary data


0003-testenv-fix-http_server.py-with-Response-and-Authent.patch
Description: Binary data


0004-testenv-add-test-case-Test-convert-links-content-on-.patch
Description: Binary data


0005-Fix-content-on-error-option-handling.patch
Description: Binary data


Re: [Bug-wget] Issue with --content-on-error and --convert-links

2015-01-29 Thread Yousong Zhou
Hi Tim

On 27 January 2015 at 17:48, Tim Ruehsen tim.rueh...@gmx.de wrote:
 Hi Yousong,

 this patch seems to be incomplete. Do you have a complete patch (e.g. +new
 option, + docs) or are you going to work on it ?


That patch was only intended as a ephemeral one to see if it can solve
the issue reported by Joe at the time.  But checking it again, I now
think the patch actually does the right thing.  The reason is that
since those --content-on-error pages are downloaded, then links within
those pages should be converted as specified by --convert-links.
There is no need for a new option for this and the current doc is just
fine.  But I will try adding an test cases for this.

Regards

yousong


 Tim

 On Thursday 16 October 2014 15:24:48 Yousong Zhou wrote:
 On 13 October 2014 10:25, Joe Hoyle joeho...@gmail.com wrote:
  Hi All,
 
 
  I’m having issues using --convert-links” in conjunction with
  --content-on-error”. Though --content-on-error” is forcing wget to
  download the pages, the links to that “errored” page is not update in
  other pages that link to it.
 
 
  This seems to be hinted at in the man page:
 
 
  Because of this, local browsing works reliably: if a linked file was
  downloaded, the link will refer to its local name; if it was not
  downloaded, the link will refer to its full Internet address rather than
  presenting a broken link. The fact that the former links are converted to
  relative links ensures that you can move the downloaded hierarchy to
  another directory.”
 
 
  However, it would seem in the case of using —content-on-error it should
  ignore this rule and do all the link substation anyhow.
 
 
  If anyone knows if this *should* work then I’d be eager to hear it, or any
  other way I can get any 404 pages downloaded and also linked to in the
  wget mirror.
 Currently, wget thought pages with 404 status code were not RETROKF
 (retrieval was OK) though the 404 page itself was actually downloaded
 successfully with `--content-on-error` option enabled.  This behaviour
 is mostly acceptable I guess.  But you can try the attached the patch
 for the moment.  The other option would be serving the 404 page by
 manually setting it up with your web server.

 Regards.

yousong



Re: [Bug-wget] Need wget feature defending against evil ISP's HPPT 302 HIJACK

2014-12-25 Thread Yousong Zhou
On 24 December 2014 at 16:48, Dawei Tong sec...@yahoo.com wrote:
 Hell wget developers:I live in China and has an China TieTong 
 Telecommunications DSL connetion .This ISP 's servers continous sending http 
 302 redirect with junk/AD link that corrupt my downloading files. I found 
 this by analyzing the corrupted files, i compared  2 corrupted files from the 
 same source and found they have inserted junk data to normal files.The 
 testing file is a world of tanks game installer, i downloaded twice, both are 
 corrupted.

Not much wget can do in this situation.  Redirected or not, as long as
it is valid HTTP response, it will fetch it.  Wget won't do the
sensing that the payload is AD junk then tell ISP stop hampering the
stream.

 Here is my test result:cmp -b -l b1_WoT.0.9.4_cn_setup.944980-2.bin 
 b2_WoT.0.9.4_cn_setup.944980-2.bin
  456582373 261 M-1  110 H
  456582374  44 $124 T

It's binary data.  I suspect your ISP has inserted any HTML elements
in it.  Was it possible that this diff was caused by the website
serving different content for your requests?

Regards


   yousong



Re: [Bug-wget] Need wget feature defending against evil ISP's HPPT 302 HIJACK

2014-12-25 Thread Yousong Zhou
On 26 December 2014 at 11:40, Yousong Zhou yszhou4t...@gmail.com wrote:
 Here is my test result:cmp -b -l b1_WoT.0.9.4_cn_setup.944980-2.bin 
 b2_WoT.0.9.4_cn_setup.944980-2.bin
  456582373 261 M-1  110 H
  456582374  44 $124 T

 It's binary data.  I suspect your ISP has inserted any HTML elements
 in it.  Was it possible that this diff was caused by the website
 serving different content for your requests?

Looks like the ISP was trying to do some cache work for you and HTTP
header has slid into the downloaded file. I guess maybe you have to
call your ISP for support...

HTTP/1.1 302 Found
Location:
http://122.72.5.170:9090/data3/1/5/8e/8/ab687c973b490c7e8f4cd285e2188e51/static.flv.uuzuonline.com/20141125141149_57133flv

HTTP/1.1 302 Found
Location:
http://122.72.5.162:909/data5/3/f/85/4/aac8eb450df593348c5adc6da4b485f3/xmp.down.sandai.net/XMPCore_4.9.16.2258.cab



Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings

2014-10-21 Thread Yousong Zhou
On 21 October 2014 16:17, Pär Karlsson feino...@gmail.com wrote:
 Yes, you are right, of course. Looking through the original implementation
 again, it seems water tight. clang probably complains about the
 uninitialized values above argcount in saved_lengths[], that are never
 reached.

 The precalculated strlen:s saved is likely only an optimization(?) attempt,
 I suppose.

Yes. Grepping through the code shows that currently there is no
invocation of concat_strings() having more than 5 arguments.


 Still, it seems wasteful to set up two complete loops with va_arg, and
 considering what this function actually does, I wonder if not s(n)printf
 should be used instead of this function? :-)

I think concat_strings() is more tight and readable than multiple
strlen() + malloc() + snprintf().

Regards.

   yousong



Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings

2014-10-20 Thread Yousong Zhou
Hi, Pär.  I got a few comments inline.

On 21 October 2014 05:47, Pär Karlsson feino...@gmail.com wrote:
 Whoops, I realised I failed on the GNU coding standards, please disregard
 the last one; the patch below should be better.

 My apologies :-/

 /Pär

 diff --git a/src/ChangeLog b/src/ChangeLog
 index d5aeca0..87abd85 100644
 --- a/src/ChangeLog
 +++ b/src/ChangeLog
 @@ -1,3 +1,8 @@
 +2014-10-20 Pär Karlsson  feino...@gmail.com
 +
 +   * utils.c (concat_strings): got rid of double loop, cleaned up
 potential
 +   memory corruption if concat_strings was called with more than five
 args
 +
  2014-10-16  Tim Ruehsen  tim.rueh...@gmx.de

 * url.c (url_parse): little code cleanup
 diff --git a/src/utils.c b/src/utils.c
 index 78c282e..5f359e0 100644
 --- a/src/utils.c
 +++ b/src/utils.c
 @@ -356,42 +356,36 @@ char *
  concat_strings (const char *str0, ...)
  {
va_list args;
 -  int saved_lengths[5]; /* inspired by Apache's apr_pstrcat */
char *ret, *p;

const char *next_str;
 -  int total_length = 0;
 -  size_t argcount;
 +  size_t len;
 +  size_t total_length = 0;
 +  size_t charsize = sizeof (char);

I am not sure here.  Do we always assume sizeof(char) to be 1 for
platforms supported by wget?

 +  size_t chunksize = 64;
 +  size_t bufsize = 64;
 +
 +  p = ret = xmalloc (charsize * bufsize);

/* Calculate the length of and allocate the resulting string. */

 -  argcount = 0;
va_start (args, str0);
for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
  {
 -  int len = strlen (next_str);
 -  if (argcount  countof (saved_lengths))
 -saved_lengths[argcount++] = len;
 +  len = strlen (next_str);
 +  if (len == 0)
 +continue;
total_length += len;
 -}
 -  va_end (args);
 -  p = ret = xmalloc (total_length + 1);
 -
 -  /* Copy the strings into the allocated space. */
 -
 -  argcount = 0;
 -  va_start (args, str0);
 -  for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
 -{
 -  int len;
 -  if (argcount  countof (saved_lengths))
 -len = saved_lengths[argcount++];
 -  else
 -len = strlen (next_str);
 +  if (total_length  bufsize)
 +  {
 +bufsize += chunksize;

Should be `bufsize = total_length` ?

 +ret = xrealloc (ret, charsize * bufsize);
 +  }
memcpy (p, next_str, len);

Xrealloc may return a new block different from p, so memcpy(p, ...)
may not be what you want.

p += len;
  }
va_end (args);
 +  ret = xrealloc (ret, charsize * total_length + 1);
*p = '\0';

Malloc takes time.  How about counting total_length in one loop and
doing the copy in another?

Regards.

yousong


return ret;




Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings

2014-10-20 Thread Yousong Zhou
On 21 October 2014 10:02, Yousong Zhou yszhou4t...@gmail.com wrote:
 Hi, Pär.  I got a few comments inline.

 On 21 October 2014 05:47, Pär Karlsson feino...@gmail.com wrote:
 Whoops, I realised I failed on the GNU coding standards, please disregard
 the last one; the patch below should be better.

 My apologies :-/

 /Pär

 diff --git a/src/ChangeLog b/src/ChangeLog
 index d5aeca0..87abd85 100644
 --- a/src/ChangeLog
 +++ b/src/ChangeLog
 @@ -1,3 +1,8 @@
 +2014-10-20 Pär Karlsson  feino...@gmail.com
 +
 +   * utils.c (concat_strings): got rid of double loop, cleaned up
 potential
 +   memory corruption if concat_strings was called with more than five
 args
 +
  2014-10-16  Tim Ruehsen  tim.rueh...@gmx.de

 * url.c (url_parse): little code cleanup
 diff --git a/src/utils.c b/src/utils.c
 index 78c282e..5f359e0 100644
 --- a/src/utils.c
 +++ b/src/utils.c
 @@ -356,42 +356,36 @@ char *
  concat_strings (const char *str0, ...)
  {
va_list args;
 -  int saved_lengths[5]; /* inspired by Apache's apr_pstrcat */
char *ret, *p;

const char *next_str;
 -  int total_length = 0;
 -  size_t argcount;
 +  size_t len;
 +  size_t total_length = 0;
 +  size_t charsize = sizeof (char);

 I am not sure here.  Do we always assume sizeof(char) to be 1 for
 platforms supported by wget?

 +  size_t chunksize = 64;
 +  size_t bufsize = 64;
 +
 +  p = ret = xmalloc (charsize * bufsize);

/* Calculate the length of and allocate the resulting string. */

 -  argcount = 0;
va_start (args, str0);
for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
  {
 -  int len = strlen (next_str);
 -  if (argcount  countof (saved_lengths))
 -saved_lengths[argcount++] = len;
 +  len = strlen (next_str);
 +  if (len == 0)
 +continue;
total_length += len;
 -}
 -  va_end (args);
 -  p = ret = xmalloc (total_length + 1);
 -
 -  /* Copy the strings into the allocated space. */
 -
 -  argcount = 0;
 -  va_start (args, str0);
 -  for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
 -{
 -  int len;
 -  if (argcount  countof (saved_lengths))
 -len = saved_lengths[argcount++];
 -  else
 -len = strlen (next_str);
 +  if (total_length  bufsize)
 +  {
 +bufsize += chunksize;

 Should be `bufsize = total_length` ?

 +ret = xrealloc (ret, charsize * bufsize);
 +  }
memcpy (p, next_str, len);

 Xrealloc may return a new block different from p, so memcpy(p, ...)
 may not be what you want.

p += len;
  }
va_end (args);
 +  ret = xrealloc (ret, charsize * total_length + 1);
*p = '\0';

 Malloc takes time.  How about counting total_length in one loop and
 doing the copy in another?

I mean, we can skip the strlen part and just do strcpy in the second
loop as we already know we have enough space in the dest buffer for
all those null-terminated arguments.

 yousong



Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings

2014-10-20 Thread Yousong Zhou
Hi, Pär

On 17 October 2014 03:50, Pär Karlsson feino...@gmail.com wrote:
 Hi, I fould a potential gotcha when playing with clang's code analysis tool.

 The concat_strings function silently stopped counting string lengths when
 given more than 5 arguments. clang warned about potential garbage values in
 the saved_lengths array, so I redid it with this approach.

After taking a closer look, I guess the old implementation is fine.
saved_length[] is used as a buffer for lengths of the first 5
arguments and there is a bound check with its length.  Maybe it's a
false-positive from clang tool?

Sorry for the noise...

Regards.

yousong


 All tests working ok with this patch.

 This is my first patch to this list, by the way. I'd be happy to help out
 more in the future.

 Best regards,

 /Pär Karlsson, Sweden

 

 commit 2d855670e0e1fbe578506b376cdd40b0e465d3ef
 Author: Pär Karlsson feino...@gmail.com
 Date:   Thu Oct 16 21:41:36 2014 +0200

 Updated ChangeLog

 diff --git a/src/ChangeLog b/src/ChangeLog
 index 1c4e2d5..1e39475 100644
 --- a/src/ChangeLog
 +++ b/src/ChangeLog
 @@ -1,3 +1,8 @@
 +2014-10-16  Pär Karlsson  feino...@gmail.com
 +
 +   * utils.c (concat_strings): fixed arbitrary limit of 5 arguments to
 +   function
 +
  2014-05-03  Tim Ruehsen  tim.rueh...@gmx.de

 * retr.c (retrieve_url): fixed memory leak

 commit 1fa9ff274dcb6e5a2dbbbc7d3fe2f139059c47f1
 Author: Pär Karlsson feino...@gmail.com
 Date:   Wed Oct 15 00:00:31 2014 +0200

 Generalized concat_strings argument length

   The concat_strings function seemed arbitrary to only accept a maximum
   of 5 arguments (the others were silently ignored).

   Also it had a potential garbage read of the values in the array.
   Updated with xmalloc/xrealloc/free

 diff --git a/src/utils.c b/src/utils.c
 index 78c282e..93c9ddc 100644
 --- a/src/utils.c
 +++ b/src/utils.c
 @@ -356,7 +356,8 @@ char *
  concat_strings (const char *str0, ...)
  {
va_list args;
 -  int saved_lengths[5]; /* inspired by Apache's apr_pstrcat */
 +  size_t psize = sizeof(int);
 +  int *saved_lengths = xmalloc (psize);
char *ret, *p;

const char *next_str;
 @@ -370,8 +371,8 @@ concat_strings (const char *str0, ...)
for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
  {
int len = strlen (next_str);
 -  if (argcount  countof (saved_lengths))
 -saved_lengths[argcount++] = len;
 +  saved_lengths[argcount++] = len;
 +  xrealloc(saved_lengths, psize * argcount);
total_length += len;
  }
va_end (args);
 @@ -393,7 +394,7 @@ concat_strings (const char *str0, ...)
  }
va_end (args);
*p = '\0';
 -
 +  free(saved_lengths);
return ret;
  }
  ^L



Re: [Bug-wget] wget

2014-10-18 Thread Yousong Zhou
Hi, Bryan

Am 18.10.2014 21:43 schrieb Bryan Baas bb...@weycogroup.com:

 Hi,

 I was wondering about the command output of wget.  I used a Java Runtime
 exec and, although the wget process ended with a 0 completion code, the
 results appeared in the error stream and not the output stream.

 As a further test, I executed the same command at the command line and
 redirected output to a file using the  operator.  Upon completion the
 file was empty, but the results scrolled down the screen.  This had me
 thinking that the wget command itself is directing its regular output to
 sderr instead of stdout.

Yes, that is the expected.  It is possible to set the output file to stdout
with -O - in which case you do not want to see output of wget itself and
the file content mangled together.


 The results of the wget command, from what I could tell, weren't error
 conditions but regular output from a successful execution.


I think it is a convention that debug, informational, error, verbose output
of unix programs be written to stderr.  However, the choice of redirecting
stderr to whatever file descriptor users prefer is always available.

regards.

yousong

 Your feedback would be appreciated.

 regards,


 --
 Bryan Baas
 Weyco IT
 x1808
 414 241 0499 (cell)



Re: [Bug-wget] Issue with --content-on-error and --convert-links

2014-10-16 Thread Yousong Zhou
On 13 October 2014 10:25, Joe Hoyle joeho...@gmail.com wrote:
 Hi All,


 I’m having issues using --convert-links” in conjunction with 
 --content-on-error”. Though --content-on-error” is forcing wget to download 
 the pages, the links to that “errored” page is not update in other pages that 
 link to it.


 This seems to be hinted at in the man page:


 Because of this, local browsing works reliably: if a linked file was 
 downloaded, the link will refer to its local name; if it was not downloaded, 
 the link will refer to its full Internet address rather than presenting a 
 broken link. The fact that the former links are converted to relative links 
 ensures that you can move the downloaded hierarchy to another directory.”


 However, it would seem in the case of using —content-on-error it should 
 ignore this rule and do all the link substation anyhow.


 If anyone knows if this *should* work then I’d be eager to hear it, or any 
 other way I can get any 404 pages downloaded and also linked to in the wget 
 mirror.


Currently, wget thought pages with 404 status code were not RETROKF
(retrieval was OK) though the 404 page itself was actually downloaded
successfully with `--content-on-error` option enabled.  This behaviour
is mostly acceptable I guess.  But you can try the attached the patch
for the moment.  The other option would be serving the 404 page by
manually setting it up with your web server.

Regards.

   yousong


0001-Let-convert-links-work-with-content-on-error.patch
Description: Binary data


Re: [Bug-wget] wget-bug

2014-09-13 Thread Yousong Zhou
On Sep 13, 2014 9:39 PM, Nyilas MISY dr.dab...@gmail.com wrote:

 hello :-)

 shortly (this is just an example!!) ::

 [user@host ~]$ wget -r -c -P ~/Downloads/

http://multicommander.com/files/updates/MultiCommander_win32_(4.5.1.1769).exe

 when the filename(s) contains ( ), then the wget doesn't downloads
 it/them how can fix this bug??


did the shell complained that it couldn't find the command 4.5.1.1769?

how about try surrounding the URL with quotes.  in this case, double or
single quotes should both work.

yousong

 I'm on Fedora 20, 32bit, MATE desktop environment..

 have a nice day and week :-)

 Nyilas MISY



Re: [Bug-wget] wget-bug

2014-09-13 Thread Yousong Zhou
On Sep 13, 2014 10:15 PM, Nyilas MISY dr.dab...@gmail.com wrote:

 [user@host ~]$ wget -r -c -P /home/user/Downloads/

http://multicommander.com/files/updates/MultiCommander_win32_(4.5.1.1769).exe
 bash: syntax error ( near unexpected token

bash emitted the error message, not wget.  quote the URL part and it should
work.

 [user@host ~]$

 2014-09-13 16:04 GMT+02:00 Yousong Zhou yszhou4t...@gmail.com:
 
  On Sep 13, 2014 9:39 PM, Nyilas MISY dr.dab...@gmail.com wrote:
 
  hello :-)
 
  shortly (this is just an example!!) ::
 
  [user@host ~]$ wget -r -c -P ~/Downloads/
 
 
http://multicommander.com/files/updates/MultiCommander_win32_(4.5.1.1769).exe
 
  when the filename(s) contains ( ), then the wget doesn't downloads
  it/them how can fix this bug??
 
 
  did the shell complained that it couldn't find the command 4.5.1.1769?
 
  how about try surrounding the URL with quotes.  in this case, double or
  single quotes should both work.
 
  yousong
 
  I'm on Fedora 20, 32bit, MATE desktop environment..
 
  have a nice day and week :-)
 
  Nyilas MISY
 


Re: [Bug-wget] Wget export URL list

2014-09-03 Thread Yousong Zhou
On 3 September 2014 22:26, Adrian - adrianTNT.com
adrian...@adriantnt.com wrote:
 Hello.
 Can anyone tell me how to do this with wget ?
 I want it to spider a given website and return the list of full urls in
 that website.
 Any ideas?

This can be done by

 - grepping through stderr output of wget
 - patching wget for your specific need.  should be easy.

   yousong



Re: [Bug-wget] grab complete download link

2014-07-20 Thread Yousong Zhou
Hi,

On 21 July 2014 09:38, bas smit baspys...@gmail.com wrote:
 Dear Darshit Shah
 Thanks for your response.

 I tried with the following command:
 subprocess.call([wget,'--user',user,'--password',passw,'-P',download_dir,'--page-requisites',url,'-o',logfile,\
 '--no-check-certificate'])


The URL you provided needs login to access.  But I guess recursive
download is what you want.  Try options `--recursive --level=1` , or
`-r -l 1` for the short equivalent.

 However, still unsuccessful to download the required file.

 I also obtained the following in the log file:

 WARNING: Certificate verification error: unable to get local issuer
 certificate


 I hope you can help me.

 Bas


 WARNING: Certificate verification error: unable to get local issuer
 certificate


 On Thu, Jul 17, 2014 at 9:34 PM, Darshit Shah dar...@gmail.com wrote:

 You want to use the --page-requisites option

 On Thu, Jul 17, 2014 at 2:22 PM, bas smit baspys...@gmail.com wrote:
  I am looking for command line option to use the same functionality as the
  Download All with Free Download Manager does. It grabs the complete
  download links though only partial links are shown in the source html.  I
  tried the following code, but but could not figure out which particular
  parameter is necessary for that. The url provided below is the only known
  one.
 
  import subprocess
 
  user, passw = 'user', 'passw'
 
  url = '
 http://earthexplorer.usgs.gov/download/3120/LM10300301974324GDS05/STANDARD/BulkDownload
 '
 
  wget = C:\\Users\\bas\\Downloads\\wget-1.10.2.exe
  subprocess.call([wget, '--user', user, '--password', passw, url])



 --
 Thanking You,
 Darshit Shah




Re: [Bug-wget] [PATCH] wget hangs on HTTP 204

2014-04-22 Thread Yousong Zhou
On 22 April 2014 21:02, Tim Ruehsen tim.rueh...@gmx.de wrote:
 Attached is a patch including a new test case.

 Guiseppe, I made it for a clone of Darshit's clone of Wget. Not sure if it
 fits into master.

Hi, Giuseppe.  Just noticed that my previous test cases for
--start-pos were not recorded in the tests/Makefile.am file.  Can you
kindly pick them up there?


   yousong



Re: [Bug-wget] Fwd: [GSoC] Extend concurrency support in Wget

2014-04-05 Thread Yousong Zhou
On 5 April 2014 17:28, Jure Grabnar grabna...@gmail.com wrote:
 Hi,


 
  That's true. type is currently only used to filter out types which
  Wget
  doesn't support.
  Do you think parsing it (type) is irrelevant?

 IMHO, if it will not be used in the near future, then better document
 or remove it.


 I tried removing elect_resources() (essentially removing type attribute)
 and it mostly works.
 It fails when bittorrent url resource has top priority. In this case it
 HTTP downloads what looks to me like a tracker info.
 Since checksum differs from original file (extracted from metalink file)
 download fails.

 I also merged two metalink files (header from the first file and resources
 from the second file) and Wget crashes. I found out there are some issues
 with temporary files.

 I do believe checking types is more fail-safe since these issues do not
 occur there. At least bittorrent resources have to be eliminated
 beforehand or make Wget somehow aware of them.

Yes, I agree that the parsing is needed for filtering out schemes like
ed2k and bittorrent.

yousong



Re: [Bug-wget] Fwd: [GSoC] Extend concurrency support in Wget

2014-04-01 Thread Yousong Zhou
On 1 April 2014 15:48, Jure Grabnar grabna...@gmail.com wrote:
 Hi,

 I debugged code before writing the 1st patch and found out that if type
 attribute is not present in v3.0, libmetalink completly ignores it (URL is
 not present in resources!).
 If you write type attribute in v4.0, libmetalink ignores it (only type,
 URL is still present in resources!). So you have to find out protocol type
 from URL in v4.0.

But the type attribute is currently not used by wget.  I cannot find
any reference to it outside metalink.c.  Anyway, IIUC, types like
torrent, ed2k, etc. are not in the realm of wget.

yousong

 This was the main purpose of the 1st patch.


 On 1 April 2014 03:20, Yousong Zhou yszhou4t...@gmail.com wrote:

 Hi, Jure.

 On 1 April 2014 03:46, Jure Grabnar grabna...@gmail.com wrote:
  Hello,
 
  thanks for your feedback! I corrected the first patch.

 Then the 1st one is fine with me.

 I am not fluent with Metalink and libmetalink on how it handles the
 type attribute.  In version 4.0 of the standard, there is no type
 attribute for metalink:url element, only metalink:metaurl has it.
 Though not explicitly using a word like must in version 3.0 of the
 spec, looks like type attribute is a required one there (See 4.1.2.4
 of the 3.0 spec).


 I thought so too, but if you take a look at 4.1.2.5 section of the v3.0
 spec, the last example shows that type attribute can be omitted.


 If that is the case, then the metalink file is not a
 standard-compliant one if that attribute is missing.  Maybe later
 people want a way to ignore those non-compliant metalink:url element.
 But let that be another story when the need actually came up.  :)


 Then libmetalink should be tweaked a bit, to allow non-compliant url
 elements, because currently it just ignores them (v3.0).
 Although, to be honest, they could just switch to v4.0, where type is
 optional and properly parsed by libmetalink. :)

 Regards,

 Jure Grabnar





Re: [Bug-wget] Fwd: [GSoC] Extend concurrency support in Wget

2014-04-01 Thread Yousong Zhou
Hi,

On 1 April 2014 23:02, Jure Grabnar grabna...@gmail.com wrote:

 On 1 April 2014 10:39, Yousong Zhou yszhou4t...@gmail.com wrote:

 On 1 April 2014 15:48, Jure Grabnar grabna...@gmail.com wrote:
  Hi,
 
  I debugged code before writing the 1st patch and found out that if
  type
  attribute is not present in v3.0, libmetalink completly ignores it (URL
  is
  not present in resources!).
  If you write type attribute in v4.0, libmetalink ignores it (only
  type,
  URL is still present in resources!). So you have to find out protocol
  type
  from URL in v4.0.

 But the type attribute is currently not used by wget.  I cannot find
 any reference to it outside metalink.c.  Anyway, IIUC, types like
 torrent, ed2k, etc. are not in the realm of wget.


I just checked 4.1.2.5 of metalink 3.0 spec.  It says when the type
attribute is missing users can derive if it is for BitTorrent by
examining the suffix of the URL.  That's bad.  URL is only for
Universal Resource Locator, it doesn't end with a specific name to
indicate its type.  I may say that libmetalink does the right thing by
ignoring those metalink:url element.


 That's true. type is currently only used to filter out types which Wget
 doesn't support.
 Do you think parsing it (type) is irrelevant?

IMHO, if it will not be used in the near future, then better document
or remove it.


 Regards,

 Jure Grabnar




Re: [Bug-wget] Fwd: [GSoC] Extend concurrency support in Wget

2014-03-31 Thread Yousong Zhou
Hi, Jure.

On 1 April 2014 03:46, Jure Grabnar grabna...@gmail.com wrote:
 Hello,

 thanks for your feedback! I corrected the first patch.

Then the 1st one is fine with me.

I am not fluent with Metalink and libmetalink on how it handles the
type attribute.  In version 4.0 of the standard, there is no type
attribute for metalink:url element, only metalink:metaurl has it.
Though not explicitly using a word like must in version 3.0 of the
spec, looks like type attribute is a required one there (See 4.1.2.4
of the 3.0 spec). If that is the case, then the metalink file is not a
standard-compliant one if that attribute is missing.  Maybe later
people want a way to ignore those non-compliant metalink:url element.
But let that be another story when the need actually came up.  :)


   yousong



Re: [Bug-wget] Fwd: [GSoC] Extend concurrency support in Wget

2014-03-30 Thread Yousong Zhou
Hi,

On 28 March 2014 20:33, Jure Grabnar grabna...@gmail.com wrote:
 Hi,

 Thank you Yousong. I've listened to your advice and changed type of
 resource-type to
 enum url_scheme. Now it looks much cleaner.

Using enum is a step forward.

 @@ -134,7 +135,20 @@ parse_metalink(char *input_file)
++(file-num_of_res);

resource-url = xstrdup ((*resources)-url);
 -  resource-type = ((*resources)-type ? xstrdup 
 ((*resources)-type) : NULL);
 +
 +  if ((*resources)-type)
 +{
 +  /* Append :// to resource type so url_scheme() recognizes 
 type */
 +  char *temp_url = malloc ( strlen ( (*resources)-type) + 4);
 +  sprintf (temp_url, %s://, (*resources)-type);
 +
 +  resource-type = url_scheme (temp_url);
 +
 +  free (temp_url);
 +}

This is a little hacky.  Adding a utility function like
url_scheme_str_to_enum() will be better.

 +  else
 +resource-type = url_scheme (resource-url);
 +
resource-location = ((*resources)-location ? xstrdup 
 ((*resources)-location) : NULL);
resource-preference = (*resources)-preference;
resource-maxconnections = (*resources)-maxconnections;
 @@ -143,7 +157,7 @@ parse_metalink(char *input_file)
(file-resources) = resource;
  }

 -  for (checksums = (*files)-checksums; *checksums; ++checksums)
 +  for (checksums = (*files)-checksums; checksums  *checksums; 
 ++checksums)

Good catch.  Should do the same NULL check for (*files)-resources.

  {
mlink_checksum *checksum = malloc (sizeof(mlink_checksum));



...

 @@ -215,19 +229,25 @@ elect_resources (mlink *mlink)

while (res_next = res-next)
  {
 -  if (strcmp(res_next-type, ftp)  strcmp(res_next-type, 
 http))
 +  if (schemes_are_similar_p (res_next-type, SCHEME_INVALID))
  {
res-next = res_next-next;
free(res_next);
 +
 +  --(file-num_of_res);
  }
else
  res = res_next;
  }
res = file-resources;
 -  if (strcmp(res-type, ftp)  strcmp(res-type, http))
 +  if (schemes_are_similar_p (res-type, SCHEME_INVALID))
  {
file-resources = res-next;

If I am right, this will set it to NULL if file-num_of_res is 1.

 -  free(res);
 +  free (res);
 +
 +  --(file-num_of_res);
 +  if (!file-num_of_res)
 +file-resources = NULL;

So explicitly setting it to NULL is not needed.

  }
  }
  }



 I also added check for whenever there's no resources available to download a
 file.

 Second patch remains unchanged.

 Regards,


 Jure Grabnar



Re: [Bug-wget] Fwd: [GSoC] Extend concurrency support in Wget

2014-03-22 Thread Yousong Zhou
Hi, Jure.

On 22 March 2014 18:02, Jure Grabnar grabna...@gmail.com wrote:
 Hi,

 thank you for your feedback, Darshit, Yousong!

 I reverted magic number back to its original state ('tmp2'), because it
 should
 be there (I overlooked that 'tmp' variable is changed in the very next
 statement).

 Duplicated line is removed.

 I also changed resource-type to point at dynamic memory.

+  if (type)
+{
+  resource-type = malloc (strlen (type));
+  sprintf(resource-type, type);
+}

xstrdup() is better because that is how existing code does it.  And
you may want to know that using a variable as the format string is not
a good practice for secure code.

yousong


 They say third's time's the charm. :) I hope it's ok now.

 Regards,


 Jure Grabnar





Re: [Bug-wget] Fwd: [GSoC] Extend concurrency support in Wget

2014-03-21 Thread Yousong Zhou
Hi, Jure.

On 21 March 2014 03:23, Jure Grabnar grabna...@gmail.com wrote:
 Thank you for you feedback Darshit. I changed my proposal according to your
 advices. Hopefully a new version is better.

 I'm also sending corrected patches, again thanks to your review, Darshit.
 First patch allows Metalink to have optional argument type in url
 field. Where type is not present, it extracts protocol type from URL string.


On the 1st patch, static char * value should not be assigned to
resource-type that will later be free()'ed.


   yousong



Re: [Bug-wget] [PATCH v6 0/5] Make wget capable of starting downloads from a specified position.

2014-03-21 Thread Yousong Zhou
On 21 March 2014 19:34, Giuseppe Scrivano gscriv...@gnu.org wrote:
 I've done some more tests and now pushed!

Finally.  Thank you, Tim, Darshit, Giuseppe, for your time and
attention on this.


   yousong



[Bug-wget] [PATCH v6 0/5] Make wget capable of starting downloads from a specified position.

2014-03-19 Thread Yousong Zhou
This series tries to add an option `--start-pos' for specifying starting
position of a HTTP or FTP download.  Also inclued are 3 fixes for the test
infrastructure and 3 test cases for the new option.

With the new option, a user-specified zero-based offset value can be specified,
instead of deriving it from existing file which is what --continue currently
does.  When both this option and --continue are both specified which does not
make much sense, wget will warn and proceed as if --continue was not there.

Signed-off-by: Yousong Zhou yszhou4t...@gmail.com
---
v5 - v6

- Fix a typo in version 5 of the patch for fixing TYPE and RETR
  commands handling in FTP test server.
- Fix test for --https-only option by adding feature constraint on
  HTTPS support.

v4 - v5

- Reworked the description in doc with kind suggestions from Tim
  Ruehsen.
- Disable --start-pos when WARC options are used.
- When --start-pos and --continue are both specified, emit a warning,
  use --start-pos and disable --continue, then proceed.
- Add 2 fixes for the test infrastructure.
- Add 3 test cases for the new option.

v3 - v4

In doc/wget.texi and wget usage output, explicitly note that
--start-pos is zero-based.

v2 - v3

Fix a typo and add description text for the new option into the usage
output.  Thank Darshit Shah dar...@gmail.com for the suggestions.

v1 - v2

It was kindly pointed out by Darshit Shah dar...@gmail.com that
server support for resuming download is required, so adding this into
doc/wget.texi.

Yousong Zhou (5):
  Make wget capable of starting downloads from a specified position.
  Tests: fix TYPE and RETR command handling.
  Tests: exclude existing files from the check of unexpected downloads.
  Tests: Add test cases for option --start-pos.
  Tests: Add constraint on https for --https-only test.

 doc/ChangeLog  |4 ++
 doc/wget.texi  |   16 ++
 src/ChangeLog  |7 
 src/ftp.c  |2 +
 src/http.c |2 +
 src/init.c |4 ++
 src/main.c |   18 +--
 src/options.h  |1 +
 tests/ChangeLog|   21 +
 tests/FTPServer.pm |   12 ---
 tests/Test--httpsonly-r.px |2 +
 tests/Test--start-pos--continue.px |   57 
 tests/Test--start-pos.px   |   46 +
 tests/Test-ftp--start-pos.px   |   42 ++
 tests/WgetTest.pm.in   |5 ++-
 tests/run-px   |3 ++
 16 files changed, 233 insertions(+), 9 deletions(-)
 create mode 100755 tests/Test--start-pos--continue.px
 create mode 100755 tests/Test--start-pos.px
 create mode 100755 tests/Test-ftp--start-pos.px

-- 
1.7.2.5




[Bug-wget] [PATCH v6 4/5] Tests: Add test cases for option --start-pos.

2014-03-19 Thread Yousong Zhou

Signed-off-by: Yousong Zhou yszhou4t...@gmail.com
---
 tests/ChangeLog|7 
 tests/Test--start-pos--continue.px |   57 
 tests/Test--start-pos.px   |   46 +
 tests/Test-ftp--start-pos.px   |   42 ++
 tests/run-px   |3 ++
 5 files changed, 155 insertions(+), 0 deletions(-)
 create mode 100755 tests/Test--start-pos--continue.px
 create mode 100755 tests/Test--start-pos.px
 create mode 100755 tests/Test-ftp--start-pos.px

diff --git a/tests/ChangeLog b/tests/ChangeLog
index d23e76e..f2e80e5 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,5 +1,12 @@
 2014-02-13  Yousong Zhou  yszhou4t...@gmail.com
 
+   * Test--start-pos.px: Test --start-pos for HTTP downloads.
+   * Test-ftp--start-pos.px: Test --start-pos for FTP downloads.
+   * Test--start-pos--continue.px: Test the case when --start-pos and
+ --continue were both specified.
+
+2014-02-13  Yousong Zhou  yszhou4t...@gmail.com
+
* Wget.pm.in: Exclude existing files from the check of unexpected
  downloads.
 
diff --git a/tests/Test--start-pos--continue.px 
b/tests/Test--start-pos--continue.px
new file mode 100755
index 000..09b8ced
--- /dev/null
+++ b/tests/Test--start-pos--continue.px
@@ -0,0 +1,57 @@
+#!/usr/bin/env perl
+
+use strict;
+use warnings;
+
+use HTTPTest;
+
+
+###
+
+my $existingfile = EOF;
+content should be preserved.
+EOF
+
+my $wholefile = 1234;
+
+# code, msg, headers, content
+my %urls = (
+'/somefile.txt' = {
+code = 206,
+msg = Dontcare,
+headers = {
+Content-type = text/plain,
+},
+content = $wholefile,
+},
+);
+
+my $cmdline = $WgetTest::WGETPATH .  --start-pos=1 --continue --debug 
http://localhost:{{port}}/somefile.txt;;
+
+my $expected_error_code = 0;
+
+my %existing_files = (
+'somefile.txt' = {
+content = $existingfile,
+},
+);
+
+my %expected_downloaded_files = (
+'somefile.txt.1' = {
+content = substr($wholefile, 1),
+},
+);
+
+###
+
+my $the_test = HTTPTest-new (name = Test--start-pos--continue,
+  input = \%urls,
+  cmdline = $cmdline,
+  errcode = $expected_error_code,
+  existing = \%existing_files,
+  output = \%expected_downloaded_files);
+exit $the_test-run();
+
+# vim: et ts=4 sw=4
+
+
diff --git a/tests/Test--start-pos.px b/tests/Test--start-pos.px
new file mode 100755
index 000..4962c82
--- /dev/null
+++ b/tests/Test--start-pos.px
@@ -0,0 +1,46 @@
+#!/usr/bin/env perl
+
+use strict;
+use warnings;
+
+use HTTPTest;
+
+
+###
+
+my $dummyfile = 1234;
+
+# code, msg, headers, content
+my %urls = (
+'/dummy.txt' = {
+code = 206,
+msg = Dontcare,
+headers = {
+Content-Type = text/plain,
+},
+content = $dummyfile
+},
+);
+
+my $cmdline = $WgetTest::WGETPATH .  --start-pos=1 
http://localhost:{{port}}/dummy.txt;;
+
+my $expected_error_code = 0;
+
+my %expected_downloaded_files = (
+'dummy.txt' = {
+content = substr($dummyfile, 1),
+}
+);
+
+###
+
+my $the_test = HTTPTest-new (name = Test--start-pos,
+  input = \%urls,
+  cmdline = $cmdline,
+  errcode = $expected_error_code,
+  output = \%expected_downloaded_files);
+exit $the_test-run();
+
+# vim: et ts=4 sw=4
+
+
diff --git a/tests/Test-ftp--start-pos.px b/tests/Test-ftp--start-pos.px
new file mode 100755
index 000..5062377
--- /dev/null
+++ b/tests/Test-ftp--start-pos.px
@@ -0,0 +1,42 @@
+#!/usr/bin/env perl
+
+use strict;
+use warnings;
+
+use FTPTest;
+
+
+###
+
+my $dummyfile = 1234;
+
+# code, msg, headers, content
+my %urls = (
+'/dummy.txt' = {
+content = $dummyfile
+},
+);
+
+my $cmdline = $WgetTest::WGETPATH .  --start-pos=1 
ftp://localhost:{{port}}/dummy.txt;;
+
+my $expected_error_code = 0;
+
+my %expected_downloaded_files = (
+'dummy.txt' = {
+content = substr($dummyfile, 1),
+}
+);
+
+###
+
+my $the_test = FTPTest-new (name = Test-ftp--start-pos,
+  input = \%urls,
+  cmdline = $cmdline,
+  errcode = $expected_error_code,
+  output

[Bug-wget] [PATCH v6 3/5] Tests: exclude existing files from the check of unexpected downloads.

2014-03-19 Thread Yousong Zhou

Signed-off-by: Yousong Zhou yszhou4t...@gmail.com
---
 tests/ChangeLog  |5 +
 tests/WgetTest.pm.in |5 -
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/tests/ChangeLog b/tests/ChangeLog
index a7db249..d23e76e 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,5 +1,10 @@
 2014-02-13  Yousong Zhou  yszhou4t...@gmail.com
 
+   * Wget.pm.in: Exclude existing files from the check of unexpected
+ downloads.
+
+2014-02-13  Yousong Zhou  yszhou4t...@gmail.com
+
* FTPServer.pm: Fix the handling of TYPE command and avoid endless
loop when doing binary mode RETR.
 
diff --git a/tests/WgetTest.pm.in b/tests/WgetTest.pm.in
index 58ad140..092777e 100644
--- a/tests/WgetTest.pm.in
+++ b/tests/WgetTest.pm.in
@@ -256,7 +256,10 @@ sub _verify_download {
 # make sure no unexpected files were downloaded
 chdir ($self-{_workdir}/$self-{_name}/output);
 
-__dir_walk('.', sub { push @unexpected_downloads, $_[0] unless (exists 
$self-{_output}{$_[0]}) }, sub { shift; return @_ } );
+__dir_walk('.',
+   sub { push @unexpected_downloads,
+  $_[0] unless (exists $self-{_output}{$_[0]} || 
$self-{_existing}{$_[0]}) },
+   sub { shift; return @_ } );
 if (@unexpected_downloads) {
 return Test failed: unexpected downloaded files [ . join(', ', 
@unexpected_downloads) . ]\n;
 }
-- 
1.7.2.5




[Bug-wget] [PATCH v6 2/5] Tests: fix TYPE and RETR command handling.

2014-03-19 Thread Yousong Zhou
 - FTPServer.pm's handling of TYPE command would ignore binary mode
   transfer request.
 - The FTP server would run into dead loop sending the same content
   forever.

Signed-off-by: Yousong Zhou yszhou4t...@gmail.com
---
 tests/ChangeLog|5 +
 tests/FTPServer.pm |   12 +++-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/tests/ChangeLog b/tests/ChangeLog
index 6730169..a7db249 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,3 +1,8 @@
+2014-02-13  Yousong Zhou  yszhou4t...@gmail.com
+
+   * FTPServer.pm: Fix the handling of TYPE command and avoid endless
+   loop when doing binary mode RETR.
+
 2014-01-23  Lars Wendler  polynomia...@gentoo.org (tiny change)
 
* Test--post-file.px: Do not fail when wget has no debug support.
diff --git a/tests/FTPServer.pm b/tests/FTPServer.pm
index 2ac72e3..1603caa 100644
--- a/tests/FTPServer.pm
+++ b/tests/FTPServer.pm
@@ -298,12 +298,13 @@ sub _RETR_command
 # What mode are we sending this file in?
 unless ($conn-{type} eq 'A') # Binary type.
 {
-my ($r, $buffer, $n, $w);
-
+my ($r, $buffer, $n, $w, $sent);
 
 # Copy data.
-while ($buffer = substr($content, 0, 65536))
+$sent = 0;
+while ($sent  length($content))
 {
+$buffer = substr($content, $sent, 65536);
 $r = length $buffer;
 
 # Restart alarm clock timer.
@@ -330,6 +331,7 @@ sub _RETR_command
 print {$conn-{socket}} 426 Transfer aborted. Data connection 
closed.\r\n;
 return;
 }
+$sent += $r;
 }
 
 # Cleanup and exit if there was an error.
@@ -410,9 +412,9 @@ sub _TYPE_command
 
 # See RFC 959 section 5.3.2.
 if ($type =~ /^([AI])$/i) {
-$conn-{type} = 'A';
+$conn-{type} = $1;
 } elsif ($type =~ /^([AI])\sN$/i) {
-$conn-{type} = 'A';
+$conn-{type} = $1;
 } elsif ($type =~ /^L\s8$/i) {
 $conn-{type} = 'L8';
 } else {
-- 
1.7.2.5




[Bug-wget] [PATCH v6 1/5] Make wget capable of starting downloads from a specified position.

2014-03-19 Thread Yousong Zhou
This patch adds an option `--start-pos' for specifying starting position
of a HTTP or FTP download.

Signed-off-by: Yousong Zhou yszhou4t...@gmail.com
---
 doc/ChangeLog |4 
 doc/wget.texi |   16 
 src/ChangeLog |7 +++
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|4 
 src/main.c|   18 +++---
 src/options.h |1 +
 8 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/doc/ChangeLog b/doc/ChangeLog
index 58d1439..68629c6 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,7 @@
+2014-02-10  Yousong Zhou  yszhou4t...@gmail.com
+
+   * wget.texi: Add documentation for --start-pos.
+
 2013-12-29  Giuseppe Scrivano  gscri...@redhat.com
 
* wget.texi: Update to GFDL 1.3.
diff --git a/doc/wget.texi b/doc/wget.texi
index 6a8c6a3..0b23bda 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -701,6 +701,22 @@ Another instance where you'll get a garbled file if you 
try to use
 Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
 servers that support the @code{Range} header.
 
+@cindex offset
+@cindex continue retrieval
+@cindex incomplete downloads
+@cindex resume download
+@cindex start position
+@item --start-pos=@var{OFFSET}
+Start downloading at zero-based position @var{OFFSET}.  Offset may be expressed
+in bytes, kilobytes with the `k' suffix, or megabytes with the `m' suffix, etc.
+
+@samp{--start-pos} has higher precedence over @samp{--continue}. When
+@samp{--start-pos} and @samp{--continue} are both specified, wget will emit a
+warning then proceed as if @samp{--continue} was absent.
+
+Server support for continued download is required, otherwise @samp{--start-pos}
+cannot help.  See @samp{-c} for details.
+
 @cindex progress indicator
 @cindex dot style
 @item --progress=@var{type}
diff --git a/src/ChangeLog b/src/ChangeLog
index d3ac754..9b10ee8 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,10 @@
+2014-03-19  Yousong Zhou  yszhou4t...@gmail.com
+
+   * init.c, main.c, options.h: Add option --start-pos for specifying
+   start position of a download.
+   * http.c: Utilize opt.start_pos for HTTP download.
+   * ftp.c: Utilize opt.start_pos for FTP retrieval.
+
 2014-03-04  Giuseppe Scrivano  gscri...@redhat.com
 
* http.c (modify_param_value, extract_param): Aesthetic change.
diff --git a/src/ftp.c b/src/ftp.c
index c2522ca..5282588 100644
--- a/src/ftp.c
+++ b/src/ftp.c
@@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo *f, 
ccon *con, char **local_fi
   /* Decide whether or not to restart.  */
   if (con-cmd  DO_LIST)
 restval = 0;
+  else if (opt.start_pos = 0)
+restval = opt.start_pos;
   else if (opt.always_rest
stat (locf, st) == 0
S_ISREG (st.st_mode))
diff --git a/src/http.c b/src/http.c
index cd2bd15..8bba70d 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3121,6 +3121,8 @@ Spider mode enabled. Check if remote file exists.\n));
   /* Decide whether or not to restart.  */
   if (force_full_retrieve)
 hstat.restval = hstat.len;
+  else if (opt.start_pos = 0)
+hstat.restval = opt.start_pos;
   else if (opt.always_rest
got_name
stat (hstat.local_file, st) == 0
diff --git a/src/init.c b/src/init.c
index 56fef50..9ed72b2 100644
--- a/src/init.c
+++ b/src/init.c
@@ -270,6 +270,7 @@ static const struct {
   { showalldnsentries, opt.show_all_dns_entries, cmd_boolean },
   { spanhosts,opt.spanhost,  cmd_boolean },
   { spider,   opt.spider,cmd_boolean },
+  { startpos, opt.start_pos, cmd_bytes },
   { strictcomments,   opt.strict_comments,   cmd_boolean },
   { timeout,  NULL,   cmd_spec_timeout },
   { timestamping, opt.timestamping,  cmd_boolean },
@@ -406,6 +407,9 @@ defaults (void)
   opt.warc_cdx_dedup_filename = NULL;
   opt.warc_tempdir = NULL;
   opt.warc_keep_log = true;
+
+  /* Use a negative value to mark the absence of --start-pos option */
+  opt.start_pos = -1;
 }
 
 /* Return the user's home directory (strdup-ed), or NULL if none is
diff --git a/src/main.c b/src/main.c
index 3ce7583..39fcff4 100644
--- a/src/main.c
+++ b/src/main.c
@@ -276,6 +276,7 @@ static struct cmdline_option option_data[] =
 { server-response, 'S', OPT_BOOLEAN, serverresponse, -1 },
 { span-hosts, 'H', OPT_BOOLEAN, spanhosts, -1 },
 { spider, 0, OPT_BOOLEAN, spider, -1 },
+{ start-pos, 0, OPT_VALUE, startpos, -1 },
 { strict-comments, 0, OPT_BOOLEAN, strictcomments, -1 },
 { timeout, 'T', OPT_VALUE, timeout, -1 },
 { timestamping, 'N', OPT_BOOLEAN, timestamping, -1 },
@@ -486,6 +487,8 @@ Download:\n),
 N_(\
   -c,  --continueresume getting a partially-downloaded 
file.\n),
 N_(\
+   --start-pos=OFFSETstart downloading from zero-based position 
OFFSET.\n),
+N_(\
--progress=TYPE

Re: [Bug-wget] [GSoC] Refactoring the Test Suite

2014-03-14 Thread Yousong Zhou
. Simply don't add them
  to the git commit. Use a local .gitignore file to handle it
  2. You can and should split this patch. I'm assuming it's the same stuff
  as before, and that can be split. Use your imagination
  3. The whitespace errors imply trailing whitespace. This happens when yo
  uhave extra whitespace characters at the end of a
  line. Usually not a good idea sinec these are characters that cannot be
  seen. You should eliminate them. My ViM editor
  simply highlights all trailing whitespaces so I always know if they are
  there. Also, you can configure your git to explicitly
  highlight trailing whitespaces in its diff output (Assuming you're
 using a
  git shell, not a GUI, in which case I have no idea.)
 
  Nervously, Chen
 
  Don't worry. Everyone faces problems with these items in the beginning.
  It's not something you are used to.
 
 
 
 
  2014-03-10 16:34 GMT+08:00 陈子杭 (Zihang Chen) chsc4...@gmail.com:
 
 
 
 
  2014-03-10 16:17 GMT+08:00 Darshit Shah dar...@gmail.com:
 
 
 
 
  On Mon, Mar 10, 2014 at 8:46 AM, 陈子杭 (Zihang Chen) 
 chsc4...@gmail.com
  wrote:
 
  Hi Yousong,
 
  So sorry about the line endings, I'll have to do a thorough check.
 
  I'm not sure about the line endings since my git and vim
 cinfiguration
  simply do the magic
  of conversions for me. But if Yousong says do, do look into it.
 
  However, you seem to have added a huge amount of those especially in
  your 2nd patch.
 
  I do however, very strongly suggest that you get access to some sort
 of
  a linux system. It will
  make your life so much easier. Autoconf takes ages to run on Windows
 in
  a cygwin shell.
 
 
 
  BTW, the pyc files in 0001.patch was deleted in the second commit.
 
 
  It would be better if you just did not have them there. It woulld
  clutter *everyone's* git repos
  if the .pyc files were there and later deleted. Because git will
 leave
  a snapshot of each
  commit in the history. Keep a .gitignore file handy. Those are very
  important. You'll get
  good ones for starts from github's own gitignore repository.
 
  Got it. But I wonder where to put the .gitignore file. Should I use
 the
  one in the `wget` directory or
  get a new one under `testenv`?
 
 
 
  Also, we usually expect a ChangeLog entry for *every* patch being
 sent.
  So, please keep that
  in mind too. And there's also the 80 characters per line limit we
 like
  to follow for all files.
 
  I'll keep that in mind.
 
  The chief reason was that older terminals could only display 80
  characters. Now, the reason is
  that it allows you to have two vertical windows with code
  simultaneously without any line wraps.
 
  And do follow Yousong's advice on organizing your patchset. Ask for
  help and you shall get it.
  Large, single commits are seldom looked upon favourably.
 
 
  I'll try to make my commits smaller next time. Work till now I is not
  likely to be divided into small
  commits though ;(
 
  And thanks very much for the advice!
 
 
  2014-03-10 15:38 GMT+08:00 Yousong Zhou yszhou4t...@gmail.com:
 
   Hi, Zihang,
  
   On 10 March 2014 13:05, 陈子杭 (Zihang Chen) chsc4...@gmail.com
   wrote:
Hi, Darshit.
I fixed the line ending using git config --global autocrlf
 input.
Line
endings should be lf now. I also added some documentation. File
modes for
Test-*.py are 755 now.
   
  
   I just did a quick check on the patch and the line endings are
 still
   wrong, e.g. testenv/test/http_test.py
  
   Also, .pyc files should not be included, right?
  
   I do not have much experience with parallel-wget, but you can
   enhance
   organizing your commits by following how existing ones in the
   repository were written.
  
  
  yousong
  
 
 
 
  --
  Regards,
  Chen Zihang,
  Computer School of Wuhan University
  ---
  此致
  陈子杭
  武汉大学计算机学院
 
 
 
 
  --
  Thanking You,
  Darshit Shah
 
 
 
 
  --
  Regards,
  Chen Zihang,
  Computer School of Wuhan University
  ---
  此致
  陈子杭
  武汉大学计算机学院
 
 
 
 
  --
  Regards,
  Chen Zihang,
  Computer School of Wuhan University
  ---
  此致
  陈子杭
  武汉大学计算机学院
 
 
 
 
  --
  Thanking You,
  Darshit Shah
 
 
 
 
  --
  Regards,
  Chen Zihang,
  Computer School of Wuhan University
  ---
  此致
  陈子杭
  武汉大学计算机学院
 



 --
 Thanking You,
 Darshit Shah




 --
 Regards,
 Chen Zihang,
 Computer School of Wuhan University
 ---
 此致
 陈子杭
 武汉大学计算机学院



Re: [Bug-wget] [GSoC PATCH 11/11] in conf, rename register to rule and hook

2014-03-14 Thread Yousong Zhou
On 14 March 2014 21:28, 陈子杭 (Zihang Chen) chsc4...@gmail.com wrote:
 So sorry I flooded the mailing list. I thought --chain-reply-to is turned
 on by default :(

I think --no-chain-reply-to is okay.


   yousong



Re: [Bug-wget] [GSoC] Refactoring the Test Suite

2014-03-10 Thread Yousong Zhou
Hi, Zihang,

On 10 March 2014 13:05, 陈子杭 (Zihang Chen) chsc4...@gmail.com wrote:
 Hi, Darshit.
 I fixed the line ending using git config --global autocrlf input. Line
 endings should be lf now. I also added some documentation. File modes for
 Test-*.py are 755 now.


I just did a quick check on the patch and the line endings are still
wrong, e.g. testenv/test/http_test.py

Also, .pyc files should not be included, right?

I do not have much experience with parallel-wget, but you can enhance
organizing your commits by following how existing ones in the
repository were written.


   yousong



Re: [Bug-wget] [GSoC] Refactoring the Test Suite

2014-03-08 Thread Yousong Zhou
Hi, Zihang and Darshit, and all.

On 9 March 2014 09:39, Darshit Shah dar...@gmail.com wrote:
 Hi Zihang,


 I just had a brief glance through the whole commit. That's a very large
 change! It's essentially the same code with lots of moving around and
 cosmetic changes.

 However, I do have a couple of issues with it:
 1. I found it really difficult to follow the code. You should edit the
 README file to reflect the current scenario and how should a developer
 follow it.
 2. It seems like you've created some really nice abstractions, it would
 very nice to explain them so the developers for Wget know what to look at
 and what to edit.

Hi, Zihang. The patch is really big as a single commit.  You'd better
split it into multiple small ones each for a single purpose, without
breaking the code with each commit if possible.  That way we can refer
to and comment on the code more easily.

 3. While the code surely is more pythonic, it creates a slight problem.
 It's *more* pythonic. Most people who have to deal with this code are not
 users who use Python everyday. I think, a little lesser of strict Python
 syntax and a little more of simpler syntax will allow non-Python developers
 to more easily follow the code. The point of using Python to rewrite the
 old test suite was that Perl was a bit too cryptic and people had to spend
 too much time understanding the code first before they could edit it. I
 don't want to repeat that with having truly pythonic code which takes more
 time to follow for a C developer.

To be honest, I am fine with the current Perl implementation.  My last
several patches for the Perl-based test framework are my first try
with Perl.  It does not take me much time to understand the design and
modify a few lines of code.  I think documentation or self-explaining
code is the solution.


 Others, please chime in on this. I like the overall restructuring though.
 And if the abstractions do work the way I think they do, I believe this
 could be a good idea. I'll look at it in much more detail when I get the
 time.


 On Sun, Mar 9, 2014 at 2:22 AM, Darshit Shah dar...@gmail.com wrote:

 Hi,

 Thanks for the refactoring. However, you've included makefile and
 makefile.in which are autogenerated files and should not be commited.

 Also, your patch has trailing whitespace errors. And I don't think you've
 added an entry to ChangeLog either. Please look into these. I haven't seen
 your patch yet. The Makefile errors mean I can't apply it without a lot of
 extra work.


Hi Zihang, looks like you development environment has to be configured
right for this.  The newline character in your code in now '\r\n'
which should be '\n'.  There are mode changes from 100755 to 100644 in
the git commit, which is not right.  Those .py files should retain
their executable attributes.


   yousong



Re: [Bug-wget] Using GNU Wget 1.13.4 on an https page...

2014-03-05 Thread Yousong Zhou
On 5 March 2014 22:33, Pauline_FTP@dmin learningadvoc...@gmail.com wrote:
 Pauline_FTP@dmin learningadvoc...@gmail.com
 Feb 27 (6 days ago)

 Hi,

 Upon reading the manual regarding the topic above, I realized that I am at
 a loss of how to begin.

 The information is so overwhelming that I feel I would have to learn an
 entirely new language.
 Obviously, I am a novice at this.

 What I want to do is try GNU Wget 1.13.4 on an https page that is timing
 out before I can view and use it.


- Does it have to be version 1.13.4?
- Hmm, so you just want to download the page source?

 Are there any shortcuts or tips you can advise for me to get the Wget set
 up easily to use it to so that I can gain access to this https page?

 My system is Windows 7 and I use the Firefox browser. (I won't use the
 Chrome due to security issues that are too cumbersome to fix after they
 happen.)

Normally wget will work out of box without extra configuration.  You
can get wget for Windows from GNUWin32 project.  Does the page needs
authentication and other parameters to gain access?

FYI, you should be able to view the page source and the whole network
activity with Firebug or other extensions on FIrefox.

I am curious about the cumbersome security issues that will happen
after accessing a https page on Chrome.


 yousong



Re: [Bug-wget] [bug-wget] Unable to execute the Test Suite

2014-02-21 Thread Yousong Zhou

Hi,

On Fri, 21 Feb 2014, Darshit Shah wrote:


Hi all,


I was trying to run the test suite on Wget, but it keeps failing due to the
new submodule. At first I thought the issue was probably with the
parallel-wget branch, so I switched to master. Yet the same problem. Just
as a control test, I created a new clone of the repository and I am still
facing the same problem.

The error output is:

echo 1.15.6-d682  .version-t  mv .version-t .version
if test -d ./.git   \
git --version /dev/null 21; then  \
 cd .\
 git submodule --quiet foreach \
 test '$(git rev-parse $sha1)' \
 = '$(git merge-base origin $sha1)'\
   || { echo 'maint.mk: found non-public submodule commit' 2;\
exit 1; }; \
else\
 : ;   \
fi
Stopping at 'gnulib'; script returned non-zero status.
maint.mk: found non-public submodule commit
maint.mk:1394: recipe for target 'public-submodule-commit' failed
make: *** [public-submodule-commit] Error 1


This happens only when running `make check` and not when trying to
otherwise compile from source.


Mine worked fine after doing `git clean -f -d'.  Have you tried run the 
command manually to see the actuall output of each elements?


  git submodule foreach \
  test '$(git rev-parse $sha1)' \
  = '$(git merge-base origin $sha1)'\

Or something like

  git submodule foreach \
  echo '$name, $path, $sha1'

which produces

  yousong@jumper:~/wget$ git submodule foreach  \
 echo '$name, $path, $sha1'
  Entering 'gnulib'
  gnulib, gnulib, 0ac90c5a98030c998f3e1db3a0d7f19d4630b6b6

on my machine.



yousong



Anyone know the reasons for this?

--
Thanking You,
Darshit Shah





Re: [Bug-wget] wget confused by URL

2014-02-20 Thread Yousong Zhou
Hi,

On Thu, 20 Feb 2014, James Macomber wrote:

 Hi,
 
 May be my n00bness, but I can't seem to get the syntax right for this
 command or the command is getting confused by my values.
 
 I am using the win86_64 version 1.11.4.
 
 I am calling wget -r -i C:\Users\macombej\Desktop\wgeturl.txt -S -o
 C:\Users\macombej\Desktop\wgetresponse.txt
 
 wgeturl.txt looks like this:
 
 http://u.eq2wire.com/soe/item_search_link/Arcane/95/-1/-1/-1/-1/-1/-1/Armor/Fury/-1/-1/-1/-1/-1/-1/-1/-1/-1
 
 I have tried it with username/password in the proper syntax for the above
 URL, but this doesn't seem to matter either.
 
 and wgetresponse.txt shows this:
 
 --2014-02-20 22:16:23--
 http://u.eq2wire.com/soe/item_search_link/Arcane/95/-1/-1/-1/-1/-1/-1/Armor/Fury/-1/-1/-1/-1/-1/-1/-1/-1/-1
 Resolving u.eq2wire.com... 67.23.252.182
 Connecting to u.eq2wire.com|67.23.252.182|:80... connected.
 HTTP request sent, awaiting response...
   HTTP/1.1 200 OK
   Date: Fri, 21 Feb 2014 03:16:45 GMT
   Server: Apache
   X-Powered-By: PHP/5.4.23
   Refresh: 0;url=http://u.eq2wire.com/soe/item_search_results

Looks like wget didn't understand this header very well?


yousong

   Set-Cookie:
 ci_session=a%3A4%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%5c5f724a6c93947f470361ed6c37e8%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A14%3A%22108.48.199.124%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A11%3A%22Wget%2F1.11.4%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1392952605%3B%7D2fdb4ad7da33521f95643c3980fe9922;
 expires=Sat, 22-Feb-2014 03:16:45 GMT; path=/
   Set-Cookie:
 ci_session=a%3A4%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%5c5f724a6c93947f470361ed6c37e8%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A14%3A%22108.48.199.124%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A11%3A%22Wget%2F1.11.4%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1392952605%3B%7D2fdb4ad7da33521f95643c3980fe9922;
 expires=Sat, 22-Feb-2014 03:16:45 GMT; path=/
   Vary: Accept-Encoding
   Content-Length: 0
   Connection: close
   Content-Type: text/html
 Length: 0 [text/html]
 Saving to: `
 u.eq2wire.com/soe/item_search_link/Arcane/95/-1/-1/-1/-1/-1/-1/Armor/Fury/-1/-1/-1/-1/-1/-1/-1/-1/-1
 '
 
  0K0.00 =0s
 
 2014-02-20 22:16:23 (0.00 B/s) - `
 u.eq2wire.com/soe/item_search_link/Arcane/95/-1/-1/-1/-1/-1/-1/Armor/Fury/-1/-1/-1/-1/-1/-1/-1/-1/-1'
 saved [0/0]
 
 I have compared this to wireshark captures and these are the first two
 cookies that get pulled, but all the rest of the html code values are not
 getting pulled.
 
 Any idea what I am missing or why this may not pull the page values I get
 with the same URL in a browser?
 



[Bug-wget] [PATCH v5 0/4] Make wget capable of starting downloads from a specified position.

2014-02-13 Thread Yousong Zhou
This series tries to add an option `--start-pos' for specifying starting
position of a HTTP or FTP download.  Also inclued are 2 fixes for the test
infrastructure and 3 test cases for the new option.

With the new option, a user-specified zero-based offset value can be specified,
instead of deriving it from existing file which is what --continue currently
does.  When both this option and --continue are both specified which does not
make much sense, wget will warn and proceed as if --continue was not there.

Signed-off-by: Yousong Zhou yszhou4t...@gmail.com
---
v4 - v5

- Reworked the description in doc with kind suggestions from Tim
  Ruehsen.
- Disable --start-pos when WARC options are used.
- When --start-pos and --continue are both specified, emit a warning,
  use --start-pos and disable --continue, then proceed.
- Add 2 fixes for the test infrastructure.
- Add 3 test cases for the new option.

v3 - v4

In doc/wget.texi and wget usage output, explicitly note that
--start-pos is zero-based.

v2 - v3

Fix a typo and add description text for the new option into the usage
output.  Thank Darshit Shah dar...@gmail.com for the suggestions.

v1 - v2

It was kindly pointed out by Darshit Shah dar...@gmail.com that
server support for resuming download is required, so adding this into
doc/wget.texi.

Yousong Zhou (4):
  Make wget capable of starting downloads from a specified position.
  Tests: fix TYPE and RETR command handling.
  Tests: exclude existing files from the check of unexpected downloads.
  Tests: Add test cases for option --start-pos.

 doc/ChangeLog  |4 ++
 doc/wget.texi  |   16 ++
 src/ChangeLog  |7 
 src/ftp.c  |2 +
 src/http.c |2 +
 src/init.c |4 ++
 src/main.c |   18 +--
 src/options.h  |1 +
 tests/ChangeLog|   16 ++
 tests/FTPServer.pm |   12 ---
 tests/Test--start-pos--continue.px |   57 
 tests/Test--start-pos.px   |   46 +
 tests/Test-ftp--start-pos.px   |   42 ++
 tests/WgetTest.pm.in   |5 ++-
 tests/run-px   |3 ++
 15 files changed, 226 insertions(+), 9 deletions(-)
 create mode 100755 tests/Test--start-pos--continue.px
 create mode 100755 tests/Test--start-pos.px
 create mode 100755 tests/Test-ftp--start-pos.px

-- 
1.7.2.5




[Bug-wget] [PATCH v5 1/4] Make wget capable of starting downloads from a specified position.

2014-02-13 Thread Yousong Zhou
This patch adds an option `--start-pos' for specifying starting position
of a HTTP or FTP download.

Signed-off-by: Yousong Zhou yszhou4t...@gmail.com
---
 doc/ChangeLog |4 
 doc/wget.texi |   16 
 src/ChangeLog |7 +++
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|4 
 src/main.c|   18 +++---
 src/options.h |1 +
 8 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/doc/ChangeLog b/doc/ChangeLog
index 58d1439..68629c6 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,7 @@
+2014-02-10  Yousong Zhou  yszhou4t...@gmail.com
+
+   * wget.texi: Add documentation for --start-pos.
+
 2013-12-29  Giuseppe Scrivano  gscri...@redhat.com
 
* wget.texi: Update to GFDL 1.3.
diff --git a/doc/wget.texi b/doc/wget.texi
index 6a8c6a3..0b23bda 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -701,6 +701,22 @@ Another instance where you'll get a garbled file if you 
try to use
 Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
 servers that support the @code{Range} header.
 
+@cindex offset
+@cindex continue retrieval
+@cindex incomplete downloads
+@cindex resume download
+@cindex start position
+@item --start-pos=@var{OFFSET}
+Start downloading at zero-based position @var{OFFSET}.  Offset may be expressed
+in bytes, kilobytes with the `k' suffix, or megabytes with the `m' suffix, etc.
+
+@samp{--start-pos} has higher precedence over @samp{--continue}. When
+@samp{--start-pos} and @samp{--continue} are both specified, wget will emit a
+warning then proceed as if @samp{--continue} was absent.
+
+Server support for continued download is required, otherwise @samp{--start-pos}
+cannot help.  See @samp{-c} for details.
+
 @cindex progress indicator
 @cindex dot style
 @item --progress=@var{type}
diff --git a/src/ChangeLog b/src/ChangeLog
index b7b6753..6615ad7 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,10 @@
+2014-02-10  Yousong Zhou  yszhou4t...@gmail.com
+
+   * init.c, main.c, options.h: Add option --start-pos for specifying
+   start position of a download.
+   * http.c: Utilize opt.start_pos for HTTP download.
+   * ftp.c: Utilize opt.start_pos for FTP retrieval.
+
 2014-02-06  Giuseppe Scrivano  gscri...@redhat.com
 
* main.c (print_version): Move copyright year out of the localized
diff --git a/src/ftp.c b/src/ftp.c
index c2522ca..5282588 100644
--- a/src/ftp.c
+++ b/src/ftp.c
@@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo *f, 
ccon *con, char **local_fi
   /* Decide whether or not to restart.  */
   if (con-cmd  DO_LIST)
 restval = 0;
+  else if (opt.start_pos = 0)
+restval = opt.start_pos;
   else if (opt.always_rest
stat (locf, st) == 0
S_ISREG (st.st_mode))
diff --git a/src/http.c b/src/http.c
index 5715df6..0bede9d 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3101,6 +3101,8 @@ Spider mode enabled. Check if remote file exists.\n));
   /* Decide whether or not to restart.  */
   if (force_full_retrieve)
 hstat.restval = hstat.len;
+  else if (opt.start_pos = 0)
+hstat.restval = opt.start_pos;
   else if (opt.always_rest
got_name
stat (hstat.local_file, st) == 0
diff --git a/src/init.c b/src/init.c
index 56fef50..9ed72b2 100644
--- a/src/init.c
+++ b/src/init.c
@@ -270,6 +270,7 @@ static const struct {
   { showalldnsentries, opt.show_all_dns_entries, cmd_boolean },
   { spanhosts,opt.spanhost,  cmd_boolean },
   { spider,   opt.spider,cmd_boolean },
+  { startpos, opt.start_pos, cmd_bytes },
   { strictcomments,   opt.strict_comments,   cmd_boolean },
   { timeout,  NULL,   cmd_spec_timeout },
   { timestamping, opt.timestamping,  cmd_boolean },
@@ -406,6 +407,9 @@ defaults (void)
   opt.warc_cdx_dedup_filename = NULL;
   opt.warc_tempdir = NULL;
   opt.warc_keep_log = true;
+
+  /* Use a negative value to mark the absence of --start-pos option */
+  opt.start_pos = -1;
 }
 
 /* Return the user's home directory (strdup-ed), or NULL if none is
diff --git a/src/main.c b/src/main.c
index 3ce7583..39fcff4 100644
--- a/src/main.c
+++ b/src/main.c
@@ -276,6 +276,7 @@ static struct cmdline_option option_data[] =
 { server-response, 'S', OPT_BOOLEAN, serverresponse, -1 },
 { span-hosts, 'H', OPT_BOOLEAN, spanhosts, -1 },
 { spider, 0, OPT_BOOLEAN, spider, -1 },
+{ start-pos, 0, OPT_VALUE, startpos, -1 },
 { strict-comments, 0, OPT_BOOLEAN, strictcomments, -1 },
 { timeout, 'T', OPT_VALUE, timeout, -1 },
 { timestamping, 'N', OPT_BOOLEAN, timestamping, -1 },
@@ -486,6 +487,8 @@ Download:\n),
 N_(\
   -c,  --continueresume getting a partially-downloaded 
file.\n),
 N_(\
+   --start-pos=OFFSETstart downloading from zero-based position 
OFFSET.\n),
+N_(\
--progress=TYPE

[Bug-wget] [PATCH v5 3/4] Tests: exclude existing files from the check of unexpected downloads.

2014-02-13 Thread Yousong Zhou

Signed-off-by: Yousong Zhou yszhou4t...@gmail.com
---
 tests/ChangeLog  |5 +
 tests/WgetTest.pm.in |5 -
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/tests/ChangeLog b/tests/ChangeLog
index a7db249..d23e76e 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,5 +1,10 @@
 2014-02-13  Yousong Zhou  yszhou4t...@gmail.com
 
+   * Wget.pm.in: Exclude existing files from the check of unexpected
+ downloads.
+
+2014-02-13  Yousong Zhou  yszhou4t...@gmail.com
+
* FTPServer.pm: Fix the handling of TYPE command and avoid endless
loop when doing binary mode RETR.
 
diff --git a/tests/WgetTest.pm.in b/tests/WgetTest.pm.in
index 58ad140..092777e 100644
--- a/tests/WgetTest.pm.in
+++ b/tests/WgetTest.pm.in
@@ -256,7 +256,10 @@ sub _verify_download {
 # make sure no unexpected files were downloaded
 chdir ($self-{_workdir}/$self-{_name}/output);
 
-__dir_walk('.', sub { push @unexpected_downloads, $_[0] unless (exists 
$self-{_output}{$_[0]}) }, sub { shift; return @_ } );
+__dir_walk('.',
+   sub { push @unexpected_downloads,
+  $_[0] unless (exists $self-{_output}{$_[0]} || 
$self-{_existing}{$_[0]}) },
+   sub { shift; return @_ } );
 if (@unexpected_downloads) {
 return Test failed: unexpected downloaded files [ . join(', ', 
@unexpected_downloads) . ]\n;
 }
-- 
1.7.2.5




[Bug-wget] [PATCH v5 2/4] Tests: fix TYPE and RETR command handling.

2014-02-13 Thread Yousong Zhou
 - FTPServer.pm's handling of TYPE command would ignore binary mode
   transfer request.
 - The FTP server would run into dead loop sending the same content
   forever.

Signed-off-by: Yousong Zhou yszhou4t...@gmail.com
---
 tests/ChangeLog|5 +
 tests/FTPServer.pm |   12 +++-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/tests/ChangeLog b/tests/ChangeLog
index 6730169..a7db249 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,3 +1,8 @@
+2014-02-13  Yousong Zhou  yszhou4t...@gmail.com
+
+   * FTPServer.pm: Fix the handling of TYPE command and avoid endless
+   loop when doing binary mode RETR.
+
 2014-01-23  Lars Wendler  polynomia...@gentoo.org (tiny change)
 
* Test--post-file.px: Do not fail when wget has no debug support.
diff --git a/tests/FTPServer.pm b/tests/FTPServer.pm
index 2ac72e3..7e9e18d 100644
--- a/tests/FTPServer.pm
+++ b/tests/FTPServer.pm
@@ -298,12 +298,13 @@ sub _RETR_command
 # What mode are we sending this file in?
 unless ($conn-{type} eq 'A') # Binary type.
 {
-my ($r, $buffer, $n, $w);
-
+my ($r, $buffer, $n, $w, $sent);
 
 # Copy data.
-while ($buffer = substr($content, 0, 65536))
+$sent = 0;
+while ($sent  length($content))
 {
+$buffer = substr($content, 0, 65536);
 $r = length $buffer;
 
 # Restart alarm clock timer.
@@ -330,6 +331,7 @@ sub _RETR_command
 print {$conn-{socket}} 426 Transfer aborted. Data connection 
closed.\r\n;
 return;
 }
+$sent += $r;
 }
 
 # Cleanup and exit if there was an error.
@@ -410,9 +412,9 @@ sub _TYPE_command
 
 # See RFC 959 section 5.3.2.
 if ($type =~ /^([AI])$/i) {
-$conn-{type} = 'A';
+$conn-{type} = $1;
 } elsif ($type =~ /^([AI])\sN$/i) {
-$conn-{type} = 'A';
+$conn-{type} = $1;
 } elsif ($type =~ /^L\s8$/i) {
 $conn-{type} = 'L8';
 } else {
-- 
1.7.2.5




[Bug-wget] [PATCH v5 2/4] Tests: fix TYPE and RETR command handling.

2014-02-13 Thread Yousong Zhou
 - FTPServer.pm's handling of TYPE command would ignore binary mode
   transfer request.
 - The FTP server would run into dead loop sending the same content
   forever.

Signed-off-by: Yousong Zhou yszhou4t...@gmail.com
---
 tests/ChangeLog|5 +
 tests/FTPServer.pm |   12 +++-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/tests/ChangeLog b/tests/ChangeLog
index 6730169..a7db249 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,3 +1,8 @@
+2014-02-13  Yousong Zhou  yszhou4t...@gmail.com
+
+   * FTPServer.pm: Fix the handling of TYPE command and avoid endless
+   loop when doing binary mode RETR.
+
 2014-01-23  Lars Wendler  polynomia...@gentoo.org (tiny change)
 
* Test--post-file.px: Do not fail when wget has no debug support.
diff --git a/tests/FTPServer.pm b/tests/FTPServer.pm
index 2ac72e3..1603caa 100644
--- a/tests/FTPServer.pm
+++ b/tests/FTPServer.pm
@@ -298,12 +298,13 @@ sub _RETR_command
 # What mode are we sending this file in?
 unless ($conn-{type} eq 'A') # Binary type.
 {
-my ($r, $buffer, $n, $w);
-
+my ($r, $buffer, $n, $w, $sent);
 
 # Copy data.
-while ($buffer = substr($content, 0, 65536))
+$sent = 0;
+while ($sent  length($content))
 {
+$buffer = substr($content, $sent, 65536);
 $r = length $buffer;
 
 # Restart alarm clock timer.
@@ -330,6 +331,7 @@ sub _RETR_command
 print {$conn-{socket}} 426 Transfer aborted. Data connection 
closed.\r\n;
 return;
 }
+$sent += $r;
 }
 
 # Cleanup and exit if there was an error.
@@ -410,9 +412,9 @@ sub _TYPE_command
 
 # See RFC 959 section 5.3.2.
 if ($type =~ /^([AI])$/i) {
-$conn-{type} = 'A';
+$conn-{type} = $1;
 } elsif ($type =~ /^([AI])\sN$/i) {
-$conn-{type} = 'A';
+$conn-{type} = $1;
 } elsif ($type =~ /^L\s8$/i) {
 $conn-{type} = 'L8';
 } else {
-- 
1.7.2.5




Re: [Bug-wget] [PATCH v5 2/4] Tests: fix TYPE and RETR command handling.

2014-02-13 Thread Yousong Zhou
Please use this newly sent one of this patch.  The old one is incorrect.

On 14 February 2014 10:27, Yousong Zhou yszhou4t...@gmail.com wrote:
 diff --git a/tests/FTPServer.pm b/tests/FTPServer.pm
 index 2ac72e3..1603caa 100644
 --- a/tests/FTPServer.pm
 +++ b/tests/FTPServer.pm
 @@ -298,12 +298,13 @@ sub _RETR_command
  # What mode are we sending this file in?
  unless ($conn-{type} eq 'A') # Binary type.
  {
 -my ($r, $buffer, $n, $w);
 -
 +my ($r, $buffer, $n, $w, $sent);

  # Copy data.
 -while ($buffer = substr($content, 0, 65536))
 +$sent = 0;
 +while ($sent  length($content))
  {
 +$buffer = substr($content, $sent, 65536);

It was:

 $buffer = substr($content, 0, 65536);


  $r = length $buffer;

  # Restart alarm clock timer.


   yousong



[Bug-wget] [PATCH v4] Make wget capable of starting download from a specified position.

2014-02-06 Thread Yousong Zhou
On Thursday, February 6, 2014, Tim Ruehsen tim.rueh...@gmx.de wrote:

 Hi Yousong,

 please don't forget to send your posts to the mailing list.


Sorry for that.



 On Thursday 06 February 2014 10:27:37 Yousong Zhou wrote:
  On Wednesday, February 5, 2014, Tim Ruehsen tim.rueh...@gmx.de wrote:
   First of all, thanks for your contribution.
  
   I have some little remarks / questions:
  
   - The documentation is not quite right: when using --start-pos and the
   file
   already exists, wget creates as expected a file.1.
   But your docs say, --start-pos would overwrite an existing file !?
   Could you make this point clear ?
 
  Yes, 'overwrite' is wrong.
 
   - The combination with --continue works for me as expected. It would
   simply
   append the downloaded bytes to the existing file. Maybe you should
   document
   that as well. At least your sentence ... it would override the
 behavior
   of --
   continue seems not to be correct.
 
  Sorry for the confusion.  --continue will detect size of existing file,
  then continue as if an equivalent --start-pos was specified.  By
 'override'
  I mean the new option has higher precedence over --continue.  Other than
  that, all existing behaviors of wget are supposed to remain unchanged.
 
   - What about extending the option to something like
   --range=STARTPOS[-ENDPOS]
   ?
 
  You mean change the option name to 'range'?  IIRC, that's how curl does
 it.
   I am okay with --start-pos. ;)

 I just wanted to mention a possible 'ENDPOS'. In that case --start-pos
 isn't
 appropriate any more and --range seems natural to me.


I thought ENDPOS was in the patch and once I did a quick look at it I know
why I decided it should be --start-pos, without LEN or ENDPOS.

 - I thought the current implementation is simple, neat and easy.  Several
lines of code really help at that time.
 - Code of wget is old and mature enough, mimicing curl's --range is very
likely to pose many compatibility and maintainance issues.  This is not
what we want.
 - I actually thought about LENGTH, but not ENDPOS and the main reason I
give myself to not implement it is that we can achieve length limit with
other utilities, e.g. dd.

So I do not have intention to implement curl-like --range option in wget.



 I just took a look at curl's man page. The curl people did it the right
 way.
 Especially their hint about multipart responses is of value (i didn't know
 that). Such cases would likely need special handling in Wget.


didn't know that either.


 
   - If you want to brush up your patch, add a test-case for it for the
 new
   Python based test suite. I guess, Darshit can give you a helping hand,
 if
   you
   request it.
 
  Will do once I get the time.
 
   Tim
 
  Thank you for looking at this.
 
  yousong
 
  --start-pos is zero-based.
   
v2 - v3
   
Fix a typo and add description text for the new option into
 the
   
usage output.  Thank Darshit Shah dar...@gmail.com javascript:;
 for
  
   the suggestions.
  
v1 - v2
   
It was kindly pointed out by Darshit Shah
dar...@gmail.comjavascript:;
  
   that
  
server support for resuming download is required, so adding
 this
   
into doc/wget.texi.
   
 doc/ChangeLog |4 
 doc/wget.texi |   17 +
 src/ChangeLog |9 +
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|1 +
 src/main.c|3 +++
 src/options.h |1 +
 8 files changed, 39 insertions(+), 0 deletions(-)




Re: [Bug-wget] [PATCH v4] Make wget capable of starting download from a specified position.

2014-02-04 Thread Yousong Zhou
Hi, can this feature be picked up?  Months have passed and I think a ping
will be good.  :)

yosong

On Monday, December 23, 2013, Yousong Zhou yszhou4t...@gmail.com wrote:

 This patch adds an option `--start-pos' for specifying starting position
 of a download, both for HTTP and FTP.  When specified, the newly added
 option would override `--continue'.  Apart from that, no existing code
 should be affected.

 Signed-off-by: Yousong Zhou yszhou4t...@gmail.com javascript:;
 ---
 v3 - v4

 In doc/wget.texi and wget usage output, explicitly note that
 --start-pos is zero-based.

 v2 - v3

 Fix a typo and add description text for the new option into the
 usage
 output.  Thank Darshit Shah dar...@gmail.com javascript:; for
 the suggestions.

 v1 - v2

 It was kindly pointed out by Darshit Shah 
 dar...@gmail.comjavascript:;
 that
 server support for resuming download is required, so adding this
 into
 doc/wget.texi.

  doc/ChangeLog |4 
  doc/wget.texi |   17 +
  src/ChangeLog |9 +
  src/ftp.c |2 ++
  src/http.c|2 ++
  src/init.c|1 +
  src/main.c|3 +++
  src/options.h |1 +
  8 files changed, 39 insertions(+), 0 deletions(-)




Re: [Bug-wget] Bug report: --content-disposition option disables --continue

2014-01-01 Thread Yousong Zhou
On 2 January 2014 07:08, Eternal Sorrow sergam...@inbox.ru wrote:
 When I set option content-disposition either in command line or in
 wgetrc, wget refuses to resume download of partially-downloaded file
 with --continue command line option and strarts download from begining.

I tried the following comand and it worked.

wget -d -c --quota=1 --content-disposition
http://greenbytes.de/tech/tc2231/attfnboth.asis

--content-disposition support in wget is experimental. I think many
cases are not covered.

It will help if the following information can be provided.
 - Content-Disposition header from the server response.
 - The filename of the partially downloaded file.
 - The filename wget tried to write into.
 - If possible, the minimal command to reproduce.


   yousong



[Bug-wget] [PATCH v3] Make wget capable of starting download from a specified position.

2013-12-22 Thread Yousong Zhou
This patch adds an option `--start-pos' for specifying starting position
of a download, both for HTTP and FTP.  When specified, the newly added
option would override `--continue'.  Apart from that, no existing code
should be affected.

Signed-off-by: Yousong Zhou yszhou4t...@gmail.com
---
v2 - v3

Fix a typo and add description text for the new option into the usage
output.  Thank Darshit Shah dar...@gmail.com for the suggestions.

v1 - v2

It was kindly pointed out by Darshit Shah dar...@gmail.com that
server support for resuming download is required, so adding this into
doc/wget.texi.

 doc/ChangeLog |4 
 doc/wget.texi |   17 +
 src/ChangeLog |9 +
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|1 +
 src/main.c|3 +++
 src/options.h |1 +
 8 files changed, 39 insertions(+), 0 deletions(-)
From f7266cc18fbea1d07b25c1bd25662a5a71920520 Mon Sep 17 00:00:00 2001
From: Yousong Zhou yszhou4t...@gmail.com
Date: Fri, 20 Dec 2013 23:17:43 +0800
Subject: [PATCH v3] Make wget capable of starting download from a specified position.

This patch adds an option `--start-pos' for specifying starting position
of a download, both for HTTP and FTP.  When specified, the newly added
option would override `--continue'.  Apart from that, no existing code
should be affected.

Signed-off-by: Yousong Zhou yszhou4t...@gmail.com
---
v2 - v3

	Fix a typo and add description text for the new option into the usage
	output.  Thank Darshit Shah dar...@gmail.com for the suggestions.

v1 - v2

	It was kindly pointed out by Darshit Shah dar...@gmail.com that
	server support for resuming download is required, so adding this into
	doc/wget.texi.

 doc/ChangeLog |4 
 doc/wget.texi |   17 +
 src/ChangeLog |9 +
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|1 +
 src/main.c|3 +++
 src/options.h |1 +
 8 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/doc/ChangeLog b/doc/ChangeLog
index 3b05756..df103c8 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,7 @@
+2013-12-21  Yousong Zhou  yszhou4t...@gmail.com
+
+	* wget.texi: Add documentation for --start-pos.
+
 2013-10-06  Tim Ruehsen  tim.rueh...@gmx.de
 
 	* wget.texi: add/explain quoting of wildcard patterns
diff --git a/doc/wget.texi b/doc/wget.texi
index 4a1f7f1..87fef7c 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -701,6 +701,23 @@ Another instance where you'll get a garbled file if you try to use
 Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
 servers that support the @code{Range} header.
 
+@cindex offset
+@cindex continue retrieval
+@cindex incomplete downloads
+@cindex resume download
+@cindex start position
+@item --start-pos=@var{OFFSET}
+Start the download at position @var{OFFSET}.  Offset may be expressed in bytes,
+kilobytes with the `k' suffix, or megabytes with the `m' suffix.
+
+When specified, it would override the behavior of @samp{--continue}.  When
+using this option, you may also want to explicitly specify an output filename
+with @samp{-O FILE} in order to not overwrite an existing partially downloaded
+file.
+
+Server support for resuming download is needed, otherwise @samp{--start-pos}
+cannot help.  See @samp{-c} for details.
+
 @cindex progress indicator
 @cindex dot style
 @item --progress=@var{type}
diff --git a/src/ChangeLog b/src/ChangeLog
index 42ce3e4..ab8a496 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,12 @@
+2013-12-21  Yousong Zhou  yszhou4t...@gmail.com
+
+	* options.h: Add option --start-pos to specify start position of
+	  a download.
+	* main.c: Same purpose as above.
+	* init.c: Same purpose as above.
+	* http.c: Utilize opt.start_pos for HTTP download.
+	* ftp.c: Utilize opt.start_pos for FTP retrieval.
+
 2013-11-02  Giuseppe Scrivano  gscri...@redhat.com
 
 	* http.c (gethttp): Increase max header value length to 512.
diff --git a/src/ftp.c b/src/ftp.c
index c2522ca..c7ab6ef 100644
--- a/src/ftp.c
+++ b/src/ftp.c
@@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo *f, ccon *con, char **local_fi
   /* Decide whether or not to restart.  */
   if (con-cmd  DO_LIST)
 restval = 0;
+  else if (opt.start_pos)
+restval = opt.start_pos;
   else if (opt.always_rest
stat (locf, st) == 0
S_ISREG (st.st_mode))
diff --git a/src/http.c b/src/http.c
index 754b7ec..a354c6b 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3098,6 +3098,8 @@ Spider mode enabled. Check if remote file exists.\n));
   /* Decide whether or not to restart.  */
   if (force_full_retrieve)
 hstat.restval = hstat.len;
+  else if (opt.start_pos)
+hstat.restval = opt.start_pos;
   else if (opt.always_rest
got_name
stat (hstat.local_file, st) == 0
diff --git a/src/init.c b/src/init.c
index 84ae654..7f7a34e 100644
--- a/src/init.c
+++ b

Re: [Bug-wget] [PATCH] Make wget capable of starting download from a specified position.

2013-12-22 Thread Yousong Zhou
On Sat, Dec 21, 2013 at 11:05:01AM +0100, Dagobert Michelsen wrote:
 Hi,
 
 Am 21.12.2013 um 10:24 schrieb Yousong Zhou yszhou4t...@gmail.com:
  In my situation, wget was trigger on the remote machine like the
  following:
  
 wget -O - --start-pos $OFFSET $URL | nc -lp 7193
  
  Then on local machine, I would download with:
  
 nc localhost 7193
  
  Before these, a local forwarding tunnel has been setup with ssh to make
  this possible.  So in this case, there was no local file on the machine
  where wget was triggerred and `--continue' will not work.  I am sure
  there are other cases `--start-pos' would be useful and that
  `--start-pos' would make wget more complete.
 
 
 When I just look at your problem it seems to be easier to set up the tunnel
 slightly different and pull with standard wget. If the URL looks like
   http://host:port/rest
 and you set up the tunnel with
   ssh -L 7193:host:port machine-where-you-called-wget-previously
 and then just
   wget http://localhost:7193/rest
 the range requests would be sent by wget just fine to the initial server and
 you could also safely use -c on further wget invocations (or with proper 
 values
 for -t / -T automatically).

Just tried this approach.  It did not work out as expected because HTTP
server responded with multiple levels of redirection thus host part were
changed on the fly.  Anyway, thank you for you time.


yousong




[Bug-wget] [PATCH v4] Make wget capable of starting download from a specified position.

2013-12-22 Thread Yousong Zhou
This patch adds an option `--start-pos' for specifying starting position
of a download, both for HTTP and FTP.  When specified, the newly added
option would override `--continue'.  Apart from that, no existing code
should be affected.

Signed-off-by: Yousong Zhou yszhou4t...@gmail.com
---
v3 - v4

In doc/wget.texi and wget usage output, explicitly note that
--start-pos is zero-based.

v2 - v3

Fix a typo and add description text for the new option into the usage
output.  Thank Darshit Shah dar...@gmail.com for the suggestions.

v1 - v2

It was kindly pointed out by Darshit Shah dar...@gmail.com that
server support for resuming download is required, so adding this into
doc/wget.texi.

 doc/ChangeLog |4 
 doc/wget.texi |   17 +
 src/ChangeLog |9 +
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|1 +
 src/main.c|3 +++
 src/options.h |1 +
 8 files changed, 39 insertions(+), 0 deletions(-)

From d8fd955d161bd8ba17ac97cbcf7a3ed316e00630 Mon Sep 17 00:00:00 2001
From: Yousong Zhou yszhou4t...@gmail.com
Date: Fri, 20 Dec 2013 23:17:43 +0800
Subject: [PATCH v4] Make wget capable of starting download from a specified position.

This patch adds an option `--start-pos' for specifying starting position
of a download, both for HTTP and FTP.  When specified, the newly added
option would override `--continue'.  Apart from that, no existing code
should be affected.

Signed-off-by: Yousong Zhou yszhou4t...@gmail.com
---
v3 - v4

	In doc/wget.texi and wget usage output, explicitly note that
	--start-pos is zero-based.

v2 - v3

Fix a typo and add description text for the new option into the usage
output.  Thank Darshit Shah dar...@gmail.com for the suggestions.

v1 - v2

It was kindly pointed out by Darshit Shah dar...@gmail.com that
server support for resuming download is required, so adding this into
doc/wget.texi.

 doc/ChangeLog |4 
 doc/wget.texi |   17 +
 src/ChangeLog |9 +
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|1 +
 src/main.c|3 +++
 src/options.h |1 +
 8 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/doc/ChangeLog b/doc/ChangeLog
index 3b05756..df103c8 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,7 @@
+2013-12-21  Yousong Zhou  yszhou4t...@gmail.com
+
+	* wget.texi: Add documentation for --start-pos.
+
 2013-10-06  Tim Ruehsen  tim.rueh...@gmx.de
 
 	* wget.texi: add/explain quoting of wildcard patterns
diff --git a/doc/wget.texi b/doc/wget.texi
index 4a1f7f1..5094c26 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -701,6 +701,23 @@ Another instance where you'll get a garbled file if you try to use
 Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
 servers that support the @code{Range} header.
 
+@cindex offset
+@cindex continue retrieval
+@cindex incomplete downloads
+@cindex resume download
+@cindex start position
+@item --start-pos=@var{OFFSET}
+Start downloading at zero-based position @var{OFFSET}.  Offset may be expressed
+in bytes, kilobytes with the `k' suffix, or megabytes with the `m' suffix.
+
+When specified, it would override the behavior of @samp{--continue}.  When
+using this option, you may also want to explicitly specify an output filename
+with @samp{-O FILE} in order to not overwrite an existing partially downloaded
+file.
+
+Server support for resuming download is needed, otherwise @samp{--start-pos}
+cannot help.  See @samp{-c} for details.
+
 @cindex progress indicator
 @cindex dot style
 @item --progress=@var{type}
diff --git a/src/ChangeLog b/src/ChangeLog
index 42ce3e4..ab8a496 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,12 @@
+2013-12-21  Yousong Zhou  yszhou4t...@gmail.com
+
+	* options.h: Add option --start-pos to specify start position of
+	  a download.
+	* main.c: Same purpose as above.
+	* init.c: Same purpose as above.
+	* http.c: Utilize opt.start_pos for HTTP download.
+	* ftp.c: Utilize opt.start_pos for FTP retrieval.
+
 2013-11-02  Giuseppe Scrivano  gscri...@redhat.com
 
 	* http.c (gethttp): Increase max header value length to 512.
diff --git a/src/ftp.c b/src/ftp.c
index c2522ca..c7ab6ef 100644
--- a/src/ftp.c
+++ b/src/ftp.c
@@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo *f, ccon *con, char **local_fi
   /* Decide whether or not to restart.  */
   if (con-cmd  DO_LIST)
 restval = 0;
+  else if (opt.start_pos)
+restval = opt.start_pos;
   else if (opt.always_rest
stat (locf, st) == 0
S_ISREG (st.st_mode))
diff --git a/src/http.c b/src/http.c
index 754b7ec..a354c6b 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3098,6 +3098,8 @@ Spider mode enabled. Check if remote file exists.\n));
   /* Decide whether or not to restart.  */
   if (force_full_retrieve)
 hstat.restval = hstat.len

Re: [Bug-wget] [PATCH] Make wget capable of starting download from a specified position.

2013-12-21 Thread Yousong Zhou
On Sat, Dec 21, 2013 at 01:51:04PM +0530, Darshit Shah wrote:
 I have a few comments on the patch. Commenting inline.

Thank you.

 
 On Sat, Dec 21, 2013 at 12:32 PM, Yousong Zhou yszhou4t...@gmail.comwrote:
 
  This patch adds an option `--start-pos' for specifying starting position
  of a download, both for HTTP and FTP.  When specified, the newly added
  option would override `--continue'.  Apart from that, no existing code
  should be affected.
 
  Signed-off-by: Yousong Zhou yszhou4t...@gmail.com
  ---
  Hi,
 
  I found myself needed this feature when I was trying to tunnel the
  download of
  big file (several gigabytes) from a remote machine back to local through a
  somewhat flaky connection.  It's a pain both for the server and local
  network
  users if we have to repeat the previously already downloaded part in case
  that
  the connection hangs or breaks.  Specifying 'Range: ' header is not an
  option
  for wget (integrity check in the code would fail), and curl is not fast
  enough.
  So I decided to make this patch in hope that this can also be useful to
  someone
  else.
 
  What integrity check would fail on using the Range Header? And if you
 already have a partially downloaded file why is using the --continue switch
 on an option?

`--continue` only works if there is already a partially downloaded file
on disk.  Otherwise, specifying `-c' will only tell wget to start from
scratch.

By 'Range: ' header I mean headers specified by `--header'.  If the
server sends back a 'Content-Range: ' header in the response, wget would
think that it's unexpected or not matching what's already on the disk
(would be zero if there is no file on disk).  If I get the code right,
the check is at `http.c:gethttp()':

2744   if ((contrange != 0  contrange != hs-restval)
2745   || (H_PARTIAL (statcode)  !contrange))
2746 {
2747   /* The Range request was somehow misunderstood by the server.
2748  Bail out.  */
2749   xfree_null (type);
2750   CLOSE_INVALIDATE (sock);
2751   xfree (head);
2752   return RANGEERR;
2753 }

In my situation, wget was trigger on the remote machine like the
following:

wget -O - --start-pos $OFFSET $URL | nc -lp 7193

Then on local machine, I would download with:

nc localhost 7193

Before these, a local forwarding tunnel has been setup with ssh to make
this possible.  So in this case, there was no local file on the machine
where wget was triggerred and `--continue' will not work.  I am sure
there are other cases `--start-pos' would be useful and that
`--start-pos' would make wget more complete.

 
 yousong
 
   doc/ChangeLog |4 
   doc/wget.texi |   14 ++
   src/ChangeLog |9 +
   src/ftp.c |2 ++
   src/http.c|2 ++
   src/init.c|1 +
   src/main.c|1 +
   src/options.h |1 +
   8 files changed, 34 insertions(+), 0 deletions(-)
 
  diff --git a/doc/ChangeLog b/doc/ChangeLog
  index 3b05756..df103c8 100644
  --- a/doc/ChangeLog
  +++ b/doc/ChangeLog
  @@ -1,3 +1,7 @@
  +2013-12-21  Yousong Zhou  yszhou4t...@gmail.com
  +
  +   * wget.texi: Add documentation for --start-pos.
  +
   2013-10-06  Tim Ruehsen  tim.rueh...@gmx.de
 
  * wget.texi: add/explain quoting of wildcard patterns
  diff --git a/doc/wget.texi b/doc/wget.texi
  index 4a1f7f1..166ea08 100644
  --- a/doc/wget.texi
  +++ b/doc/wget.texi
  @@ -701,6 +701,20 @@ Another instance where you'll get a garbled file if
  you try to use
   Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
   servers that support the @code{Range} header.
 
  +@cindex offset
  +@cindex continue retrieval
  +@cindex incomplete downloads
  +@cindex resume download
  +@cindex start position
  +@item --start-pos=@var{OFFSET}
  +Start the download at position @var{OFFSET}.  Offset may be expressed in
  bytes,
  +kilobytes with the `k' suffix, or megabytes with the `m' suffix.
  +
  +When specified, it would override the behavior of @samp{--continue}.  When
  +using this option, you may also want to explicitly specify an output
  filename
  +with @samp{-O FILE} in order to not overwrite an existing partially
  downloaded
  +file.
  +
   @cindex progress indicator
   @cindex dot style
   @item --progress=@var{type}
  diff --git a/src/ChangeLog b/src/ChangeLog
  index 42ce3e4..ab8a496 100644
  --- a/src/ChangeLog
  +++ b/src/ChangeLog
  @@ -1,3 +1,12 @@
  +2013-12-21  Yousong Zhou  yszhou4t...@gmail.com
  +
  +   * options.h: Add option --start-pos to specify start position of
  + a download.
  +   * main.c: Same purpose as above.
  +   * init.c: Same purpose as above.
  +   * http.c: Utilize opt.start_pos for HTTP download.
  +   * ftp.c: Utilize opt.start_pos for FTP retrieval.
  +
   2013-11-02  Giuseppe Scrivano  gscri...@redhat.com
 
  * http.c (gethttp): Increase max header value length

Re: [Bug-wget] [PATCH] Make wget capable of starting download from a specified position.

2013-12-21 Thread Yousong Zhou
On Sat, Dec 21, 2013 at 11:05:01AM +0100, Dagobert Michelsen wrote:
 Hi,
 
 Am 21.12.2013 um 10:24 schrieb Yousong Zhou yszhou4t...@gmail.com:
  In my situation, wget was trigger on the remote machine like the
  following:
  
 wget -O - --start-pos $OFFSET $URL | nc -lp 7193
  
  Then on local machine, I would download with:
  
 nc localhost 7193
  
  Before these, a local forwarding tunnel has been setup with ssh to make
  this possible.  So in this case, there was no local file on the machine
  where wget was triggerred and `--continue' will not work.  I am sure
  there are other cases `--start-pos' would be useful and that
  `--start-pos' would make wget more complete.
 
 
 When I just look at your problem it seems to be easier to set up the tunnel
 slightly different and pull with standard wget. If the URL looks like
   http://host:port/rest
 and you set up the tunnel with
   ssh -L 7193:host:port machine-where-you-called-wget-previously
 and then just
   wget http://localhost:7193/rest
 the range requests would be sent by wget just fine to the initial server and
 you could also safely use -c on further wget invocations (or with proper 
 values
 for -t / -T automatically).

Yes, this is more sensible a way of doing the download once the tunnel
is up.  Thank you for pointing this out.  Really, one thing that has
caught much of my concern when redirecting netcat stdout to a file is
that it may not good for my hard disk with hours of continuous disk
writing.  I may count on wget's behavior on this ;)

In other cases, `--start-pos' may still come in handy, for example, when
trying to peek just parts of each file on server without fully
downloading them, or doing parallel downloads with some simple
scripting.  Just name a few I come up with.


yousong

 
 
 Best regards
 
   -- Dago
 
 -- 
 You don't become great by trying to be great, you become great by wanting to 
 do something,
 and then doing it so hard that you become great in the process. - xkcd #896
 





[Bug-wget] [PATCH v2] Make wget capable of starting download from a specified position.

2013-12-21 Thread Yousong Zhou
This patch adds an option `--start-pos' for specifying starting position
of a download, both for HTTP and FTP.  When specified, the newly added
option would override `--continue'.  Apart from that, no existing code
should be affected.

Signed-off-by: Yousong Zhou yszhou4t...@gmail.com
---
v1 - v2

It was kindly pointed out by Darshit Shah dar...@gmail.com that
server support for resuming download is required, so adding this into
doc/wget.texi.

 doc/ChangeLog |4 
 doc/wget.texi |   17 +
 src/ChangeLog |9 +
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|1 +
 src/main.c|1 +
 src/options.h |1 +
 8 files changed, 37 insertions(+), 0 deletions(-)
From 93152cb081f529762a364eea67115f654cd6fda4 Mon Sep 17 00:00:00 2001
From: Yousong Zhou yszhou4t...@gmail.com
Date: Fri, 20 Dec 2013 23:17:43 +0800
Subject: [PATCH v2] Make wget capable of starting download from a specified position.

This patch adds an option `--start-pos' for specifying starting position
of a download, both for HTTP and FTP.  When specified, the newly added
option would override `--continue'.  Apart from that, no existing code
should be affected.

Signed-off-by: Yousong Zhou yszhou4t...@gmail.com
---
v1 - v2

	It was kindly pointed out by Darshit Shah dar...@gmail.com that
	server support for resuming download is required, so adding this into
	doc/wget.texi.

 doc/ChangeLog |4 
 doc/wget.texi |   17 +
 src/ChangeLog |9 +
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|1 +
 src/main.c|1 +
 src/options.h |1 +
 8 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/doc/ChangeLog b/doc/ChangeLog
index 3b05756..df103c8 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,7 @@
+2013-12-21  Yousong Zhou  yszhou4t...@gmail.com
+
+	* wget.texi: Add documentation for --start-pos.
+
 2013-10-06  Tim Ruehsen  tim.rueh...@gmx.de
 
 	* wget.texi: add/explain quoting of wildcard patterns
diff --git a/doc/wget.texi b/doc/wget.texi
index 4a1f7f1..9151d28 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -701,6 +701,23 @@ Another instance where you'll get a garbled file if you try to use
 Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
 servers that support the @code{Range} header.
 
+@cindex offset
+@cindex continue retrieval
+@cindex incomplete downloads
+@cindex resume download
+@cindex start position
+@item --start-pos=@var{OFFSET}
+Start the download at position @var{OFFSET}.  Offset may be expressed in bytes,
+kilobytes with the `k' suffix, or megabytes with the `m' suffix.
+
+When specified, it would override the behavior of @samp{--continue}.  When
+using this option, you may also want to explicitly specify an output filename
+with @samp{-O FILE} in order to not overwrite an existing partially downloaded
+file.
+
+Serer support for resuming download is needed, otherwise @samp{--start-pos}
+cannot help.  See @samp{-c} for details.
+
 @cindex progress indicator
 @cindex dot style
 @item --progress=@var{type}
diff --git a/src/ChangeLog b/src/ChangeLog
index 42ce3e4..ab8a496 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,12 @@
+2013-12-21  Yousong Zhou  yszhou4t...@gmail.com
+
+	* options.h: Add option --start-pos to specify start position of
+	  a download.
+	* main.c: Same purpose as above.
+	* init.c: Same purpose as above.
+	* http.c: Utilize opt.start_pos for HTTP download.
+	* ftp.c: Utilize opt.start_pos for FTP retrieval.
+
 2013-11-02  Giuseppe Scrivano  gscri...@redhat.com
 
 	* http.c (gethttp): Increase max header value length to 512.
diff --git a/src/ftp.c b/src/ftp.c
index c2522ca..c7ab6ef 100644
--- a/src/ftp.c
+++ b/src/ftp.c
@@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo *f, ccon *con, char **local_fi
   /* Decide whether or not to restart.  */
   if (con-cmd  DO_LIST)
 restval = 0;
+  else if (opt.start_pos)
+restval = opt.start_pos;
   else if (opt.always_rest
stat (locf, st) == 0
S_ISREG (st.st_mode))
diff --git a/src/http.c b/src/http.c
index 754b7ec..a354c6b 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3098,6 +3098,8 @@ Spider mode enabled. Check if remote file exists.\n));
   /* Decide whether or not to restart.  */
   if (force_full_retrieve)
 hstat.restval = hstat.len;
+  else if (opt.start_pos)
+hstat.restval = opt.start_pos;
   else if (opt.always_rest
got_name
stat (hstat.local_file, st) == 0
diff --git a/src/init.c b/src/init.c
index 84ae654..7f7a34e 100644
--- a/src/init.c
+++ b/src/init.c
@@ -271,6 +271,7 @@ static const struct {
   { showalldnsentries, opt.show_all_dns_entries, cmd_boolean },
   { spanhosts,opt.spanhost,  cmd_boolean },
   { spider,   opt.spider,cmd_boolean },
+  { startpos, opt.start_pos, cmd_bytes

[Bug-wget] [PATCH] Make wget capable of starting download from a specified position.

2013-12-20 Thread Yousong Zhou
This patch adds an option `--start-pos' for specifying starting position
of a download, both for HTTP and FTP.  When specified, the newly added
option would override `--continue'.  Apart from that, no existing code
should be affected.

Signed-off-by: Yousong Zhou yszhou4t...@gmail.com
---
Hi, 

I found myself needed this feature when I was trying to tunnel the download of
big file (several gigabytes) from a remote machine back to local through a
somewhat flaky connection.  It's a pain both for the server and local network
users if we have to repeat the previously already downloaded part in case that
the connection hangs or breaks.  Specifying 'Range: ' header is not an option
for wget (integrity check in the code would fail), and curl is not fast enough.
So I decided to make this patch in hope that this can also be useful to someone
else.

yousong

 doc/ChangeLog |4 
 doc/wget.texi |   14 ++
 src/ChangeLog |9 +
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|1 +
 src/main.c|1 +
 src/options.h |1 +
 8 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/doc/ChangeLog b/doc/ChangeLog
index 3b05756..df103c8 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,7 @@
+2013-12-21  Yousong Zhou  yszhou4t...@gmail.com
+
+   * wget.texi: Add documentation for --start-pos.
+
 2013-10-06  Tim Ruehsen  tim.rueh...@gmx.de
 
* wget.texi: add/explain quoting of wildcard patterns
diff --git a/doc/wget.texi b/doc/wget.texi
index 4a1f7f1..166ea08 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -701,6 +701,20 @@ Another instance where you'll get a garbled file if you 
try to use
 Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
 servers that support the @code{Range} header.
 
+@cindex offset
+@cindex continue retrieval
+@cindex incomplete downloads
+@cindex resume download
+@cindex start position
+@item --start-pos=@var{OFFSET}
+Start the download at position @var{OFFSET}.  Offset may be expressed in bytes,
+kilobytes with the `k' suffix, or megabytes with the `m' suffix.
+
+When specified, it would override the behavior of @samp{--continue}.  When
+using this option, you may also want to explicitly specify an output filename
+with @samp{-O FILE} in order to not overwrite an existing partially downloaded
+file.
+
 @cindex progress indicator
 @cindex dot style
 @item --progress=@var{type}
diff --git a/src/ChangeLog b/src/ChangeLog
index 42ce3e4..ab8a496 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,12 @@
+2013-12-21  Yousong Zhou  yszhou4t...@gmail.com
+
+   * options.h: Add option --start-pos to specify start position of
+ a download.
+   * main.c: Same purpose as above.
+   * init.c: Same purpose as above.
+   * http.c: Utilize opt.start_pos for HTTP download.
+   * ftp.c: Utilize opt.start_pos for FTP retrieval.
+
 2013-11-02  Giuseppe Scrivano  gscri...@redhat.com
 
* http.c (gethttp): Increase max header value length to 512.
diff --git a/src/ftp.c b/src/ftp.c
index c2522ca..c7ab6ef 100644
--- a/src/ftp.c
+++ b/src/ftp.c
@@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo *f, 
ccon *con, char **local_fi
   /* Decide whether or not to restart.  */
   if (con-cmd  DO_LIST)
 restval = 0;
+  else if (opt.start_pos)
+restval = opt.start_pos;
   else if (opt.always_rest
stat (locf, st) == 0
S_ISREG (st.st_mode))
diff --git a/src/http.c b/src/http.c
index 754b7ec..a354c6b 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3098,6 +3098,8 @@ Spider mode enabled. Check if remote file exists.\n));
   /* Decide whether or not to restart.  */
   if (force_full_retrieve)
 hstat.restval = hstat.len;
+  else if (opt.start_pos)
+hstat.restval = opt.start_pos;
   else if (opt.always_rest
got_name
stat (hstat.local_file, st) == 0
diff --git a/src/init.c b/src/init.c
index 84ae654..7f7a34e 100644
--- a/src/init.c
+++ b/src/init.c
@@ -271,6 +271,7 @@ static const struct {
   { showalldnsentries, opt.show_all_dns_entries, cmd_boolean },
   { spanhosts,opt.spanhost,  cmd_boolean },
   { spider,   opt.spider,cmd_boolean },
+  { startpos, opt.start_pos, cmd_bytes },
   { strictcomments,   opt.strict_comments,   cmd_boolean },
   { timeout,  NULL,   cmd_spec_timeout },
   { timestamping, opt.timestamping,  cmd_boolean },
diff --git a/src/main.c b/src/main.c
index 19d7253..4fbfaee 100644
--- a/src/main.c
+++ b/src/main.c
@@ -281,6 +281,7 @@ static struct cmdline_option option_data[] =
 { server-response, 'S', OPT_BOOLEAN, serverresponse, -1 },
 { span-hosts, 'H', OPT_BOOLEAN, spanhosts, -1 },
 { spider, 0, OPT_BOOLEAN, spider, -1 },
+{ start-pos, 0, OPT_VALUE, startpos, -1 },
 { strict-comments, 0, OPT_BOOLEAN, strictcomments, -1 },
 { timeout, 'T