WGET -O Help

2006-05-25 Thread David David
Hi,
   Don't know if this will be answered - but I had to
ask (since I DID read the man page! :)P )
   Symptom : automating my stock research I type a
command as "wget -p -H -k  -nd -nH -x -Ota.html
-Dichart.finance.yahoo.com -Pbtu 
"http://finance.yahoo.com/q/ta?s=btu&t=6m&l=on&z=l&q=b&p=b,p,s,v&a=m26-12-9,p12,vm&c="";
Which :
   1. Downloads the html page and -O -> outputs to
ta.html as requested!  GOOD
   2. Downloads the external link to the graph in the
page as requested! GOOD!
   3. Outputs the graph to ta.html (replacing original
ta.html)... BAD.

   Reason for using -O is because I want the filename
to beusefull instead of [EMAIL PROTECTED]&t=6... blah blah
blah

   4. If I remove -O it outputs the files into the url
directores of finance.yahoo.com and
ichart.finance.yahoo.com but the filename is the funky
url one I can live with that - but is there not a
way to get the page AND the external url pictures with
decent names using somthing like-O?

-Dave

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Re: Wishlist: support the file:/// protocol

2006-06-25 Thread David
  In replies to the post requesting support of the “file://” scheme, requests were made for someone to provide a compelling reason to want to do this. Perhaps the following is such a reason.I have a CD with HTML content (it is a CD of abstracts from a scientific conference), however for space reasons not all the content was included on the CD – there remain links to figures and diagrams on a remote web site. I’d like to create an archive of the complete content locally by having wget retrieve everything and convert the links to point to the retrieved material. Thus the wget functionality when retrieving the local files should work the same as if the files were retrieved from a web server (i.e. the input local file needs to be processed, both local and remote content retrieved, and the copies made of the local and remote files all need to be adjusted to now refer to the local copy rather than the remote content). A
 simple shell script that runs cp or rsync on local files without any further processing would not achieve this aim.Regarding to where the local files should be copied, I suggest a default scheme similar to current http functionality. For example, if the local source was /source/index.htm, and I ran something like:     wget.exe -m -np -k file:///source/index.htm  this could be retrieved to ./source/index.htm (assuming that I ran the command from anywhere other than the root directory). On Windows,  if the local source file is c:\test.htm,  then the destination could be .\c\test.htm. It would probably be fair enough for wget to throw up an error if the source and destination were the same file (and perhaps helpfully suggest that the user changes into a new subdirectory and retry the command).   
 One additional problem this scheme needs to deal with is when one or more /../ in the path specification results in the destination being above the current parent directory; then  the destination would have to be adjusted to ensure the file remained within the parent directory structure. For example, if I am in /dir/dest/ and ran     wget.exe -m -np -k file://../../source/index.htm  this could be saved to ./source/index.htm  (i.e. /dir/dest/source/index.htm)-David. 
		On Yahoo!7 
 
Socceroos Central: Latest news, schedule, blogs and videos.  

Re: Support for file://

2008-09-22 Thread David

Hi Micah,

Your're right - this was raised before and in fact it was a feature Mauro 
Tortonesi intended to be implemented for the 1.12 release, but it seems to have 
been forgotten somewhere along the line. I wrote to the list in 2006 describing 
what I consider a compelling reason to support file://. Here is what I wrote 
then:

At 03:45 PM 26/06/2006, David wrote:
In replies to the post requesting support of the "file://" scheme, requests 
were made for someone to provide a compelling reason to want to do this. 
Perhaps the following is such a reason.
I have a CD with HTML content (it is a CD of abstracts from a scientific 
conference), however for space reasons not all the content was included on the 
CD - there remain links to figures and diagrams on a remote web site. I'd like 
to create an archive of the complete content locally by having wget retrieve 
everything and convert the links to point to the retrieved material. Thus the 
wget functionality when retrieving the local files should work the same as if 
the files were retrieved from a web server (i.e. the input local file needs to 
be processed, both local and remote content retrieved, and the copies made of 
the local and remote files all need to be adjusted to now refer to the local 
copy rather than the remote content). A simple shell script that runs cp or 
rsync on local files without any further processing would not achieve this aim.
Regarding to where the local files should be copied, I suggest a default scheme 
similar to current http functionality. For example, if the local source was 
/source/index.htm, and I ran something like:
   wget.exe -m -np -k file:///source/index.htm
this could be retrieved to ./source/index.htm (assuming that I ran the command 
from anywhere other than the root directory). On Windows,  if the local source 
file is c:\test.htm,  then the destination could be .\c\test.htm. It would 
probably be fair enough for wget to throw up an error if the source and 
destination were the same file (and perhaps helpfully suggest that the user 
changes into a new subdirectory and retry the command).
One additional problem this scheme needs to deal with is when one or more /../ 
in the path specification results in the destination being above the current 
parent directory; then  the destination would have to be adjusted to ensure the 
file remained within the parent directory structure. For example, if I am in 
/dir/dest/ and ran
   wget.exe -m -np -k file://../../source/index.htm
this could be saved to ./source/index.htm  (i.e. /dir/dest/source/index.htm)
-David. 


At 08:49 AM 3/09/2008, you wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Petri Koistinen wrote:
> Hi,
> 
> I would be nice if wget would also support file://.

Feel free to file an issue for this (I'll mark it "Needs Discussion" and
set at low priority). I'd thought there was already an issue for this,
but can't find it (either open or closed). I know this has come up
before, at least.

I think I'd need some convincing on this, as well as a clear definition
of what the scope for such a feature ought to be. Unlike curl, which
"groks urls", Wget "W(eb)-gets", and file:// can't really be argued to
be part of the web.

That in and of itself isn't really a reason not to support it, but my
real misgivings have to do with the existence of various excellent tools
that already do local-file transfers, and likely do it _much_ better
than Wget could hope to. Rsync springs readily to mind.

Even the system "cp" command is likely to handle things much better than
Wget. In particular, special OS-specific, extended file attributes,
extended permissions and the like, are among the things that existing
system tools probably handle quite well, and that Wget is unlikely to. I
don't really want Wget to be in the business of duplicating the system
"cp" command, but I might conceivably not mind "file://" support if it
means simple _content_ transfer, and not actual file duplication.

Also in need of addressing is what "recursion" should mean for file://.
Between ftp:// and http://, "recursion" currently means different
things. In FTP, it means "traverse the file hierarchy recursively",
whereas in HTTP it means "traverse links recursively". I'm guessing
file:// should work like FTP (i.e., recurse when the path is a
directory, ignore HTML-ness), but anyway this is something that'd need
answering.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvcLq7M8hyUobTrERAl6YAJ9xeTINVkuvl8HkElYlQt7dAsUfHACfXRT3
lNR++Q0XMkcY4c6dZu0+gi4=
=mKqj
-END PGP SIGNATURE-


  Make the switch to the world's best email. Get Yahoo!7 Mail! 
http://au.yahoo.com/y7mail

Website Port problem...

2001-11-29 Thread David

Hi,

I have a problem on using wget, as follows:

I want to download a bunch of files in, say, www.server.com/dir/files, and I found out 
that wget is contacting www.server.com:80, and the files it get is not what I'm 
looking for.

I typed www.server.com:80/dir/files in netscape and found out that the result I get is 
different from www.server.com/dir/files!

So how can I get the files?

Thx!

DrDave




-
Sign up for ICQmail at http://www.icq.com/icqmail/signup.html



Re: Website Port problem...

2001-11-29 Thread David

The version I'm using is 1.7.1


On Thu, 29 November 2001, Hrvoje Niksic wrote:

> 
> David <[EMAIL PROTECTED]> writes:
> 
> > I have a problem on using wget, as follows:
> 
> What version of Wget are you using?
> 
> > I want to download a bunch of files in, say,
> > www.server.com/dir/files, and I found out that wget is contacting
> > www.server.com:80, and the files it get is not what I'm looking for.
> 
> I believe this has been fixed in later versions of Wget.




-
Sign up for ICQmail at http://www.icq.com/icqmail/signup.html



--continue still broken

2005-05-16 Thread David Fritz
This problem seems to have gone overlooked:
http://www.mail-archive.com/wget%40sunsite.dk/msg06527.html
http://www.mail-archive.com/wget%40sunsite.dk/msg06560.html
Sorry for not including a patch.


Re: ftp bug in 1.10

2005-06-25 Thread David Fritz
"I64" is a size prefix akin to "ll". One still needs to specify the argument 
type as in "%I64d" as with "%lld".




Checking for broken links

2005-10-16 Thread David Walker

Hi

I am trying to use wget to check for broken links on a web site as follows:

wget --spider --recursive -np -owesc.log http://www.wesc.ac.uk/

but I get back a message "www.wesc.ac.uk/index.html: No such file or 
directory". Can anyone tell me how to fix this, or else suggest another way 
of using wget to check for broken links.


Thanks
David



I want -p to download external links

2005-12-01 Thread David Srbecky
Hello,

When I download a complete website using "wget -rpk -l inf http://...";,
some webpages are incomplete because -p does not follow external links.
I do not want to download external webpages, I only want to download
external images/files referenced in the domain a.com.

How can I achieve this?


Thank you very much.

Regards,
David Srbecky


wget -N url -O file won't check timestamp

2006-05-25 Thread David Graham
relating to:
GNU Wget 1.10.2 on Debian testing/unstable using Linux kernel 2.6.4

wget -N http://domain.tld/downloadfile -O outputfile
downloads outputfile

Doing it again does it again regardless of timestamp. It does not check
outputfile's timestamp against downloadfile, as prescribed by -N.

wget -N without -O works as intended.

Thanks.

- -
David "cdlu" Graham  -  [EMAIL PROTECTED]
Guelph, Ontario - http://www.railfan.ca/



DNS through proxy with wget

2006-08-18 Thread Karr, David
Inside our firewall, we can't do simple DNS lookups for hostnames
outside of our firewall.  However, I can write a Java program that uses
commons-httpclient, specifying the proxy credentials, and my URL
referencing an external host name will connect to that host perfectly
fine, obviously resolving the DNS name under the covers.

If I then use wget to do a similar request, even if I specify the proxy
credentials, it fails to find the host.  If I instead plug in the IP
address instead of the hostname, it works fine.

I noticed that the command-line options for wget allow me to specify the
proxy user and password, but they don't have a way to specify the proxy
host and port.

Am I missing something, or is this a flaw (or missing feature) in wget?


RE: wget 1.11 beta 1 released

2006-08-22 Thread Karr, David
Does this happen to resolve the issue I asked about a few days ago (no
response yet) where DNS doesn't resolve in the presence of an
authenticated proxy?

> -Original Message-
> From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, August 22, 2006 8:01 AM
> To: wget@sunsite.dk
> Subject: wget 1.11 beta 1 released
> 
> 
> hi to everybody,
> 
> i've just released wget 1.11 beta 1:
> 
> ftp://alpha.gnu.org/pub/pub/gnu/wget/wget-1.11-beta-1.tar.gz
> 
> you're very welcome to try it and report every bug you might 
> encounter.


.listing files and ftp_proxy

2006-11-02 Thread David Creasy

Hi,
I've looked, but been unable to find the answer to this rather simple 
question. (It's been asked before, but I can't see an answer.)


wget --passive-ftp --dont-remove-listing -d "ftp://ftp.ebi.ac.uk/";

gives me a .listing file, but:

wget -e ftp_proxy=http://proxy:1234 --passive-ftp --dont-remove-listing 
-d "ftp://ftp.ebi.ac.uk/";


just gives me the index.html file and no .listing file.
Using alternate ways of specifying the proxy server doesn't make any 
difference.


Is there any easy fix for this, or is it the same as:
http://www.mail-archive.com/wget@sunsite.dk/msg08572.html

Thanks in advance for any advice,

David


Re: .listing files and ftp_proxy

2006-11-07 Thread David Creasy
I realise that I may not have provided enough information to get an 
answer to this...


I've tried this using the latest version (1.10.2) on Debian Linux 3.1
However, I've also tried with a variety of earlier versions on other 
platforms and it looks as though it has never worked on any platform.


If anybody knows if this is a bug or something that just can't/won't be 
fixed, I'd be very, very grateful for an answer. We use wget a lot, and 
it's just perfect for our needs. However, some of our customers are 
stuck behind a proxy and can't use the scripts we've developed that use 
wget because of this problem.


Thanks,
David

David Creasy wrote:

Hi,
I've looked, but been unable to find the answer to this rather simple 
question. (It's been asked before, but I can't see an answer.)


wget --passive-ftp --dont-remove-listing -d "ftp://ftp.ebi.ac.uk/";

gives me a .listing file, but:

wget -e ftp_proxy=http://proxy:1234 --passive-ftp --dont-remove-listing 
-d "ftp://ftp.ebi.ac.uk/";


just gives me the index.html file and no .listing file.
Using alternate ways of specifying the proxy server doesn't make any 
difference.


Is there any easy fix for this, or is it the same as:
http://www.mail-archive.com/wget@sunsite.dk/msg08572.html

Thanks in advance for any advice,

David


--
David Creasy



Windows WGET 1.10.2 - two bugs

2007-05-10 Thread David MacMillan

Re: Windows Wget 1.10.2 - two bugs

Bug 1) Wget's manual says as shown below, but Windows Wget does not 
generate the file.1 and file2. - it just overwrites.


To reproduce the problem:
WGet -S -N http://www.pjm.com/pub/account/lmpgen/lmppost.html

Wget will keep overwriting the local file each time the web page's 
timestamp updates, rather than creating numbered versions.


from manual:
-nc
--no-clobber
If a file is downloaded more than once in the same directory, 
Wget's behavior depends on a few options, including -nc. In certain 
cases, the local file will be clobbered, or overwritten, upon repeated 
download. In other cases it will be preserved.


When running Wget without -N, -nc, or -r, downloading the same file 
in the same directory will result in the original copy of file being 
preserved and the second copy being named file.1. If that file is 
downloaded yet again, the third copy will be named file.2, and so on. 
When -nc is specified, this behavior is suppressed, and Wget will refuse 
to download newer copies of file. Therefore, "no-clobber" is actually a 
misnomer in this mode--it's not clobbering that's prevented (as the 
numeric suffixes were already preventing clobbering), but rather the 
multiple version saving that's prevented.


When running Wget with -r, but without -N or -nc, re-downloading a 
file will result in the new copy simply overwriting the old. Adding -nc 
will prevent this behavior, instead causing the original version to be 
preserved and any newer copies on the server to be ignored.


When running Wget with -N, with or without -r, the decision as to 
whether or not to download a newer copy of a file depends on the local 
and remote timestamp and size of the file (see Time-Stamping.). -nc may 
not be specified at the same time as -N.


Note that when -nc is specified, files with the suffixes .html or 
.htm will be loaded from the local disk and parsed as if they had been 
retrieved from the Web.


---

Bug 2) (Sort of a bug - or a feature request) Normal Windows protocol is 
to have an ERRORLEVEL returned, whose value indicates program success or 
failure. WGET is not doing this - in the above example, the ERRORLEVEL 
is the same for the two cases of retrieving an updated page and not 
retrieving an updated page (i.e. page is unchanged). This makes it 
impossible to build a batch file whose behavior is conditional on a new 
file being downloaded.


---

Otherwise a great program. Very useful.

David MacMillan


Re: .1, .2 before suffix rather than after

2007-11-29 Thread David Ginger
> i totally agree with hrvoje here. also note that changing wget
> unique-name-finding algorithm can potentially break lots of wget-based
> scripts out there. i think we should leave these kind of changes for wget2
> - or wget-on-steroids or however you want to call it ;-)

So can I ask is a wget2 actualy being developed ?



Re: wget2

2007-11-29 Thread David Ginger
On Friday 30 November 2007 00:02:25 Micah Cowan wrote:
> Alan Thomas wrote:
> > What is wget2?   Any plans to move to Java?   (Of course, the latter
> > will not be controversial.  :)
>
> Java is not likely. The most likely language is probably still C,
> especially as that's where our scant human resource assets are
> specialized currently. I have toyed with thoughts of C++ or Python,
> however - especially as the use of higher-level languages could allow
> more rapid development, which is nice, given our (again) scant assets.

I'd vote for Python :-)

> :) The truth is, it's too early to say, given that work hasn't even
>
> begun to have... begun. :D
>
> C still remains by far the most portable language (though of course,
> writing it portably is tricky ;) ). But that's a bigger issue for the
> existing Wget's purposes probably, than "new-fangled Wget 2".
>
> For information on what is planned for "Wget 2", check out the "Next
> Generation" and "Unofficially Supported" sections of this page:
> http://wget.addictivecode.org/FeatureSpecifications, and particularly,
> this thread: http://www.mail-archive.com/wget%40sunsite.dk/index.html#10511

Thanks for the links:-)

I really liked this idea  -
  "An API for developers to write their own dynamically-loaded plugins"

What I'm looking at wget for is saving streamed mp3 from a radio station, 
crazy but true.. such is life.




Re: Wget for MP3 streams

2007-11-29 Thread David Ginger
On Friday 30 November 2007 01:03:06 Micah Cowan wrote:
> David Ginger wrote:
> > What I'm looking at wget for is saving streamed mp3 from a radio station,
> > crazy but true.. such is life.
>
> Isn't that already possible now? Provided that the transport is HTTP,
> that is?

Yes and No . . . 

Yes I can save a stream,

But, not everything works as expected, some of wget's features kick in.

Like, I cant get the quota to work no matter how much I fiddle and tinker.






Re: Wget for MP3 streams

2007-11-30 Thread David Ginger
On Friday 30 November 2007 03:38:54 Micah Cowan wrote:
> David Ginger wrote:
> > On Friday 30 November 2007 01:03:06 Micah Cowan wrote:
> >> David Ginger wrote:
> >>> What I'm looking at wget for is saving streamed mp3 from a radio
> >>> station, crazy but true.. such is life.
> >>
> >> Isn't that already possible now? Provided that the transport is HTTP,
> >> that is?
> >
> > Yes and No . . .
> >
> > Yes I can save a stream,
> >
> > But, not everything works as expected, some of wget's features kick in.
> >
> > Like, I cant get the quota to work no matter how much I fiddle and
> > tinker.

> Not too surprising, since the documentation points out that the quota
> never affects the downloading of a single file. :\

So I downloaded the source code . . . and subscribed to the mailing list to 
find out why :-)



Re: wget2

2007-11-30 Thread David Ginger
On Friday 30 November 2007 13:45:08 Mauro Tortonesi wrote:
> On Friday 30 November 2007 11:59:45 Hrvoje Niksic wrote:
> > Mauro Tortonesi <[EMAIL PROTECTED]> writes:
> > >> I vote we stick with C. Java is slower and more prone to environmental
> > >> problems.
> > >
> > > not really. because of its JIT compiler, Java is often as fast as
> > > C/C++, and sometimes even significantly faster.
> >
> > Not if you count startup time, which is crucial for a program like
> > Wget.  Memory use is also incomparable.
>
> right. i was not suggesting to implement wget2 in Java, anyway ;-)
>
> but we could definitely make good use of dynamic languages such as Ruby (my
> personal favorite) or Python, at least for rapid prototyping purposes. both
> Ruby and Python support event-driven I/O (http://rubyeventmachine.com for
> Ruby, and http://code.google.com/p/pyevent/ for Python) and asynch DNS
> (http://cares.rubyforge.org/ for Ruby and
> http://code.google.com/p/adns-python/ for Python) and both are relatively
> easy to interface with C code.

> writing a small prototype for wget2 in Ruby or Python at first, and then
> incrementally rewrite it in C would save us a lot of development time,
> IMVHO.

> what do you think?

Python.




Re: Work on your computer! Register Key: QD5V56G5

2007-12-07 Thread David Ginger
On Friday 07 December 2007 12:35:32 Jerrold Massey wrote:
> JOB IN OUR COMPANY Dating Team company:

So which switch option makes wget a hot date then ?

--babe ?



Hello, All and bug #21793

2008-09-08 Thread David Coon
Hello everyone,

I thought I'd introduce myself to you all, as I intend to start helping out
with wget.  This will be my first time contributing to any kind of free or
open source software, so I may have some basic questions down the line about
best practices and such, though I'll try to keep that to a minimum.

Anyway, I've been researching unicode and utf-8 recently, so I'm gonna try
to tackle bug #21793 <https://savannah.gnu.org/bugs/?21793>.

-David A Coon


Spam

2001-01-09 Thread David VanHorn


Is anyone else getting this junk, with the wget servers as the intermediary?


Return-Path: <[EMAIL PROTECTED]>
Received: from sunsite.auc.dk (sunsite.dk [130.225.51.30])
by www.cedar.net (8.9.3/SCO5.0.4) with SMTP id XAA21229
for <[EMAIL PROTECTED]>; Tue, 9 Jan 2001 23:19:41 GMT
Received: (qmail 29057 invoked by alias); 9 Jan 2001 23:19:06 -
Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm
Precedence: bulk
Delivered-To: mailing list [EMAIL PROTECTED]
Received: (qmail 29051 invoked from network); 9 Jan 2001 23:19:05 -
From: [EMAIL PROTECTED]
Subject: Incredible Home e-Business Opportunity!
Message-ID: <[EMAIL PROTECTED]>
Date: Sat, 06 Jan 2001 16:43:40 -0500
To: [EMAIL PROTECTED]
Content-Type: text/plain; charset="iso-8859-1"
Reply-To: [EMAIL PROTECTED]


Dear Friend,
Perhaps this might be of interest to you.
If not, please disregard.


--
Where's dave? http://www.findu.com/cgi-bin/find.cgi?kc6ete-9





images with absolute references

2001-05-22 Thread David Killick

In the page:
www.objectmentor.com/publications/articlesbysubject.html
there are images that have absolute URLs (ie. http://www.objectmentor.com...) 
that are not downloaded when the -p option is specified. I had understood that 
this is what the -p and -k options do.

If I have misunderstood the -p and -k options, or misconfigured something, 
please excuse me.

Thanks for your time..

PS. The wget command line I used was:
wget -H www.objectmentor.com/publications/articlesbysubject.html

and my .wgetrc is:

 WGet RC file to implement the command line parameters:
# wget -P webcache -p -nc -l1 -k -A gif,jpg,png

# Accepti the following file types
accept = gif,jpg,png

# Convert links locally
convert_links = on

# Use FTP
follow_ftp = off

# Preserve existing files
noclobber = on

# Ignore the sometimes incorrect content length header.
ignore_length = off

# Get the requisites for each page
page_requisites = on

# Use recursive retrieval
recursive = off

# Levels to recurse
reclevel = 1

# Timestamp files
timestamping = off

# Build the directory tree locally
dirstruct = on

# Top of the local directory tree
dir_prefix = webcache

# --EOF --
~|~\ /~\ | | |~~Dave Killick[EMAIL PROTECTED]
 | | |_| | | |--+44 (0)1225 475235
_|_/ | | \_/ |__IPL Information Processing Limited



images with absolute references (more info)

2001-05-22 Thread David Killick

Sorry, following on from my earlier message, I forgot to mention that my wget 
version us:
GNU Wget 1.6

and if it is of any consequence, the output from 'uname -srvmpi' is
SunOS 5.7 Generic_106542-05 i86pc i386 i86pc

Thanks.
--- Forwarded message follows ---
From:   David Killick <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject:images with absolute references
Date sent:  Tue, 22 May 2001 11:57:21 +0100

In the page:
www.objectmentor.com/publications/articlesbysubject.html
there are images that have absolute URLs (ie. http://www.objectmentor.com...) 
that are not downloaded when the -p option is specified. I had understood that 
this is what the -p and -k options do.

If I have misunderstood the -p and -k options, or misconfigured something, 
please excuse me.

Thanks for your time..

PS. The wget command line I used was:
wget -H www.objectmentor.com/publications/articlesbysubject.html

and my .wgetrc is:

 WGet RC file to implement the command line parameters:
# wget -P webcache -p -nc -l1 -k -A gif,jpg,png

# Accepti the following file types
accept = gif,jpg,png

# Convert links locally
convert_links = on

# Use FTP
follow_ftp = off

# Preserve existing files
noclobber = on

# Ignore the sometimes incorrect content length header.
ignore_length = off

# Get the requisites for each page
page_requisites = on

# Use recursive retrieval
recursive = off

# Levels to recurse
reclevel = 1

# Timestamp files
timestamping = off

# Build the directory tree locally
dirstruct = on

# Top of the local directory tree
dir_prefix = webcache

# --EOF --
--- End of forwarded message ---
~|~\ /~\ | | |~~Dave Killick[EMAIL PROTECTED]
 | | |_| | | |--+44 (0)1225 475235
_|_/ | | \_/ |__IPL Information Processing Limited



tags

2001-06-14 Thread David Killick

We have been using wget with the -p option to retrieve page requisites.
We have noticed that it does not appear to work when  tag is 
encountered in the requested page.
The tag and its href are copied verbatim, and required images etc. are 
not retrieved and mapped locally.
By way of example, one of the pages in question is:
http://www.howstuffworks.com/ethernet2.htm


~|~\ /~\ | | |~~Dave Killick[EMAIL PROTECTED]
 | | |_| | | |--+44 (0)1225 475235
_|_/ | | \_/ |__IPL Information Processing Limited



Re: wget timestamping (-N) bug/feature?

2001-08-04 Thread David VanHorn

At 07:11 PM 8/4/01 -0500, Mengmeng Zhang wrote:
> > Say, I have a index.html which is not changed, but some of the pages
> > linked from this page might be changed. When I use -N option to retrieve
> > index.html recursively, wget will quit after find out that index.html is
> > not changed, without following the url in index.html, and thus missed the
> > fact that some other pages being linked by index.html might have been
> > changed.
>
>Every version of wget I've used will indeed process index.html correctly
>and follow all the links. Can you give an example of the command line
>you're using where the links are not followed?
>
>MZ

Any chance you guys could either do this ON the wget group, or OFF it?
CCing the group creates messages to everyone on the group, that aren't 
filterable.
At least, eudora can't filter on CC fields.


--
Dave's Engineering Page: http://www.dvanhorn.org

I would have a link to http://www.findu.com/cgi-bin/find.cgi?KC6ETE-9 here 
in my signature line, but due to the inability of sysadmins at TELOCITY to 
differentiate a signature line from the text of an email, I am forbidden to 
have it.





Redirection spans hosts unconditionally

2001-10-28 Thread David Nesting

I am seeing some anomalous behavior with wget with respects to mirroring
(-m) a site and trying to keep that mirror local to the source domain.
There are a couple of CGI scripts that inevitably get called that end up
issuing redirects off-site.  These redirects are followed even though
--span-hosts is not supplied, and even if the destination domains are
added via the --exclude-domains option.

A test case is up at http://fastolfe.net/misc/wget-bug/.
Spidering http://fastolfe.net/misc/wget-bug/normal will
correctly ignore the *link* to www.example.com, but spidering
http://fastolfe.net/misc/wget-bug/redirected ends up following a local
link that results in a redirection.  This redirection is followed
unconditionally.

In this case, www.example.com doesn't exist, but if this were a normal
domain, wget would still fetch the page and store it locally (creating
a www.example.com directory, etc.).

I am using GNU Wget 1.7 installed via RPM as wget-1.7-3mdk on Linux
2.4.12 i686.

Thanks!

-- 
 == David Nesting WL7RO Fastolfe [EMAIL PROTECTED] http://fastolfe.net/ ==
 fastolfe.net/me/pgp-key A054 47B1 6D4C E97A D882  C41F 3065 57D9 832F AB01



Differences between "wget" and "cURL"?

2001-11-19 Thread Karr, David

I've noticed a tool recently called "cURL" that seems to be in the same
"space" as "wget".  Could someone give me a basic overview of how these two
things are different?



Re: Unsubscribing

2001-11-24 Thread David VanHorn


>
>Hi David,
>
>please present us the following fact:
>
>Where did you send your request to unsubscribe (exact E-mail address)?

[EMAIL PROTECTED]
--
Dave's Engineering Page: http://www.dvanhorn.org

Got a need to read Bar codes?  http://www.barcodechip.com
Bi-directional read of UPC-A, UPC-E, EAN-8, EAN-13, JAN, and Bookland, with 
two or five digit supplemental codes, in an 8 pin chip, with NO external parts.





Re: Unsubscribing

2001-11-24 Thread David VanHorn

At 05:11 AM 11/24/01 +, Byran wrote:
>THIS list clogging up your email account?

Not exactly, but I tried several times to unsubscribe recently, to no avail.

--
Dave's Engineering Page: http://www.dvanhorn.org

Got a need to read Bar codes?  http://www.barcodechip.com
Bi-directional read of UPC-A, UPC-E, EAN-8, EAN-13, JAN, and Bookland, with 
two or five digit supplemental codes, in an 8 pin chip, with NO external parts.





Re: Unsubscribing

2001-11-24 Thread David VanHorn

At 10:47 PM 11/23/01 +, Neil Osborne wrote:
>Hello All,
>
>I want to unsubscribe from this mail list - however despite several mails
>with unsubscribe in both subject and body, I still keep receiving mail, and
>it's clogging up my mail account. Can anyone help please ?
>
>Thanks

I'm in the same condition, I've unsubscribed.

Plase release me, let me gooo.. :)

--
Dave's Engineering Page: http://www.dvanhorn.org

Got a need to read Bar codes?  http://www.barcodechip.com
Bi-directional read of UPC-A, UPC-E, EAN-8, EAN-13, JAN, and Bookland, with 
two or five digit supplemental codes, in an 8 pin chip, with NO external parts.





wget segfault on ppc

2001-10-08 Thread David Roundy

Hello.  I have a patch to fix a problem with wget segfaulting on the
powerpc platform.  It happens in the logvprintf routine, due to differences
in the handling of va_lists on ppc vs. x86.  The problem was that it was
reusing a va_list after it had already been exhausted, and the following
fix should be portable on at least any platform using gcc.

I'm afraid the the patch may be malformed wrt whitespace, but it is small
enough that you shouldn't have a problem applying it by hand.

--- wget-1.7.old/src/log.c  Sun May 27 12:35:05 2001
+++ wget-1.7/src/log.c  Fri Sep 28 09:29:48 2001
@@ -280,9 +280,12 @@
 static void
 logvprintf (enum log_options o, const char *fmt, va_list args)
 {
+  va_list all_the_args;
+
   CHECK_VERBOSE (o);
   CANONICALIZE_LOGFP_OR_RETURN;

+  __va_copy(all_the_args,args);
   /* Originally, we first used vfprintf(), and then checked whether
  the message needs to be stored with vsprintf().  However, Watcom
  C didn't like ARGS being used twice, so now we first vsprintf()
@@ -310,7 +313,9 @@
  the systems where vsnprintf() is not available, we use
  the implementation from snprintf.c which does return the
  correct value.  */
- int numwritten =3D vsnprintf (write_ptr, available_size, fmt, args);
+  int numwritten;
+  __va_copy(args,all_the_args);
+ numwritten =3D vsnprintf (write_ptr, available_size, fmt, args);

  /* vsnprintf() will not step over the limit given by
  available_size.  If it fails, it will return either -1



-- 
David Roundy
http://civet.berkeley.edu/droundy/



msg01993/pgp0.pgp
Description: PGP signature


html-parse.c

2001-10-10 Thread David Edmondson


Hello, I had to do the following to get wget to compile on 
ppc-apple-darwin

diff src/html-parse.c ../wget-1.7.fixed/src/html-parse.c
435c435
< assert (ch == '\'' || ch == '"');
---
 > assert (ch == '\'' || ch == '\"');

Regards, Dave




wget reject lists

2002-01-29 Thread David McCabe

Hello,

I am not subscribed to this list, so please CC me on your answers.

I am having a hell of a time to get the reg-ex stuff to work with the -A or -R
options. If I supply this option to my wget command:

-R 1*

Everything works as expected. Same with this:

-R 2*

Now, if I do this:

-R 1*,2*

I get all the files beginning with 1. if I do this:

-R 2*,1* 

I get all the files beginning with 2. No combination of quoting makes any
difference whatsoever. Anybody have any clues, before I give up and look for
another tool??

BTW, wget is 1.8.1 (same thing with 1.8) compiled from source on Solaris 8,
using gcc 2.95.3 package from sunfreeware.com

--
David McCabeSenior Systems Analyst
Network and Communications Services, McGill University
Montreal, Quebec, Canada[EMAIL PROTECTED]

If you stop having sex, drinking and smoking, You don't live longer... 
It just seems like it



inconsistency between man page and --help

2002-05-15 Thread David Rostenne

Hello!

In version 1.8.1 of GNU Wget...

I found that in the --help there is;

   --limit-rate=RATElimit download rate to RATE.

But no reference is made to it in the man page. I checked and made 
sure the man page was for the same version ;-)

So, please fix! And, uh is there anything I need to know about 
--limit-rate, or should I assume that If i use 50k it'll work?

cheers,

dAVE



Redirection cycle detected using wget 1.8.2

2002-06-24 Thread David Woodyard

I got the message 'Redirection cycle detected' when I tried to download a 
file. The download aborted. I have looked for a solution and have not found 
one. Any help will be greatly appreciated.

Please 'CC' me on reply as I am not currently subscribed.

Thanks again,

David



Trouble with Yahoo

2002-10-16 Thread David McNab

Hi,

I'm trying to build a locally browsable mirror of Yahoo's PDA-friendly
portal - http://wap.oa.yahoo.com

Sadly, the result is a set of local pages with non-working inter-page
links.

I've tried combinations of -k, -F, -E.
But what happens is that the links don't match up with the actual stored
documents.

Running wget with '-l 1 -r' will take 20 seconds, and provide a good
example of the problem. When you load the main page, your links will
404.

Can someone please advise if there are any wget options which can remedy
the problem - or is this a bug, or is Yahoo's site structure beyond the
scope of wget?

Cheers
David






error fetching some files

2002-12-22 Thread David Magda

Hello,

This isn't a bug in wget per se, but wget's current behaviour may result
in not being able to download some files. I am using wget 1.8.2.

Some FTP servers have setup permissions so that you cannot do an 'ls',
or a 'cd' into a directory. You can fetch the file directly from the
root directory ('/'), but cannot view the directory contents. You cannot
even go into the directories even if you know their names before hand.

An example of this is an FTP server which holds the GNU/Linux version of
the "Return to Castle Wolfenstein" server. The original link is [1], but
it is redirected to [2].

Would it be possible to add a command-line option to tell wget to, after
logging in, to issue the RETR command using the full pathname instead of
issuing the CWD command?

Thank you for your time.

[1] http://www.wolfensteinx.com/dl.php?file=wolflinux&download=true
[2] ftp://dl:[EMAIL PROTECTED]/wolfx/demos/linux/wolfmptest-dedicated.x86.run

-- 
David Magda 
Because the innovator has for enemies all those who have done well under
the old conditions, and lukewarm defenders in those who may do well 
under the new. -- Niccolo Machiavelli, _The Prince_, Chapter VI



large file

2002-12-26 Thread Allouche David

wwhen large file (size > 2go )are downloaded 
 wget 1.8.2  realese  crash down 

is it possible to complie the lastest realease with a large file
support option ?






@@@
 Allouche David   Tel:+33 (0)5 61 28 52 77
  Fax:+33 (0)5 61 28 53 35
--
  GENOPOLE TOULOUSE 
--
BIA chem. de Borde Rouge BP 27 - 31326 Castanet Tolosan
Cedex , france :   e-mail : [EMAIL PROTECTED]
@@




Not 100% rfc 1738 complience for FTP URLs => bug

2003-03-07 Thread David Balazic
Hi!

I noticed that wget ( 1.8.2 ) does not conform 100% to RFC 1738 when
handling FTP URLs :

wget ftp://user1:[EMAIL PROTECTED]/x/y/foo

does this :

USER user1
PASS secret1
SYST
PWD ( let's say this returns "/home/user1" )
TYPE I
CWD /home/user1/x/y
PORT 11,22,33,44,3,239
RETR foo

Why does it prepend the current working directory to the path ?

wget does "CWD /home/user1/x/y" , while RFC 1738 suggests :
CWD x
CWD y

?

This _usually_ results in the same results, except :

 - ftp://user1:[EMAIL PROTECTED]//x/y/foo
 wget : CWD /x/y
 rfc :
   CWD   # empty parameter ! this usually puts one in the $HOME
directory
   CWD x
   CWD y

  So wget will try to fetch the file /x/y/foo , while an RFC
compliant
  program would fetch $HOME/x/y/foo

 - non unix and other "weird" systems. Example :

wget
"ftp://user1:[EMAIL PROTECTED]/DAD4%3A%5Bperl5%5D/FREEWARE_README.TXT"

does not work. Also the following variations don't work either :

wget
"ftp://user1:[EMAIL PROTECTED]/DAD4:[perl5]FREEWARE_README.TXT"
wget
"ftp://user1:[EMAIL PROTECTED]/DAD4%3A%5Bperl5%5DFREEWARE_README.TXT"
wget
"ftp://user1:[EMAIL PROTECTED]/DAD4:/perl5/FREEWARE_README.TXT"

Using a regular ftp client , the follwoing works :

open connection & log in :

 - first possibility :

get DAD4:[perl5]FREEWARE_README.TXT

 - second :

cd DAD4:[perl5]
get FREEWARE_README.TXT

Another example with more directory levels :

get DAD4:[MTOOLS.AXP_EXE]MTOOLS.EXE
or
cd DAD4:[MTOOLS.AXP_EXE]
get MTOOLS.EXE
or
cd DAD4:[MTOOLS]
cd AXP_EXE
get MTOOLS.EXE


I recommend removing the "cool&smart" code and stick to RFCs :-)

-- 
David Balazic
--
"Be excellent to each other." - Bill S. Preston, Esq., & "Ted" Theodore
Logan
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -


Not 100% rfc 1738 complience for FTP URLs => bug

2003-03-13 Thread David Balazic
As I got no response on [EMAIL PROTECTED], I am resending my report here.

--

Hi!

I noticed that wget ( 1.8.2 ) does not conform 100% to RFC 1738 when
handling FTP URLs :

wget ftp://user1:[EMAIL PROTECTED]/x/y/foo

does this :

USER user1
PASS secret1
SYST
PWD ( let's say this returns "/home/user1" )
TYPE I
CWD /home/user1/x/y
PORT 11,22,33,44,3,239
RETR foo

Why does it prepend the current working directory to the path ?

wget does "CWD /home/user1/x/y" , while RFC 1738 suggests :
CWD x
CWD y

?

This _usually_ results in the same results, except :

 - ftp://user1:[EMAIL PROTECTED]//x/y/foo
 wget : CWD /x/y
 rfc :
   CWD   # empty parameter ! this usually puts one in the $HOME
directory
   CWD x
   CWD y

  So wget will try to fetch the file /x/y/foo , while an RFC
compliant
  program would fetch $HOME/x/y/foo

 - non unix and other "weird" systems. Example :

wget
"ftp://user1:[EMAIL PROTECTED]/DAD4%3A%5Bperl5%5D/FREEWARE_README.TXT"

does not work. Also the following variations don't work either :

wget
"ftp://user1:[EMAIL PROTECTED]/DAD4:[perl5]FREEWARE_README.TXT"
wget
"ftp://user1:[EMAIL PROTECTED]/DAD4%3A%5Bperl5%5DFREEWARE_README.TXT"
wget
"ftp://user1:[EMAIL PROTECTED]/DAD4:/perl5/FREEWARE_README.TXT"

Using a regular ftp client , the follwoing works :

open connection & log in :

 - first possibility :

get DAD4:[perl5]FREEWARE_README.TXT

 - second :

cd DAD4:[perl5]
get FREEWARE_README.TXT

Another example with more directory levels :

get DAD4:[MTOOLS.AXP_EXE]MTOOLS.EXE
or
cd DAD4:[MTOOLS.AXP_EXE]
get MTOOLS.EXE
or
cd DAD4:[MTOOLS]
cd AXP_EXE
get MTOOLS.EXE


I recommend removing the "cool&smart" code and stick to RFCs :-)

-- 
David Balazic
--
"Be excellent to each other." - Bill S. Preston, Esq., & "Ted" Theodore
Logan
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -


Re: Not 100% rfc 1738 complience for FTP URLs => bug

2003-03-13 Thread David Balazic
Max Bowsher wrote:
> 
> David Balazic wrote:
> > As I got no response on [EMAIL PROTECTED], I am resending my report
> > here.
> 
> One forwards to the other. The problem is that the wget maintainer is
> absent, and likely to continue to be so for several more months.
> 
> As a result, wget development is effectively stalled.

So it is "do it yourself" , huh ? :-)

> Max.
> 
> >
> > --
> >
> > Hi!
> >
> > I noticed that wget ( 1.8.2 ) does not conform 100% to RFC 1738 when
> > handling FTP URLs :
> >
> > wget ftp://user1:[EMAIL PROTECTED]/x/y/foo
> >
> > does this :
> >
> > USER user1
> > PASS secret1
> > SYST
> > PWD ( let's say this returns "/home/user1" )
> > TYPE I
> > CWD /home/user1/x/y
> > PORT 11,22,33,44,3,239
> > RETR foo
> >
> > Why does it prepend the current working directory to the path ?
> >
> > wget does "CWD /home/user1/x/y" , while RFC 1738 suggests :
> > CWD x
> > CWD y
> >
> > ?
> >
> > This _usually_ results in the same results, except :
> >
> >  - ftp://user1:[EMAIL PROTECTED]//x/y/foo
> >  wget : CWD /x/y
> >  rfc :
> >CWD   # empty parameter ! this usually puts one in the $HOME
> > directory
> >CWD x
> >CWD y
> >
> >   So wget will try to fetch the file /x/y/foo , while an RFC
> > compliant
> >   program would fetch $HOME/x/y/foo
> >
> >  - non unix and other "weird" systems. Example :
> >
> > wget
> >
> "ftp://user1:[EMAIL PROTECTED]/DAD4%3A%5Bperl5%5D/FREEWARE_README.TXT"
> >
> > does not work. Also the following variations don't work either :
> >
> > wget
> > "ftp://user1:[EMAIL PROTECTED]/DAD4:[perl5]FREEWARE_README.TXT"
> > wget
> >
> "ftp://user1:[EMAIL PROTECTED]/DAD4%3A%5Bperl5%5DFREEWARE_README.TXT"
> > wget
> > "ftp://user1:[EMAIL PROTECTED]/DAD4:/perl5/FREEWARE_README.TXT"
> >
> > Using a regular ftp client , the follwoing works :
> >
> > open connection & log in :
> >
> >  - first possibility :
> >
> > get DAD4:[perl5]FREEWARE_README.TXT
> >
> >  - second :
> >
> > cd DAD4:[perl5]
> > get FREEWARE_README.TXT
> >
> > Another example with more directory levels :
> >
> > get DAD4:[MTOOLS.AXP_EXE]MTOOLS.EXE
> > or
> > cd DAD4:[MTOOLS.AXP_EXE]
> > get MTOOLS.EXE
> > or
> > cd DAD4:[MTOOLS]
> > cd AXP_EXE
> > get MTOOLS.EXE
> >
> >
> > I recommend removing the "cool&smart" code and stick to RFCs :-)


-- 
David Balazic
--
"Be excellent to each other." - Bill S. Preston, Esq., & "Ted" Theodore
Logan
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -


unreasonable not to doc ascii vs. binary in the --help text

2003-08-18 Thread Mark David
When I look at the long help for wget, there's no mention of how to arrange
for ascii vs. binary download. It should be under FTP options.  I've used
FTP for almost 20 years, and the ASCII or BINARY commands are the two most
common commands outside of get and put.  I think it's pretty unreasonable
not to mention "binary" or "ascii" it in the --help output at all.  For
simple file downloads on the command line, which is when you need this help
desk to help you, the way to specify binary is crucial.

For your reference, the full help text (& version info) follows the end of
this message.

Thanks for your consideration,

Mark David

wget --help
GNU Wget 1.8.2, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...

Mandatory arguments to long options are mandatory for short options too.

Startup:
  -V,  --version   display the version of Wget and exit.
  -h,  --help  print this help.
  -b,  --backgroundgo to background after startup.
  -e,  --execute=COMMAND   execute a `.wgetrc'-style command.

Logging and input file:
  -o,  --output-file=FILE log messages to FILE.
  -a,  --append-output=FILE   append messages to FILE.
  -d,  --debugprint debug output.
  -q,  --quietquiet (no output).
  -v,  --verbose  be verbose (this is the default).
  -nv, --non-verbose  turn off verboseness, without being quiet.
  -i,  --input-file=FILE  download URLs found in FILE.
  -F,  --force-html   treat input file as HTML.
  -B,  --base=URL prepends URL to relative links in -F -i file.
   --sslcertfile=FILE optional client certificate.
   --sslcertkey=KEYFILE   optional keyfile for this certificate.
   --egd-file=FILEfile name of the EGD socket.

Download:
   --bind-address=ADDRESS   bind to ADDRESS (hostname or IP) on local
host.
  -t,  --tries=NUMBER   set number of retries to NUMBER (0
unlimits).
  -O   --output-document=FILE   write documents to FILE.
  -nc, --no-clobber don't clobber existing files or use .#
suffixes.
  -c,  --continue   resume getting a partially-downloaded file.
   --progress=TYPE  select progress gauge type.
  -N,  --timestamping   don't re-retrieve files unless newer than
local.
  -S,  --server-responseprint server response.
   --spider don't download anything.
  -T,  --timeout=SECONDSset the read timeout to SECONDS.
  -w,  --wait=SECONDS   wait SECONDS between retrievals.
   --waitretry=SECONDS  wait 1...SECONDS between retries of a
retrieval.
   --random-waitwait from 0...2*WAIT secs between
retrievals.
  -Y,  --proxy=on/off   turn proxy on or off.
  -Q,  --quota=NUMBER   set retrieval quota to NUMBER.
   --limit-rate=RATElimit download rate to RATE.

Directories:
  -nd  --no-directoriesdon't create directories.
  -x,  --force-directories force creation of directories.
  -nH, --no-host-directories   don't create host directories.
  -P,  --directory-prefix=PREFIX   save files to PREFIX/...
   --cut-dirs=NUMBER   ignore NUMBER remote directory
components.

HTTP options:
   --http-user=USER  set http user to USER.
   --http-passwd=PASSset http password to PASS.
  -C,  --cache=on/off(dis)allow server-cached data (normally
allowed).
  -E,  --html-extension  save all text/html documents with .html
extension.
   --ignore-length   ignore `Content-Length' header field.
   --header=STRING   insert STRING among the headers.
   --proxy-user=USER set USER as proxy username.
   --proxy-passwd=PASS   set PASS as proxy password.
   --referer=URL include `Referer: URL' header in HTTP request.
  -s,  --save-headerssave the HTTP headers to file.
  -U,  --user-agent=AGENTidentify as AGENT instead of Wget/VERSION.
   --no-http-keep-alive  disable HTTP keep-alive (persistent
connections).
   --cookies=off don't use cookies.
   --load-cookies=FILE   load cookies from FILE before session.
   --save-cookies=FILE   save cookies to FILE after session.

FTP options:
  -nr, --dont-remove-listing   don't remove `.listing' files.
  -g,  --glob=on/off   turn file name globbing on or off.
   --passive-ftp   use the "passive" transfer mode.
   --retr-symlinks when recursing, get linked-to files (not
dirs).

Recursive retrieval:
  -r,  --recursive  recursive web-suck -- use with care!
  -l,  --level=NUMBER   maximum recursion depth (inf or 0 for infinite).
   --delete-after   delete files locally after downloading them.
  -k,  --convert-links  convert non-relative links to relative.
  -K,  --backup-converted   before converting file X, back

RE: unreasonable not to doc ascii vs. binary in the --help text

2003-08-18 Thread Mark David
You said: The type selection is rarely needed ...

This is untrue. I just tried this out using wget on Windows.

If you don't tack on ;type=a onto the end when transfering a text
file from unix to Windows, the file's line endings will not be
converted from unix (LF) to Windows (CRLF) conventions.

If you look at the file in applications that just follow windows
conventions, e.g., Notepad, the lines will not be broken in the
display.  Some applications (e.g., web browsers) follow a 
liberal interpretation of line endings, which helps overcome 
this problem, but many do not, including programs that read
ascii (text) files as data, and will silently but fatally malfunction
if the CR is not there in front of the LF.

So, with unix to Windows transfer of text files being obviously
an extremely common case, this clearly deserves a few lines in
your --help documentation.  It can hardly violate any length
limit for that text -- there seems to be none, since the
text goes on and on and documents such seldom needed options 
as passive mode:

  --passive-ftp   use the "passive" transfer mode.

And many others that don't deserve as much attention as ascii
vs. binary transfer.

Thanks,

Mark


-Original Message-
From: Maciej W. Rozycki [mailto:[EMAIL PROTECTED]
Sent: Mon, August 18, 2003 11:22 AM
To: Mark David
Cc: '[EMAIL PROTECTED]'
Subject: Re: unreasonable not to doc ascii vs. binary in the --help text


On Mon, 18 Aug 2003, Mark David wrote:

> When I look at the long help for wget, there's no mention of how to
arrange
> for ascii vs. binary download. It should be under FTP options.  I've used
> FTP for almost 20 years, and the ASCII or BINARY commands are the two most
> common commands outside of get and put.  I think it's pretty unreasonable
> not to mention "binary" or "ascii" it in the --help output at all.  For
> simple file downloads on the command line, which is when you need this
help
> desk to help you, the way to specify binary is crucial.

 The default download type wget uses is binary.  If you want another type,
then ";type=X" ("X" denotes the desired type; e.g. "i" is binary and "a"
is ASCII) can be appended to a URL.  It's all documented within the wget's
info pages.  The type selection is rarely needed -- typically for
downloading a text file from an EBCDIC host -- so including it with the
short help reference would seem to be an overkill. 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--+
+e-mail: [EMAIL PROTECTED], PGP key available+


RE: Content-Disposition Take 3

2003-09-08 Thread Newman, David

"Hrvoje Niksic" <mailto:[EMAIL PROTECTED]> writes:

> "Newman, David" <[EMAIL PROTECTED]> writes:
> 
> > This is my third attempt at a Content-Disposition patch and if it
> > isn't acceptable yet, I'm sure it is pretty close.
> 
> Thanks.  Note that I and other (co-)maintainers have been away for
> some time, so if your previous attempts have been ignore, it might not
> have been for lack of quality in your contribution.

Actually my last attempt was early last year and it was a total hack.  :-)

> > However, with the --content-disposition option wget will
> > instead process the header
> >
> > Content-Disposition: attachment; filename="joemama.txt"
> >
> > and change the local filename to "joemama.txt"
> 
> The thing that worries me about this patch is that in some places
> Wget's actions depend on transforming the URL to the output file
> name.  I'm having in mind options like `-c' and `-nc'.  Won't your
> patch break those?

I did have in the back of my mind the consequences of other options
that may have been given.  I ended up conceding that if the user
specified --content-disposition that they really wanted the filename
specified within the header.  Of course, I now concede that I
had not considered the effects of -nc or the lack of -nc.

Hmmm.  I can easily change the patch such that if the file specified
in Content-Disposition: already exists that a numerical extension
is added to the name in the absence of -nc.  However, if -nc is
present, that would imply that if the file exists wget should not
download the content.  But the filename isn't known until the
web server has already been contacted.  So I would have to ask
how would I abort the current transfer?

As far as --continue is concerned I don't know if that option is
valid in this context.  Meaning, Content-Disposition is usually
used with generated content (I think).  Like in my example, test.php
generates all the content.  And the only other place I've seen it
is when downloading Solaris patches the URL is something like
patchDownload.pl?target=\&method=h and it sets the
name of the zip file it gives you in the header, i.e. patchid.zip.
I just tried it and wget fails to continue the download of a
patch but refuses to truncate the existing file.  Should I just
track down the code that handles this case and duplicate it?

-Dave



Error in wget-1.9-b5.zip

2003-10-15 Thread David Drobny
Error in wget-1.9-b5.zip

Na co dávat důraz při zkušební jízdě? 
http://ad2.seznam.cz/redir.cgi?instance=62696%26url=http://www.auto-plus.cz/faq.php<>
--17:46:21--  http://www.digitalplayground.com/freepage.php?tgpid=008d&refid=393627
   => `/tmp2/www.digitalplayground.com/[EMAIL PROTECTED]&refid=393627'
Resolving www.digitalplayground.com... 64.38.205.100
Connecting to www.digitalplayground.com[64.38.205.100]:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://www.yourworstenemy.com?tgpid=008d&refid=393627 [following]
--17:46:23--  http://www.yourworstenemy.com/?tgpid=008d&refid=393627
   => `/tmp2/www.yourworstenemy.com/[EMAIL PROTECTED]&refid=393627'
Resolving www.yourworstenemy.com... failed: Unknown error.

FINISHED --17:46:36--
Downloaded: 0 bytes in 0 files
Converted 0 files in 0.00 seconds.


wget can't get the following site

2004-01-09 Thread David C.
Hi, all
 
Please CC me when you reply.  I'm not subscribed to this list.
 
I'm new to wget.  When I tried getting the following using wget, 
 
wget 
http://quicktake.morningstar.com/Stock/Income10.asp?Country=USA&Symbol=JNJ&stocktab=finance
 
I got the errors below:
 
--22:58:29--  http://quicktake.morningstar.com:80/Stock/Income10.asp?Country=USA
   => [EMAIL PROTECTED]'
Connecting to quicktake.morningstar.com:80... connected!
HTTP request sent, awaiting response... 302 Object moved
Location: http://quote.morningstar.com/switch.html?ticker= [following]
--22:58:30--  http://quote.morningstar.com:80/switch.html?ticker=
   => [EMAIL PROTECTED]'
Connecting to quote.morningstar.com:80... connected!
HTTP request sent, awaiting response... 302 Object moved
Location: TickerNotFound.html [following]
TickerNotFound.html: Unknown/unsupported protocol.
'Symbol' is not recognized as an internal or external command,
operable program or batch file.
'stocktab' is not recognized as an internal or external command,
operable program or batch file.
 
Is this a bug in wget?  Or is there something I can do so that wget can get the site?
 
Please help!  Thanks in advance.
 


-
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes

Calling wget in C++

2004-01-28 Thread David C.
Hi, all
 
Please CC me when you reply.  I'm not subscribed to this list.
 
I have two questions:

1) I am writing a C++ program that calls wget using execv.  After wget gets the 
requested page, it does not return to my program to excute the rest of my program 
after the call.  Here's what my code looks like:
 
char* arg_list[] = {"wget", args, NULL};
int result = execv("wget.exe", arg_list); 
// rest of my code
.
 
As noted above, after execv runs wget, it does not return to execute the rest of my 
program.  Does anyone know how to fix this?
 
2) When I use wget to get the following url, it sometimes (not all the time!) gives me 
the error below:
 
C:\>wget -O test.html "http://screen.yahoo.com/b?dvy=2/100&pe=0/200&b=1&z=d
vy&db=stocks&vw=1"
--23:24:57--  http://screen.yahoo.com:80/b?dvy=2/100&pe=0/200&b=1&z=dvy&db=stock
s&vw=1
   => `test.html'
Connecting to screen.yahoo.com:80... connected!
HTTP request sent, awaiting response...
End of file while parsing headers.
Giving up.
 
 
The page I requested is not downloaded.  But sometimes it works.  Any ideas how to fix 
this?
 
Thanks in advance!
 
David

 



-
Do you Yahoo!?
Yahoo! SiteBuilder - Free web site building tool. Try it!

Re: [PATCH] implementation of determine_screen_width() for Windows

2004-01-28 Thread David Fritz
Herold Heiko wrote:

From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
..

Yes.  Specifically, Unix's SIGWINCH simply sets a flag that means
"window size might have changed, please check it out".  That is
because checking window size on each refresh would perform an
unnecessary ioctl.
One thing we could do for Windows is check for window size every
second or so.


I agree, but I have no idea how taxing those GetStdHandle() and
GetConsoleScreenBufferInfo() are.
Maybe David can shed more light on this, or even profile a bit.
Possibly the handle could be cached, saving at least the GetStdHandle() bit.
Heiko

Yes, GetStdHandle() would only need to be called once unless the handle 
were to change during execution (fork_to_background()?).

I haven't done any exhaustive profiling but the attached patch doesn't 
seem to affect performance. It calls determine_screen_width() every time 
the progress bar is updated (~5 times per second?).

Note: I'm not suggesting we use the patch as-is, it's just a test.

It might be possible to implement something similar to SIGWINCH using 
WinEvents, but that's not really what they were designed for. They were 
designed to be used by "accessibility" software (screen readers, etc.), 
and it may not be available on older versions of Windows.

How often do people change the size of the screen buffer while a command 
is running?

Index: progress.c
===
RCS file: /pack/anoncvs/wget/src/progress.c,v
retrieving revision 1.43
diff -u -r1.43 progress.c
--- progress.c  2004/01/28 01:02:26 1.43
+++ progress.c  2004/01/28 19:37:50
@@ -579,6 +579,22 @@
 /* Don't update more often than five times per second. */
 return;
 
+#ifdef WINDOWS
+{
+  int old_width = screen_width;
+  screen_width = determine_screen_width ();
+  if (!screen_width)
+   screen_width = DEFAULT_SCREEN_WIDTH;
+  else if (screen_width < MINIMUM_SCREEN_WIDTH)
+   screen_width = MINIMUM_SCREEN_WIDTH;
+  if (screen_width != old_width)
+   {
+ bp->width = screen_width - 1;
+ bp->buffer = xrealloc (bp->buffer, bp->width + 1);
+   }
+}
+#endif
+
   create_image (bp, dltime);
   display_image (bp->buffer);
   bp->last_screen_update = dltime;


Re: [PATCH] implementation of determine_screen_width() for Windows

2004-01-29 Thread David Fritz
Hrvoje Niksic wrote:

This patch should fix both problems.

Great, thanks



[PATCH] periodic screen width check under Windows

2004-01-29 Thread David Fritz
Herold Heiko wrote:


How often do people change the size of the screen buffer 
while a command 
is running?


Rarely I think, for example when you notice a huge file is being downloaded
slowly and you enlarge the window in order to have a better granularity on
the progress bar.
Probably instead of risking a performance drawback on some (slow) machines a
better way would be call it rarely (every 5 seconds or so would still be
enough I think).
Heiko

Right. The previous patch was kind of a worst-case test.

Attached is a patch that checks the screen width approximately every two 
seconds in the Windows build. I don't know if this is what Hrvoje had in 
mind. And of course the interval can be tweaked.

Cheers

Index: progress.c
===
RCS file: /pack/anoncvs/wget/src/progress.c,v
retrieving revision 1.43
diff -u -r1.43 progress.c
--- progress.c  2004/01/28 01:02:26 1.43
+++ progress.c  2004/01/29 20:20:35
@@ -452,6 +452,11 @@
   double last_screen_update;   /* time of the last screen update,
   measured since the beginning of
   download. */
+#ifdef WINDOWS
+  double last_screen_width_check; /* time of the last screen width
+  check, measured since the
+  beginning of download. */
+#endif /* WINDOWS */
 
   int width;   /* screen width we're using at the
   time the progress gauge was
@@ -555,6 +560,15 @@
 bp->total_length = bp->initial_length + bp->count;
 
   update_speed_ring (bp, howmuch, dltime);
+
+#ifdef WINDOWS
+  /* Under Windows, check to see if the screen width has changed no more
+ than once every two seconds. */
+  if (dltime - bp->last_screen_width_check > 2000) {
+received_sigwinch = 1;
+bp->last_screen_width_check = dltime;
+  }
+#endif /* WINDOWS */
 
   /* If SIGWINCH (the window size change signal) been received,
  determine the new screen size and update the screen.  */


Re: Startup delay on Windows

2004-02-08 Thread David Fritz
Petr Kadlec wrote:
> I have traced the problem down to search_netrc() in netrc.c, where the
> program is trying to find the file using stat(). But as home_dir()
> returns "C:\" on Windows, the filename constructed looks like
> "C:\/.netrc", which is then probably interpreted by Windows as a name of
> a remote file, so Windows are trying to look around on the network, and
> continue only after some timeout.
I'm curious as to what operating system and compiler you are using.  I 
tried briefly to reproduce this under Windows 2000 with MSVC 7.1 and 
could not.  I would regard this as a bug in the implementation of 
stat(), not Wget.  BTW, this has come up before:

http://www.mail-archive.com/[EMAIL PROTECTED]/msg04440.html

Hrvoje Niksic wrote:

Thanks tracing this one.  It would never have occurred to me that the
file name "c:\/foo" could cause such a problem.
It really shouldn't; it seems perfectly valid (albeit strange) to me. 
Though, I guess, it behooves us to work around compiler/library bugs.

I see two different bugs here:

1. The routine that "merges" the .netrc file name with the directory
   name should be made aware of Windows, so that it doesn't append
   another backslash if a backslash is already present at the end of
   directory name returned by home_dir.  (In fact, the same logic
   could be applied to slashes following Unix directory names.)
*AFAIK*, Window should only treat two consecutive slashes specially if 
they are at the beginning of a file name. (Windows might not like more 
than one slash between a machine and share name, but that's not really 
relevant.)  Otherwise, they should be equivalent to a single slash. All 
this irrespective of whether the slashes are forward (/) or backward (\).

2. home_dir() should really be fixed to return something better than
   `c:\' unconditionally, as is currently the case.  The comment in
   the source says:
  home = "C:\\";
  /*  Maybe I should grab home_dir from registry, but the best
 that I could get from there is user's Start menu.  It sucks!  */
   This comment was not written by me, but by (I think) Darko Budor,
   who wrote the original Windows support.  Under Windows 2000 and XP,
   there have to be better choices of home directory.  For instance,
   Cygwin considers `c:\Documents and Settings\USERNAME' to be the
   home directory.
From Cygwin's /etc/profile:

# Here is how HOME is set, in order of priority, when starting from Windows
# 1) From existing HOME in the Windows environment, translated to a 
Posix path
# 2) from /etc/passwd, if there is an entry with a non empty directory field
# 3) from HOMEDRIVE/HOMEPATH
# 4) / (root)

If things were installed normally, Cygwin will consider /home/username 
to be the users home directory.  Under Cygwin / is usually mounted on 
C:\cygwin, or wherever Cygwin was installed.  But Cygwin is very much 
it's own environment. Already, two of the above methods are unavailable 
to us (2 and 4).

 I wonder if that is reachable through registry...

Does anyone have an idea what we should consider the home dir under
Windows, and how to find it?
I imagine there are a number of ways to go about this.

As it stands now, if I understand correctly, Wget works like this:

When processing .wgetrc under Windows, Wget does the following:

If Wget was built with MSVC, it looks for a file called "wgetrc" in the 
current directory. This is mildly evil. A comment in init.c includes the 
following sentence: "SYSTEM_WGETRC should not be defined under WINDOWS." 
Nonetheless, the MSVC Makefile defines SYSTEM_WGETRC as "wgetrc". 
AFAICT, Wget won't do this if it was built with one of the other Windows 
Makefiles.

Wget then processes the users .wgetrc. Under Windows, Wget ignores $HOME 
and looks for a file called wget.ini in the directory of the Wget binary.

Under Windows, if $HOME is defined home_dir() will return that, 
otherwise it returns `C:\'.  Wget uses the directory returned by 
home_dir() when looking for .netrc and when resolving ~.

So currently Wget's behavior is inconsistent, both with its behavior on 
other platforms, and with itself (the handling of .wgetrc and .netrc).

If we wanted to do things the NT way, we could, essentially, treat 
"C:\Documents and Settings\username\Application Data\Wget" as HOME and 
"C:\Documents and Settings\All Users\Application Data\Wget" as /etc. 
The above directories are just examples of typical locations; we would, 
of course, resolve the directories correctly.  But then what would we do 
if $HOME *is* defined? Ignore it? That would seem the `Windows' thing to do.

The directories themselves could be resolved using 
SHGetSpecialFolderPath() or its like. The entry points would have to 
resolved dynamically as they may not be available on ancient Windows 
installations.  We could then fall-back to the registry or the 
environment or something else.

The user could always define $WGETRC and put .wgetrc anywhere he/she 
pleased. But what about .netrc? And doe

[PATCH] MSVC Makefiles

2004-02-08 Thread David Fritz
Attached is a patch for the MSVC Makefiles.  I have tested it with MSVC 6.0sp5
for 80x86 and MSVC 7.1 for 80x86 under Windows 2000.
One change of note: I changed the Makefile to use batch rules. This
significantly decreases the time required to build. It might not be supported by
ancient versions of nmake.exe, I don't know.
I'm hoping others can test these changes, especially with older versions of MSVC.

Cheers,
David Fritz
2004-02-09  David Fritz  <[EMAIL PROTECTED]>

	* configure.bat.in: Don't clear the screen.

* windows/README: Add introductory paragraph.  Re-word a few
sentences.  Correct minor typographical errors.  Use consistent
capitalization of Wget, SSL, and OpenSSL.  Refer to Microsoft
Visual C++ as MSVC instead of VC++.  Mention the --msvc option to
configure.bat.  Reflow paragraphs.
* windows/Makefile.top: Use tabs instead of spaces.  Ignore errors
in clean rules.  Use lowercase filenames when building distribution
.zip archive.
* windows/Makefile.doc: Use tabs instead of spaces.  Ignore errors
in clean rules.
* windows/Makefile.src: Clean-up clean rules.  Use tabs instead of
spaces.  Link against gdi32.lib.  Don't define SYSTEM_WGETRC.
Remove unused macros.  Remove anachronistic and superfluous linker
flags.  Don't rename wget.exe to all upper-case.  Add
`preprocessor' conditionals for SSL and newer MSVC options.  Use
batch rules.  Don't suppress all warnings.



Index: configure.bat.in
===
RCS file: /pack/anoncvs/wget/configure.bat.in,v
retrieving revision 1.4
diff -u -r1.4 configure.bat.in
--- configure.bat.in2003/10/26 00:19:04 1.4
+++ configure.bat.in2004/02/09 05:29:50
@@ -26,7 +26,6 @@
 rem file, but you are not obligated to do so.  If you do not wish to do
 rem so, delete this exception statement from your version.
 
-cls
 if .%1 == .--borland goto :borland
 if .%1 == .--mingw goto :mingw
 if .%1 == .--msvc goto :msvc
Index: windows/Makefile.doc
===
RCS file: /pack/anoncvs/wget/windows/Makefile.doc,v
retrieving revision 1.4
diff -u -r1.4 Makefile.doc
--- windows/Makefile.doc2002/05/18 02:16:35 1.4
+++ windows/Makefile.doc2004/02/09 05:29:51
@@ -28,22 +28,22 @@
 # You probably need makeinfo and perl, see the README in the main
 # windows directory.
 
-RM = del
-CP = copy
-ATTRIB = attrib
-
-MAKEINFO = makeinfo.exe
-TEXI2POD = texi2pod.pl
-POD2MAN  = pod2man
-
-SAMPLERCTEXI = sample.wgetrc.munged_for_texi_inclusion
-WGETHLP  = wget.hlp
-WGETINFO = wget.info
-WGETTEXI = wget.texi
-WGETHTML = wget.html
-WGETPOD  = wget.pod
-manext   = 1
-MAN  = wget.$(manext)
+RM = -del
+CP = copy
+ATTRIB = attrib
+
+MAKEINFO   = makeinfo.exe
+TEXI2POD   = texi2pod.pl
+POD2MAN= pod2man
+
+SAMPLERCTEXI   = sample.wgetrc.munged_for_texi_inclusion
+WGETHLP= wget.hlp
+WGETINFO   = wget.info
+WGETTEXI   = wget.texi
+WGETHTML   = wget.html
+WGETPOD= wget.pod
+manext = 1
+MAN= wget.$(manext)
 
 all: $(WGETHLP) $(WGETINFO) $(WGETHTML)
 
@@ -76,10 +76,10 @@
 hcrtf -xn wget.hpj
 
 clean:
-$(RM) *.bak
-$(RM) *.hpj
-$(RM) *.rtf
-$(RM) *.ph
+   $(RM) *.bak
+   $(RM) *.hpj
+   $(RM) *.rtf
+   $(RM) *.ph
$(RM) $(SAMPLERCTEXI)
$(RM) $(MAN)
$(RM) $(TEXI2POD)
Index: windows/Makefile.src
===
RCS file: /pack/anoncvs/wget/windows/Makefile.src,v
retrieving revision 1.21
diff -u -r1.21 Makefile.src
--- windows/Makefile.src2003/11/21 08:48:45 1.21
+++ windows/Makefile.src2004/02/09 05:29:51
@@ -1,4 +1,4 @@
-# Makefile for `wget' utility for MSVC 4.0
+# Makefile for `wget' utility for MSVC
 # Copyright (C) 1995, 1996, 1997 Free Software Foundation, Inc.
 
 # This program is free software; you can redistribute it and/or modify
@@ -25,44 +25,49 @@
 # file, but you are not obligated to do so.  If you do not wish to do
 # so, delete this exception statement from your version.
 
-#
-# Version: 1.4.4
-#
-
-#Comment these if you don't have openssl available - however https
-#won't work.
-SSLDEFS=   /DHAVE_SSL
-SSLLIBS=   libeay32.lib ssleay32.lib
-SSLSRC =   gen_sslfunc.c
-SSLOBJ =   gen_sslfunc$o
-
-SHELL = command
-
-VPATH   = .
-o   = .obj
-OUTDIR  = .
-
-CC   = cl
-LD   = link
-
-CFLAGS   = /nologo /MT /W0 /O2
-#DEBUGCF  = /DENABLE_DEBUG /Zi /Od #/Fd /FR
-CPPFLAGS = 
-DEFS = /DWINDOWS /D_CONSOLE /DHAVE_CONFIG_H /DSYSTEM_WGETRC=\"wgetrc\"
-LDFLAGS  = /subsystem:console /incremental:no /warn:3
-#DEBUGLF  = /pdb:wget.pdb

Re: Startup delay on Windows

2004-02-16 Thread David Fritz
I'd be content with the following logic:

Don't process a `system' wgetrc. If $HOME is not defined, use the 
directory the Wget executable is in as $HOME (what home_dir() returns).
If $HOME/.wgetrc exists, use that; otherwise look for wget.ini in the 
directory the executable is in, regardless of $HOME.

We would retain wget.ini support for backward compatibility, and support 
.wgetrc for consistency with other platforms and with the handling of 
.netrc.  This would only break things if people had $HOME defined and it 
contained a .wgetrc and they expected the Windows port to ignore it.

As a side-effect, this would also resolve the above issue.

I went ahead and implemented this.  I figure at least it will work as an interim 
solution.

2004-02-16  David Fritz  <[EMAIL PROTECTED]>

* init.c (home_dir): Use aprintf() instead of xmalloc()/sprintf().
Under Windows, if $HOME is not defined, use the directory that
contains the Wget binary instead of hard-coded `C:\'.
(wgetrc_file_name): Under Windows, look for $HOME/.wgetrc then, if
not found, look for wget.ini in the directory of the Wget binary.
* mswindows.c (ws_mypath): Employ slightly more robust methodology.
Strip trailing path separator.

Index: src/init.c
===
RCS file: /pack/anoncvs/wget/src/init.c,v
retrieving revision 1.91
diff -u -r1.91 init.c
--- src/init.c  2003/12/14 13:35:27 1.91
+++ src/init.c  2004/02/16 15:58:36
@@ -1,5 +1,5 @@
 /* Reading/parsing the initialization file.
-   Copyright (C) 1995, 1996, 1997, 1998, 2000, 2001, 2003
+   Copyright (C) 1995, 1996, 1997, 1998, 2000, 2001, 2003, 2004
Free Software Foundation, Inc.
 
 This file is part of GNU Wget.
@@ -314,9 +314,9 @@
return NULL;
   home = pwd->pw_dir;
 #else  /* WINDOWS */
-  home = "C:\\";
-  /*  Maybe I should grab home_dir from registry, but the best
-that I could get from there is user's Start menu.  It sucks!  */
+  /* Under Windows, if $HOME isn't defined, use the directory where
+ `wget.exe' resides.  */
+  home = ws_mypath ();
 #endif /* WINDOWS */
 }
 
@@ -347,27 +347,24 @@
   return xstrdup (env);
 }
 
-#ifndef WINDOWS
   /* If that failed, try $HOME/.wgetrc.  */
   home = home_dir ();
   if (home)
-{
-  file = (char *)xmalloc (strlen (home) + 1 + strlen (".wgetrc") + 1);
-  sprintf (file, "%s/.wgetrc", home);
-}
+file = aprintf ("%s/.wgetrc", home);
   xfree_null (home);
-#else  /* WINDOWS */
-  /* Under Windows, "home" is (for the purposes of this function) the
- directory where `wget.exe' resides, and `wget.ini' will be used
- as file name.  SYSTEM_WGETRC should not be defined under WINDOWS.
-
- It is not as trivial as I assumed, because on 95 argv[0] is full
- path, but on NT you get what you typed in command line.  --dbudor */
-  home = ws_mypath ();
-  if (home)
+
+#ifdef WINDOWS
+  /* Under Windows, if we still haven't found .wgetrc, look for the file
+ `wget.ini' in the directory where `wget.exe' resides; we do this for
+ backward compatibility with previous versions of Wget.
+ SYSTEM_WGETRC should not be defined under WINDOWS.  */
+  if (!file || !file_exists_p (file))
 {
-  file = (char *)xmalloc (strlen (home) + strlen ("wget.ini") + 1);
-  sprintf (file, "%swget.ini", home);
+  xfree_null (file);
+  file = NULL;
+  home = ws_mypath ();
+  if (home)
+   file = aprintf ("%s/wget.ini", home);
 }
 #endif /* WINDOWS */
 
Index: src/mswindows.c
===
RCS file: /pack/anoncvs/wget/src/mswindows.c,v
retrieving revision 1.22
diff -u -r1.22 mswindows.c
--- src/mswindows.c 2003/11/03 21:57:03 1.22
+++ src/mswindows.c 2004/02/16 15:58:37
@@ -1,5 +1,5 @@
 /* mswindows.c -- Windows-specific support
-   Copyright (C) 1995, 1996, 1997, 1998  Free Software Foundation, Inc.
+   Copyright (C) 1995, 1996, 1997, 1998, 2004  Free Software Foundation, Inc.
 
 This file is part of GNU Wget.
 
@@ -199,22 +199,25 @@
 ws_mypath (void)
 {
   static char *wspathsave = NULL;
-  char buffer[MAX_PATH];
-  char *ptr;
 
-  if (wspathsave)
+  if (!wspathsave)
 {
-  return wspathsave;
-}
+  char buf[MAX_PATH + 1];
+  char *p;
+  DWORD len;
+
+  len = GetModuleFileName (GetModuleHandle (NULL), buf, sizeof (buf));
+  if (!len || (len >= sizeof (buf)))
+return NULL;
+
+  p = strrchr (buf, PATH_SEPARATOR);
+  if (!p)
+return NULL;
 
-  if (GetModuleFileName (NULL, buffer, MAX_PATH) &&
-  (ptr = strrchr (buffer, PATH_SEPARATOR)) != NULL)
-{
-  *(ptr + 1) = '\0';
-  wspathsave = xstrdup (buffer);
+  *p = '\0';
+  wspathsave = xstrdup (buf);
 }
-  else
-wspathsave = NULL;
+
   return wspathsave;
 }
 


[PATCH] Don't launch the Windows help file in response to --help

2004-02-20 Thread David Fritz
Attached is a patch that removes the ws_help() function from mswindows.[ch] and 
the call to it from print_help() in main.c.  Also attached is an alternate patch 
that will fix ws_help(), which I neglected to update when I changed ws_mypath().

I find this behavior inconsistent with the vast majority of other command line 
tools.  It's something akin to popping-up a web browser with the HTML version of 
the docs in response to `wget -–help' when running in a terminal under X.

Feedback from users would be appreciated.

Note: The previous change to ws_mypath() broke ws_help(), so one of the attached 
patches should be applied.

2004-02-20  David Fritz  <[EMAIL PROTECTED]>

	* main.c (print_help): Remove call to ws_help().

	* mswindows.c (ws_help): Remove.

	* mswindows.h (ws_help): Remove.

Index: src/mswindows.c
===
RCS file: /pack/anoncvs/wget/src/mswindows.c,v
retrieving revision 1.23
diff -u -r1.23 mswindows.c
--- src/mswindows.c 2004/02/17 15:37:31 1.23
+++ src/mswindows.c 2004/02/20 16:17:34
@@ -229,8 +229,8 @@
   if (mypath)
 {
   struct stat sbuf;
-  char *buf = (char *)alloca (strlen (mypath) + strlen (name) + 4 + 1);
-  sprintf (buf, "%s%s.HLP", mypath, name);
+  char *buf = (char *)alloca (strlen (mypath) + strlen (name) + 5 + 1);
+  sprintf (buf, "%s/%s.HLP", mypath, name);
   if (stat (buf, &sbuf) == 0) 
{
   printf (_("Starting WinHelp %s\n"), buf);
Index: src/main.c
===
RCS file: /pack/anoncvs/wget/src/main.c,v
retrieving revision 1.110
diff -u -r1.110 main.c
--- src/main.c  2003/12/14 13:35:27 1.110
+++ src/main.c  2004/02/20 16:25:03
@@ -621,9 +621,6 @@
   for (i = 0; i < countof (help); i++)
 fputs (_(help[i]), stdout);
 
-#ifdef WINDOWS
-  ws_help (exec_name);
-#endif
   exit (0);
 }
 
Index: src/mswindows.c
===
RCS file: /pack/anoncvs/wget/src/mswindows.c,v
retrieving revision 1.23
diff -u -r1.23 mswindows.c
--- src/mswindows.c 2004/02/17 15:37:31 1.23
+++ src/mswindows.c 2004/02/20 16:25:04
@@ -222,28 +222,6 @@
 }
 
 void
-ws_help (const char *name)
-{
-  char *mypath = ws_mypath ();
-
-  if (mypath)
-{
-  struct stat sbuf;
-  char *buf = (char *)alloca (strlen (mypath) + strlen (name) + 4 + 1);
-  sprintf (buf, "%s%s.HLP", mypath, name);
-  if (stat (buf, &sbuf) == 0) 
-   {
-  printf (_("Starting WinHelp %s\n"), buf);
-  WinHelp (NULL, buf, HELP_INDEX, 0);
-}
-  else
-{
-  printf ("%s: %s\n", buf, strerror (errno));
-}
-}
-}
-
-void
 ws_startup (void)
 {
   WORD requested;
Index: src/mswindows.h
===
RCS file: /pack/anoncvs/wget/src/mswindows.h,v
retrieving revision 1.13
diff -u -r1.13 mswindows.h
--- src/mswindows.h 2003/11/06 01:12:02 1.13
+++ src/mswindows.h 2004/02/20 16:25:05
@@ -159,7 +159,6 @@
 void ws_changetitle (const char*, int);
 void ws_percenttitle (double);
 char *ws_mypath (void);
-void ws_help (const char *);
 void windows_main_junk (int *, char **, char **);
 
 /* Things needed for IPv6; missing in . */


Re: wget: Option -O not working in version 1.9 ?

2004-02-27 Thread David Fritz
Michael Bingel wrote:
Hi there,

I was looking for a tool to retrieve web pages and print them to 
standard out. As windows user I tried wget from Cygwin, but it created a 
file and I could not find the option to redirect output to standard out.

Then I browsed throught the online documentation and found the -O option 
in the manual 
(http://www.gnu.org/software/wget/manual/wget-1.8.1/html_mono/wget.html#IDX131). 

I thought great, problem solved, but Cygwin wget version 1.9 does not 
accept "-O", although the NEWS file does not state removal of this feature.

Your official site (http://www.gnu.org/software/wget/wget.html) states 
1.8 as latest version with the option -O in the manual, so why can 
Cygwin have version 1.9 and does not support -O ?

kind regards,
Mike
It might help if you would post the invocations you tried and also the output 
Wget produced.

But, I'd guess you probably had a non-option argument before –O.  For a while 
now, the version of getopt_long() included with Cygwin has had argument 
permutation disabled by default.  (It's recently been re-enabled in CVS, but not 
yet in a released version.)  So, under Cygwin you'll need to place all option 
arguments before any non-option arguments.  Like this, for instance:

$ wget –O - http://www.gnu.org/

HTH, Cheers



Re: wget: Option -O not working in version 1.9 ?

2004-02-27 Thread David Fritz
Hrvoje Niksic wrote:

David Fritz writes:

But, I'd guess you probably had a non-option argument before -O.
For a while now, the version of getopt_long() included with Cygwin
has had argument permutation disabled by default.


What on Earth were they thinking?!  
:)  Well, ultimately, I can only speculate, but I recently grep'd through the 
archive of Cygwin mailing lists trying to understand just why this change was 
made.  It seems the Cygwin RCM (see http://cygwin.com/acronyms/), found that 
argument permutation was causing problems with one of the utilities distributed 
with Cygwin that takes another command as it's argument.  He felt it was a good 
idea to hard-code POSIXLY_CORRECT into the getopt* code.  Irrespective of the 
situation that whereas getopt() is POSIX standard and argument permutation is 
not part of POSIX, getopt_long() is a GNU invention and disabling argument 
permutation for it is just weird.

I tried to point this out a few months ago and was ignored.  Recently, a Mingw 
maintainer reverted the change to their copy of getopt* (which some parts of 
Cygwin use) and that broke things again.  This time they seem to have fixed 
things the right way and argument permutation is enabled for getopt_long() 
again.  Hopefully it will stay that way.

I've never considered the
possibility that someone would be stupid enough to do that.  Maybe
Wget should revert to simply using its own implementation of
getopt_long unconditionally.
That's the solution some of the Cygwin package maintainers have used.  But 
hopefully by the time the next version of Wget is released it won't be a problem 
anymore.


(It's recently been re-enabled in CVS, but not yet in a released
version.)  So, under Cygwin you'll need to place all option
arguments before any non-option arguments.


I never thought I'd see those particular instructions applied to Wget.
I've always hated the traditional Unix requirement for all options to
precede all non-options.
If Cygwin thinks it knows better how to handle command-line options,
they should have the decency to name the function something other than
getopt_long.
Agreed.

Cheers



Re: Windows titlebar fix

2004-03-02 Thread David Fritz
Gisle Vanem wrote:

ws_percenttitle() should not be called in quiet mode since ws_changetitle() 
AFAICS is only called in verbose mode. That caused an assert in 
mswindows.c. An easy patch:

--- CVS-latest\src\retr.c   Sun Dec 14 14:35:27 2003
+++ src\retr.c  Tue Mar 02 21:18:55 2004
@@ -311,7 +311,7 @@
   if (progress)
progress_update (progress, ret, wtimer_read (timer));
 #ifdef WINDOWS
-  if (toread > 0)
+  if (toread > 0 && !opt.quiet)
ws_percenttitle (100.0 *
 (startpos + sum_read) / (startpos + toread));
 #endif
--gv


This is because of a patch I recently submitted.  The code in ws_percenttitle() 
used to just return if the relevant pointers were null.  I replaced that with 
the assert()s.  I failed to notice that ws_changetitle() is only called when 
opt.verbose is non-zero.  (After I removed the call to it from main().)  Sorry 
about that.

We could also fix this by calling ws_changetitle() unconditionally.  Should the 
title bar be affected by verbosity?

One minor nit regarding the above patch:  It should use opt.verbose instead of 
!opt.quiet so Wget won't assert when –nv is used.




Re: Windows titlebar fix

2004-03-02 Thread David Fritz
The attached patch will cause ws_percenttitle() to do nothing if the relevant 
variables have not been initialized.  This is what the code did before I changed 
it.  I changed it because it seemed that the code was making allowances for 
things that should not happen and I thought an assert would be more appropriate. 
 Though, I guess doing thing this way will make the code more robust against 
future changes (and thus make the Windows port less of a maintenance burden). 
The patch also clamps the percentage value instead of just returning when it's 
out-of-range.  This is so it will update the title to display the percentage as 
100 in the arcane case when the previous percentage was < 100 and the new 
percentage is > 100.  It also includes Gisle Vanem's fix for retr.c.

[I know my assignment is pending but hopefully this patch is small enough to 
squeak-by until it's been processed.]

2004-03-02  David Fritz  <[EMAIL PROTECTED]>

* retr.c (fd_read_body): Under Windows, only call
ws_percenttitle() if verbose.  Fix by Gisle Vanem.
* mswindows.c (ws_percenttitle): Guard against future changes by
doing nothing if the proper variables have not been initialized.
Clamp percentage value.




Index: src/mswindows.c
===
RCS file: /pack/anoncvs/wget/src/mswindows.c,v
retrieving revision 1.27
diff -u -r1.27 mswindows.c
--- src/mswindows.c 2004/02/26 14:34:17 1.27
+++ src/mswindows.c 2004/03/03 03:21:27
@@ -180,19 +180,24 @@
 void
 ws_percenttitle (double percentage_float)
 {
-  int percentage = (int) percentage_float;
+  int percentage;
 
-  /* Only update the title when the percentage has changed.  */
-  if (percentage == old_percentage)
+  if (!title_buf || !curr_url)
 return;
 
-  old_percentage = percentage;
+  percentage = (int) percentage_float;
 
+  /* Clamp percentage value.  */
+  if (percentage < 0)
+percentage = 0;
   if (percentage > 100)
+percentage = 100;
+
+  /* Only update the title when the percentage has changed.  */
+  if (percentage == old_percentage)
 return;
 
-  assert (title_buf != NULL);
-  assert (curr_url != NULL);
+  old_percentage = percentage;
 
   sprintf (title_buf, "Wget [%d%%] %s", percentage, curr_url);
   SetConsoleTitle (title_buf);
Index: src/retr.c
===
RCS file: /pack/anoncvs/wget/src/retr.c,v
retrieving revision 1.84
diff -u -r1.84 retr.c
--- src/retr.c  2003/12/14 13:35:27 1.84
+++ src/retr.c  2004/03/03 03:21:31
@@ -311,7 +311,7 @@
   if (progress)
progress_update (progress, ret, wtimer_read (timer));
 #ifdef WINDOWS
-  if (toread > 0)
+  if (opt.verbose && toread > 0)
ws_percenttitle (100.0 *
 (startpos + sum_read) / (startpos + toread));
 #endif







Suggestion to add an switch on timestamps

2004-03-16 Thread david-zhan
Suggestion to add an switch on timestamps

 

Dear Sir/Madam:

 

WGET is popular FTP software for UNIX. But, after the files were downloaded
for the first time, WGET always use the date and time, matching those on the
remote server, for the downloaded files. If WGET is executed in temporary
directory in which the files will be deleted according to the date of the
files, the files, created seven days ago, will be deleted automatically once
they are finish. I suggest that an option on timestamps can be added to WGET
such that the users can use the current date and time for the newly
downloaded files.

 

Thank you for kind attention.



[PATCH] A working implementation of fork_to_background() under Windows – please test

2004-03-19 Thread David Fritz
Attached is an implementation of fork_to_background() for Windows that (I hope) 
has the desired effect under both 9x and NT.

_This is a preliminary patch and needs to be tested._

The patch is dependant upon the fact that the only time fork_to_background() is 
called is on start-up when –b is specified.

Windows of course does not support the fork() call, so it must be simulated. 
This can be done by creating a new process and using some form of inter-process 
communication to transfer the state of the old process to the new one.  This 
requires the parent and child to cooperate and when done in a general way (such 
as by Cygwin) requires a lot of work.

However, with Wget since we have a priori knowledge of what could have changed 
in the parent by the time we call fork(), we could implement a special purpose 
fork() that only passes to the child the things that we know could have changed. 
 (The initialization done by the C run-time library, etc. would be performed 
anew in the child, but hold on a minute.)

The only real work done by Wget before calling fork() is the reading of wgetrc 
files and the processing of command-line arguments.  Passing this information 
directly to the child would be possible, but the implementation would be complex 
and fragile. It would need to be updated as changes are made to the main code.

It would be much simpler to simply perform the initialization (reading of config 
files, processing of args, etc.) again in the child.  This would have a small 
performance impact and introduce some race-conditions, but I think the 
advantages (having –b work) outweigh the disadvantages.

The implementation is, I hope, fairly straightforward.  I have attempted to 
explain it in moderate detail in an attached README.

I'm hoping others can test it with various operating systems and compilers. 
Also, any feedback regarding the design or implementation would be welcome.  Do 
you feel this is the right way to go about this?

Cheers,
David Fritz
2004-03-19  David Fritz  <[EMAIL PROTECTED]>

* mswindows.c (make_section_name, fake_fork, fake_fork_child): New
functions.
(fork_to_backgorund): Replace with new implementation.


Index: src/mswindows.c
===
RCS file: /pack/anoncvs/wget/src/mswindows.c,v
retrieving revision 1.29
diff -u -r1.29 mswindows.c
--- src/mswindows.c 2004/03/19 23:54:27 1.29
+++ src/mswindows.c 2004/03/20 01:34:15
@@ -131,10 +131,240 @@
   FreeConsole ();
 }
 
+/* Construct the name for a named section (a.k.a `file mapping') object.
+   The returned string is dynamically allocated and needs to be xfree()'d.  */
+static char *
+make_section_name (DWORD pid)
+{
+return aprintf("gnu_wget_fake_fork_%lu", pid);
+}
+
+/* This structure is used to hold all the data that is exchanged between
+   parent and child.  */
+struct fake_fork_info
+{
+  HANDLE event;
+  int changedp;
+  char lfilename[MAX_PATH + 1];
+};
+
+/* Determines if we are the child and if so performs the child logic.
+   Return values:
+ < 0  error
+ 0parent
+ > 0  child
+*/
+static int
+fake_fork_child (void)
+{
+  HANDLE section, event;
+  struct fake_fork_info *info;
+  char *name;
+  DWORD le;
+
+  name = make_section_name (GetCurrentProcessId ());
+  section = OpenFileMapping (FILE_MAP_WRITE, FALSE, name);
+  le = GetLastError ();
+  xfree (name);
+  if (!section)
+{
+  if (le == ERROR_FILE_NOT_FOUND)
+return 0;   /* Section object does not exist; we are the parent.  */
+  else
+return -1;
+}
+
+  info = MapViewOfFile (section, FILE_MAP_WRITE, 0, 0, 0);
+  if (!info)
+{
+  CloseHandle (section);
+  return -1;
+}
+
+  event = info->event;
+
+  if (!opt.lfilename)
+{
+  opt.lfilename = unique_name (DEFAULT_LOGFILE, 0);
+  info->changedp = 1;
+  strncpy (info->lfilename, opt.lfilename, sizeof (info->lfilename));
+  info->lfilename[sizeof (info->lfilename) - 1] = '\0';
+}
+  else
+info->changedp = 0;
+
+  UnmapViewOfFile (info);
+  CloseHandle (section);
+
+  /* Inform the parent that we've done our part.  */
+  if (!SetEvent (event))
+  return -1;
+
+  CloseHandle (event);
+  return 1; /* We are the child.  */
+}
+
+
+static void
+fake_fork (void)
+{
+  char *cmdline, *args;
+  char exe[MAX_PATH + 1];
+  DWORD exe_len, le;
+  SECURITY_ATTRIBUTES sa;
+  HANDLE section, event, h[2];
+  STARTUPINFO si;
+  PROCESS_INFORMATION pi;
+  struct fake_fork_info *info;
+  char *name;
+  BOOL rv;
+
+  event = section = pi.hProcess = pi.hThread = NULL;
+
+  /* Get command line arguments to pass to the child process.
+ We need to skip the name of the command (what amounts to argv[0]).  */
+  cmdline = GetCommandLine ();
+  if (*cmdline == '"')
+{
+  args = strchr (cmdline + 1, '"');
+  if (a

Re: [PATCH] A working implementation of fork_to_background() under Windows – please test

2004-03-23 Thread David Fritz
Herold Heiko wrote:
MSVC binary at http://xoomer.virgilio.it/hherold/ for public testing.
I performed only basic tests on NT4 sp6a, everything performed fine as
expected.
Thank you much for testing and hosting the binary.

Some ideas on this thing:
I'll respond to your points out-of-order.

In verbose mode the child should probably acknowledge in the log file the
fact it was invocated as child.
The current patch attempts to emulate the behavior of the Unix version.  AFAICT, 
this and the following suggestion apply equally well to the existing (Unix) code.

In quiet mode the parent log (child pid, "log on wget-log" or whatever)
probably should not be printed.
Also, perhaps in quiet mode the child should not automatically set a log file if 
none was specified.  IIUC, the log file would always be empty.

In debug mode the client should probably also log the name of the section
object and any information retrieved from it (currently the flag only).
Sure, I could add a number of debug prints.

A possible fix for the wgetrc race condition could be caching the content of
the whole wgetrc in the parent and transmit it in the section object to the
child, a bit messy I must admit but a possible solution if that race
condition is considered a Bad Thing.
That would work, but would require making changes to the main code and would 
require performing the child detection logic much earlier (even before we know 
if –b was specified).  We could also exploit Windows file-sharing semantics or 
file locking features to guarantee the config files can't change.  I'm unsure 
such complexity is necessary.

About the only scenario I could think
of is where you have a script creating a custom wgetrc, run wget, then
change the wgetrc: introduce -b and the script could change the wgetrc after
running wget but before the parsing on client side a rather remote but
possible scenario.
In this scenario, the script would have to wait for the parent to terminate to 
avoid a race, even with the Unix version.  With this patch the child would have 
necessarily finished reading any wgetrc files before the parent terminates.  So 
there shouldn't be a problem.

Thanks again,
David Fritz



Re: [PATCH] A working implementation of fork_to_background() under Windows – please test

2004-03-24 Thread David Fritz
Hrvoje Niksic wrote:
For now I'd start with applying David's patch, so that people can test
its functionality.  It is easy to fix the behavior of `wget -q -b'
later.
David, can I apply your patch now?
Sure.

The attached patch corrects a few minor formatting details but is otherwise 
identical to the previous one.

Index: src/mswindows.c
===
RCS file: /pack/anoncvs/wget/src/mswindows.c,v
retrieving revision 1.29
diff -u -r1.29 mswindows.c
--- src/mswindows.c 2004/03/19 23:54:27 1.29
+++ src/mswindows.c 2004/03/24 17:50:32
@@ -131,10 +131,240 @@
   FreeConsole ();
 }
 
+/* Construct the name for a named section (a.k.a. `file mapping') object.
+   The returned string is dynamically allocated and needs to be xfree()'d.  */
+static char *
+make_section_name (DWORD pid)
+{
+  return aprintf ("gnu_wget_fake_fork_%lu", pid);
+}
+
+/* This structure is used to hold all the data that is exchanged between
+   parent and child.  */
+struct fake_fork_info
+{
+  HANDLE event;
+  int changedp;
+  char lfilename[MAX_PATH + 1];
+};
+
+/* Determines if we are the child and if so performs the child logic.
+   Return values:
+ < 0  error
+   0  parent
+ > 0  child
+*/
+static int
+fake_fork_child (void)
+{
+  HANDLE section, event;
+  struct fake_fork_info *info;
+  char *name;
+  DWORD le;
+
+  name = make_section_name (GetCurrentProcessId ());
+  section = OpenFileMapping (FILE_MAP_WRITE, FALSE, name);
+  le = GetLastError ();
+  xfree (name);
+  if (!section)
+{
+  if (le == ERROR_FILE_NOT_FOUND)
+return 0;   /* Section object does not exist; we are the parent.  */
+  else
+return -1;
+}
+
+  info = MapViewOfFile (section, FILE_MAP_WRITE, 0, 0, 0);
+  if (!info)
+{
+  CloseHandle (section);
+  return -1;
+}
+
+  event = info->event;
+
+  if (!opt.lfilename)
+{
+  opt.lfilename = unique_name (DEFAULT_LOGFILE, 0);
+  info->changedp = 1;
+  strncpy (info->lfilename, opt.lfilename, sizeof (info->lfilename));
+  info->lfilename[sizeof (info->lfilename) - 1] = '\0';
+}
+  else
+info->changedp = 0;
+
+  UnmapViewOfFile (info);
+  CloseHandle (section);
+
+  /* Inform the parent that we've done our part.  */
+  if (!SetEvent (event))
+return -1;
+
+  CloseHandle (event);
+  return 1; /* We are the child.  */
+}
+
+
+static void
+fake_fork (void)
+{
+  char *cmdline, *args;
+  char exe[MAX_PATH + 1];
+  DWORD exe_len, le;
+  SECURITY_ATTRIBUTES sa;
+  HANDLE section, event, h[2];
+  STARTUPINFO si;
+  PROCESS_INFORMATION pi;
+  struct fake_fork_info *info;
+  char *name;
+  BOOL rv;
+
+  event = section = pi.hProcess = pi.hThread = NULL;
+
+  /* Get command line arguments to pass to the child process.
+ We need to skip the name of the command (what amounts to argv[0]).  */
+  cmdline = GetCommandLine ();
+  if (*cmdline == '"')
+{
+  args = strchr (cmdline + 1, '"');
+  if (args)
+++args;
+}
+  else
+args = strchr (cmdline, ' ');
+
+  /* It's ok if args is NULL, that would mean there were no arguments
+ after the command name.  As it is now though, we would never get here
+ if that were true.  */
+
+  /* Get the fully qualified name of our executable.  This is more reliable
+ than using argv[0].  */
+  exe_len = GetModuleFileName (GetModuleHandle (NULL), exe, sizeof (exe));
+  if (!exe_len || (exe_len >= sizeof (exe)))
+return;
+
+  sa.nLength = sizeof (sa);
+  sa.lpSecurityDescriptor = NULL;
+  sa.bInheritHandle = TRUE;
+
+  /* Create an anonymous inheritable event object that starts out
+ non-signaled.  */
+  event = CreateEvent (&sa, FALSE, FALSE, NULL);
+  if (!event)
+return;
+
+  /* Creat the child process detached form the current console and in a
+ suspended state.  */
+  memset (&si, 0, sizeof (si));
+  si.cb = sizeof (si);
+  rv = CreateProcess (exe, args, NULL, NULL, TRUE, CREATE_SUSPENDED |
+  DETACHED_PROCESS, NULL, NULL, &si, &pi);
+  if (!rv)
+goto cleanup;
+
+  /* Create a named section object with a name based on the process id of
+ the child.  */
+  name = make_section_name (pi.dwProcessId);
+  section =
+  CreateFileMapping (INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0,
+ sizeof (struct fake_fork_info), name);
+  le = GetLastError();
+  xfree (name);
+  /* Fail if the section object already exists (should not happen).  */
+  if (!section || (le == ERROR_ALREADY_EXISTS))
+{
+  rv = FALSE;
+  goto cleanup;
+}
+
+  /* Copy the event handle into the section object.  */
+  info = MapViewOfFile (section, FILE_MAP_WRITE, 0, 0, 0);
+  if (!info)
+{
+  rv = FALSE;
+  goto cleanup;
+}
+
+  info->event = event;
+
+  UnmapViewOfFile (info);
+
+  /* S

Re: [PATCH] A working implementation of fork_to_background() under Windows – please test

2004-03-24 Thread David Fritz
Hrvoje Niksic wrote:
Thanks for the patch, I've now applied it to CVS.

You might want to add a comment in front of fake_fork() explaining
what it does, and why.  The comment doesn't have to be long, only
several sentences so that someone reading the code later understands
what the heck a "fake fork" is and why we're performing it.


Ok, I'll submit a patch latter tonight.  Do you think it would be a good idea to 
include README.fork in windows/ (the directory with the Windows Makefiles, etc. 
in it)?  (If so, I'd like to tweak it a little first, though.)

Thanks



Re: [PATCH] A working implementation of fork_to_background() under Windows – please test

2004-03-24 Thread David Fritz
Hrvoje Niksic wrote:
Thanks for the patch, I've now applied it to CVS.

You might want to add a comment in front of fake_fork() explaining
what it does, and why.  The comment doesn't have to be long, only
several sentences so that someone reading the code later understands
what the heck a "fake fork" is and why we're performing it.
Ok, I hope this is sufficient.

Cheers
Index: src/mswindows.c
===
RCS file: /pack/anoncvs/wget/src/mswindows.c,v
retrieving revision 1.30
diff -u -r1.30 mswindows.c
--- src/mswindows.c 2004/03/24 19:16:08 1.30
+++ src/mswindows.c 2004/03/24 23:52:59
@@ -202,7 +202,14 @@
   return 1; /* We are the child.  */
 }
 
-
+/* Windows doesn't support the fork() call; so we fake it by invoking
+   another copy of Wget with the same arguments with which we were
+   invoked.  The child copy of Wget should perform the same initialization
+   sequence as the parent; so we should have two processes that are
+   essentially identical.  We create a specially named section object that
+   allows the child to distinguish itself from the parent and is used to
+   exchange information between the two processes.  We use an event object
+   for synchronization.  */
 static void
 fake_fork (void)
 {
@@ -343,6 +350,8 @@
   /* We failed, return.  */
 }
 
+/* This is the corresponding Windows implementation of the
+   fork_to_background() function in utils.c.  */
 void
 fork_to_background (void)
 {


Re: wget: strdup: Not enough memory

2004-05-09 Thread David Fritz
Axel Pettinger wrote:

Hrvoje Niksic wrote:

This patch should fix the problem.  Please let me know if it works
for you:
I would like to check it out, but I'm afraid I'm not able to compile
it.
Why not?  What error are you getting?


I have not that much experience with compiling source code ... When I
try to build WGET.EXE (w/o SSL) using MinGW then I get many warnings and
errors in "utils.c" and "log.c", i.e.:
--
gcc -DWINDOWS -DHAVE_CONFIG_H -O3 -Wall -I.   -c -o log.o log.c
log.c:498:26: macro "va_start" requires 2 arguments, but only 1 given
log.c: In function `logprintf':
log.c:498: `va_start' undeclared (first use in this function)
log.c:498: (Each undeclared identifier is reported only once
log.c:498: for each function it appears in.)
log.c:524:30: macro "va_start" requires 2 arguments, but only 1 given
log.c: In function `debug_logprintf':
log.c:524: `va_start' undeclared (first use in this function)
mingw32-make: *** [log.o] Error 1
--
Regards,
Axel Pettinger
I just posted a patch to wget-patches that, hopefully, will fix the mingw build. 
 In the meantime try adding the following lines to config.h.mingw:

#define WGET_USE_STDARG
#define HAVE_SIG_ATOMIC_T
HTH, Cheers




Re: wget: strdup: Not enough memory

2004-05-09 Thread David Fritz
Axel Pettinger wrote:
David Fritz wrote:

Axel Pettinger wrote:


I have not that much experience with compiling source code ... 
When I try to build WGET.EXE (w/o SSL) using MinGW then I get many 


Forgot to mention that the source is 1.9+cvs-dev-200404081407 ...


warnings and errors in "utils.c" and "log.c", i.e.:
[snip]

I just posted a patch to wget-patches that, hopefully, will fix the 
mingw build.
 In the meantime try adding the following lines to config.h.mingw:

#define WGET_USE_STDARG
#define HAVE_SIG_ATOMIC_T


"log.c" seems to be ok now, but there's still an error in "utils.c":

--
gcc -DWINDOWS -DHAVE_CONFIG_H -O3 -Wall -I.   -c -o utils.o utils.c
utils.c:53:20: utime.h: No such file or directory
utils.c: In function `unique_name_1':
utils.c:411: warning: implicit declaration of function `alloca'
utils.c: In function `number_to_string':
utils.c:1271: warning: suggest parentheses around + or - inside shift
mingw32-make: *** [utils.o] Error 1
--
Regards,
Axel Pettinger
Hmm, you might try upgrading to a newer version of mingw (see 
http://www.mingw.org/).  Alternatively, you could try to comment-out the #define 
HAVE_UTIME_H 1 line in config.h.mingw or add a utime.h to your mingw include 
directory that consists of the following line:

#include 

HTH, Cheers





Re: Large Files Support for Wget

2004-05-10 Thread David Fritz
IIUC, GNU coreutils uses uintmax_t to store large numbers relating to the file 
system and prints them with something like this:

  char buf[INT_BUFSIZE_BOUND (uintmax_t)];
  printf (_("The file is %s octets long.\n"), umaxtostr (size, buf));
where umaxtostr() has the following prototype:

char *umaxtostr (uintmax_t, char *);

and it returns its second argument (the address of the buffer provided by the 
caller) so it can be used easily as an argument in printf calls.





Re: Input string size limitations

2004-06-02 Thread David Fritz
[redirecting this thread to the general discussion list [EMAIL PROTECTED]
Laura Sanders wrote:
I am using wget to pass order information, which includes item numbers,
addresses, etc.
I have run into a size limitation on the string I send into wget.
[...]
How are you `sending' the string to Wget?  Under what OS?
If you're running into command-line length limitations you can simply put the 
URL(s) in a file (one per line) and use -i.  If you use `wget -i -', Wget will 
read the list of URLs from stdin; this can be useful in avoiding the need for 
temporary files.

HTH, Cheers



Output error stream if response code != 200

2004-06-02 Thread Karr, David
When testing of posting to web services, if the service returns a SOAP
fault, it will set the response code to 500.  However, the information
in the SOAP fault is still useful.  When wget gets a 500 response code,
it doesn't try to output the "error stream" (as opposed to the "input
stream"), where this information would be provided.  It might be useful
to add a command-line option that specifies to emit the error stream on
error.


--continue breakage and changes

2004-06-02 Thread David Fritz
Because of the way the always_rest logic has been restructured, if a non-fatal 
error occurs in an initial attempt, subsequent retries will forget about 
always_rest and clobber the existing file.  Ouch.

Also, the behavior of –c when downloading from a server that does not support 
ranges has changed since 1.9.1.  (Or seems to have, from looking at the code; I 
haven't actually tested.)  Previously, Wget would bail in such situations:

Continued download failed on this file, which conflicts with `-c'.
Refusing to truncate existing file `%s'
Now, it will re-download the whole file and discard bytes util it gets to the 
right position.  I think this change deserves explicit mention in the NEWS file. 
 There's an entry about the new logic when Range requests fail, but I don't 
think it's obvious that this affects –c.

Also, I think the old behavior was useful in some situations.  If you're short 
on bandwidth it might not be worth it to re-get the whole file.  Especially when 
it's a popular file and there's likely to be another mirror that does support 
Range.  What would you think of an option to disallow start-over retries?



RE: Output error stream if response code != 200

2004-06-05 Thread Karr, David
> -Original Message-
> From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] 
> 
> "Karr, David" <[EMAIL PROTECTED]> writes:
> 
> > When testing of posting to web services, if the service 
> returns a SOAP 
> > fault, it will set the response code to 500.  However, the 
> information 
> > in the SOAP fault is still useful.  When wget gets a 500 response 
> > code, it doesn't try to output the "error stream" (as 
> opposed to the 
> > "input stream"), where this information would be provided.  
> It might 
> > be useful to add a command-line option that specifies to emit the 
> > error stream on error.
> 
> I'm not quite sure what you mean by "error stream" here.  The 
> thing is, web errors are usually in HTML, which is next to 
> useless if dumped on stdout by Wget.  But perhaps Wget should 
> show that output anyway when -S is used?

When I speak of the "error stream" as opposed to the "input stream", I
refer to the terms used in the Java API.  The latter is meaningful when
the response code is 2xx, and the error stream is meaningful when the
response code is not in that range.

When HTTP applications are HTML-based, both the input stream and the
error stream are likely to be HTML, so it doesn't make sense to exclude
one, but not the other.

However, when the HTTP application is XML-based, the error stream will
often be meaningful (and interpretable).

I don't know whether it's reasonable to make the error stream show up by
default, or turned on by an option.  It depends a bit on whether
changing that functionality would affect any existing uses.


Headers/resume -s/-c conflict

2004-06-22 Thread David Greaves
Hi
If I specify -s and -c then the resultant file is corrupted if a resume 
occurs because the resume sticks the headers partway through the file.

Additionally, the resume doesn't do a full grab because it miscounts the 
size by ignoring the header bytes.

Is this on anyones to-do list?
David


Re: Headers/resume -s/-c conflict

2004-06-29 Thread David Greaves
David Greaves wrote:
Hi
If I specify -s and -c then the resultant file is corrupted if a 
resume occurs because the resume sticks the headers partway through 
the file.

Additionally, the resume doesn't do a full grab because it miscounts 
the size by ignoring the header bytes.

Is this on anyones to-do list?
David

FYI
I've fixed and sent in a patch to fix this for the cvs version.
I also have a patch for 1.9.1 if anyone wants it.
It's particularly useful for apt-cacher which trips over this bug a lot.
David
Mail me at davidatdgreavesdotcom


2 giga file size limit ?

2004-09-09 Thread david coornaert




Hi all, 
I'm trying to get around this kind of message on  I*86 linux boxes
with wget 1.9.1


--11:12:08-- 
ftp://ftp.ensembl.org/pub/current_human/data/mysql/homo_sapiens_snp_23_34e/RefSNP.txt.table.gz
   =>
`current_human/data/mysql/homo_sapiens_snp_23_34e/RefSNP.txt.table.gz'
==> CWD not required.
==> PASV ... done.    ==> RETR RefSNP.txt.table.gz ... done.
Length: -1,212,203,102

The file is actually more than 3giga, 
since my main goal is to mirror the whole thing @ensembl-org, It would
be very fine  if mirroring could be used with huge files too


There is no trouble though on Tru64 machines.


here is what the .listing file says for this file on the linux boxes :

total 3753518
-rw-rw-r--   1 0    0 97960 Jul 21 17:05
Assay.txt.table.gz
-rw-rw-r--   1 0    0   279 Jul 21 19:29 CHECKSUMS.gz
-rw-rw-r--   1 0    0 153157540 Jul 21 17:08
ContigHit.txt.table.gz
-rw-rw-r--   1 0    0    32 Jul 21 17:08
DataSource.txt.table.gz
-rw-rw-r--   1 0    0  18359087 Jul 21 17:09
Freq.txt.table.gz
-rw-rw-r--   1 0    0 46848 Jul 21 17:09
GTInd.txt.table.gz
-rw-rw-r--   1 0    0 185265599 Jul 21 17:13
Hit.txt.table.gz
-rw-rw-r--   1 0    0  35914149 Jul 21 17:14
Locus.txt.table.gz
-rw-rw-r--   1 0    0    20 Jul 21 17:14
Pop.txt.table.gz
-rw-rw-r--   1 0    0    3082764194 Jul 21 19:21
RefSNP.txt.table.gz
-rw-rw-r--   1 0    0   195 Jul 21 19:21
Resource.txt.table.gz
-rw-rw-r--   1 0    0  72306055 Jul 21 19:23
Strain.txt.table.gz
-rw-rw-r--   1 0    0   9480171 Jul 21 19:23
SubPop.txt.table.gz
-rw-rw-r--   1 0    0 286116716 Jul 21 19:27
SubSNP.txt.table.gz
-rw-rw-r--   1 0    0 49095 Jul 21 19:23
Submitter.txt.table.gz
-rw-rw-r--   1 0    0  1697 Jul 21 19:27
homo_sapiens_snp_23_34e.sql.gz

You can see that the file is appropriately listed , though once the ftp
session is started it reports a negative size..

any solution ?




Re: 2 giga file size limit ?

2004-09-10 Thread david coornaert
Yep sorry to being a pain ,
I've seen a bit later that the issue raised quite a lot of times in the 
past,

My point is though that wget compiled on Tru64 OS does work with huge files.

Jonathan Stewart wrote:
Wget doesn't support >2GB files.  It is a known issue that is brought up a lot.
Please patch if you're able, so far no fix has been forthcoming.
Cheers,
Jonathan
- Original Message -
From: david coornaert <[EMAIL PROTECTED]>
Date: Thu, 09 Sep 2004 12:41:31 +0200
Subject: 2 giga file size limit ?
To: [EMAIL PROTECTED]
Hi all, 
I'm trying to get around this kind of message on  I*86 linux boxes
with wget 1.9.1

--11:12:08--  
ftp://ftp.ensembl.org/pub/current_human/data/mysql/homo_sapiens_snp_23_34e/RefSNP.txt.table.gz
   => `current_human/data/mysql/homo_sapiens_snp_23_34e/RefSNP.txt.table.gz'
==> CWD not required.
==> PASV ... done.==> RETR RefSNP.txt.table.gz ... done.
Length: -1,212,203,102
The file is actually more than 3giga, 
since my main goal is to mirror the whole thing @ensembl-org, It
would be very fine  if mirroring could be used with huge files too

There is no trouble though on Tru64 machines.
here is what the .listing file says for this file on the linux boxes :
total 3753518
-rw-rw-r--   1 00 97960 Jul 21 17:05 Assay.txt.table.gz
-rw-rw-r--   1 00   279 Jul 21 19:29 CHECKSUMS.gz
-rw-rw-r--   1 00 153157540 Jul 21 17:08 ContigHit.txt.table.gz
-rw-rw-r--   1 0032 Jul 21 17:08
DataSource.txt.table.gz
-rw-rw-r--   1 00  18359087 Jul 21 17:09 Freq.txt.table.gz
-rw-rw-r--   1 00 46848 Jul 21 17:09 GTInd.txt.table.gz
-rw-rw-r--   1 00 185265599 Jul 21 17:13 Hit.txt.table.gz
-rw-rw-r--   1 00  35914149 Jul 21 17:14 Locus.txt.table.gz
-rw-rw-r--   1 0020 Jul 21 17:14 Pop.txt.table.gz
-rw-rw-r--   1 003082764194 Jul 21 19:21 RefSNP.txt.table.gz
-rw-rw-r--   1 00   195 Jul 21 19:21 Resource.txt.table.gz
-rw-rw-r--   1 00  72306055 Jul 21 19:23 Strain.txt.table.gz
-rw-rw-r--   1 00   9480171 Jul 21 19:23 SubPop.txt.table.gz
-rw-rw-r--   1 00 286116716 Jul 21 19:27 SubSNP.txt.table.gz
-rw-rw-r--   1 00 49095 Jul 21 19:23 Submitter.txt.table.gz
-rw-rw-r--   1 00  1697 Jul 21 19:27
homo_sapiens_snp_23_34e.sql.gz
You can see that the file is appropriately listed , though once the
ftp session is started it reports a negative size..
any solution ?

 




maybe wget bug

2001-04-04 Thread David Christopher Asher

Hello,

I am using wget to invoke a CGI script call, while passing it several
variables.  For example:

wget -O myfile.txt
"http://user:[EMAIL PROTECTED]/myscript.cgi?COLOR=blue&SHAPE=circle"

where myscript.cgi say, makes an image based on the parameters "COLOR" and
"SHAPE".  The problem I am having is when I need to pass a key/value pair
where the value contains the "&" character.  Such as:

wget -O myfile.txt "http://user:[EMAIL PROTECTED]/myscript.cgi?COLOR=blue
& red&SHAPE=circle"

I have tried encoding the "&" as %26, but that does not seem to work (spaces
as %20 works fine).  The error log for the web server shows that the URL
requested does not say %26, but rather "&".  It does not appear to me that
wget is sending the %26 as %26, but perhaps "fixing" it to "&".

I am using GNU wget v1.5.3 with Red Hat 7.0

Thanks!

--
David Christopher Asher







Is there a way to override wgetrc options on command line?

2001-05-31 Thread Humes, David G.

Hello,

I have several cronjobs using wget and the wgetrc file turns on passive-ftp
by default.  I have one site where strangely enough passive ftp does not
work but active does work.  I'd rather leave the passive ftp default set and
just change the one cronjob that requires active ftp.  Is there any way to
tell wget to either disregard the wgetrc file or to override one or more of
its options?

Thanks.





RE: Is there a way to override wgetrc options on command line?

2001-06-01 Thread Humes, David G.

Thanks!  That worked.

--Dave

-Original Message-
From: Hack Kampbjørn [mailto:[EMAIL PROTECTED]]
Sent: Friday, June 01, 2001 2:25 AM
To: Humes, David G.
Cc: '[EMAIL PROTECTED]'
Subject: Re: Is there a way to override wgetrc options on command line?




"Humes, David G." wrote:
> 
> Hello,
> 
> I have several cronjobs using wget and the wgetrc file turns on
passive-ftp
> by default.  I have one site where strangely enough passive ftp does not
> work but active does work.  I'd rather leave the passive ftp default set
and
> just change the one cronjob that requires active ftp.  Is there any way to
> tell wget to either disregard the wgetrc file or to override one or more
of
> its options?
> 
> Thanks.

What about --execute=COMMAND ?

$ wget --help
GNU Wget 1.7-pre1, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...

Mandatory arguments to long options are mandatory for short options too.

Startup:
  -V,  --version   display the version of Wget and exit.
  -h,  --help  print this help.
  -b,  --backgroundgo to background after startup.
  -e,  --execute=COMMAND   execute a `.wgetrc'-style command.
[...]

-- 
Med venlig hilsen / Kind regards

Hack Kampbjørn   [EMAIL PROTECTED]
HackLine +45 2031 7799



Unsubscribe help please

2001-07-16 Thread Humes, David G.

Hello,

I have tried to unsubscribe several times by sending emails to
[EMAIL PROTECTED], but the wget emails keep coming.  I hate to
bother everyone on the list, but could someone please give me a way to
unsubscribe that works.

Thanks.

--Dave



wget and tag searching

2001-09-11 Thread J. David Bickel

Hi,

I am using the wget functionality in one of my projects to search through
web content.  However I note when I try to recur on a link found in the
page that only differs by a ?tag=pag&st=15 then wget seems to ignore
everything after the question mark .. thus returning the same content as
before.  I was wondering if this is a know issue and if you might have any
suggestions how I might be able to work around this so that wget can get
the webpage in question.

Thanks,
--Dave




RE: parameters in the URL

2002-01-14 Thread David Robinson (AU)


Hey, I remember this feature was in WGETWIN 1.5.3.1
It was really useful. But it is missing from WGET 1.8.1

I would like to see this feature added back into WGET
because at the moment it is completely broken when the
URL contains a question mark '?'.


Kind regards,
David Robinson

-- URL.C (1.5.3.1)

char *url_filename (const struct urlinfo *u)
{
  .
  .
  .
#ifdef WINDOWS
  {
char *p = file;
for (p = file; *p; p++)
  if ( (*p == '%') || (*p == '?') || (*p == '*') )
*p = '@';
  }
#endif /* WINDOWS */
  .
  .
  .
}

-- URL.C (1.8.1)

char *url_filename (const struct url *u)
{
  .
  .
  .
#ifdef WINDOWS
  {
char *p = file;
for (p = file; *p; p++)
  if (*p == '%')
*p = '@';
  }
#endif /* WINDOWS */
  .
  .
  .
}

--

From: Herold Heiko
Subject: RE: parameters in the URL
Date: Tue, 18 Dec 2001 07:07:07 -0800 


Older wget versions did some some other character translation, but
that
had other bigger sideeffects, and has been partially substituted by
the
current translation table.
Some time ago there had been extensive discussions regarding this
(at
the time of Dan iirc), and there had been some agreement there
wasn't
any perfect solution working for every case except if wget keeps an
external database (say, a file in every directory) where to record
exactly which translations have been made, in order to be able to
send
back the correct urls to web servers when necessary.
 
Heiko

-- 
-- PREVINET S.p.A.[EMAIL PROTECTED]
-- Via Ferretto, 1ph  x39-041-5907073
-- I-31021 Mogliano V.to (TV) fax x39-041-5907087
-- ITALY


>From the wget mailing list archives:
http://www.mail-archive.com/wget@sunsite.dk/msg02314.html



RE: Mapping URLs to filenames

2002-01-15 Thread David Robinson (AU)

Hello


The '%' character is valid within Win32 filenames. The '*' and '?' are not
valid filename characters.

The '%' and '*' are wildcard characters, which is probably why they were
excluded in previous versions.

There will always be problems mapping strings between namespaces, such as
URLs and file systems. WGET could be extended to call an optional shared
library provided by the user. This would permit the user to build a
URL/Filename mapping table however they chose.

In the meantime, however, '?' is problematic for Win32 users. It stops WGET
from working properly whenever it is found within a URL. Can we fix it
please.


Kind regards
David Robinson

-Original Message-
From: Herold Heiko [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, 16 January 2002 03:28
To: Wget Development
Subject: RE: Mapping URLs to filenames


Some comments.

a) escape character remapping -> % not best choice ?
If I understood correctly how you are proposing to remap the urls to
directories and files we'll need to remap the escape character, too, IF
that character is a legal char for urls, otherwise it would be not
immedeatly obvious if a %xx was part of the url or is a tranlation made
by wget.
This means IF a url did contain something like somewhat%other we'll have
a file like named somewhat%25other (supposing the charset used to
generate the hex values contains % at 0x25 like ascii does) but this
also means a url fragment some%20what would map to a file some%2520what
- not a pretty thing.So possibly the % is not a good choice.

b) we're treating mostly html - remap to html entities ?
Would it be good to map some characters to things like 'agrave' instead
of hex values ? Probably not. Forget it.

b) @ on windows
I'm not sure if on some dos/win* platforms the % was ever a illegal
character.
As you stated dos/win batch files could generate real ugliness with
files containing, say, %06%20 or something (not that should be ever part
of a url but...).
Please note, if some other character than % is the escape char (say @),
some%thing should not be encoded but some%20thing most definitively on
windows should (dangerous three-char combination, a batchfile could
later somehow interpret this as "positional parameter number 20"). But
see my later point for the "at least on windows" part.

c) filename length
Why remap only dangerous characters ? There are filenames with
file/directory length limitations (minix 14 ? old unixes 14 ? dos 8.3 ?
iso9660 8.3 ? iso9660+joliet 63 ? Some other at 254 ? All of these could
have some problems with long filenames, urls generated by
cgi/jsp/whatever and so on).
However, remapping long files/directories to shorter ones creates a BIG
problems (IIRC first raised by Dan): collision - say the current file
systems is dossish and supports minimal 8.3 filenames. How to remap if
we need to save in the same directory both 01234567.htm and
0123456789.htm and lots of similar filenames ? Whatever mapping is done
"later" another file in the same directory could need exactly that name
- which means the only way to have a complete working mapping between
url fragments and filenames is a external table (some file wget
maintains in every directory).
Note the "every" - if that table would be unique for the whole download,
say, in the starting directory, it would not be available anymore if
later only some branch of the downloaded directories is used for a
successive run, so the table location must be obvious from the directory
location itself. Having a single, unique master table for the whole
download would mean lot of splicing and joining when changing parts of
the local copy before a successive run. Having a different location (not
in the directory itself) would mean more difficulty when moving those
directories around the local filesystem (need to move the directory and
- somewhere else - the table).

d) "presets"
As you said there's always the odd combination (save as vfat from linux,
save as iso9660 from whatever os, ecc.). Users should not be required to
know exactly what the requirements are (at least for the more usual file
systems - generic "longnames" unix, vfat, fat, ntfs, iso9660, vms, minix
should cover most cases) - they are users, not admins.
Beside the possibility of specifying an exact, manual, detailed setup
(command line probably is too complex, rule file specified from command
line or .wgetrc I'd say), there should be some presets included for
those usual cases mentioned above. Possibly the above+iso9660, too.
This could be as easy as some ruleset files included in the sources,
mentioned in the docs and installed by default (/usr/local/lib/wget or
wherever), or even compiled in, although compiling in any ruleset
different than the default is probably not worth it (to avoid binary
bloating, we need to be able to load external rules a

RE: Mapping URLs to filenames

2002-01-16 Thread David Robinson (AU)


I like this proposal. This would restore the version 1.5.3 behaviour.

David.

-Original Message-
From: Ian Abbott [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, 16 January 2002 21:48
To: Wget List
Subject: RE: Mapping URLs to filenames


On 16 Jan 2002 at 8:02, David Robinson (AU) wrote:

> In the meantime, however, '?' is problematic for Win32 users. It stops
WGET
> from working properly whenever it is found within a URL. Can we fix it
> please.

My proposal for using escape sequences in filenames for problem
characters is up for discussion at the moment, but I'm not sure if
they really need to be reversible (except that it helps to reduce
the chances of different URLs being saved to the same filename).

Would it be sufficient to map all illegal characters to '@'? For
Windows, the code already changes '%' to '@' and it could just as
easily change '*', '?', etc. to '@' as well.



timestamping

2002-04-15 Thread David C. Anderson

This isn't a bug, but the offer of a new feature.  The timestamping
feature doesn't quite work for us, as we don't keep just the latest
view of a website and we don't want to copy all those files around for
each update.

So I implemented a --changed-since=mmdd[hhmm] flag to only get
files that have changed since then according to the header.  It seems
to work okay, although your extra check for file-size eqality for the
timestamping feature makes me wonder if the date isn't always a good
measure.

One oddity is that if you point wget at a file that's older than the
date at the top level, it won't be gotten and there won't be any urls
to recurse on.  (We're pointing it at an url that changes daily.)

I tested it under Solaris 7, but there is a dependency on time() and
gmtime() that I haven't conditionalized for autoconf, as I am not
familiar with that tool.

I would like this feature to get carried along with the rest of the
codebase; would you like it?

-dca




spanish characters at name file

2002-09-30 Thread David Cañizares Hernandez

Hi all,
I'm a spanish guy who is working with this good program but I'm having problems with some spanish characters and blanks(only in the begining) of the file name which I've tried to download from a ftp. An example could be: "/tmp/camaras y acción.jpg"
 if there is anyone who has solved this problem, please tell me, if not, I'm trying to recompile the source with the apropiate modifications so if anyone could helpme in this field I would be very grateful. 
 
Thx 4 all.
:)MSN Fotos: la forma más fácil de compartir e imprimir fotos. Haga clic aquí


--mirror not downloading everything

2002-10-05 Thread David Cañizares Hernandez

Hi all,
I want to mirror an ftp, but there are some files that it can't download:

ftp> pwd
257 "/Admon/datosvo/empresas" is current directory.
ftp> ls
227 Passive Mode (x)
125 Data connection already open; Transfer starting.
10-02-01  11:05AM Vehiculos Ocasión
226 Transfer complete.

then
wget --mirror ftp://:[EMAIL PROTECTED]/Admon/datosvo/empresas/*

but
==> CWD /Admon/datosvo/empresas/Vehiculos Ocasión ...
No existe el directorio `Admon/datosvo/empresas/Vehiculos Ocasión'.
(directory `Admon/datosvo/empresas/Vehiculos Ocasión' does not exist)


Thanks in advance.
David.




_
Únase al mayor servicio mundial de correo electrónico: 
http://www.hotmail.com/es




wget -nd -r doesn't work as documented

2003-05-30 Thread David B. Tucker
Doing wget -nd -r doesn't overwrite a file of the same name, as the
documentation claims.  Is there any other way to do this?  Thanks.

Dave


Two cookie bugs and a problem

2004-11-12 Thread David M. Bennett
The challenge is to navigate into an ASP.NET site, which requires a sequence
of GET and POST requests and that uses multiple session cookies.

1. Session cookies don't save in the release version (1.9.1).

I downloaded a pre-release version which supports --keep-session-cookies.
However...

2. Only 1 session cookie is saved. The site generates many. Result:
showstopper.

3. You can't issue multiple POSTs in a single command.

Which brings us back to (1). To navigate into a site that requires some
mixture of GET and POST requests to login and find the right page is just
hard work. You have to use 3 cookie switches on every command to --load,
--save and --keep session cookies. 

I pray for something better, like IJW (It Just Works).

To summarise


All I want is to be able to issue a sequence of GET and POST requests in a
specific order prior to specifying the file(s) to download, and to have
session cookies automatically do the right thing. Is this too much to ask?

David B.




RE: new bug tracking system for GNU wget

2004-12-01 Thread David M. Bennett
> if i don't find any major problem, i am planning to release wget 1.9.2
with 
> LFS support and a long list of bugfixes before the end of the year.

Are you planning to fix session cookies?

In the current release version they don't work. In the tip build they nearly
work, but I got problems logging in to an ASP.NET site with multiple session
cookies (only the first one seemed to work).

I don't have a repro case, but I'm hoping you've got some unit test cases on
multiple session cookies and it's in that long list somewhere.

[I'm now using curl, which handles this just fine.]

BTW a compact syntax would be nice, combining the functions of
--load-cookies, --save-cookies and --keep-session-cookies when stringing
together multiple wget commands in one session.

How about -C with a default cookie file name of cookies.txt?

DavidB




RE: Wget 1.11.3 - case sensetivity and URLs

2008-06-19 Thread Coombe, Allan David (DPS)
Thanks averyone for the contributions.

Ultimately, our purpose is to process documents from the site into our
search database, so probably the most important thing is to limit the
number of files being processed.  The case of  the URLs in the html
probably wouldn't cause us much concern, but I could see that it might
be useful to "convert" a site for mirroring from a non-case sensetive
(windows) environment to a case sensetive (li|u)nix one - this would
need to include translation of urls in content as well as filenames on
disk.

In the meantime - does anyone know of a proxy server that could
translate urls from mixed case to lower case.  I thought that if we
downloaded using wget via such a proxy server we might get the
appropriate result.  

The other alternative we were thinking of was to post process the files
with symlinks for all mixed case versions of files and directories (I
think someone already suggested this - greate minds and all that...). I
assume that wget would correctly use the symlink to determine the
time/date stamp of the file for determining if it requires updating (or
would it use the time/date stamp of the symlink?). I also assume that if
wget downloaded the file it would overwrite the symlink and we would
have to run our "convert files to" symlinks process again.

Just to put it in perspective, the actual site is approximately 45gb
(that's what the administrator said) and wget downloaded > 100gb
(463,000 files) when I did the first process.

Cheers
Allan

-Original Message-
From: Micah Cowan [mailto:[EMAIL PROTECTED] 
Sent: Saturday, 14 June 2008 7:30 AM
To: Tony Lewis
Cc: Coombe, Allan David (DPS); 'Wget'
Subject: Re: Wget 1.11.3 - case sensetivity and URLs


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tony Lewis wrote:
> Micah Cowan wrote:
> 
>> Unfortunately, nothing really comes to mind. If you'd like, you could

>> file a feature request at 
>> https://savannah.gnu.org/bugs/?func=additem&group=wget, for an option

>> asking Wget to treat URLs case-insensitively.
> 
> To have the effect that Allan seeks, I think the option would have to 
> convert all URIs to lower case at an appropriate point in the process.

> I think you probably want to send the original case to the server 
> (just in case it really does matter to the server). If you're going to

> treat different case URIs as matching then the lower-case version will

> have to be stored in the hash. The most important part (from the 
> perspective that Allan voices) is that the versions written to disk 
> use lower case characters.

Well, that really depends. If it's doing a straight recursive download,
without preexisting local files, then all that's really necessary is to
do lookups/stores in the blacklist in a case-normalized manner.

If preexisting files matter, then yes, your solution would fix it.
Another solution would be to scan directory contents for the first name
that matches case insensitively. That's obviously much less efficient,
but has the advantage that the file will match at least one of the
"real" cases from the server.

As Matthias points out, your lower-case normalization solution could be
achieved in a more general manner with a hook. Which is something I was
planning on introducing perhaps in 1.13 anyway (so you could, say, run
sed on the filenames before Wget uses them), so that's probably the
approach I'd take. But probably not before 1.13, even if someone
provides a patch for it in time for 1.12 (too many other things to focus
on, and I'd like to introduce the "external command" hooks as a suite,
if possible).

OTOH, case normalization in the blacklists would still be useful, in
addition to that mechanism. Could make another good addition for 1.13
(because it'll be more useful in combination with the rename hooks).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIUua+7M8hyUobTrERAr0tAJ98A/WCfPNhTOQ3Xcfx2eWP2stofgCcDUUQ
nVYivipui+0TRmmK04kD2JE=
=OMsD
-END PGP SIGNATURE-


RE: Wget 1.11.3 - case sensetivity and URLs

2008-06-21 Thread Coombe, Allan David (DPS)
OK - now I am confused.

I found a perl based http proxy (named "http::proxy" funnily enough)
that has filters to change both the request and response headers and
data.  I modified the response from the web site to lowercase the urls
in the html (actually I lowercased the whole response) and the data that
wget put on disk was fully lowercased - problem solved - or so I
thought.

However, the case of the files on disk is still mixed - so I assume that
wget is not using the URL it originally requested (harvested from the
HTML?) to create directories and files on disk.  So what is it using? A
http header (if so, which one??).

Any ideas??

Cheers
Allan


RE: Wget 1.11.3 - case sensetivity and URLs

2008-06-24 Thread Coombe, Allan David (DPS)
Sorry Guys - just an ID 10 T error on my part.

I think I need to change 2 things in the proxy server.

1.  URLs in the HTML being returned to wget - this works OK
2.  The "Content-Location" header used when the web server reports a
"301 Moved Permanently" response - I think this works OK.

When I reported that it wasn't working I hadn't done both at the same
time.

Cheers

Allan

-Original Message-
From: Micah Cowan [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, 25 June 2008 6:44 AM
To: Tony Lewis
Cc: Coombe, Allan David (DPS); 'Wget'
Subject: Re: Wget 1.11.3 - case sensetivity and URLs


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tony Lewis wrote:
> Coombe, Allan David (DPS) wrote:
> 
>> However, the case of the files on disk is still mixed - so I assume 
>> that wget is not using the URL it originally requested (harvested 
>> from the HTML?) to create directories and files on disk.  So what is 
>> it using? A http header (if so, which one??).
> 
> I think wget uses the case from the HTML page(s) for the file name; 
> your proxy would need to change the URLs in the HTML pages to lower 
> case too.

My understanding from David's post is that he claimed to have been doing
just that:

> I modified the response from the web site to lowercase the urls in the

> html (actually I lowercased the whole response) and the data that wget

> put on disk was fully lowercased - problem solved - or so I thought.

My suspicion is it's not quite working, though, as otherwise where would
Wget be getting the mixed-case URLs?

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIYVyq7M8hyUobTrERAo6mAJ4ylEi5qUZqE7DR8xL2XjWOSfuurACePrIz
Vl7REl1hNVNqdBrLqoygrcE=
=jlBN
-END PGP SIGNATURE-