WGET -O Help
Hi, Don't know if this will be answered - but I had to ask (since I DID read the man page! :)P ) Symptom : automating my stock research I type a command as "wget -p -H -k -nd -nH -x -Ota.html -Dichart.finance.yahoo.com -Pbtu "http://finance.yahoo.com/q/ta?s=btu&t=6m&l=on&z=l&q=b&p=b,p,s,v&a=m26-12-9,p12,vm&c=""; Which : 1. Downloads the html page and -O -> outputs to ta.html as requested! GOOD 2. Downloads the external link to the graph in the page as requested! GOOD! 3. Outputs the graph to ta.html (replacing original ta.html)... BAD. Reason for using -O is because I want the filename to beusefull instead of [EMAIL PROTECTED]&t=6... blah blah blah 4. If I remove -O it outputs the files into the url directores of finance.yahoo.com and ichart.finance.yahoo.com but the filename is the funky url one I can live with that - but is there not a way to get the page AND the external url pictures with decent names using somthing like-O? -Dave __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: Wishlist: support the file:/// protocol
In replies to the post requesting support of the file:// scheme, requests were made for someone to provide a compelling reason to want to do this. Perhaps the following is such a reason.I have a CD with HTML content (it is a CD of abstracts from a scientific conference), however for space reasons not all the content was included on the CD there remain links to figures and diagrams on a remote web site. Id like to create an archive of the complete content locally by having wget retrieve everything and convert the links to point to the retrieved material. Thus the wget functionality when retrieving the local files should work the same as if the files were retrieved from a web server (i.e. the input local file needs to be processed, both local and remote content retrieved, and the copies made of the local and remote files all need to be adjusted to now refer to the local copy rather than the remote content). A simple shell script that runs cp or rsync on local files without any further processing would not achieve this aim.Regarding to where the local files should be copied, I suggest a default scheme similar to current http functionality. For example, if the local source was /source/index.htm, and I ran something like: wget.exe -m -np -k file:///source/index.htm this could be retrieved to ./source/index.htm (assuming that I ran the command from anywhere other than the root directory). On Windows, if the local source file is c:\test.htm, then the destination could be .\c\test.htm. It would probably be fair enough for wget to throw up an error if the source and destination were the same file (and perhaps helpfully suggest that the user changes into a new subdirectory and retry the command). One additional problem this scheme needs to deal with is when one or more /../ in the path specification results in the destination being above the current parent directory; then the destination would have to be adjusted to ensure the file remained within the parent directory structure. For example, if I am in /dir/dest/ and ran wget.exe -m -np -k file://../../source/index.htm this could be saved to ./source/index.htm (i.e. /dir/dest/source/index.htm)-David. On Yahoo!7 Socceroos Central: Latest news, schedule, blogs and videos.
Re: Support for file://
Hi Micah, Your're right - this was raised before and in fact it was a feature Mauro Tortonesi intended to be implemented for the 1.12 release, but it seems to have been forgotten somewhere along the line. I wrote to the list in 2006 describing what I consider a compelling reason to support file://. Here is what I wrote then: At 03:45 PM 26/06/2006, David wrote: In replies to the post requesting support of the "file://" scheme, requests were made for someone to provide a compelling reason to want to do this. Perhaps the following is such a reason. I have a CD with HTML content (it is a CD of abstracts from a scientific conference), however for space reasons not all the content was included on the CD - there remain links to figures and diagrams on a remote web site. I'd like to create an archive of the complete content locally by having wget retrieve everything and convert the links to point to the retrieved material. Thus the wget functionality when retrieving the local files should work the same as if the files were retrieved from a web server (i.e. the input local file needs to be processed, both local and remote content retrieved, and the copies made of the local and remote files all need to be adjusted to now refer to the local copy rather than the remote content). A simple shell script that runs cp or rsync on local files without any further processing would not achieve this aim. Regarding to where the local files should be copied, I suggest a default scheme similar to current http functionality. For example, if the local source was /source/index.htm, and I ran something like: wget.exe -m -np -k file:///source/index.htm this could be retrieved to ./source/index.htm (assuming that I ran the command from anywhere other than the root directory). On Windows, if the local source file is c:\test.htm, then the destination could be .\c\test.htm. It would probably be fair enough for wget to throw up an error if the source and destination were the same file (and perhaps helpfully suggest that the user changes into a new subdirectory and retry the command). One additional problem this scheme needs to deal with is when one or more /../ in the path specification results in the destination being above the current parent directory; then the destination would have to be adjusted to ensure the file remained within the parent directory structure. For example, if I am in /dir/dest/ and ran wget.exe -m -np -k file://../../source/index.htm this could be saved to ./source/index.htm (i.e. /dir/dest/source/index.htm) -David. At 08:49 AM 3/09/2008, you wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Petri Koistinen wrote: > Hi, > > I would be nice if wget would also support file://. Feel free to file an issue for this (I'll mark it "Needs Discussion" and set at low priority). I'd thought there was already an issue for this, but can't find it (either open or closed). I know this has come up before, at least. I think I'd need some convincing on this, as well as a clear definition of what the scope for such a feature ought to be. Unlike curl, which "groks urls", Wget "W(eb)-gets", and file:// can't really be argued to be part of the web. That in and of itself isn't really a reason not to support it, but my real misgivings have to do with the existence of various excellent tools that already do local-file transfers, and likely do it _much_ better than Wget could hope to. Rsync springs readily to mind. Even the system "cp" command is likely to handle things much better than Wget. In particular, special OS-specific, extended file attributes, extended permissions and the like, are among the things that existing system tools probably handle quite well, and that Wget is unlikely to. I don't really want Wget to be in the business of duplicating the system "cp" command, but I might conceivably not mind "file://" support if it means simple _content_ transfer, and not actual file duplication. Also in need of addressing is what "recursion" should mean for file://. Between ftp:// and http://, "recursion" currently means different things. In FTP, it means "traverse the file hierarchy recursively", whereas in HTTP it means "traverse links recursively". I'm guessing file:// should work like FTP (i.e., recurse when the path is a directory, ignore HTML-ness), but anyway this is something that'd need answering. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvcLq7M8hyUobTrERAl6YAJ9xeTINVkuvl8HkElYlQt7dAsUfHACfXRT3 lNR++Q0XMkcY4c6dZu0+gi4= =mKqj -END PGP SIGNATURE- Make the switch to the world's best email. Get Yahoo!7 Mail! http://au.yahoo.com/y7mail
Website Port problem...
Hi, I have a problem on using wget, as follows: I want to download a bunch of files in, say, www.server.com/dir/files, and I found out that wget is contacting www.server.com:80, and the files it get is not what I'm looking for. I typed www.server.com:80/dir/files in netscape and found out that the result I get is different from www.server.com/dir/files! So how can I get the files? Thx! DrDave - Sign up for ICQmail at http://www.icq.com/icqmail/signup.html
Re: Website Port problem...
The version I'm using is 1.7.1 On Thu, 29 November 2001, Hrvoje Niksic wrote: > > David <[EMAIL PROTECTED]> writes: > > > I have a problem on using wget, as follows: > > What version of Wget are you using? > > > I want to download a bunch of files in, say, > > www.server.com/dir/files, and I found out that wget is contacting > > www.server.com:80, and the files it get is not what I'm looking for. > > I believe this has been fixed in later versions of Wget. - Sign up for ICQmail at http://www.icq.com/icqmail/signup.html
--continue still broken
This problem seems to have gone overlooked: http://www.mail-archive.com/wget%40sunsite.dk/msg06527.html http://www.mail-archive.com/wget%40sunsite.dk/msg06560.html Sorry for not including a patch.
Re: ftp bug in 1.10
"I64" is a size prefix akin to "ll". One still needs to specify the argument type as in "%I64d" as with "%lld".
Checking for broken links
Hi I am trying to use wget to check for broken links on a web site as follows: wget --spider --recursive -np -owesc.log http://www.wesc.ac.uk/ but I get back a message "www.wesc.ac.uk/index.html: No such file or directory". Can anyone tell me how to fix this, or else suggest another way of using wget to check for broken links. Thanks David
I want -p to download external links
Hello, When I download a complete website using "wget -rpk -l inf http://...";, some webpages are incomplete because -p does not follow external links. I do not want to download external webpages, I only want to download external images/files referenced in the domain a.com. How can I achieve this? Thank you very much. Regards, David Srbecky
wget -N url -O file won't check timestamp
relating to: GNU Wget 1.10.2 on Debian testing/unstable using Linux kernel 2.6.4 wget -N http://domain.tld/downloadfile -O outputfile downloads outputfile Doing it again does it again regardless of timestamp. It does not check outputfile's timestamp against downloadfile, as prescribed by -N. wget -N without -O works as intended. Thanks. - - David "cdlu" Graham - [EMAIL PROTECTED] Guelph, Ontario - http://www.railfan.ca/
DNS through proxy with wget
Inside our firewall, we can't do simple DNS lookups for hostnames outside of our firewall. However, I can write a Java program that uses commons-httpclient, specifying the proxy credentials, and my URL referencing an external host name will connect to that host perfectly fine, obviously resolving the DNS name under the covers. If I then use wget to do a similar request, even if I specify the proxy credentials, it fails to find the host. If I instead plug in the IP address instead of the hostname, it works fine. I noticed that the command-line options for wget allow me to specify the proxy user and password, but they don't have a way to specify the proxy host and port. Am I missing something, or is this a flaw (or missing feature) in wget?
RE: wget 1.11 beta 1 released
Does this happen to resolve the issue I asked about a few days ago (no response yet) where DNS doesn't resolve in the presence of an authenticated proxy? > -Original Message- > From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] > Sent: Tuesday, August 22, 2006 8:01 AM > To: wget@sunsite.dk > Subject: wget 1.11 beta 1 released > > > hi to everybody, > > i've just released wget 1.11 beta 1: > > ftp://alpha.gnu.org/pub/pub/gnu/wget/wget-1.11-beta-1.tar.gz > > you're very welcome to try it and report every bug you might > encounter.
.listing files and ftp_proxy
Hi, I've looked, but been unable to find the answer to this rather simple question. (It's been asked before, but I can't see an answer.) wget --passive-ftp --dont-remove-listing -d "ftp://ftp.ebi.ac.uk/"; gives me a .listing file, but: wget -e ftp_proxy=http://proxy:1234 --passive-ftp --dont-remove-listing -d "ftp://ftp.ebi.ac.uk/"; just gives me the index.html file and no .listing file. Using alternate ways of specifying the proxy server doesn't make any difference. Is there any easy fix for this, or is it the same as: http://www.mail-archive.com/wget@sunsite.dk/msg08572.html Thanks in advance for any advice, David
Re: .listing files and ftp_proxy
I realise that I may not have provided enough information to get an answer to this... I've tried this using the latest version (1.10.2) on Debian Linux 3.1 However, I've also tried with a variety of earlier versions on other platforms and it looks as though it has never worked on any platform. If anybody knows if this is a bug or something that just can't/won't be fixed, I'd be very, very grateful for an answer. We use wget a lot, and it's just perfect for our needs. However, some of our customers are stuck behind a proxy and can't use the scripts we've developed that use wget because of this problem. Thanks, David David Creasy wrote: Hi, I've looked, but been unable to find the answer to this rather simple question. (It's been asked before, but I can't see an answer.) wget --passive-ftp --dont-remove-listing -d "ftp://ftp.ebi.ac.uk/"; gives me a .listing file, but: wget -e ftp_proxy=http://proxy:1234 --passive-ftp --dont-remove-listing -d "ftp://ftp.ebi.ac.uk/"; just gives me the index.html file and no .listing file. Using alternate ways of specifying the proxy server doesn't make any difference. Is there any easy fix for this, or is it the same as: http://www.mail-archive.com/wget@sunsite.dk/msg08572.html Thanks in advance for any advice, David -- David Creasy
Windows WGET 1.10.2 - two bugs
Re: Windows Wget 1.10.2 - two bugs Bug 1) Wget's manual says as shown below, but Windows Wget does not generate the file.1 and file2. - it just overwrites. To reproduce the problem: WGet -S -N http://www.pjm.com/pub/account/lmpgen/lmppost.html Wget will keep overwriting the local file each time the web page's timestamp updates, rather than creating numbered versions. from manual: -nc --no-clobber If a file is downloaded more than once in the same directory, Wget's behavior depends on a few options, including -nc. In certain cases, the local file will be clobbered, or overwritten, upon repeated download. In other cases it will be preserved. When running Wget without -N, -nc, or -r, downloading the same file in the same directory will result in the original copy of file being preserved and the second copy being named file.1. If that file is downloaded yet again, the third copy will be named file.2, and so on. When -nc is specified, this behavior is suppressed, and Wget will refuse to download newer copies of file. Therefore, "no-clobber" is actually a misnomer in this mode--it's not clobbering that's prevented (as the numeric suffixes were already preventing clobbering), but rather the multiple version saving that's prevented. When running Wget with -r, but without -N or -nc, re-downloading a file will result in the new copy simply overwriting the old. Adding -nc will prevent this behavior, instead causing the original version to be preserved and any newer copies on the server to be ignored. When running Wget with -N, with or without -r, the decision as to whether or not to download a newer copy of a file depends on the local and remote timestamp and size of the file (see Time-Stamping.). -nc may not be specified at the same time as -N. Note that when -nc is specified, files with the suffixes .html or .htm will be loaded from the local disk and parsed as if they had been retrieved from the Web. --- Bug 2) (Sort of a bug - or a feature request) Normal Windows protocol is to have an ERRORLEVEL returned, whose value indicates program success or failure. WGET is not doing this - in the above example, the ERRORLEVEL is the same for the two cases of retrieving an updated page and not retrieving an updated page (i.e. page is unchanged). This makes it impossible to build a batch file whose behavior is conditional on a new file being downloaded. --- Otherwise a great program. Very useful. David MacMillan
Re: .1, .2 before suffix rather than after
> i totally agree with hrvoje here. also note that changing wget > unique-name-finding algorithm can potentially break lots of wget-based > scripts out there. i think we should leave these kind of changes for wget2 > - or wget-on-steroids or however you want to call it ;-) So can I ask is a wget2 actualy being developed ?
Re: wget2
On Friday 30 November 2007 00:02:25 Micah Cowan wrote: > Alan Thomas wrote: > > What is wget2? Any plans to move to Java? (Of course, the latter > > will not be controversial. :) > > Java is not likely. The most likely language is probably still C, > especially as that's where our scant human resource assets are > specialized currently. I have toyed with thoughts of C++ or Python, > however - especially as the use of higher-level languages could allow > more rapid development, which is nice, given our (again) scant assets. I'd vote for Python :-) > :) The truth is, it's too early to say, given that work hasn't even > > begun to have... begun. :D > > C still remains by far the most portable language (though of course, > writing it portably is tricky ;) ). But that's a bigger issue for the > existing Wget's purposes probably, than "new-fangled Wget 2". > > For information on what is planned for "Wget 2", check out the "Next > Generation" and "Unofficially Supported" sections of this page: > http://wget.addictivecode.org/FeatureSpecifications, and particularly, > this thread: http://www.mail-archive.com/wget%40sunsite.dk/index.html#10511 Thanks for the links:-) I really liked this idea - "An API for developers to write their own dynamically-loaded plugins" What I'm looking at wget for is saving streamed mp3 from a radio station, crazy but true.. such is life.
Re: Wget for MP3 streams
On Friday 30 November 2007 01:03:06 Micah Cowan wrote: > David Ginger wrote: > > What I'm looking at wget for is saving streamed mp3 from a radio station, > > crazy but true.. such is life. > > Isn't that already possible now? Provided that the transport is HTTP, > that is? Yes and No . . . Yes I can save a stream, But, not everything works as expected, some of wget's features kick in. Like, I cant get the quota to work no matter how much I fiddle and tinker.
Re: Wget for MP3 streams
On Friday 30 November 2007 03:38:54 Micah Cowan wrote: > David Ginger wrote: > > On Friday 30 November 2007 01:03:06 Micah Cowan wrote: > >> David Ginger wrote: > >>> What I'm looking at wget for is saving streamed mp3 from a radio > >>> station, crazy but true.. such is life. > >> > >> Isn't that already possible now? Provided that the transport is HTTP, > >> that is? > > > > Yes and No . . . > > > > Yes I can save a stream, > > > > But, not everything works as expected, some of wget's features kick in. > > > > Like, I cant get the quota to work no matter how much I fiddle and > > tinker. > Not too surprising, since the documentation points out that the quota > never affects the downloading of a single file. :\ So I downloaded the source code . . . and subscribed to the mailing list to find out why :-)
Re: wget2
On Friday 30 November 2007 13:45:08 Mauro Tortonesi wrote: > On Friday 30 November 2007 11:59:45 Hrvoje Niksic wrote: > > Mauro Tortonesi <[EMAIL PROTECTED]> writes: > > >> I vote we stick with C. Java is slower and more prone to environmental > > >> problems. > > > > > > not really. because of its JIT compiler, Java is often as fast as > > > C/C++, and sometimes even significantly faster. > > > > Not if you count startup time, which is crucial for a program like > > Wget. Memory use is also incomparable. > > right. i was not suggesting to implement wget2 in Java, anyway ;-) > > but we could definitely make good use of dynamic languages such as Ruby (my > personal favorite) or Python, at least for rapid prototyping purposes. both > Ruby and Python support event-driven I/O (http://rubyeventmachine.com for > Ruby, and http://code.google.com/p/pyevent/ for Python) and asynch DNS > (http://cares.rubyforge.org/ for Ruby and > http://code.google.com/p/adns-python/ for Python) and both are relatively > easy to interface with C code. > writing a small prototype for wget2 in Ruby or Python at first, and then > incrementally rewrite it in C would save us a lot of development time, > IMVHO. > what do you think? Python.
Re: Work on your computer! Register Key: QD5V56G5
On Friday 07 December 2007 12:35:32 Jerrold Massey wrote: > JOB IN OUR COMPANY Dating Team company: So which switch option makes wget a hot date then ? --babe ?
Hello, All and bug #21793
Hello everyone, I thought I'd introduce myself to you all, as I intend to start helping out with wget. This will be my first time contributing to any kind of free or open source software, so I may have some basic questions down the line about best practices and such, though I'll try to keep that to a minimum. Anyway, I've been researching unicode and utf-8 recently, so I'm gonna try to tackle bug #21793 <https://savannah.gnu.org/bugs/?21793>. -David A Coon
Spam
Is anyone else getting this junk, with the wget servers as the intermediary? Return-Path: <[EMAIL PROTECTED]> Received: from sunsite.auc.dk (sunsite.dk [130.225.51.30]) by www.cedar.net (8.9.3/SCO5.0.4) with SMTP id XAA21229 for <[EMAIL PROTECTED]>; Tue, 9 Jan 2001 23:19:41 GMT Received: (qmail 29057 invoked by alias); 9 Jan 2001 23:19:06 - Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm Precedence: bulk Delivered-To: mailing list [EMAIL PROTECTED] Received: (qmail 29051 invoked from network); 9 Jan 2001 23:19:05 - From: [EMAIL PROTECTED] Subject: Incredible Home e-Business Opportunity! Message-ID: <[EMAIL PROTECTED]> Date: Sat, 06 Jan 2001 16:43:40 -0500 To: [EMAIL PROTECTED] Content-Type: text/plain; charset="iso-8859-1" Reply-To: [EMAIL PROTECTED] Dear Friend, Perhaps this might be of interest to you. If not, please disregard. -- Where's dave? http://www.findu.com/cgi-bin/find.cgi?kc6ete-9
images with absolute references
In the page: www.objectmentor.com/publications/articlesbysubject.html there are images that have absolute URLs (ie. http://www.objectmentor.com...) that are not downloaded when the -p option is specified. I had understood that this is what the -p and -k options do. If I have misunderstood the -p and -k options, or misconfigured something, please excuse me. Thanks for your time.. PS. The wget command line I used was: wget -H www.objectmentor.com/publications/articlesbysubject.html and my .wgetrc is: WGet RC file to implement the command line parameters: # wget -P webcache -p -nc -l1 -k -A gif,jpg,png # Accepti the following file types accept = gif,jpg,png # Convert links locally convert_links = on # Use FTP follow_ftp = off # Preserve existing files noclobber = on # Ignore the sometimes incorrect content length header. ignore_length = off # Get the requisites for each page page_requisites = on # Use recursive retrieval recursive = off # Levels to recurse reclevel = 1 # Timestamp files timestamping = off # Build the directory tree locally dirstruct = on # Top of the local directory tree dir_prefix = webcache # --EOF -- ~|~\ /~\ | | |~~Dave Killick[EMAIL PROTECTED] | | |_| | | |--+44 (0)1225 475235 _|_/ | | \_/ |__IPL Information Processing Limited
images with absolute references (more info)
Sorry, following on from my earlier message, I forgot to mention that my wget version us: GNU Wget 1.6 and if it is of any consequence, the output from 'uname -srvmpi' is SunOS 5.7 Generic_106542-05 i86pc i386 i86pc Thanks. --- Forwarded message follows --- From: David Killick <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject:images with absolute references Date sent: Tue, 22 May 2001 11:57:21 +0100 In the page: www.objectmentor.com/publications/articlesbysubject.html there are images that have absolute URLs (ie. http://www.objectmentor.com...) that are not downloaded when the -p option is specified. I had understood that this is what the -p and -k options do. If I have misunderstood the -p and -k options, or misconfigured something, please excuse me. Thanks for your time.. PS. The wget command line I used was: wget -H www.objectmentor.com/publications/articlesbysubject.html and my .wgetrc is: WGet RC file to implement the command line parameters: # wget -P webcache -p -nc -l1 -k -A gif,jpg,png # Accepti the following file types accept = gif,jpg,png # Convert links locally convert_links = on # Use FTP follow_ftp = off # Preserve existing files noclobber = on # Ignore the sometimes incorrect content length header. ignore_length = off # Get the requisites for each page page_requisites = on # Use recursive retrieval recursive = off # Levels to recurse reclevel = 1 # Timestamp files timestamping = off # Build the directory tree locally dirstruct = on # Top of the local directory tree dir_prefix = webcache # --EOF -- --- End of forwarded message --- ~|~\ /~\ | | |~~Dave Killick[EMAIL PROTECTED] | | |_| | | |--+44 (0)1225 475235 _|_/ | | \_/ |__IPL Information Processing Limited
tags
We have been using wget with the -p option to retrieve page requisites. We have noticed that it does not appear to work when tag is encountered in the requested page. The tag and its href are copied verbatim, and required images etc. are not retrieved and mapped locally. By way of example, one of the pages in question is: http://www.howstuffworks.com/ethernet2.htm ~|~\ /~\ | | |~~Dave Killick[EMAIL PROTECTED] | | |_| | | |--+44 (0)1225 475235 _|_/ | | \_/ |__IPL Information Processing Limited
Re: wget timestamping (-N) bug/feature?
At 07:11 PM 8/4/01 -0500, Mengmeng Zhang wrote: > > Say, I have a index.html which is not changed, but some of the pages > > linked from this page might be changed. When I use -N option to retrieve > > index.html recursively, wget will quit after find out that index.html is > > not changed, without following the url in index.html, and thus missed the > > fact that some other pages being linked by index.html might have been > > changed. > >Every version of wget I've used will indeed process index.html correctly >and follow all the links. Can you give an example of the command line >you're using where the links are not followed? > >MZ Any chance you guys could either do this ON the wget group, or OFF it? CCing the group creates messages to everyone on the group, that aren't filterable. At least, eudora can't filter on CC fields. -- Dave's Engineering Page: http://www.dvanhorn.org I would have a link to http://www.findu.com/cgi-bin/find.cgi?KC6ETE-9 here in my signature line, but due to the inability of sysadmins at TELOCITY to differentiate a signature line from the text of an email, I am forbidden to have it.
Redirection spans hosts unconditionally
I am seeing some anomalous behavior with wget with respects to mirroring (-m) a site and trying to keep that mirror local to the source domain. There are a couple of CGI scripts that inevitably get called that end up issuing redirects off-site. These redirects are followed even though --span-hosts is not supplied, and even if the destination domains are added via the --exclude-domains option. A test case is up at http://fastolfe.net/misc/wget-bug/. Spidering http://fastolfe.net/misc/wget-bug/normal will correctly ignore the *link* to www.example.com, but spidering http://fastolfe.net/misc/wget-bug/redirected ends up following a local link that results in a redirection. This redirection is followed unconditionally. In this case, www.example.com doesn't exist, but if this were a normal domain, wget would still fetch the page and store it locally (creating a www.example.com directory, etc.). I am using GNU Wget 1.7 installed via RPM as wget-1.7-3mdk on Linux 2.4.12 i686. Thanks! -- == David Nesting WL7RO Fastolfe [EMAIL PROTECTED] http://fastolfe.net/ == fastolfe.net/me/pgp-key A054 47B1 6D4C E97A D882 C41F 3065 57D9 832F AB01
Differences between "wget" and "cURL"?
I've noticed a tool recently called "cURL" that seems to be in the same "space" as "wget". Could someone give me a basic overview of how these two things are different?
Re: Unsubscribing
> >Hi David, > >please present us the following fact: > >Where did you send your request to unsubscribe (exact E-mail address)? [EMAIL PROTECTED] -- Dave's Engineering Page: http://www.dvanhorn.org Got a need to read Bar codes? http://www.barcodechip.com Bi-directional read of UPC-A, UPC-E, EAN-8, EAN-13, JAN, and Bookland, with two or five digit supplemental codes, in an 8 pin chip, with NO external parts.
Re: Unsubscribing
At 05:11 AM 11/24/01 +, Byran wrote: >THIS list clogging up your email account? Not exactly, but I tried several times to unsubscribe recently, to no avail. -- Dave's Engineering Page: http://www.dvanhorn.org Got a need to read Bar codes? http://www.barcodechip.com Bi-directional read of UPC-A, UPC-E, EAN-8, EAN-13, JAN, and Bookland, with two or five digit supplemental codes, in an 8 pin chip, with NO external parts.
Re: Unsubscribing
At 10:47 PM 11/23/01 +, Neil Osborne wrote: >Hello All, > >I want to unsubscribe from this mail list - however despite several mails >with unsubscribe in both subject and body, I still keep receiving mail, and >it's clogging up my mail account. Can anyone help please ? > >Thanks I'm in the same condition, I've unsubscribed. Plase release me, let me gooo.. :) -- Dave's Engineering Page: http://www.dvanhorn.org Got a need to read Bar codes? http://www.barcodechip.com Bi-directional read of UPC-A, UPC-E, EAN-8, EAN-13, JAN, and Bookland, with two or five digit supplemental codes, in an 8 pin chip, with NO external parts.
wget segfault on ppc
Hello. I have a patch to fix a problem with wget segfaulting on the powerpc platform. It happens in the logvprintf routine, due to differences in the handling of va_lists on ppc vs. x86. The problem was that it was reusing a va_list after it had already been exhausted, and the following fix should be portable on at least any platform using gcc. I'm afraid the the patch may be malformed wrt whitespace, but it is small enough that you shouldn't have a problem applying it by hand. --- wget-1.7.old/src/log.c Sun May 27 12:35:05 2001 +++ wget-1.7/src/log.c Fri Sep 28 09:29:48 2001 @@ -280,9 +280,12 @@ static void logvprintf (enum log_options o, const char *fmt, va_list args) { + va_list all_the_args; + CHECK_VERBOSE (o); CANONICALIZE_LOGFP_OR_RETURN; + __va_copy(all_the_args,args); /* Originally, we first used vfprintf(), and then checked whether the message needs to be stored with vsprintf(). However, Watcom C didn't like ARGS being used twice, so now we first vsprintf() @@ -310,7 +313,9 @@ the systems where vsnprintf() is not available, we use the implementation from snprintf.c which does return the correct value. */ - int numwritten =3D vsnprintf (write_ptr, available_size, fmt, args); + int numwritten; + __va_copy(args,all_the_args); + numwritten =3D vsnprintf (write_ptr, available_size, fmt, args); /* vsnprintf() will not step over the limit given by available_size. If it fails, it will return either -1 -- David Roundy http://civet.berkeley.edu/droundy/ msg01993/pgp0.pgp Description: PGP signature
html-parse.c
Hello, I had to do the following to get wget to compile on ppc-apple-darwin diff src/html-parse.c ../wget-1.7.fixed/src/html-parse.c 435c435 < assert (ch == '\'' || ch == '"'); --- > assert (ch == '\'' || ch == '\"'); Regards, Dave
wget reject lists
Hello, I am not subscribed to this list, so please CC me on your answers. I am having a hell of a time to get the reg-ex stuff to work with the -A or -R options. If I supply this option to my wget command: -R 1* Everything works as expected. Same with this: -R 2* Now, if I do this: -R 1*,2* I get all the files beginning with 1. if I do this: -R 2*,1* I get all the files beginning with 2. No combination of quoting makes any difference whatsoever. Anybody have any clues, before I give up and look for another tool?? BTW, wget is 1.8.1 (same thing with 1.8) compiled from source on Solaris 8, using gcc 2.95.3 package from sunfreeware.com -- David McCabeSenior Systems Analyst Network and Communications Services, McGill University Montreal, Quebec, Canada[EMAIL PROTECTED] If you stop having sex, drinking and smoking, You don't live longer... It just seems like it
inconsistency between man page and --help
Hello! In version 1.8.1 of GNU Wget... I found that in the --help there is; --limit-rate=RATElimit download rate to RATE. But no reference is made to it in the man page. I checked and made sure the man page was for the same version ;-) So, please fix! And, uh is there anything I need to know about --limit-rate, or should I assume that If i use 50k it'll work? cheers, dAVE
Redirection cycle detected using wget 1.8.2
I got the message 'Redirection cycle detected' when I tried to download a file. The download aborted. I have looked for a solution and have not found one. Any help will be greatly appreciated. Please 'CC' me on reply as I am not currently subscribed. Thanks again, David
Trouble with Yahoo
Hi, I'm trying to build a locally browsable mirror of Yahoo's PDA-friendly portal - http://wap.oa.yahoo.com Sadly, the result is a set of local pages with non-working inter-page links. I've tried combinations of -k, -F, -E. But what happens is that the links don't match up with the actual stored documents. Running wget with '-l 1 -r' will take 20 seconds, and provide a good example of the problem. When you load the main page, your links will 404. Can someone please advise if there are any wget options which can remedy the problem - or is this a bug, or is Yahoo's site structure beyond the scope of wget? Cheers David
error fetching some files
Hello, This isn't a bug in wget per se, but wget's current behaviour may result in not being able to download some files. I am using wget 1.8.2. Some FTP servers have setup permissions so that you cannot do an 'ls', or a 'cd' into a directory. You can fetch the file directly from the root directory ('/'), but cannot view the directory contents. You cannot even go into the directories even if you know their names before hand. An example of this is an FTP server which holds the GNU/Linux version of the "Return to Castle Wolfenstein" server. The original link is [1], but it is redirected to [2]. Would it be possible to add a command-line option to tell wget to, after logging in, to issue the RETR command using the full pathname instead of issuing the CWD command? Thank you for your time. [1] http://www.wolfensteinx.com/dl.php?file=wolflinux&download=true [2] ftp://dl:[EMAIL PROTECTED]/wolfx/demos/linux/wolfmptest-dedicated.x86.run -- David Magda Because the innovator has for enemies all those who have done well under the old conditions, and lukewarm defenders in those who may do well under the new. -- Niccolo Machiavelli, _The Prince_, Chapter VI
large file
wwhen large file (size > 2go )are downloaded wget 1.8.2 realese crash down is it possible to complie the lastest realease with a large file support option ? @@@ Allouche David Tel:+33 (0)5 61 28 52 77 Fax:+33 (0)5 61 28 53 35 -- GENOPOLE TOULOUSE -- BIA chem. de Borde Rouge BP 27 - 31326 Castanet Tolosan Cedex , france : e-mail : [EMAIL PROTECTED] @@
Not 100% rfc 1738 complience for FTP URLs => bug
Hi! I noticed that wget ( 1.8.2 ) does not conform 100% to RFC 1738 when handling FTP URLs : wget ftp://user1:[EMAIL PROTECTED]/x/y/foo does this : USER user1 PASS secret1 SYST PWD ( let's say this returns "/home/user1" ) TYPE I CWD /home/user1/x/y PORT 11,22,33,44,3,239 RETR foo Why does it prepend the current working directory to the path ? wget does "CWD /home/user1/x/y" , while RFC 1738 suggests : CWD x CWD y ? This _usually_ results in the same results, except : - ftp://user1:[EMAIL PROTECTED]//x/y/foo wget : CWD /x/y rfc : CWD # empty parameter ! this usually puts one in the $HOME directory CWD x CWD y So wget will try to fetch the file /x/y/foo , while an RFC compliant program would fetch $HOME/x/y/foo - non unix and other "weird" systems. Example : wget "ftp://user1:[EMAIL PROTECTED]/DAD4%3A%5Bperl5%5D/FREEWARE_README.TXT" does not work. Also the following variations don't work either : wget "ftp://user1:[EMAIL PROTECTED]/DAD4:[perl5]FREEWARE_README.TXT" wget "ftp://user1:[EMAIL PROTECTED]/DAD4%3A%5Bperl5%5DFREEWARE_README.TXT" wget "ftp://user1:[EMAIL PROTECTED]/DAD4:/perl5/FREEWARE_README.TXT" Using a regular ftp client , the follwoing works : open connection & log in : - first possibility : get DAD4:[perl5]FREEWARE_README.TXT - second : cd DAD4:[perl5] get FREEWARE_README.TXT Another example with more directory levels : get DAD4:[MTOOLS.AXP_EXE]MTOOLS.EXE or cd DAD4:[MTOOLS.AXP_EXE] get MTOOLS.EXE or cd DAD4:[MTOOLS] cd AXP_EXE get MTOOLS.EXE I recommend removing the "cool&smart" code and stick to RFCs :-) -- David Balazic -- "Be excellent to each other." - Bill S. Preston, Esq., & "Ted" Theodore Logan - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Not 100% rfc 1738 complience for FTP URLs => bug
As I got no response on [EMAIL PROTECTED], I am resending my report here. -- Hi! I noticed that wget ( 1.8.2 ) does not conform 100% to RFC 1738 when handling FTP URLs : wget ftp://user1:[EMAIL PROTECTED]/x/y/foo does this : USER user1 PASS secret1 SYST PWD ( let's say this returns "/home/user1" ) TYPE I CWD /home/user1/x/y PORT 11,22,33,44,3,239 RETR foo Why does it prepend the current working directory to the path ? wget does "CWD /home/user1/x/y" , while RFC 1738 suggests : CWD x CWD y ? This _usually_ results in the same results, except : - ftp://user1:[EMAIL PROTECTED]//x/y/foo wget : CWD /x/y rfc : CWD # empty parameter ! this usually puts one in the $HOME directory CWD x CWD y So wget will try to fetch the file /x/y/foo , while an RFC compliant program would fetch $HOME/x/y/foo - non unix and other "weird" systems. Example : wget "ftp://user1:[EMAIL PROTECTED]/DAD4%3A%5Bperl5%5D/FREEWARE_README.TXT" does not work. Also the following variations don't work either : wget "ftp://user1:[EMAIL PROTECTED]/DAD4:[perl5]FREEWARE_README.TXT" wget "ftp://user1:[EMAIL PROTECTED]/DAD4%3A%5Bperl5%5DFREEWARE_README.TXT" wget "ftp://user1:[EMAIL PROTECTED]/DAD4:/perl5/FREEWARE_README.TXT" Using a regular ftp client , the follwoing works : open connection & log in : - first possibility : get DAD4:[perl5]FREEWARE_README.TXT - second : cd DAD4:[perl5] get FREEWARE_README.TXT Another example with more directory levels : get DAD4:[MTOOLS.AXP_EXE]MTOOLS.EXE or cd DAD4:[MTOOLS.AXP_EXE] get MTOOLS.EXE or cd DAD4:[MTOOLS] cd AXP_EXE get MTOOLS.EXE I recommend removing the "cool&smart" code and stick to RFCs :-) -- David Balazic -- "Be excellent to each other." - Bill S. Preston, Esq., & "Ted" Theodore Logan - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Re: Not 100% rfc 1738 complience for FTP URLs => bug
Max Bowsher wrote: > > David Balazic wrote: > > As I got no response on [EMAIL PROTECTED], I am resending my report > > here. > > One forwards to the other. The problem is that the wget maintainer is > absent, and likely to continue to be so for several more months. > > As a result, wget development is effectively stalled. So it is "do it yourself" , huh ? :-) > Max. > > > > > -- > > > > Hi! > > > > I noticed that wget ( 1.8.2 ) does not conform 100% to RFC 1738 when > > handling FTP URLs : > > > > wget ftp://user1:[EMAIL PROTECTED]/x/y/foo > > > > does this : > > > > USER user1 > > PASS secret1 > > SYST > > PWD ( let's say this returns "/home/user1" ) > > TYPE I > > CWD /home/user1/x/y > > PORT 11,22,33,44,3,239 > > RETR foo > > > > Why does it prepend the current working directory to the path ? > > > > wget does "CWD /home/user1/x/y" , while RFC 1738 suggests : > > CWD x > > CWD y > > > > ? > > > > This _usually_ results in the same results, except : > > > > - ftp://user1:[EMAIL PROTECTED]//x/y/foo > > wget : CWD /x/y > > rfc : > >CWD # empty parameter ! this usually puts one in the $HOME > > directory > >CWD x > >CWD y > > > > So wget will try to fetch the file /x/y/foo , while an RFC > > compliant > > program would fetch $HOME/x/y/foo > > > > - non unix and other "weird" systems. Example : > > > > wget > > > "ftp://user1:[EMAIL PROTECTED]/DAD4%3A%5Bperl5%5D/FREEWARE_README.TXT" > > > > does not work. Also the following variations don't work either : > > > > wget > > "ftp://user1:[EMAIL PROTECTED]/DAD4:[perl5]FREEWARE_README.TXT" > > wget > > > "ftp://user1:[EMAIL PROTECTED]/DAD4%3A%5Bperl5%5DFREEWARE_README.TXT" > > wget > > "ftp://user1:[EMAIL PROTECTED]/DAD4:/perl5/FREEWARE_README.TXT" > > > > Using a regular ftp client , the follwoing works : > > > > open connection & log in : > > > > - first possibility : > > > > get DAD4:[perl5]FREEWARE_README.TXT > > > > - second : > > > > cd DAD4:[perl5] > > get FREEWARE_README.TXT > > > > Another example with more directory levels : > > > > get DAD4:[MTOOLS.AXP_EXE]MTOOLS.EXE > > or > > cd DAD4:[MTOOLS.AXP_EXE] > > get MTOOLS.EXE > > or > > cd DAD4:[MTOOLS] > > cd AXP_EXE > > get MTOOLS.EXE > > > > > > I recommend removing the "cool&smart" code and stick to RFCs :-) -- David Balazic -- "Be excellent to each other." - Bill S. Preston, Esq., & "Ted" Theodore Logan - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
unreasonable not to doc ascii vs. binary in the --help text
When I look at the long help for wget, there's no mention of how to arrange for ascii vs. binary download. It should be under FTP options. I've used FTP for almost 20 years, and the ASCII or BINARY commands are the two most common commands outside of get and put. I think it's pretty unreasonable not to mention "binary" or "ascii" it in the --help output at all. For simple file downloads on the command line, which is when you need this help desk to help you, the way to specify binary is crucial. For your reference, the full help text (& version info) follows the end of this message. Thanks for your consideration, Mark David wget --help GNU Wget 1.8.2, a non-interactive network retriever. Usage: wget [OPTION]... [URL]... Mandatory arguments to long options are mandatory for short options too. Startup: -V, --version display the version of Wget and exit. -h, --help print this help. -b, --backgroundgo to background after startup. -e, --execute=COMMAND execute a `.wgetrc'-style command. Logging and input file: -o, --output-file=FILE log messages to FILE. -a, --append-output=FILE append messages to FILE. -d, --debugprint debug output. -q, --quietquiet (no output). -v, --verbose be verbose (this is the default). -nv, --non-verbose turn off verboseness, without being quiet. -i, --input-file=FILE download URLs found in FILE. -F, --force-html treat input file as HTML. -B, --base=URL prepends URL to relative links in -F -i file. --sslcertfile=FILE optional client certificate. --sslcertkey=KEYFILE optional keyfile for this certificate. --egd-file=FILEfile name of the EGD socket. Download: --bind-address=ADDRESS bind to ADDRESS (hostname or IP) on local host. -t, --tries=NUMBER set number of retries to NUMBER (0 unlimits). -O --output-document=FILE write documents to FILE. -nc, --no-clobber don't clobber existing files or use .# suffixes. -c, --continue resume getting a partially-downloaded file. --progress=TYPE select progress gauge type. -N, --timestamping don't re-retrieve files unless newer than local. -S, --server-responseprint server response. --spider don't download anything. -T, --timeout=SECONDSset the read timeout to SECONDS. -w, --wait=SECONDS wait SECONDS between retrievals. --waitretry=SECONDS wait 1...SECONDS between retries of a retrieval. --random-waitwait from 0...2*WAIT secs between retrievals. -Y, --proxy=on/off turn proxy on or off. -Q, --quota=NUMBER set retrieval quota to NUMBER. --limit-rate=RATElimit download rate to RATE. Directories: -nd --no-directoriesdon't create directories. -x, --force-directories force creation of directories. -nH, --no-host-directories don't create host directories. -P, --directory-prefix=PREFIX save files to PREFIX/... --cut-dirs=NUMBER ignore NUMBER remote directory components. HTTP options: --http-user=USER set http user to USER. --http-passwd=PASSset http password to PASS. -C, --cache=on/off(dis)allow server-cached data (normally allowed). -E, --html-extension save all text/html documents with .html extension. --ignore-length ignore `Content-Length' header field. --header=STRING insert STRING among the headers. --proxy-user=USER set USER as proxy username. --proxy-passwd=PASS set PASS as proxy password. --referer=URL include `Referer: URL' header in HTTP request. -s, --save-headerssave the HTTP headers to file. -U, --user-agent=AGENTidentify as AGENT instead of Wget/VERSION. --no-http-keep-alive disable HTTP keep-alive (persistent connections). --cookies=off don't use cookies. --load-cookies=FILE load cookies from FILE before session. --save-cookies=FILE save cookies to FILE after session. FTP options: -nr, --dont-remove-listing don't remove `.listing' files. -g, --glob=on/off turn file name globbing on or off. --passive-ftp use the "passive" transfer mode. --retr-symlinks when recursing, get linked-to files (not dirs). Recursive retrieval: -r, --recursive recursive web-suck -- use with care! -l, --level=NUMBER maximum recursion depth (inf or 0 for infinite). --delete-after delete files locally after downloading them. -k, --convert-links convert non-relative links to relative. -K, --backup-converted before converting file X, back
RE: unreasonable not to doc ascii vs. binary in the --help text
You said: The type selection is rarely needed ... This is untrue. I just tried this out using wget on Windows. If you don't tack on ;type=a onto the end when transfering a text file from unix to Windows, the file's line endings will not be converted from unix (LF) to Windows (CRLF) conventions. If you look at the file in applications that just follow windows conventions, e.g., Notepad, the lines will not be broken in the display. Some applications (e.g., web browsers) follow a liberal interpretation of line endings, which helps overcome this problem, but many do not, including programs that read ascii (text) files as data, and will silently but fatally malfunction if the CR is not there in front of the LF. So, with unix to Windows transfer of text files being obviously an extremely common case, this clearly deserves a few lines in your --help documentation. It can hardly violate any length limit for that text -- there seems to be none, since the text goes on and on and documents such seldom needed options as passive mode: --passive-ftp use the "passive" transfer mode. And many others that don't deserve as much attention as ascii vs. binary transfer. Thanks, Mark -Original Message- From: Maciej W. Rozycki [mailto:[EMAIL PROTECTED] Sent: Mon, August 18, 2003 11:22 AM To: Mark David Cc: '[EMAIL PROTECTED]' Subject: Re: unreasonable not to doc ascii vs. binary in the --help text On Mon, 18 Aug 2003, Mark David wrote: > When I look at the long help for wget, there's no mention of how to arrange > for ascii vs. binary download. It should be under FTP options. I've used > FTP for almost 20 years, and the ASCII or BINARY commands are the two most > common commands outside of get and put. I think it's pretty unreasonable > not to mention "binary" or "ascii" it in the --help output at all. For > simple file downloads on the command line, which is when you need this help > desk to help you, the way to specify binary is crucial. The default download type wget uses is binary. If you want another type, then ";type=X" ("X" denotes the desired type; e.g. "i" is binary and "a" is ASCII) can be appended to a URL. It's all documented within the wget's info pages. The type selection is rarely needed -- typically for downloading a text file from an EBCDIC host -- so including it with the short help reference would seem to be an overkill. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+
RE: Content-Disposition Take 3
"Hrvoje Niksic" <mailto:[EMAIL PROTECTED]> writes: > "Newman, David" <[EMAIL PROTECTED]> writes: > > > This is my third attempt at a Content-Disposition patch and if it > > isn't acceptable yet, I'm sure it is pretty close. > > Thanks. Note that I and other (co-)maintainers have been away for > some time, so if your previous attempts have been ignore, it might not > have been for lack of quality in your contribution. Actually my last attempt was early last year and it was a total hack. :-) > > However, with the --content-disposition option wget will > > instead process the header > > > > Content-Disposition: attachment; filename="joemama.txt" > > > > and change the local filename to "joemama.txt" > > The thing that worries me about this patch is that in some places > Wget's actions depend on transforming the URL to the output file > name. I'm having in mind options like `-c' and `-nc'. Won't your > patch break those? I did have in the back of my mind the consequences of other options that may have been given. I ended up conceding that if the user specified --content-disposition that they really wanted the filename specified within the header. Of course, I now concede that I had not considered the effects of -nc or the lack of -nc. Hmmm. I can easily change the patch such that if the file specified in Content-Disposition: already exists that a numerical extension is added to the name in the absence of -nc. However, if -nc is present, that would imply that if the file exists wget should not download the content. But the filename isn't known until the web server has already been contacted. So I would have to ask how would I abort the current transfer? As far as --continue is concerned I don't know if that option is valid in this context. Meaning, Content-Disposition is usually used with generated content (I think). Like in my example, test.php generates all the content. And the only other place I've seen it is when downloading Solaris patches the URL is something like patchDownload.pl?target=\&method=h and it sets the name of the zip file it gives you in the header, i.e. patchid.zip. I just tried it and wget fails to continue the download of a patch but refuses to truncate the existing file. Should I just track down the code that handles this case and duplicate it? -Dave
Error in wget-1.9-b5.zip
Error in wget-1.9-b5.zip Na co dávat důraz při zkušební jízdě? http://ad2.seznam.cz/redir.cgi?instance=62696%26url=http://www.auto-plus.cz/faq.php<> --17:46:21-- http://www.digitalplayground.com/freepage.php?tgpid=008d&refid=393627 => `/tmp2/www.digitalplayground.com/[EMAIL PROTECTED]&refid=393627' Resolving www.digitalplayground.com... 64.38.205.100 Connecting to www.digitalplayground.com[64.38.205.100]:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://www.yourworstenemy.com?tgpid=008d&refid=393627 [following] --17:46:23-- http://www.yourworstenemy.com/?tgpid=008d&refid=393627 => `/tmp2/www.yourworstenemy.com/[EMAIL PROTECTED]&refid=393627' Resolving www.yourworstenemy.com... failed: Unknown error. FINISHED --17:46:36-- Downloaded: 0 bytes in 0 files Converted 0 files in 0.00 seconds.
wget can't get the following site
Hi, all Please CC me when you reply. I'm not subscribed to this list. I'm new to wget. When I tried getting the following using wget, wget http://quicktake.morningstar.com/Stock/Income10.asp?Country=USA&Symbol=JNJ&stocktab=finance I got the errors below: --22:58:29-- http://quicktake.morningstar.com:80/Stock/Income10.asp?Country=USA => [EMAIL PROTECTED]' Connecting to quicktake.morningstar.com:80... connected! HTTP request sent, awaiting response... 302 Object moved Location: http://quote.morningstar.com/switch.html?ticker= [following] --22:58:30-- http://quote.morningstar.com:80/switch.html?ticker= => [EMAIL PROTECTED]' Connecting to quote.morningstar.com:80... connected! HTTP request sent, awaiting response... 302 Object moved Location: TickerNotFound.html [following] TickerNotFound.html: Unknown/unsupported protocol. 'Symbol' is not recognized as an internal or external command, operable program or batch file. 'stocktab' is not recognized as an internal or external command, operable program or batch file. Is this a bug in wget? Or is there something I can do so that wget can get the site? Please help! Thanks in advance. - Do you Yahoo!? Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
Calling wget in C++
Hi, all Please CC me when you reply. I'm not subscribed to this list. I have two questions: 1) I am writing a C++ program that calls wget using execv. After wget gets the requested page, it does not return to my program to excute the rest of my program after the call. Here's what my code looks like: char* arg_list[] = {"wget", args, NULL}; int result = execv("wget.exe", arg_list); // rest of my code . As noted above, after execv runs wget, it does not return to execute the rest of my program. Does anyone know how to fix this? 2) When I use wget to get the following url, it sometimes (not all the time!) gives me the error below: C:\>wget -O test.html "http://screen.yahoo.com/b?dvy=2/100&pe=0/200&b=1&z=d vy&db=stocks&vw=1" --23:24:57-- http://screen.yahoo.com:80/b?dvy=2/100&pe=0/200&b=1&z=dvy&db=stock s&vw=1 => `test.html' Connecting to screen.yahoo.com:80... connected! HTTP request sent, awaiting response... End of file while parsing headers. Giving up. The page I requested is not downloaded. But sometimes it works. Any ideas how to fix this? Thanks in advance! David - Do you Yahoo!? Yahoo! SiteBuilder - Free web site building tool. Try it!
Re: [PATCH] implementation of determine_screen_width() for Windows
Herold Heiko wrote: From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] .. Yes. Specifically, Unix's SIGWINCH simply sets a flag that means "window size might have changed, please check it out". That is because checking window size on each refresh would perform an unnecessary ioctl. One thing we could do for Windows is check for window size every second or so. I agree, but I have no idea how taxing those GetStdHandle() and GetConsoleScreenBufferInfo() are. Maybe David can shed more light on this, or even profile a bit. Possibly the handle could be cached, saving at least the GetStdHandle() bit. Heiko Yes, GetStdHandle() would only need to be called once unless the handle were to change during execution (fork_to_background()?). I haven't done any exhaustive profiling but the attached patch doesn't seem to affect performance. It calls determine_screen_width() every time the progress bar is updated (~5 times per second?). Note: I'm not suggesting we use the patch as-is, it's just a test. It might be possible to implement something similar to SIGWINCH using WinEvents, but that's not really what they were designed for. They were designed to be used by "accessibility" software (screen readers, etc.), and it may not be available on older versions of Windows. How often do people change the size of the screen buffer while a command is running? Index: progress.c === RCS file: /pack/anoncvs/wget/src/progress.c,v retrieving revision 1.43 diff -u -r1.43 progress.c --- progress.c 2004/01/28 01:02:26 1.43 +++ progress.c 2004/01/28 19:37:50 @@ -579,6 +579,22 @@ /* Don't update more often than five times per second. */ return; +#ifdef WINDOWS +{ + int old_width = screen_width; + screen_width = determine_screen_width (); + if (!screen_width) + screen_width = DEFAULT_SCREEN_WIDTH; + else if (screen_width < MINIMUM_SCREEN_WIDTH) + screen_width = MINIMUM_SCREEN_WIDTH; + if (screen_width != old_width) + { + bp->width = screen_width - 1; + bp->buffer = xrealloc (bp->buffer, bp->width + 1); + } +} +#endif + create_image (bp, dltime); display_image (bp->buffer); bp->last_screen_update = dltime;
Re: [PATCH] implementation of determine_screen_width() for Windows
Hrvoje Niksic wrote: This patch should fix both problems. Great, thanks
[PATCH] periodic screen width check under Windows
Herold Heiko wrote: How often do people change the size of the screen buffer while a command is running? Rarely I think, for example when you notice a huge file is being downloaded slowly and you enlarge the window in order to have a better granularity on the progress bar. Probably instead of risking a performance drawback on some (slow) machines a better way would be call it rarely (every 5 seconds or so would still be enough I think). Heiko Right. The previous patch was kind of a worst-case test. Attached is a patch that checks the screen width approximately every two seconds in the Windows build. I don't know if this is what Hrvoje had in mind. And of course the interval can be tweaked. Cheers Index: progress.c === RCS file: /pack/anoncvs/wget/src/progress.c,v retrieving revision 1.43 diff -u -r1.43 progress.c --- progress.c 2004/01/28 01:02:26 1.43 +++ progress.c 2004/01/29 20:20:35 @@ -452,6 +452,11 @@ double last_screen_update; /* time of the last screen update, measured since the beginning of download. */ +#ifdef WINDOWS + double last_screen_width_check; /* time of the last screen width + check, measured since the + beginning of download. */ +#endif /* WINDOWS */ int width; /* screen width we're using at the time the progress gauge was @@ -555,6 +560,15 @@ bp->total_length = bp->initial_length + bp->count; update_speed_ring (bp, howmuch, dltime); + +#ifdef WINDOWS + /* Under Windows, check to see if the screen width has changed no more + than once every two seconds. */ + if (dltime - bp->last_screen_width_check > 2000) { +received_sigwinch = 1; +bp->last_screen_width_check = dltime; + } +#endif /* WINDOWS */ /* If SIGWINCH (the window size change signal) been received, determine the new screen size and update the screen. */
Re: Startup delay on Windows
Petr Kadlec wrote: > I have traced the problem down to search_netrc() in netrc.c, where the > program is trying to find the file using stat(). But as home_dir() > returns "C:\" on Windows, the filename constructed looks like > "C:\/.netrc", which is then probably interpreted by Windows as a name of > a remote file, so Windows are trying to look around on the network, and > continue only after some timeout. I'm curious as to what operating system and compiler you are using. I tried briefly to reproduce this under Windows 2000 with MSVC 7.1 and could not. I would regard this as a bug in the implementation of stat(), not Wget. BTW, this has come up before: http://www.mail-archive.com/[EMAIL PROTECTED]/msg04440.html Hrvoje Niksic wrote: Thanks tracing this one. It would never have occurred to me that the file name "c:\/foo" could cause such a problem. It really shouldn't; it seems perfectly valid (albeit strange) to me. Though, I guess, it behooves us to work around compiler/library bugs. I see two different bugs here: 1. The routine that "merges" the .netrc file name with the directory name should be made aware of Windows, so that it doesn't append another backslash if a backslash is already present at the end of directory name returned by home_dir. (In fact, the same logic could be applied to slashes following Unix directory names.) *AFAIK*, Window should only treat two consecutive slashes specially if they are at the beginning of a file name. (Windows might not like more than one slash between a machine and share name, but that's not really relevant.) Otherwise, they should be equivalent to a single slash. All this irrespective of whether the slashes are forward (/) or backward (\). 2. home_dir() should really be fixed to return something better than `c:\' unconditionally, as is currently the case. The comment in the source says: home = "C:\\"; /* Maybe I should grab home_dir from registry, but the best that I could get from there is user's Start menu. It sucks! */ This comment was not written by me, but by (I think) Darko Budor, who wrote the original Windows support. Under Windows 2000 and XP, there have to be better choices of home directory. For instance, Cygwin considers `c:\Documents and Settings\USERNAME' to be the home directory. From Cygwin's /etc/profile: # Here is how HOME is set, in order of priority, when starting from Windows # 1) From existing HOME in the Windows environment, translated to a Posix path # 2) from /etc/passwd, if there is an entry with a non empty directory field # 3) from HOMEDRIVE/HOMEPATH # 4) / (root) If things were installed normally, Cygwin will consider /home/username to be the users home directory. Under Cygwin / is usually mounted on C:\cygwin, or wherever Cygwin was installed. But Cygwin is very much it's own environment. Already, two of the above methods are unavailable to us (2 and 4). I wonder if that is reachable through registry... Does anyone have an idea what we should consider the home dir under Windows, and how to find it? I imagine there are a number of ways to go about this. As it stands now, if I understand correctly, Wget works like this: When processing .wgetrc under Windows, Wget does the following: If Wget was built with MSVC, it looks for a file called "wgetrc" in the current directory. This is mildly evil. A comment in init.c includes the following sentence: "SYSTEM_WGETRC should not be defined under WINDOWS." Nonetheless, the MSVC Makefile defines SYSTEM_WGETRC as "wgetrc". AFAICT, Wget won't do this if it was built with one of the other Windows Makefiles. Wget then processes the users .wgetrc. Under Windows, Wget ignores $HOME and looks for a file called wget.ini in the directory of the Wget binary. Under Windows, if $HOME is defined home_dir() will return that, otherwise it returns `C:\'. Wget uses the directory returned by home_dir() when looking for .netrc and when resolving ~. So currently Wget's behavior is inconsistent, both with its behavior on other platforms, and with itself (the handling of .wgetrc and .netrc). If we wanted to do things the NT way, we could, essentially, treat "C:\Documents and Settings\username\Application Data\Wget" as HOME and "C:\Documents and Settings\All Users\Application Data\Wget" as /etc. The above directories are just examples of typical locations; we would, of course, resolve the directories correctly. But then what would we do if $HOME *is* defined? Ignore it? That would seem the `Windows' thing to do. The directories themselves could be resolved using SHGetSpecialFolderPath() or its like. The entry points would have to resolved dynamically as they may not be available on ancient Windows installations. We could then fall-back to the registry or the environment or something else. The user could always define $WGETRC and put .wgetrc anywhere he/she pleased. But what about .netrc? And doe
[PATCH] MSVC Makefiles
Attached is a patch for the MSVC Makefiles. I have tested it with MSVC 6.0sp5 for 80x86 and MSVC 7.1 for 80x86 under Windows 2000. One change of note: I changed the Makefile to use batch rules. This significantly decreases the time required to build. It might not be supported by ancient versions of nmake.exe, I don't know. I'm hoping others can test these changes, especially with older versions of MSVC. Cheers, David Fritz 2004-02-09 David Fritz <[EMAIL PROTECTED]> * configure.bat.in: Don't clear the screen. * windows/README: Add introductory paragraph. Re-word a few sentences. Correct minor typographical errors. Use consistent capitalization of Wget, SSL, and OpenSSL. Refer to Microsoft Visual C++ as MSVC instead of VC++. Mention the --msvc option to configure.bat. Reflow paragraphs. * windows/Makefile.top: Use tabs instead of spaces. Ignore errors in clean rules. Use lowercase filenames when building distribution .zip archive. * windows/Makefile.doc: Use tabs instead of spaces. Ignore errors in clean rules. * windows/Makefile.src: Clean-up clean rules. Use tabs instead of spaces. Link against gdi32.lib. Don't define SYSTEM_WGETRC. Remove unused macros. Remove anachronistic and superfluous linker flags. Don't rename wget.exe to all upper-case. Add `preprocessor' conditionals for SSL and newer MSVC options. Use batch rules. Don't suppress all warnings. Index: configure.bat.in === RCS file: /pack/anoncvs/wget/configure.bat.in,v retrieving revision 1.4 diff -u -r1.4 configure.bat.in --- configure.bat.in2003/10/26 00:19:04 1.4 +++ configure.bat.in2004/02/09 05:29:50 @@ -26,7 +26,6 @@ rem file, but you are not obligated to do so. If you do not wish to do rem so, delete this exception statement from your version. -cls if .%1 == .--borland goto :borland if .%1 == .--mingw goto :mingw if .%1 == .--msvc goto :msvc Index: windows/Makefile.doc === RCS file: /pack/anoncvs/wget/windows/Makefile.doc,v retrieving revision 1.4 diff -u -r1.4 Makefile.doc --- windows/Makefile.doc2002/05/18 02:16:35 1.4 +++ windows/Makefile.doc2004/02/09 05:29:51 @@ -28,22 +28,22 @@ # You probably need makeinfo and perl, see the README in the main # windows directory. -RM = del -CP = copy -ATTRIB = attrib - -MAKEINFO = makeinfo.exe -TEXI2POD = texi2pod.pl -POD2MAN = pod2man - -SAMPLERCTEXI = sample.wgetrc.munged_for_texi_inclusion -WGETHLP = wget.hlp -WGETINFO = wget.info -WGETTEXI = wget.texi -WGETHTML = wget.html -WGETPOD = wget.pod -manext = 1 -MAN = wget.$(manext) +RM = -del +CP = copy +ATTRIB = attrib + +MAKEINFO = makeinfo.exe +TEXI2POD = texi2pod.pl +POD2MAN= pod2man + +SAMPLERCTEXI = sample.wgetrc.munged_for_texi_inclusion +WGETHLP= wget.hlp +WGETINFO = wget.info +WGETTEXI = wget.texi +WGETHTML = wget.html +WGETPOD= wget.pod +manext = 1 +MAN= wget.$(manext) all: $(WGETHLP) $(WGETINFO) $(WGETHTML) @@ -76,10 +76,10 @@ hcrtf -xn wget.hpj clean: -$(RM) *.bak -$(RM) *.hpj -$(RM) *.rtf -$(RM) *.ph + $(RM) *.bak + $(RM) *.hpj + $(RM) *.rtf + $(RM) *.ph $(RM) $(SAMPLERCTEXI) $(RM) $(MAN) $(RM) $(TEXI2POD) Index: windows/Makefile.src === RCS file: /pack/anoncvs/wget/windows/Makefile.src,v retrieving revision 1.21 diff -u -r1.21 Makefile.src --- windows/Makefile.src2003/11/21 08:48:45 1.21 +++ windows/Makefile.src2004/02/09 05:29:51 @@ -1,4 +1,4 @@ -# Makefile for `wget' utility for MSVC 4.0 +# Makefile for `wget' utility for MSVC # Copyright (C) 1995, 1996, 1997 Free Software Foundation, Inc. # This program is free software; you can redistribute it and/or modify @@ -25,44 +25,49 @@ # file, but you are not obligated to do so. If you do not wish to do # so, delete this exception statement from your version. -# -# Version: 1.4.4 -# - -#Comment these if you don't have openssl available - however https -#won't work. -SSLDEFS= /DHAVE_SSL -SSLLIBS= libeay32.lib ssleay32.lib -SSLSRC = gen_sslfunc.c -SSLOBJ = gen_sslfunc$o - -SHELL = command - -VPATH = . -o = .obj -OUTDIR = . - -CC = cl -LD = link - -CFLAGS = /nologo /MT /W0 /O2 -#DEBUGCF = /DENABLE_DEBUG /Zi /Od #/Fd /FR -CPPFLAGS = -DEFS = /DWINDOWS /D_CONSOLE /DHAVE_CONFIG_H /DSYSTEM_WGETRC=\"wgetrc\" -LDFLAGS = /subsystem:console /incremental:no /warn:3 -#DEBUGLF = /pdb:wget.pdb
Re: Startup delay on Windows
I'd be content with the following logic: Don't process a `system' wgetrc. If $HOME is not defined, use the directory the Wget executable is in as $HOME (what home_dir() returns). If $HOME/.wgetrc exists, use that; otherwise look for wget.ini in the directory the executable is in, regardless of $HOME. We would retain wget.ini support for backward compatibility, and support .wgetrc for consistency with other platforms and with the handling of .netrc. This would only break things if people had $HOME defined and it contained a .wgetrc and they expected the Windows port to ignore it. As a side-effect, this would also resolve the above issue. I went ahead and implemented this. I figure at least it will work as an interim solution. 2004-02-16 David Fritz <[EMAIL PROTECTED]> * init.c (home_dir): Use aprintf() instead of xmalloc()/sprintf(). Under Windows, if $HOME is not defined, use the directory that contains the Wget binary instead of hard-coded `C:\'. (wgetrc_file_name): Under Windows, look for $HOME/.wgetrc then, if not found, look for wget.ini in the directory of the Wget binary. * mswindows.c (ws_mypath): Employ slightly more robust methodology. Strip trailing path separator. Index: src/init.c === RCS file: /pack/anoncvs/wget/src/init.c,v retrieving revision 1.91 diff -u -r1.91 init.c --- src/init.c 2003/12/14 13:35:27 1.91 +++ src/init.c 2004/02/16 15:58:36 @@ -1,5 +1,5 @@ /* Reading/parsing the initialization file. - Copyright (C) 1995, 1996, 1997, 1998, 2000, 2001, 2003 + Copyright (C) 1995, 1996, 1997, 1998, 2000, 2001, 2003, 2004 Free Software Foundation, Inc. This file is part of GNU Wget. @@ -314,9 +314,9 @@ return NULL; home = pwd->pw_dir; #else /* WINDOWS */ - home = "C:\\"; - /* Maybe I should grab home_dir from registry, but the best -that I could get from there is user's Start menu. It sucks! */ + /* Under Windows, if $HOME isn't defined, use the directory where + `wget.exe' resides. */ + home = ws_mypath (); #endif /* WINDOWS */ } @@ -347,27 +347,24 @@ return xstrdup (env); } -#ifndef WINDOWS /* If that failed, try $HOME/.wgetrc. */ home = home_dir (); if (home) -{ - file = (char *)xmalloc (strlen (home) + 1 + strlen (".wgetrc") + 1); - sprintf (file, "%s/.wgetrc", home); -} +file = aprintf ("%s/.wgetrc", home); xfree_null (home); -#else /* WINDOWS */ - /* Under Windows, "home" is (for the purposes of this function) the - directory where `wget.exe' resides, and `wget.ini' will be used - as file name. SYSTEM_WGETRC should not be defined under WINDOWS. - - It is not as trivial as I assumed, because on 95 argv[0] is full - path, but on NT you get what you typed in command line. --dbudor */ - home = ws_mypath (); - if (home) + +#ifdef WINDOWS + /* Under Windows, if we still haven't found .wgetrc, look for the file + `wget.ini' in the directory where `wget.exe' resides; we do this for + backward compatibility with previous versions of Wget. + SYSTEM_WGETRC should not be defined under WINDOWS. */ + if (!file || !file_exists_p (file)) { - file = (char *)xmalloc (strlen (home) + strlen ("wget.ini") + 1); - sprintf (file, "%swget.ini", home); + xfree_null (file); + file = NULL; + home = ws_mypath (); + if (home) + file = aprintf ("%s/wget.ini", home); } #endif /* WINDOWS */ Index: src/mswindows.c === RCS file: /pack/anoncvs/wget/src/mswindows.c,v retrieving revision 1.22 diff -u -r1.22 mswindows.c --- src/mswindows.c 2003/11/03 21:57:03 1.22 +++ src/mswindows.c 2004/02/16 15:58:37 @@ -1,5 +1,5 @@ /* mswindows.c -- Windows-specific support - Copyright (C) 1995, 1996, 1997, 1998 Free Software Foundation, Inc. + Copyright (C) 1995, 1996, 1997, 1998, 2004 Free Software Foundation, Inc. This file is part of GNU Wget. @@ -199,22 +199,25 @@ ws_mypath (void) { static char *wspathsave = NULL; - char buffer[MAX_PATH]; - char *ptr; - if (wspathsave) + if (!wspathsave) { - return wspathsave; -} + char buf[MAX_PATH + 1]; + char *p; + DWORD len; + + len = GetModuleFileName (GetModuleHandle (NULL), buf, sizeof (buf)); + if (!len || (len >= sizeof (buf))) +return NULL; + + p = strrchr (buf, PATH_SEPARATOR); + if (!p) +return NULL; - if (GetModuleFileName (NULL, buffer, MAX_PATH) && - (ptr = strrchr (buffer, PATH_SEPARATOR)) != NULL) -{ - *(ptr + 1) = '\0'; - wspathsave = xstrdup (buffer); + *p = '\0'; + wspathsave = xstrdup (buf); } - else -wspathsave = NULL; + return wspathsave; }
[PATCH] Don't launch the Windows help file in response to --help
Attached is a patch that removes the ws_help() function from mswindows.[ch] and the call to it from print_help() in main.c. Also attached is an alternate patch that will fix ws_help(), which I neglected to update when I changed ws_mypath(). I find this behavior inconsistent with the vast majority of other command line tools. It's something akin to popping-up a web browser with the HTML version of the docs in response to `wget -–help' when running in a terminal under X. Feedback from users would be appreciated. Note: The previous change to ws_mypath() broke ws_help(), so one of the attached patches should be applied. 2004-02-20 David Fritz <[EMAIL PROTECTED]> * main.c (print_help): Remove call to ws_help(). * mswindows.c (ws_help): Remove. * mswindows.h (ws_help): Remove. Index: src/mswindows.c === RCS file: /pack/anoncvs/wget/src/mswindows.c,v retrieving revision 1.23 diff -u -r1.23 mswindows.c --- src/mswindows.c 2004/02/17 15:37:31 1.23 +++ src/mswindows.c 2004/02/20 16:17:34 @@ -229,8 +229,8 @@ if (mypath) { struct stat sbuf; - char *buf = (char *)alloca (strlen (mypath) + strlen (name) + 4 + 1); - sprintf (buf, "%s%s.HLP", mypath, name); + char *buf = (char *)alloca (strlen (mypath) + strlen (name) + 5 + 1); + sprintf (buf, "%s/%s.HLP", mypath, name); if (stat (buf, &sbuf) == 0) { printf (_("Starting WinHelp %s\n"), buf); Index: src/main.c === RCS file: /pack/anoncvs/wget/src/main.c,v retrieving revision 1.110 diff -u -r1.110 main.c --- src/main.c 2003/12/14 13:35:27 1.110 +++ src/main.c 2004/02/20 16:25:03 @@ -621,9 +621,6 @@ for (i = 0; i < countof (help); i++) fputs (_(help[i]), stdout); -#ifdef WINDOWS - ws_help (exec_name); -#endif exit (0); } Index: src/mswindows.c === RCS file: /pack/anoncvs/wget/src/mswindows.c,v retrieving revision 1.23 diff -u -r1.23 mswindows.c --- src/mswindows.c 2004/02/17 15:37:31 1.23 +++ src/mswindows.c 2004/02/20 16:25:04 @@ -222,28 +222,6 @@ } void -ws_help (const char *name) -{ - char *mypath = ws_mypath (); - - if (mypath) -{ - struct stat sbuf; - char *buf = (char *)alloca (strlen (mypath) + strlen (name) + 4 + 1); - sprintf (buf, "%s%s.HLP", mypath, name); - if (stat (buf, &sbuf) == 0) - { - printf (_("Starting WinHelp %s\n"), buf); - WinHelp (NULL, buf, HELP_INDEX, 0); -} - else -{ - printf ("%s: %s\n", buf, strerror (errno)); -} -} -} - -void ws_startup (void) { WORD requested; Index: src/mswindows.h === RCS file: /pack/anoncvs/wget/src/mswindows.h,v retrieving revision 1.13 diff -u -r1.13 mswindows.h --- src/mswindows.h 2003/11/06 01:12:02 1.13 +++ src/mswindows.h 2004/02/20 16:25:05 @@ -159,7 +159,6 @@ void ws_changetitle (const char*, int); void ws_percenttitle (double); char *ws_mypath (void); -void ws_help (const char *); void windows_main_junk (int *, char **, char **); /* Things needed for IPv6; missing in . */
Re: wget: Option -O not working in version 1.9 ?
Michael Bingel wrote: Hi there, I was looking for a tool to retrieve web pages and print them to standard out. As windows user I tried wget from Cygwin, but it created a file and I could not find the option to redirect output to standard out. Then I browsed throught the online documentation and found the -O option in the manual (http://www.gnu.org/software/wget/manual/wget-1.8.1/html_mono/wget.html#IDX131). I thought great, problem solved, but Cygwin wget version 1.9 does not accept "-O", although the NEWS file does not state removal of this feature. Your official site (http://www.gnu.org/software/wget/wget.html) states 1.8 as latest version with the option -O in the manual, so why can Cygwin have version 1.9 and does not support -O ? kind regards, Mike It might help if you would post the invocations you tried and also the output Wget produced. But, I'd guess you probably had a non-option argument before –O. For a while now, the version of getopt_long() included with Cygwin has had argument permutation disabled by default. (It's recently been re-enabled in CVS, but not yet in a released version.) So, under Cygwin you'll need to place all option arguments before any non-option arguments. Like this, for instance: $ wget –O - http://www.gnu.org/ HTH, Cheers
Re: wget: Option -O not working in version 1.9 ?
Hrvoje Niksic wrote: David Fritz writes: But, I'd guess you probably had a non-option argument before -O. For a while now, the version of getopt_long() included with Cygwin has had argument permutation disabled by default. What on Earth were they thinking?! :) Well, ultimately, I can only speculate, but I recently grep'd through the archive of Cygwin mailing lists trying to understand just why this change was made. It seems the Cygwin RCM (see http://cygwin.com/acronyms/), found that argument permutation was causing problems with one of the utilities distributed with Cygwin that takes another command as it's argument. He felt it was a good idea to hard-code POSIXLY_CORRECT into the getopt* code. Irrespective of the situation that whereas getopt() is POSIX standard and argument permutation is not part of POSIX, getopt_long() is a GNU invention and disabling argument permutation for it is just weird. I tried to point this out a few months ago and was ignored. Recently, a Mingw maintainer reverted the change to their copy of getopt* (which some parts of Cygwin use) and that broke things again. This time they seem to have fixed things the right way and argument permutation is enabled for getopt_long() again. Hopefully it will stay that way. I've never considered the possibility that someone would be stupid enough to do that. Maybe Wget should revert to simply using its own implementation of getopt_long unconditionally. That's the solution some of the Cygwin package maintainers have used. But hopefully by the time the next version of Wget is released it won't be a problem anymore. (It's recently been re-enabled in CVS, but not yet in a released version.) So, under Cygwin you'll need to place all option arguments before any non-option arguments. I never thought I'd see those particular instructions applied to Wget. I've always hated the traditional Unix requirement for all options to precede all non-options. If Cygwin thinks it knows better how to handle command-line options, they should have the decency to name the function something other than getopt_long. Agreed. Cheers
Re: Windows titlebar fix
Gisle Vanem wrote: ws_percenttitle() should not be called in quiet mode since ws_changetitle() AFAICS is only called in verbose mode. That caused an assert in mswindows.c. An easy patch: --- CVS-latest\src\retr.c Sun Dec 14 14:35:27 2003 +++ src\retr.c Tue Mar 02 21:18:55 2004 @@ -311,7 +311,7 @@ if (progress) progress_update (progress, ret, wtimer_read (timer)); #ifdef WINDOWS - if (toread > 0) + if (toread > 0 && !opt.quiet) ws_percenttitle (100.0 * (startpos + sum_read) / (startpos + toread)); #endif --gv This is because of a patch I recently submitted. The code in ws_percenttitle() used to just return if the relevant pointers were null. I replaced that with the assert()s. I failed to notice that ws_changetitle() is only called when opt.verbose is non-zero. (After I removed the call to it from main().) Sorry about that. We could also fix this by calling ws_changetitle() unconditionally. Should the title bar be affected by verbosity? One minor nit regarding the above patch: It should use opt.verbose instead of !opt.quiet so Wget won't assert when –nv is used.
Re: Windows titlebar fix
The attached patch will cause ws_percenttitle() to do nothing if the relevant variables have not been initialized. This is what the code did before I changed it. I changed it because it seemed that the code was making allowances for things that should not happen and I thought an assert would be more appropriate. Though, I guess doing thing this way will make the code more robust against future changes (and thus make the Windows port less of a maintenance burden). The patch also clamps the percentage value instead of just returning when it's out-of-range. This is so it will update the title to display the percentage as 100 in the arcane case when the previous percentage was < 100 and the new percentage is > 100. It also includes Gisle Vanem's fix for retr.c. [I know my assignment is pending but hopefully this patch is small enough to squeak-by until it's been processed.] 2004-03-02 David Fritz <[EMAIL PROTECTED]> * retr.c (fd_read_body): Under Windows, only call ws_percenttitle() if verbose. Fix by Gisle Vanem. * mswindows.c (ws_percenttitle): Guard against future changes by doing nothing if the proper variables have not been initialized. Clamp percentage value. Index: src/mswindows.c === RCS file: /pack/anoncvs/wget/src/mswindows.c,v retrieving revision 1.27 diff -u -r1.27 mswindows.c --- src/mswindows.c 2004/02/26 14:34:17 1.27 +++ src/mswindows.c 2004/03/03 03:21:27 @@ -180,19 +180,24 @@ void ws_percenttitle (double percentage_float) { - int percentage = (int) percentage_float; + int percentage; - /* Only update the title when the percentage has changed. */ - if (percentage == old_percentage) + if (!title_buf || !curr_url) return; - old_percentage = percentage; + percentage = (int) percentage_float; + /* Clamp percentage value. */ + if (percentage < 0) +percentage = 0; if (percentage > 100) +percentage = 100; + + /* Only update the title when the percentage has changed. */ + if (percentage == old_percentage) return; - assert (title_buf != NULL); - assert (curr_url != NULL); + old_percentage = percentage; sprintf (title_buf, "Wget [%d%%] %s", percentage, curr_url); SetConsoleTitle (title_buf); Index: src/retr.c === RCS file: /pack/anoncvs/wget/src/retr.c,v retrieving revision 1.84 diff -u -r1.84 retr.c --- src/retr.c 2003/12/14 13:35:27 1.84 +++ src/retr.c 2004/03/03 03:21:31 @@ -311,7 +311,7 @@ if (progress) progress_update (progress, ret, wtimer_read (timer)); #ifdef WINDOWS - if (toread > 0) + if (opt.verbose && toread > 0) ws_percenttitle (100.0 * (startpos + sum_read) / (startpos + toread)); #endif
Suggestion to add an switch on timestamps
Suggestion to add an switch on timestamps Dear Sir/Madam: WGET is popular FTP software for UNIX. But, after the files were downloaded for the first time, WGET always use the date and time, matching those on the remote server, for the downloaded files. If WGET is executed in temporary directory in which the files will be deleted according to the date of the files, the files, created seven days ago, will be deleted automatically once they are finish. I suggest that an option on timestamps can be added to WGET such that the users can use the current date and time for the newly downloaded files. Thank you for kind attention.
[PATCH] A working implementation of fork_to_background() under Windows – please test
Attached is an implementation of fork_to_background() for Windows that (I hope) has the desired effect under both 9x and NT. _This is a preliminary patch and needs to be tested._ The patch is dependant upon the fact that the only time fork_to_background() is called is on start-up when –b is specified. Windows of course does not support the fork() call, so it must be simulated. This can be done by creating a new process and using some form of inter-process communication to transfer the state of the old process to the new one. This requires the parent and child to cooperate and when done in a general way (such as by Cygwin) requires a lot of work. However, with Wget since we have a priori knowledge of what could have changed in the parent by the time we call fork(), we could implement a special purpose fork() that only passes to the child the things that we know could have changed. (The initialization done by the C run-time library, etc. would be performed anew in the child, but hold on a minute.) The only real work done by Wget before calling fork() is the reading of wgetrc files and the processing of command-line arguments. Passing this information directly to the child would be possible, but the implementation would be complex and fragile. It would need to be updated as changes are made to the main code. It would be much simpler to simply perform the initialization (reading of config files, processing of args, etc.) again in the child. This would have a small performance impact and introduce some race-conditions, but I think the advantages (having –b work) outweigh the disadvantages. The implementation is, I hope, fairly straightforward. I have attempted to explain it in moderate detail in an attached README. I'm hoping others can test it with various operating systems and compilers. Also, any feedback regarding the design or implementation would be welcome. Do you feel this is the right way to go about this? Cheers, David Fritz 2004-03-19 David Fritz <[EMAIL PROTECTED]> * mswindows.c (make_section_name, fake_fork, fake_fork_child): New functions. (fork_to_backgorund): Replace with new implementation. Index: src/mswindows.c === RCS file: /pack/anoncvs/wget/src/mswindows.c,v retrieving revision 1.29 diff -u -r1.29 mswindows.c --- src/mswindows.c 2004/03/19 23:54:27 1.29 +++ src/mswindows.c 2004/03/20 01:34:15 @@ -131,10 +131,240 @@ FreeConsole (); } +/* Construct the name for a named section (a.k.a `file mapping') object. + The returned string is dynamically allocated and needs to be xfree()'d. */ +static char * +make_section_name (DWORD pid) +{ +return aprintf("gnu_wget_fake_fork_%lu", pid); +} + +/* This structure is used to hold all the data that is exchanged between + parent and child. */ +struct fake_fork_info +{ + HANDLE event; + int changedp; + char lfilename[MAX_PATH + 1]; +}; + +/* Determines if we are the child and if so performs the child logic. + Return values: + < 0 error + 0parent + > 0 child +*/ +static int +fake_fork_child (void) +{ + HANDLE section, event; + struct fake_fork_info *info; + char *name; + DWORD le; + + name = make_section_name (GetCurrentProcessId ()); + section = OpenFileMapping (FILE_MAP_WRITE, FALSE, name); + le = GetLastError (); + xfree (name); + if (!section) +{ + if (le == ERROR_FILE_NOT_FOUND) +return 0; /* Section object does not exist; we are the parent. */ + else +return -1; +} + + info = MapViewOfFile (section, FILE_MAP_WRITE, 0, 0, 0); + if (!info) +{ + CloseHandle (section); + return -1; +} + + event = info->event; + + if (!opt.lfilename) +{ + opt.lfilename = unique_name (DEFAULT_LOGFILE, 0); + info->changedp = 1; + strncpy (info->lfilename, opt.lfilename, sizeof (info->lfilename)); + info->lfilename[sizeof (info->lfilename) - 1] = '\0'; +} + else +info->changedp = 0; + + UnmapViewOfFile (info); + CloseHandle (section); + + /* Inform the parent that we've done our part. */ + if (!SetEvent (event)) + return -1; + + CloseHandle (event); + return 1; /* We are the child. */ +} + + +static void +fake_fork (void) +{ + char *cmdline, *args; + char exe[MAX_PATH + 1]; + DWORD exe_len, le; + SECURITY_ATTRIBUTES sa; + HANDLE section, event, h[2]; + STARTUPINFO si; + PROCESS_INFORMATION pi; + struct fake_fork_info *info; + char *name; + BOOL rv; + + event = section = pi.hProcess = pi.hThread = NULL; + + /* Get command line arguments to pass to the child process. + We need to skip the name of the command (what amounts to argv[0]). */ + cmdline = GetCommandLine (); + if (*cmdline == '"') +{ + args = strchr (cmdline + 1, '"'); + if (a
Re: [PATCH] A working implementation of fork_to_background() under Windows – please test
Herold Heiko wrote: MSVC binary at http://xoomer.virgilio.it/hherold/ for public testing. I performed only basic tests on NT4 sp6a, everything performed fine as expected. Thank you much for testing and hosting the binary. Some ideas on this thing: I'll respond to your points out-of-order. In verbose mode the child should probably acknowledge in the log file the fact it was invocated as child. The current patch attempts to emulate the behavior of the Unix version. AFAICT, this and the following suggestion apply equally well to the existing (Unix) code. In quiet mode the parent log (child pid, "log on wget-log" or whatever) probably should not be printed. Also, perhaps in quiet mode the child should not automatically set a log file if none was specified. IIUC, the log file would always be empty. In debug mode the client should probably also log the name of the section object and any information retrieved from it (currently the flag only). Sure, I could add a number of debug prints. A possible fix for the wgetrc race condition could be caching the content of the whole wgetrc in the parent and transmit it in the section object to the child, a bit messy I must admit but a possible solution if that race condition is considered a Bad Thing. That would work, but would require making changes to the main code and would require performing the child detection logic much earlier (even before we know if –b was specified). We could also exploit Windows file-sharing semantics or file locking features to guarantee the config files can't change. I'm unsure such complexity is necessary. About the only scenario I could think of is where you have a script creating a custom wgetrc, run wget, then change the wgetrc: introduce -b and the script could change the wgetrc after running wget but before the parsing on client side a rather remote but possible scenario. In this scenario, the script would have to wait for the parent to terminate to avoid a race, even with the Unix version. With this patch the child would have necessarily finished reading any wgetrc files before the parent terminates. So there shouldn't be a problem. Thanks again, David Fritz
Re: [PATCH] A working implementation of fork_to_background() under Windows – please test
Hrvoje Niksic wrote: For now I'd start with applying David's patch, so that people can test its functionality. It is easy to fix the behavior of `wget -q -b' later. David, can I apply your patch now? Sure. The attached patch corrects a few minor formatting details but is otherwise identical to the previous one. Index: src/mswindows.c === RCS file: /pack/anoncvs/wget/src/mswindows.c,v retrieving revision 1.29 diff -u -r1.29 mswindows.c --- src/mswindows.c 2004/03/19 23:54:27 1.29 +++ src/mswindows.c 2004/03/24 17:50:32 @@ -131,10 +131,240 @@ FreeConsole (); } +/* Construct the name for a named section (a.k.a. `file mapping') object. + The returned string is dynamically allocated and needs to be xfree()'d. */ +static char * +make_section_name (DWORD pid) +{ + return aprintf ("gnu_wget_fake_fork_%lu", pid); +} + +/* This structure is used to hold all the data that is exchanged between + parent and child. */ +struct fake_fork_info +{ + HANDLE event; + int changedp; + char lfilename[MAX_PATH + 1]; +}; + +/* Determines if we are the child and if so performs the child logic. + Return values: + < 0 error + 0 parent + > 0 child +*/ +static int +fake_fork_child (void) +{ + HANDLE section, event; + struct fake_fork_info *info; + char *name; + DWORD le; + + name = make_section_name (GetCurrentProcessId ()); + section = OpenFileMapping (FILE_MAP_WRITE, FALSE, name); + le = GetLastError (); + xfree (name); + if (!section) +{ + if (le == ERROR_FILE_NOT_FOUND) +return 0; /* Section object does not exist; we are the parent. */ + else +return -1; +} + + info = MapViewOfFile (section, FILE_MAP_WRITE, 0, 0, 0); + if (!info) +{ + CloseHandle (section); + return -1; +} + + event = info->event; + + if (!opt.lfilename) +{ + opt.lfilename = unique_name (DEFAULT_LOGFILE, 0); + info->changedp = 1; + strncpy (info->lfilename, opt.lfilename, sizeof (info->lfilename)); + info->lfilename[sizeof (info->lfilename) - 1] = '\0'; +} + else +info->changedp = 0; + + UnmapViewOfFile (info); + CloseHandle (section); + + /* Inform the parent that we've done our part. */ + if (!SetEvent (event)) +return -1; + + CloseHandle (event); + return 1; /* We are the child. */ +} + + +static void +fake_fork (void) +{ + char *cmdline, *args; + char exe[MAX_PATH + 1]; + DWORD exe_len, le; + SECURITY_ATTRIBUTES sa; + HANDLE section, event, h[2]; + STARTUPINFO si; + PROCESS_INFORMATION pi; + struct fake_fork_info *info; + char *name; + BOOL rv; + + event = section = pi.hProcess = pi.hThread = NULL; + + /* Get command line arguments to pass to the child process. + We need to skip the name of the command (what amounts to argv[0]). */ + cmdline = GetCommandLine (); + if (*cmdline == '"') +{ + args = strchr (cmdline + 1, '"'); + if (args) +++args; +} + else +args = strchr (cmdline, ' '); + + /* It's ok if args is NULL, that would mean there were no arguments + after the command name. As it is now though, we would never get here + if that were true. */ + + /* Get the fully qualified name of our executable. This is more reliable + than using argv[0]. */ + exe_len = GetModuleFileName (GetModuleHandle (NULL), exe, sizeof (exe)); + if (!exe_len || (exe_len >= sizeof (exe))) +return; + + sa.nLength = sizeof (sa); + sa.lpSecurityDescriptor = NULL; + sa.bInheritHandle = TRUE; + + /* Create an anonymous inheritable event object that starts out + non-signaled. */ + event = CreateEvent (&sa, FALSE, FALSE, NULL); + if (!event) +return; + + /* Creat the child process detached form the current console and in a + suspended state. */ + memset (&si, 0, sizeof (si)); + si.cb = sizeof (si); + rv = CreateProcess (exe, args, NULL, NULL, TRUE, CREATE_SUSPENDED | + DETACHED_PROCESS, NULL, NULL, &si, &pi); + if (!rv) +goto cleanup; + + /* Create a named section object with a name based on the process id of + the child. */ + name = make_section_name (pi.dwProcessId); + section = + CreateFileMapping (INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0, + sizeof (struct fake_fork_info), name); + le = GetLastError(); + xfree (name); + /* Fail if the section object already exists (should not happen). */ + if (!section || (le == ERROR_ALREADY_EXISTS)) +{ + rv = FALSE; + goto cleanup; +} + + /* Copy the event handle into the section object. */ + info = MapViewOfFile (section, FILE_MAP_WRITE, 0, 0, 0); + if (!info) +{ + rv = FALSE; + goto cleanup; +} + + info->event = event; + + UnmapViewOfFile (info); + + /* S
Re: [PATCH] A working implementation of fork_to_background() under Windows – please test
Hrvoje Niksic wrote: Thanks for the patch, I've now applied it to CVS. You might want to add a comment in front of fake_fork() explaining what it does, and why. The comment doesn't have to be long, only several sentences so that someone reading the code later understands what the heck a "fake fork" is and why we're performing it. Ok, I'll submit a patch latter tonight. Do you think it would be a good idea to include README.fork in windows/ (the directory with the Windows Makefiles, etc. in it)? (If so, I'd like to tweak it a little first, though.) Thanks
Re: [PATCH] A working implementation of fork_to_background() under Windows – please test
Hrvoje Niksic wrote: Thanks for the patch, I've now applied it to CVS. You might want to add a comment in front of fake_fork() explaining what it does, and why. The comment doesn't have to be long, only several sentences so that someone reading the code later understands what the heck a "fake fork" is and why we're performing it. Ok, I hope this is sufficient. Cheers Index: src/mswindows.c === RCS file: /pack/anoncvs/wget/src/mswindows.c,v retrieving revision 1.30 diff -u -r1.30 mswindows.c --- src/mswindows.c 2004/03/24 19:16:08 1.30 +++ src/mswindows.c 2004/03/24 23:52:59 @@ -202,7 +202,14 @@ return 1; /* We are the child. */ } - +/* Windows doesn't support the fork() call; so we fake it by invoking + another copy of Wget with the same arguments with which we were + invoked. The child copy of Wget should perform the same initialization + sequence as the parent; so we should have two processes that are + essentially identical. We create a specially named section object that + allows the child to distinguish itself from the parent and is used to + exchange information between the two processes. We use an event object + for synchronization. */ static void fake_fork (void) { @@ -343,6 +350,8 @@ /* We failed, return. */ } +/* This is the corresponding Windows implementation of the + fork_to_background() function in utils.c. */ void fork_to_background (void) {
Re: wget: strdup: Not enough memory
Axel Pettinger wrote: Hrvoje Niksic wrote: This patch should fix the problem. Please let me know if it works for you: I would like to check it out, but I'm afraid I'm not able to compile it. Why not? What error are you getting? I have not that much experience with compiling source code ... When I try to build WGET.EXE (w/o SSL) using MinGW then I get many warnings and errors in "utils.c" and "log.c", i.e.: -- gcc -DWINDOWS -DHAVE_CONFIG_H -O3 -Wall -I. -c -o log.o log.c log.c:498:26: macro "va_start" requires 2 arguments, but only 1 given log.c: In function `logprintf': log.c:498: `va_start' undeclared (first use in this function) log.c:498: (Each undeclared identifier is reported only once log.c:498: for each function it appears in.) log.c:524:30: macro "va_start" requires 2 arguments, but only 1 given log.c: In function `debug_logprintf': log.c:524: `va_start' undeclared (first use in this function) mingw32-make: *** [log.o] Error 1 -- Regards, Axel Pettinger I just posted a patch to wget-patches that, hopefully, will fix the mingw build. In the meantime try adding the following lines to config.h.mingw: #define WGET_USE_STDARG #define HAVE_SIG_ATOMIC_T HTH, Cheers
Re: wget: strdup: Not enough memory
Axel Pettinger wrote: David Fritz wrote: Axel Pettinger wrote: I have not that much experience with compiling source code ... When I try to build WGET.EXE (w/o SSL) using MinGW then I get many Forgot to mention that the source is 1.9+cvs-dev-200404081407 ... warnings and errors in "utils.c" and "log.c", i.e.: [snip] I just posted a patch to wget-patches that, hopefully, will fix the mingw build. In the meantime try adding the following lines to config.h.mingw: #define WGET_USE_STDARG #define HAVE_SIG_ATOMIC_T "log.c" seems to be ok now, but there's still an error in "utils.c": -- gcc -DWINDOWS -DHAVE_CONFIG_H -O3 -Wall -I. -c -o utils.o utils.c utils.c:53:20: utime.h: No such file or directory utils.c: In function `unique_name_1': utils.c:411: warning: implicit declaration of function `alloca' utils.c: In function `number_to_string': utils.c:1271: warning: suggest parentheses around + or - inside shift mingw32-make: *** [utils.o] Error 1 -- Regards, Axel Pettinger Hmm, you might try upgrading to a newer version of mingw (see http://www.mingw.org/). Alternatively, you could try to comment-out the #define HAVE_UTIME_H 1 line in config.h.mingw or add a utime.h to your mingw include directory that consists of the following line: #include HTH, Cheers
Re: Large Files Support for Wget
IIUC, GNU coreutils uses uintmax_t to store large numbers relating to the file system and prints them with something like this: char buf[INT_BUFSIZE_BOUND (uintmax_t)]; printf (_("The file is %s octets long.\n"), umaxtostr (size, buf)); where umaxtostr() has the following prototype: char *umaxtostr (uintmax_t, char *); and it returns its second argument (the address of the buffer provided by the caller) so it can be used easily as an argument in printf calls.
Re: Input string size limitations
[redirecting this thread to the general discussion list [EMAIL PROTECTED] Laura Sanders wrote: I am using wget to pass order information, which includes item numbers, addresses, etc. I have run into a size limitation on the string I send into wget. [...] How are you `sending' the string to Wget? Under what OS? If you're running into command-line length limitations you can simply put the URL(s) in a file (one per line) and use -i. If you use `wget -i -', Wget will read the list of URLs from stdin; this can be useful in avoiding the need for temporary files. HTH, Cheers
Output error stream if response code != 200
When testing of posting to web services, if the service returns a SOAP fault, it will set the response code to 500. However, the information in the SOAP fault is still useful. When wget gets a 500 response code, it doesn't try to output the "error stream" (as opposed to the "input stream"), where this information would be provided. It might be useful to add a command-line option that specifies to emit the error stream on error.
--continue breakage and changes
Because of the way the always_rest logic has been restructured, if a non-fatal error occurs in an initial attempt, subsequent retries will forget about always_rest and clobber the existing file. Ouch. Also, the behavior of –c when downloading from a server that does not support ranges has changed since 1.9.1. (Or seems to have, from looking at the code; I haven't actually tested.) Previously, Wget would bail in such situations: Continued download failed on this file, which conflicts with `-c'. Refusing to truncate existing file `%s' Now, it will re-download the whole file and discard bytes util it gets to the right position. I think this change deserves explicit mention in the NEWS file. There's an entry about the new logic when Range requests fail, but I don't think it's obvious that this affects –c. Also, I think the old behavior was useful in some situations. If you're short on bandwidth it might not be worth it to re-get the whole file. Especially when it's a popular file and there's likely to be another mirror that does support Range. What would you think of an option to disallow start-over retries?
RE: Output error stream if response code != 200
> -Original Message- > From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] > > "Karr, David" <[EMAIL PROTECTED]> writes: > > > When testing of posting to web services, if the service > returns a SOAP > > fault, it will set the response code to 500. However, the > information > > in the SOAP fault is still useful. When wget gets a 500 response > > code, it doesn't try to output the "error stream" (as > opposed to the > > "input stream"), where this information would be provided. > It might > > be useful to add a command-line option that specifies to emit the > > error stream on error. > > I'm not quite sure what you mean by "error stream" here. The > thing is, web errors are usually in HTML, which is next to > useless if dumped on stdout by Wget. But perhaps Wget should > show that output anyway when -S is used? When I speak of the "error stream" as opposed to the "input stream", I refer to the terms used in the Java API. The latter is meaningful when the response code is 2xx, and the error stream is meaningful when the response code is not in that range. When HTTP applications are HTML-based, both the input stream and the error stream are likely to be HTML, so it doesn't make sense to exclude one, but not the other. However, when the HTTP application is XML-based, the error stream will often be meaningful (and interpretable). I don't know whether it's reasonable to make the error stream show up by default, or turned on by an option. It depends a bit on whether changing that functionality would affect any existing uses.
Headers/resume -s/-c conflict
Hi If I specify -s and -c then the resultant file is corrupted if a resume occurs because the resume sticks the headers partway through the file. Additionally, the resume doesn't do a full grab because it miscounts the size by ignoring the header bytes. Is this on anyones to-do list? David
Re: Headers/resume -s/-c conflict
David Greaves wrote: Hi If I specify -s and -c then the resultant file is corrupted if a resume occurs because the resume sticks the headers partway through the file. Additionally, the resume doesn't do a full grab because it miscounts the size by ignoring the header bytes. Is this on anyones to-do list? David FYI I've fixed and sent in a patch to fix this for the cvs version. I also have a patch for 1.9.1 if anyone wants it. It's particularly useful for apt-cacher which trips over this bug a lot. David Mail me at davidatdgreavesdotcom
2 giga file size limit ?
Hi all, I'm trying to get around this kind of message on I*86 linux boxes with wget 1.9.1 --11:12:08-- ftp://ftp.ensembl.org/pub/current_human/data/mysql/homo_sapiens_snp_23_34e/RefSNP.txt.table.gz => `current_human/data/mysql/homo_sapiens_snp_23_34e/RefSNP.txt.table.gz' ==> CWD not required. ==> PASV ... done. ==> RETR RefSNP.txt.table.gz ... done. Length: -1,212,203,102 The file is actually more than 3giga, since my main goal is to mirror the whole thing @ensembl-org, It would be very fine if mirroring could be used with huge files too There is no trouble though on Tru64 machines. here is what the .listing file says for this file on the linux boxes : total 3753518 -rw-rw-r-- 1 0 0 97960 Jul 21 17:05 Assay.txt.table.gz -rw-rw-r-- 1 0 0 279 Jul 21 19:29 CHECKSUMS.gz -rw-rw-r-- 1 0 0 153157540 Jul 21 17:08 ContigHit.txt.table.gz -rw-rw-r-- 1 0 0 32 Jul 21 17:08 DataSource.txt.table.gz -rw-rw-r-- 1 0 0 18359087 Jul 21 17:09 Freq.txt.table.gz -rw-rw-r-- 1 0 0 46848 Jul 21 17:09 GTInd.txt.table.gz -rw-rw-r-- 1 0 0 185265599 Jul 21 17:13 Hit.txt.table.gz -rw-rw-r-- 1 0 0 35914149 Jul 21 17:14 Locus.txt.table.gz -rw-rw-r-- 1 0 0 20 Jul 21 17:14 Pop.txt.table.gz -rw-rw-r-- 1 0 0 3082764194 Jul 21 19:21 RefSNP.txt.table.gz -rw-rw-r-- 1 0 0 195 Jul 21 19:21 Resource.txt.table.gz -rw-rw-r-- 1 0 0 72306055 Jul 21 19:23 Strain.txt.table.gz -rw-rw-r-- 1 0 0 9480171 Jul 21 19:23 SubPop.txt.table.gz -rw-rw-r-- 1 0 0 286116716 Jul 21 19:27 SubSNP.txt.table.gz -rw-rw-r-- 1 0 0 49095 Jul 21 19:23 Submitter.txt.table.gz -rw-rw-r-- 1 0 0 1697 Jul 21 19:27 homo_sapiens_snp_23_34e.sql.gz You can see that the file is appropriately listed , though once the ftp session is started it reports a negative size.. any solution ?
Re: 2 giga file size limit ?
Yep sorry to being a pain , I've seen a bit later that the issue raised quite a lot of times in the past, My point is though that wget compiled on Tru64 OS does work with huge files. Jonathan Stewart wrote: Wget doesn't support >2GB files. It is a known issue that is brought up a lot. Please patch if you're able, so far no fix has been forthcoming. Cheers, Jonathan - Original Message - From: david coornaert <[EMAIL PROTECTED]> Date: Thu, 09 Sep 2004 12:41:31 +0200 Subject: 2 giga file size limit ? To: [EMAIL PROTECTED] Hi all, I'm trying to get around this kind of message on I*86 linux boxes with wget 1.9.1 --11:12:08-- ftp://ftp.ensembl.org/pub/current_human/data/mysql/homo_sapiens_snp_23_34e/RefSNP.txt.table.gz => `current_human/data/mysql/homo_sapiens_snp_23_34e/RefSNP.txt.table.gz' ==> CWD not required. ==> PASV ... done.==> RETR RefSNP.txt.table.gz ... done. Length: -1,212,203,102 The file is actually more than 3giga, since my main goal is to mirror the whole thing @ensembl-org, It would be very fine if mirroring could be used with huge files too There is no trouble though on Tru64 machines. here is what the .listing file says for this file on the linux boxes : total 3753518 -rw-rw-r-- 1 00 97960 Jul 21 17:05 Assay.txt.table.gz -rw-rw-r-- 1 00 279 Jul 21 19:29 CHECKSUMS.gz -rw-rw-r-- 1 00 153157540 Jul 21 17:08 ContigHit.txt.table.gz -rw-rw-r-- 1 0032 Jul 21 17:08 DataSource.txt.table.gz -rw-rw-r-- 1 00 18359087 Jul 21 17:09 Freq.txt.table.gz -rw-rw-r-- 1 00 46848 Jul 21 17:09 GTInd.txt.table.gz -rw-rw-r-- 1 00 185265599 Jul 21 17:13 Hit.txt.table.gz -rw-rw-r-- 1 00 35914149 Jul 21 17:14 Locus.txt.table.gz -rw-rw-r-- 1 0020 Jul 21 17:14 Pop.txt.table.gz -rw-rw-r-- 1 003082764194 Jul 21 19:21 RefSNP.txt.table.gz -rw-rw-r-- 1 00 195 Jul 21 19:21 Resource.txt.table.gz -rw-rw-r-- 1 00 72306055 Jul 21 19:23 Strain.txt.table.gz -rw-rw-r-- 1 00 9480171 Jul 21 19:23 SubPop.txt.table.gz -rw-rw-r-- 1 00 286116716 Jul 21 19:27 SubSNP.txt.table.gz -rw-rw-r-- 1 00 49095 Jul 21 19:23 Submitter.txt.table.gz -rw-rw-r-- 1 00 1697 Jul 21 19:27 homo_sapiens_snp_23_34e.sql.gz You can see that the file is appropriately listed , though once the ftp session is started it reports a negative size.. any solution ?
maybe wget bug
Hello, I am using wget to invoke a CGI script call, while passing it several variables. For example: wget -O myfile.txt "http://user:[EMAIL PROTECTED]/myscript.cgi?COLOR=blue&SHAPE=circle" where myscript.cgi say, makes an image based on the parameters "COLOR" and "SHAPE". The problem I am having is when I need to pass a key/value pair where the value contains the "&" character. Such as: wget -O myfile.txt "http://user:[EMAIL PROTECTED]/myscript.cgi?COLOR=blue & red&SHAPE=circle" I have tried encoding the "&" as %26, but that does not seem to work (spaces as %20 works fine). The error log for the web server shows that the URL requested does not say %26, but rather "&". It does not appear to me that wget is sending the %26 as %26, but perhaps "fixing" it to "&". I am using GNU wget v1.5.3 with Red Hat 7.0 Thanks! -- David Christopher Asher
Is there a way to override wgetrc options on command line?
Hello, I have several cronjobs using wget and the wgetrc file turns on passive-ftp by default. I have one site where strangely enough passive ftp does not work but active does work. I'd rather leave the passive ftp default set and just change the one cronjob that requires active ftp. Is there any way to tell wget to either disregard the wgetrc file or to override one or more of its options? Thanks.
RE: Is there a way to override wgetrc options on command line?
Thanks! That worked. --Dave -Original Message- From: Hack Kampbjørn [mailto:[EMAIL PROTECTED]] Sent: Friday, June 01, 2001 2:25 AM To: Humes, David G. Cc: '[EMAIL PROTECTED]' Subject: Re: Is there a way to override wgetrc options on command line? "Humes, David G." wrote: > > Hello, > > I have several cronjobs using wget and the wgetrc file turns on passive-ftp > by default. I have one site where strangely enough passive ftp does not > work but active does work. I'd rather leave the passive ftp default set and > just change the one cronjob that requires active ftp. Is there any way to > tell wget to either disregard the wgetrc file or to override one or more of > its options? > > Thanks. What about --execute=COMMAND ? $ wget --help GNU Wget 1.7-pre1, a non-interactive network retriever. Usage: wget [OPTION]... [URL]... Mandatory arguments to long options are mandatory for short options too. Startup: -V, --version display the version of Wget and exit. -h, --help print this help. -b, --backgroundgo to background after startup. -e, --execute=COMMAND execute a `.wgetrc'-style command. [...] -- Med venlig hilsen / Kind regards Hack Kampbjørn [EMAIL PROTECTED] HackLine +45 2031 7799
Unsubscribe help please
Hello, I have tried to unsubscribe several times by sending emails to [EMAIL PROTECTED], but the wget emails keep coming. I hate to bother everyone on the list, but could someone please give me a way to unsubscribe that works. Thanks. --Dave
wget and tag searching
Hi, I am using the wget functionality in one of my projects to search through web content. However I note when I try to recur on a link found in the page that only differs by a ?tag=pag&st=15 then wget seems to ignore everything after the question mark .. thus returning the same content as before. I was wondering if this is a know issue and if you might have any suggestions how I might be able to work around this so that wget can get the webpage in question. Thanks, --Dave
RE: parameters in the URL
Hey, I remember this feature was in WGETWIN 1.5.3.1 It was really useful. But it is missing from WGET 1.8.1 I would like to see this feature added back into WGET because at the moment it is completely broken when the URL contains a question mark '?'. Kind regards, David Robinson -- URL.C (1.5.3.1) char *url_filename (const struct urlinfo *u) { . . . #ifdef WINDOWS { char *p = file; for (p = file; *p; p++) if ( (*p == '%') || (*p == '?') || (*p == '*') ) *p = '@'; } #endif /* WINDOWS */ . . . } -- URL.C (1.8.1) char *url_filename (const struct url *u) { . . . #ifdef WINDOWS { char *p = file; for (p = file; *p; p++) if (*p == '%') *p = '@'; } #endif /* WINDOWS */ . . . } -- From: Herold Heiko Subject: RE: parameters in the URL Date: Tue, 18 Dec 2001 07:07:07 -0800 Older wget versions did some some other character translation, but that had other bigger sideeffects, and has been partially substituted by the current translation table. Some time ago there had been extensive discussions regarding this (at the time of Dan iirc), and there had been some agreement there wasn't any perfect solution working for every case except if wget keeps an external database (say, a file in every directory) where to record exactly which translations have been made, in order to be able to send back the correct urls to web servers when necessary. Heiko -- -- PREVINET S.p.A.[EMAIL PROTECTED] -- Via Ferretto, 1ph x39-041-5907073 -- I-31021 Mogliano V.to (TV) fax x39-041-5907087 -- ITALY >From the wget mailing list archives: http://www.mail-archive.com/wget@sunsite.dk/msg02314.html
RE: Mapping URLs to filenames
Hello The '%' character is valid within Win32 filenames. The '*' and '?' are not valid filename characters. The '%' and '*' are wildcard characters, which is probably why they were excluded in previous versions. There will always be problems mapping strings between namespaces, such as URLs and file systems. WGET could be extended to call an optional shared library provided by the user. This would permit the user to build a URL/Filename mapping table however they chose. In the meantime, however, '?' is problematic for Win32 users. It stops WGET from working properly whenever it is found within a URL. Can we fix it please. Kind regards David Robinson -Original Message- From: Herold Heiko [mailto:[EMAIL PROTECTED]] Sent: Wednesday, 16 January 2002 03:28 To: Wget Development Subject: RE: Mapping URLs to filenames Some comments. a) escape character remapping -> % not best choice ? If I understood correctly how you are proposing to remap the urls to directories and files we'll need to remap the escape character, too, IF that character is a legal char for urls, otherwise it would be not immedeatly obvious if a %xx was part of the url or is a tranlation made by wget. This means IF a url did contain something like somewhat%other we'll have a file like named somewhat%25other (supposing the charset used to generate the hex values contains % at 0x25 like ascii does) but this also means a url fragment some%20what would map to a file some%2520what - not a pretty thing.So possibly the % is not a good choice. b) we're treating mostly html - remap to html entities ? Would it be good to map some characters to things like 'agrave' instead of hex values ? Probably not. Forget it. b) @ on windows I'm not sure if on some dos/win* platforms the % was ever a illegal character. As you stated dos/win batch files could generate real ugliness with files containing, say, %06%20 or something (not that should be ever part of a url but...). Please note, if some other character than % is the escape char (say @), some%thing should not be encoded but some%20thing most definitively on windows should (dangerous three-char combination, a batchfile could later somehow interpret this as "positional parameter number 20"). But see my later point for the "at least on windows" part. c) filename length Why remap only dangerous characters ? There are filenames with file/directory length limitations (minix 14 ? old unixes 14 ? dos 8.3 ? iso9660 8.3 ? iso9660+joliet 63 ? Some other at 254 ? All of these could have some problems with long filenames, urls generated by cgi/jsp/whatever and so on). However, remapping long files/directories to shorter ones creates a BIG problems (IIRC first raised by Dan): collision - say the current file systems is dossish and supports minimal 8.3 filenames. How to remap if we need to save in the same directory both 01234567.htm and 0123456789.htm and lots of similar filenames ? Whatever mapping is done "later" another file in the same directory could need exactly that name - which means the only way to have a complete working mapping between url fragments and filenames is a external table (some file wget maintains in every directory). Note the "every" - if that table would be unique for the whole download, say, in the starting directory, it would not be available anymore if later only some branch of the downloaded directories is used for a successive run, so the table location must be obvious from the directory location itself. Having a single, unique master table for the whole download would mean lot of splicing and joining when changing parts of the local copy before a successive run. Having a different location (not in the directory itself) would mean more difficulty when moving those directories around the local filesystem (need to move the directory and - somewhere else - the table). d) "presets" As you said there's always the odd combination (save as vfat from linux, save as iso9660 from whatever os, ecc.). Users should not be required to know exactly what the requirements are (at least for the more usual file systems - generic "longnames" unix, vfat, fat, ntfs, iso9660, vms, minix should cover most cases) - they are users, not admins. Beside the possibility of specifying an exact, manual, detailed setup (command line probably is too complex, rule file specified from command line or .wgetrc I'd say), there should be some presets included for those usual cases mentioned above. Possibly the above+iso9660, too. This could be as easy as some ruleset files included in the sources, mentioned in the docs and installed by default (/usr/local/lib/wget or wherever), or even compiled in, although compiling in any ruleset different than the default is probably not worth it (to avoid binary bloating, we need to be able to load external rules a
RE: Mapping URLs to filenames
I like this proposal. This would restore the version 1.5.3 behaviour. David. -Original Message- From: Ian Abbott [mailto:[EMAIL PROTECTED]] Sent: Wednesday, 16 January 2002 21:48 To: Wget List Subject: RE: Mapping URLs to filenames On 16 Jan 2002 at 8:02, David Robinson (AU) wrote: > In the meantime, however, '?' is problematic for Win32 users. It stops WGET > from working properly whenever it is found within a URL. Can we fix it > please. My proposal for using escape sequences in filenames for problem characters is up for discussion at the moment, but I'm not sure if they really need to be reversible (except that it helps to reduce the chances of different URLs being saved to the same filename). Would it be sufficient to map all illegal characters to '@'? For Windows, the code already changes '%' to '@' and it could just as easily change '*', '?', etc. to '@' as well.
timestamping
This isn't a bug, but the offer of a new feature. The timestamping feature doesn't quite work for us, as we don't keep just the latest view of a website and we don't want to copy all those files around for each update. So I implemented a --changed-since=mmdd[hhmm] flag to only get files that have changed since then according to the header. It seems to work okay, although your extra check for file-size eqality for the timestamping feature makes me wonder if the date isn't always a good measure. One oddity is that if you point wget at a file that's older than the date at the top level, it won't be gotten and there won't be any urls to recurse on. (We're pointing it at an url that changes daily.) I tested it under Solaris 7, but there is a dependency on time() and gmtime() that I haven't conditionalized for autoconf, as I am not familiar with that tool. I would like this feature to get carried along with the rest of the codebase; would you like it? -dca
spanish characters at name file
Hi all, I'm a spanish guy who is working with this good program but I'm having problems with some spanish characters and blanks(only in the begining) of the file name which I've tried to download from a ftp. An example could be: "/tmp/camaras y acción.jpg" if there is anyone who has solved this problem, please tell me, if not, I'm trying to recompile the source with the apropiate modifications so if anyone could helpme in this field I would be very grateful. Thx 4 all. :)MSN Fotos: la forma más fácil de compartir e imprimir fotos. Haga clic aquí
--mirror not downloading everything
Hi all, I want to mirror an ftp, but there are some files that it can't download: ftp> pwd 257 "/Admon/datosvo/empresas" is current directory. ftp> ls 227 Passive Mode (x) 125 Data connection already open; Transfer starting. 10-02-01 11:05AM Vehiculos Ocasión 226 Transfer complete. then wget --mirror ftp://:[EMAIL PROTECTED]/Admon/datosvo/empresas/* but ==> CWD /Admon/datosvo/empresas/Vehiculos Ocasión ... No existe el directorio `Admon/datosvo/empresas/Vehiculos Ocasión'. (directory `Admon/datosvo/empresas/Vehiculos Ocasión' does not exist) Thanks in advance. David. _ Únase al mayor servicio mundial de correo electrónico: http://www.hotmail.com/es
wget -nd -r doesn't work as documented
Doing wget -nd -r doesn't overwrite a file of the same name, as the documentation claims. Is there any other way to do this? Thanks. Dave
Two cookie bugs and a problem
The challenge is to navigate into an ASP.NET site, which requires a sequence of GET and POST requests and that uses multiple session cookies. 1. Session cookies don't save in the release version (1.9.1). I downloaded a pre-release version which supports --keep-session-cookies. However... 2. Only 1 session cookie is saved. The site generates many. Result: showstopper. 3. You can't issue multiple POSTs in a single command. Which brings us back to (1). To navigate into a site that requires some mixture of GET and POST requests to login and find the right page is just hard work. You have to use 3 cookie switches on every command to --load, --save and --keep session cookies. I pray for something better, like IJW (It Just Works). To summarise All I want is to be able to issue a sequence of GET and POST requests in a specific order prior to specifying the file(s) to download, and to have session cookies automatically do the right thing. Is this too much to ask? David B.
RE: new bug tracking system for GNU wget
> if i don't find any major problem, i am planning to release wget 1.9.2 with > LFS support and a long list of bugfixes before the end of the year. Are you planning to fix session cookies? In the current release version they don't work. In the tip build they nearly work, but I got problems logging in to an ASP.NET site with multiple session cookies (only the first one seemed to work). I don't have a repro case, but I'm hoping you've got some unit test cases on multiple session cookies and it's in that long list somewhere. [I'm now using curl, which handles this just fine.] BTW a compact syntax would be nice, combining the functions of --load-cookies, --save-cookies and --keep-session-cookies when stringing together multiple wget commands in one session. How about -C with a default cookie file name of cookies.txt? DavidB
RE: Wget 1.11.3 - case sensetivity and URLs
Thanks averyone for the contributions. Ultimately, our purpose is to process documents from the site into our search database, so probably the most important thing is to limit the number of files being processed. The case of the URLs in the html probably wouldn't cause us much concern, but I could see that it might be useful to "convert" a site for mirroring from a non-case sensetive (windows) environment to a case sensetive (li|u)nix one - this would need to include translation of urls in content as well as filenames on disk. In the meantime - does anyone know of a proxy server that could translate urls from mixed case to lower case. I thought that if we downloaded using wget via such a proxy server we might get the appropriate result. The other alternative we were thinking of was to post process the files with symlinks for all mixed case versions of files and directories (I think someone already suggested this - greate minds and all that...). I assume that wget would correctly use the symlink to determine the time/date stamp of the file for determining if it requires updating (or would it use the time/date stamp of the symlink?). I also assume that if wget downloaded the file it would overwrite the symlink and we would have to run our "convert files to" symlinks process again. Just to put it in perspective, the actual site is approximately 45gb (that's what the administrator said) and wget downloaded > 100gb (463,000 files) when I did the first process. Cheers Allan -Original Message- From: Micah Cowan [mailto:[EMAIL PROTECTED] Sent: Saturday, 14 June 2008 7:30 AM To: Tony Lewis Cc: Coombe, Allan David (DPS); 'Wget' Subject: Re: Wget 1.11.3 - case sensetivity and URLs -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony Lewis wrote: > Micah Cowan wrote: > >> Unfortunately, nothing really comes to mind. If you'd like, you could >> file a feature request at >> https://savannah.gnu.org/bugs/?func=additem&group=wget, for an option >> asking Wget to treat URLs case-insensitively. > > To have the effect that Allan seeks, I think the option would have to > convert all URIs to lower case at an appropriate point in the process. > I think you probably want to send the original case to the server > (just in case it really does matter to the server). If you're going to > treat different case URIs as matching then the lower-case version will > have to be stored in the hash. The most important part (from the > perspective that Allan voices) is that the versions written to disk > use lower case characters. Well, that really depends. If it's doing a straight recursive download, without preexisting local files, then all that's really necessary is to do lookups/stores in the blacklist in a case-normalized manner. If preexisting files matter, then yes, your solution would fix it. Another solution would be to scan directory contents for the first name that matches case insensitively. That's obviously much less efficient, but has the advantage that the file will match at least one of the "real" cases from the server. As Matthias points out, your lower-case normalization solution could be achieved in a more general manner with a hook. Which is something I was planning on introducing perhaps in 1.13 anyway (so you could, say, run sed on the filenames before Wget uses them), so that's probably the approach I'd take. But probably not before 1.13, even if someone provides a patch for it in time for 1.12 (too many other things to focus on, and I'd like to introduce the "external command" hooks as a suite, if possible). OTOH, case normalization in the blacklists would still be useful, in addition to that mechanism. Could make another good addition for 1.13 (because it'll be more useful in combination with the rename hooks). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIUua+7M8hyUobTrERAr0tAJ98A/WCfPNhTOQ3Xcfx2eWP2stofgCcDUUQ nVYivipui+0TRmmK04kD2JE= =OMsD -END PGP SIGNATURE-
RE: Wget 1.11.3 - case sensetivity and URLs
OK - now I am confused. I found a perl based http proxy (named "http::proxy" funnily enough) that has filters to change both the request and response headers and data. I modified the response from the web site to lowercase the urls in the html (actually I lowercased the whole response) and the data that wget put on disk was fully lowercased - problem solved - or so I thought. However, the case of the files on disk is still mixed - so I assume that wget is not using the URL it originally requested (harvested from the HTML?) to create directories and files on disk. So what is it using? A http header (if so, which one??). Any ideas?? Cheers Allan
RE: Wget 1.11.3 - case sensetivity and URLs
Sorry Guys - just an ID 10 T error on my part. I think I need to change 2 things in the proxy server. 1. URLs in the HTML being returned to wget - this works OK 2. The "Content-Location" header used when the web server reports a "301 Moved Permanently" response - I think this works OK. When I reported that it wasn't working I hadn't done both at the same time. Cheers Allan -Original Message- From: Micah Cowan [mailto:[EMAIL PROTECTED] Sent: Wednesday, 25 June 2008 6:44 AM To: Tony Lewis Cc: Coombe, Allan David (DPS); 'Wget' Subject: Re: Wget 1.11.3 - case sensetivity and URLs -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony Lewis wrote: > Coombe, Allan David (DPS) wrote: > >> However, the case of the files on disk is still mixed - so I assume >> that wget is not using the URL it originally requested (harvested >> from the HTML?) to create directories and files on disk. So what is >> it using? A http header (if so, which one??). > > I think wget uses the case from the HTML page(s) for the file name; > your proxy would need to change the URLs in the HTML pages to lower > case too. My understanding from David's post is that he claimed to have been doing just that: > I modified the response from the web site to lowercase the urls in the > html (actually I lowercased the whole response) and the data that wget > put on disk was fully lowercased - problem solved - or so I thought. My suspicion is it's not quite working, though, as otherwise where would Wget be getting the mixed-case URLs? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIYVyq7M8hyUobTrERAo6mAJ4ylEi5qUZqE7DR8xL2XjWOSfuurACePrIz Vl7REl1hNVNqdBrLqoygrcE= =jlBN -END PGP SIGNATURE-