Addition to MACHINES File
As requested, I am including the output from ./config-guess for my Linux for S/390 system. # ./config.guess s390-ibm-linux Version 1.5.3 works just fine on this system, although I am having problems with 1.6 and 1.7, which I am detailing in a separate email. Mark Post
Segfault on Linux/390 for wget 1.6 and 1.7
I am having problems with both wget 1.6 and wget 1.7. I have a working wget 1.5.3 that I use a quite a lot. When I compile wget 1.6 or 1.7, using either the -O2 (default) or -O1 parameters on gcc 2.95.2, I get segmentation faults as follows: # wget -m -nd ftp://ftp.slackware.com/pub/slackware/alpha/slackware-current/slakware/n1/wg et* --13:06:27-- ftp://ftp.slackware.com/pub/slackware/alpha/slackware-current/slakware/n1/wg et* = `.listing' Connecting to ftp.slackware.com:21... connected! Logging in as anonymous ... Logged in! == TYPE I ... done. == CWD pub/slackware/alpha/slackware-current/slakware/n1 ... done. == PORT ... done.== LIST ... done. 0K - ... 13:06:39 (417.44 B/s) - `.listing' saved [4025] Segmentation fault When I compile wget with -O0 to turn off optimization, wget works, but I get some garbage in the output as follows: # wget -m -nd ftp://ftp.slackware.com/pub/slackware/alpha/slackware-current/slakware/n1/wg et* --13:01:08-- ftp://ftp.slackware.com/pub/slackware/alpha/slackware-current/slakware/n1/wg et* = `.listing' Connecting to ftp.slackware.com:21... connected! Logging in as anonymous ... Logged in! == TYPE I ... done. == CWD pub/slackware/alpha/slackware-current/slakware/n1 ... done. == PORT ... done.== LIST ... done. 0K - ... 13:01:11 (3.10 KB/s) - `.listing' saved [4025] --@woeÿâ8Àt¸EUR@b¸EUR@fOEÿâ8-- = `' == CWD not required. == PORT ... done.== RETR wget-1.6-alpha-1.tgz ... done. Length: 274,854 0K - .. .. .. .. .. [ 18%] 50K - .. .. .. .. .. [ 37%] 100K - .. .. .. .. .. [ 55%] 150K - .. .. .. .. .. [ 74%] 200K - .. .. .. .. .. [ 93%] 250K - .. [100%] 13:01:22 (25.12 KB/s) - `wget-1.6-alpha-1.tgz' saved [274854] FINISHED --13:01:22-- Downloaded: 278,879 bytes in 2 files What other documentation would you need from me on this problem? Please be specific on how to get it, also, since I am very unfamiliar with gdb, etc. Mark Post
RE: Segfault on Linux/390 for wget 1.6 and 1.7
Jan, Did you ever make any progress on this? Mark Post -Original Message- From: Jan Prikryl [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 19, 2001 1:53 PM To: Post, Mark K Cc: Wget mailing list Subject: Re: Segfault on Linux/390 for wget 1.6 and 1.7 Quoting Post, Mark K ([EMAIL PROTECTED]): When I compile wget with -O0 to turn off optimization, wget works, but I get some garbage in the output as follows: Could you please try (1) to run wget with the -d parameter to switch on the debugging output (2) compile wget using -O2 -g and have a look what gdb wget core reports? It shall be able to provide us with the content of the call stack in the moment of crash that in turn would reveal the place where wget crashes. Thanks, -- jan ---+ Dr. Jan Prikryl icq | vr|vis center for virtual reality and [EMAIL PROTECTED] 83242638 | visualisation http://www.vrvis.at ---+
RE: wget: ftp through http proxy not working with 1.8.2. It does work with 1.5.3
Hans, I'm investigating this as a proxy server problem. When I ran some tests, it appeared as though the HEAD command from wget was getting translated into a series of commands to query the size, MDMT of the file, etc., but then I was seeing a STOR command come from the proxy server, which was getting failed by the FTP server. If you'd like to co-ordinate some tests off-list, we can see if something similar is happening to you. Do you know what proxy server you are using? Mark Post -Original Message- From: Hans Deragon (QA/LMC) [mailto:[EMAIL PROTECTED] Sent: Monday, July 14, 2003 10:21 AM To: '[EMAIL PROTECTED]' Subject: RE: wget: ftp through http proxy not working with 1.8.2. It doe s wo rk with 1.5.3 Hi again. Some people have reported experiencing the same problem, but nobody from the development team has forwarded a comment on this. Anybody can tell us if this is bug or some config issue? Regards, Hans Deragon -Original Message- From: Hans Deragon (LMC) Sent: Wednesday, July 02, 2003 2:11 PM To: '[EMAIL PROTECTED]' Subject: wget: ftp through http proxy not working with 1.8.2. It does wo rk with 1.5.3 Greetings. I read many emails in different archives on the net regarding my isssue, but never found a solution to my problem. Here is the description: I am trying to mirror an ftp site though and http proxy. I have the following setting on both my RH machine running wget 1.8.2 and Solaris machine running wget 1.5.3 (actual /etc/wgetrc file is the same on both machine and I do not have ~/.wgetrc on any of the machines): http_proxy = http://proxy.hostname.com:80/ ftp_proxy = http://proxy.hostname.com:80/ use_proxy = on Ok, proxy.hostname.com is not the real URL I use, but believe me that the one I am using is the right one. The port numbers are valid and only port 80 is opened, thus both http and ftp request most go through port 80. Now, when I run say: wget --mirror -np --cut-dirs=2 ftp://ftp.ox.ac.uk/pub/wordlists/ On my RH machine running wget 1.8.2, only the index.html file is downloaded. On my Solaris machine running wget 1.5.3, all the files under pub/wordlists get downloaded. Anybody got a clue what the problem is? Is it that wget 1.8.2 is more compliant to a standard and my proxy is bogus and I am just lucky it works with version 1.5.3? Is there a configuartion parameter missing in my /etc/wgetrc? Or is it a known problem with version 1.8.2? I am a newbie with wget, so the mistake is probably on my side, but I cannot figure what it is. Regards, Hans Deragon
RE: wget: ftp through http proxy not working with 1.8.2. It does work with 1.5.3
Hans, Based on what I'm seeing on the FTP server side, and the debugging output you sent, this definitely looks like a wget problem. From the FTP server, everything looks normal. No strange command sequences, nothing odd at all. So, I guess you need to keep bugging the wget maintainer (if he's still interested in working on wget). Mark -Original Message- From: Hans Deragon (QA/LMC) [mailto:[EMAIL PROTECTED] Sent: Monday, July 14, 2003 1:05 PM To: 'Post, Mark K' Subject: RE: wget: ftp through http proxy not working with 1.8.2. It doe s work with 1.5.3 wget --debug -m ftp://l015062.zseriespenguins.ihost.com/ 2nd time output: == DEBUG output created by Wget 1.8.2 on linux-gnu. --12:58:00-- ftp://l015062.zseriespenguins.ihost.com/ = `l015062.zseriespenguins.ihost.com/index.html' Resolving www-proxy.lmc.ericsson.se... done. Caching www-proxy.lmc.ericsson.se = 142.133.17.203 Connecting to www-proxy.lmc.ericsson.se[142.133.17.203]:80... connected. Created socket 3. Releasing 0x8082060 (new refcount 1). ---request begin--- HEAD ftp://l015062.zseriespenguins.ihost.com/ HTTP/1.0 User-Agent: Wget/1.8.2 Host: l015062.zseriespenguins.ihost.com Accept: */* ---request end--- Proxy request sent, awaiting response... HTTP/1.0 200 OK Server: Squid/2.4.STABLE2 Mime-Version: 1.0 Date: Mon, 14 Jul 2003 16:59:11 GMT Content-Type: text/html Age: 25 X-Cache: HIT from www-proxy.lmc.ericsson.se Proxy-Connection: close Length: unspecified [text/html] Closing fd 3 Last-modified header missing -- time-stamps turned off. --12:58:00-- ftp://l015062.zseriespenguins.ihost.com/ = `l015062.zseriespenguins.ihost.com/index.html' Found www-proxy.lmc.ericsson.se in host_name_addresses_map (0x8082060) Connecting to www-proxy.lmc.ericsson.se[142.133.17.203]:80... connected. Created socket 3. Releasing 0x8082060 (new refcount 1). ---request begin--- GET ftp://l015062.zseriespenguins.ihost.com/ HTTP/1.0 User-Agent: Wget/1.8.2 Host: l015062.zseriespenguins.ihost.com Accept: */* ---request end--- Proxy request sent, awaiting response... HTTP/1.0 200 OK Server: Squid/2.4.STABLE2 Mime-Version: 1.0 Date: Mon, 14 Jul 2003 16:59:47 GMT Content-Type: text/html X-Cache: MISS from www-proxy.lmc.ericsson.se Proxy-Connection: close Length: unspecified [text/html] 0K . 163.92 KB/s Closing fd 3 12:58:11 (163.92 KB/s) - `l015062.zseriespenguins.ihost.com/index.html' saved [1175] FINISHED --12:58:11-- Downloaded: 1,175 bytes in 1 files Diff between 1st and 2nd output: == [EMAIL PROTECTED] 2]# diff output ../1/output 3c3 --12:58:00-- ftp://l015062.zseriespenguins.ihost.com/ --- --12:56:41-- ftp://l015062.zseriespenguins.ihost.com/ 11,36d10 HEAD ftp://l015062.zseriespenguins.ihost.com/ HTTP/1.0 User-Agent: Wget/1.8.2 Host: l015062.zseriespenguins.ihost.com Accept: */* ---request end--- Proxy request sent, awaiting response... HTTP/1.0 200 OK Server: Squid/2.4.STABLE2 Mime-Version: 1.0 Date: Mon, 14 Jul 2003 16:59:11 GMT Content-Type: text/html Age: 25 X-Cache: HIT from www-proxy.lmc.ericsson.se Proxy-Connection: close Length: unspecified [text/html] Closing fd 3 Last-modified header missing -- time-stamps turned off. --12:58:00-- ftp://l015062.zseriespenguins.ihost.com/ = `l015062.zseriespenguins.ihost.com/index.html' Found www-proxy.lmc.ericsson.se in host_name_addresses_map (0x8082060) Connecting to www-proxy.lmc.ericsson.se[142.133.17.203]:80... connected. Created socket 3. Releasing 0x8082060 (new refcount 1). ---request begin--- 46c20 Date: Mon, 14 Jul 2003 16:59:47 GMT --- Date: Mon, 14 Jul 2003 16:58:28 GMT 54c28 0K . 163.92 KB/s --- 0K . 382.49 KB/s 57c31,32 12:58:11 (163.92 KB/s) - `l015062.zseriespenguins.ihost.com/index.html' saved [1175] --- Last-modified header missing -- time-stamps turned off. 12:56:52 (382.49 KB/s) - `l015062.zseriespenguins.ihost.com/index.html' saved [1175] 60c35 FINISHED --12:58:11-- --- FINISHED --12:56:52--
RE: wget and procmail
Does the PATH of procmail contain the directory where wget lives? Mark Post -Original Message- From: Michel Lombart [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 29, 2003 6:51 PM To: [EMAIL PROTECTED] Subject: wget and procmail Hello, I've an issue with wget and procmail. I install the forum software mailgust ( http://mailgust.phpoutsourcing.com/ ) on a Cobalt/Sun Raq4. I need, in order to use incoming e-mail, to install a .procmailrc file calling wget. When I type the complete command on console wget works fine. When wget is called by procmail it does nothing. I've enabled a verbose logfile for procmail and I see in the log the call of wget without error. Any idea ? Thank for your help Michel Lombart
RE: -N option
Other than the --ignore-length option I mentioned previously, no. Sorry. Mark Post -Original Message- From: Preston [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 29, 2003 7:01 PM To: [EMAIL PROTECTED] Subject: Re: -N option Aaron S. Hawley wrote: On Tue, 29 Jul 2003, Post, Mark K wrote: .. So, perhaps you need to modify your work practices rather than diddle with the software. Copy the locally updated files to another location so they're not clobbered when the remote version changes. indeed. consider creating local copies by instead just tracking versions of your image files with RCS if its available for your system (and if you aren't already using it): http://www.gnu.org/software/rcs/ To answer questons asked so far: We are using wget version 1.8.2 I have checked the dates on the local file and the remote file and the local file date is newer. The reason I thought it was still clobbering despite the newer date on the local was because of the size difference. I read that in the online manual here: http://www.gnu.org/manual/wget/html_chapter/wget_5.html#SEC22 At the bottom it says, If the local file does not exist, or the sizes of the files do not match, Wget will download the remote file no matter what the time-stamps say. I do want newer files on the remote to replace older files on the local server. Essentially, I want the newest file to remain on the local. The problem I am having, however is that if we change/update files on the local, if they are of a different size, the remote copy is downloaded and clobbers the local no matter what the dates are. I hope this is clear, sorry if I have not explained the problem well. Let me know if you have anymore ideas and if you need me to try again to explain. Thanks for your help. Preston [EMAIL PROTECTED]
RE: Wget 1.8.2 timestamping bug
Angelo, It works for me: # wget -N http://www.nic.it/index.html --13:04:39-- http://www.nic.it/index.html = `index.html' Resolving www.nic.it... done. Connecting to www.nic.it[193.205.245.10]:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2,474 [text/html] 100%[] 2,474 142.12K/sETA 00:00 13:04:44 (142.12 KB/s) - `index.html' saved [2474/2474] [EMAIL PROTECTED]:/tmp# wget -N http://www.nic.it/index.html --13:04:49-- http://www.nic.it/index.html = `index.html' Resolving www.nic.it... done. Connecting to www.nic.it[193.205.245.10]:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2,474 [text/html] Server file no newer than local file `index.html' -- not retrieving. [EMAIL PROTECTED]:/tmp# wget -V GNU Wget 1.8.2 Are you perhaps behind a firewall? At my work location, I frequently run into cases where the firewall does not correctly pass date and timestamp information back to wget. Mark Post -Original Message- From: Angelo Archie Amoruso [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 05, 2003 6:36 AM To: [EMAIL PROTECTED] Subject: Wget 1.8.2 timestamping bug Hi All, I'm using Wget 1.8.2 on a Redhat 9.0 box equipped with Athlon 550 MHz cpu, 128 MB Ram. I've encountered a strange issue, which seem really a bug, using the timestamping option. I'm trying to retrieve the http://www.nic.it/index.html page. The HEAD HTTP method returns that page is 2474 bytes long and Last Modified on Wed, 30 Oct 2002. Using wget I retrieve it (using -N) on /tmp and I get : (ls -l --time-style=long) -rw-r--r--1 root root 2474 2002-10-30 15:53 index.html Then running again wget with -N I get The sizes do not match (local 91941) -- retrieving And on /tmp I get again: -rw-r--r--1 root root 2474 2002-10-30 15:53 index.html What's happening? Does Wget check creation time file time, which is obviously : -rw-r--r--1 root root 2474 2003-08-05 12:28 index.html Thanks for your time and cooperation. Please reply by email. Below you'll find actual output = HEAD == Trying 193.205.245.10... Connected to www.nic.it. Escape character is '^]'. GET /index.html HTTP/1.0 HTTP/1.1 200 OK Date: Tue, 05 Aug 2003 10:11:26 GMT Server: Apache/2.0.45 (Unix) mod_ssl/2.0.45 OpenSSL/0.9.7a Last-Modified: Wed, 30 Oct 2002 14:53:58 GMT ETag: 2bc04-9aa-225d2d80 Accept-Ranges: bytes Content-Length: 2474 Connection: close Content-Type: text/html; charset=ISO-8859-1 I run wget with the following parameters: wget -N -O /tmp/index.html == GET OUTPUT [EMAIL PROTECTED] celldataweb]# wget -N -O /tmp/index.html http://www.nic.it/index.html --12:18:31-- http://www.nic.it/index.html = `/tmp/index.html' Resolving www.nic.it... done. Connecting to www.nic.it[193.205.245.10]:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2,474 [text/html] The sizes do not match (local 91941) -- retrieving. --12:18:31-- http://www.nic.it/index.html = `/tmp/index.html' Connecting to www.nic.it[193.205.245.10]:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2,474 [text/html] 100%[] 2,474 38.35K/sETA 00:00 12:18:32-rw-r--r--1 root root 2474 Oct 30 2002 index.html (38.35 KB/s) - On /tmp: -rw-r--r--1 root root 2474 Oct 30 2002 index.html When I try again: [EMAIL PROTECTED] celldataweb]# wget -N -O /tmp/index.html http://www.nic.it/index.html === SECOND GET OUTPUT === --12:18:31-- http://www.nic.it/index.html = `/tmp/index.html' Resolving www.nic.it... done. Connecting to www.nic.it[193.205.245.10]:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2,474 [text/html] The sizes do not match (local 91941) -- retrieving. --12:18:31-- http://www.nic.it/index.html = `/tmp/index.html' Connecting to www.nic.it[193.205.245.10]:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2,474 [text/html] 100%[] 2,474 38.35K/sETA 00:00 12:18:32 (38.35 KB/s) - `/tmp/index.html' saved [2474/2474] But on /tmp: -rw-r--r--1 root root 2474 Oct 30 2002 index.html What is happening? -- To The Kernel And Beyond!
RE: wget is mirroring whole internet instead of just my web page!
man wget shows: -D domain-list --domains=domain-list Set domains to be followed. domain-list is a comma-separated list of domains. Note that it does not turn on -H. Mark Post -Original Message- From: Andrzej Kasperowicz [mailto:[EMAIL PROTECTED] Sent: Monday, August 18, 2003 8:38 AM To: [EMAIL PROTECTED] Subject: wget is mirroring whole internet instead of just my web page! When I try to mirror web pages using the command: wget -m -nv -k -K -nH -t 100 -o logchemfanpl -P public_html/mirror http://znik.wbc.lublin.pl/ChemFan/ wget is mirroring not just the domain of the web page but just whole internet... There is robot.txt files, but it should not influence wget to download all available domains I suppose? So why is it happening and how to avoid it? Regards Andrzej.
RE: wget is mirroring whole internet instead of just my web page!
It's always been my experience when specifying -m that wget does follow across domains by default. I've always had to tell it not to do that. Mark Post -Original Message- From: Andrzej Kasperowicz [mailto:[EMAIL PROTECTED] Sent: Monday, August 18, 2003 4:02 PM To: Post, Mark K; [EMAIL PROTECTED] Subject: RE: wget is mirroring whole internet instead of just my web page! On 18 Aug 2003 at 13:49, Post, Mark K wrote: man wget shows: -D domain-list --domains=domain-list Set domains to be followed. domain-list is a comma-separated list of domains. Note that it does not turn on -H. Right, but by default wget should not follow all domains, then why it was happening in this case? I tried also to mirror another web site from the same server, also containing links to other domains: wget -m -nv -k -K -nH -t 100 -o logmineraly -P public_html/mirror http://znik.wbc.lublin.pl/Mineraly/ and in this case it was not downloading from other domains. So that's a mystery really. Anyway, if I add -D wbc.lublin.pl it should run correctly? wget -m -nv -k -K -nH -t 100 -D wbc.lublin.pl -o logchemfanpl -P public_html/mirror http://znik.wbc.lublin.pl/ChemFan/ ak
RE: wget and 2 users / passwords to get through?
If this is a non-transparent proxy, you do indeed need to use the proxy parameters: --proxy-user=user --proxy-passwd=password As well as set the proxy server environment variables ftp_proxy=http://proxy.server.name[:port] - Note the http:// value. That is correct. http_proxy=http://proxy.server.name[:port] Mark Post -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 19, 2003 8:03 AM To: [EMAIL PROTECTED] Subject: wget and 2 users / passwords to get through? Hi I'm trying to use wget on an external ftp server, but have to pass a gateway server in the company before i'm on the internet. So i have to specify 2 sets of user/pass. 1 set for the gateway and 1 for the ftp server. This works ok in e.g. windows commander. But how do i specify this when calling wget? I have tried several combinations in: --ftp-user= --ftp-passwd= I don't think that the proxy thing shall be involved here. Best Regards / Venlig Hilsen Lars Rasmussen -- Rohde Schwarz Technology Center A/S Tel.: +45 96 73 88 88 http://www.rohdeschwarz.dk Lars Rasmussen SW Developer Tel.: +45 96 73 88 34 mailto:[EMAIL PROTECTED]
RE: rfc2732 patch for wget
Absolutely. I would much rather get an intelligent error message stating that ipv6 addresses are not supported, versus a misleading one about the host not being found. That would save end-users a whole lot of wasted time. Mark Post -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Friday, September 05, 2003 4:23 PM To: Mauro Tortonesi Cc: [EMAIL PROTECTED] Subject: Re: rfc2732 patch for wget -snip- I'm starting to think that Wget should reject all [...] addresses when IPv6 is not compiled in because they, being valid IPv6 addresses, have no chance of ever working. What do you think?
RE: wget -r -p -k -l 5 www.protcast.com doesnt pull some images t hough they are part of the HREF
No, it won't. The javascript stuff makes sure of that. Mark Post -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 09, 2003 4:32 PM To: [EMAIL PROTECTED] Subject: wget -r -p -k -l 5 www.protcast.com doesnt pull some images though they are part of the HREF Hi, I am having some problems with downloading www.protcast.com. I used wget -r -p -k -l 5 www.protcast.com In www.protcast.com/Grafx files menu-contact_(off).jpg get downloaded. However menu-contact_(on).jpg does not get downloaded though it lies in the same directory as the menu-contact_(off).jpg file. index.html contains the following HREF A HREF=contact.htm ONMOUSEOVER=msover1('m-contact','Grafx/menu-contact_(on).jpg'); ONMOUSEOUT=msout1('m-contact','Grafx/menu-contact_(off).jpg'); IMG SRC=Grafx/menu-contact_(off).jpg NAME=m-contact WIDTH=197 HEIGHT=29 BORDER=0/A so wget should be able to see this image right?. Please help/advice. bye
RE: Small change to print SSL version
Perhaps, but it is kind of nice to get that information from the program itself at the same time you get the version information. For example: # ssh -V OpenSSH_3.7p1, SSH protocols 1.5/2.0, OpenSSL 0.9.7b 10 Apr 2003 All the information, from one place. Mark Post -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 17, 2003 7:15 AM To: Christopher G. Lewis Cc: [EMAIL PROTECTED] Subject: Re: Small change to print SSL version Christopher G. Lewis [EMAIL PROTECTED] writes: Here's a small change to print out the OpenSSL version with the -V --help parameters. [...] I think that GNU Wget something should always stand for Wget's version, regardless of the libraries it has been compiled with. But if you want to see the version of libraries, why not make it clearer, e.g.: GNU Wget x.x.x (compiled with OpenSSL x.x.x) BTW can't you find out OpenSSL version by using `ldd'?
RE: Compile and link problems with wget 1.9 beta5
Do you see the missing symbol when you do an nm -D command against either libssl.so or libcrypto.so? (It shows up on my Linux system in libcrypto.so.) Mark Post -Original Message- From: Robert Poole [mailto:[EMAIL PROTECTED] Sent: Sunday, October 12, 2003 2:23 PM To: [EMAIL PROTECTED] Subject: Compile and link problems with wget 1.9 beta5 After ploughing through the archives of this mailing list, looking for additional clues why wget 1.8.2 wasn't linking correctly, I found that wget 1.9 beta 5 was released recently. I downloaded the source code for wget 1.9 beta 5 and am getting the same link problems I was getting with 1.8.2: /bin/sh ../libtool --mode=link gcc -O2 -Wall -Wno-implicit -o wget cmpt.o connect.o convert.o cookies.o ftp.o ftp-basic.o ftp-ls.o ftp-opie.o getopt.o hash.o headers.o host.o html-parse.o html-url.o http.o init.o log.o main.o gen-md5.o netrc.o progress.o rbuf.o recur.o res.o retr.o safe-ctype.o snprintf.o gen_sslfunc.o url.o utils.o version.o -lssl -lcrypto mkdir .libs gcc -O2 -Wall -Wno-implicit -o wget cmpt.o connect.o convert.o cookies.o ftp.o ftp-basic.o ftp-ls.o ftp-opie.o getopt.o hash.o headers.o host.o html-parse.o html-url.o http.o init.o log.o main.o gen-md5.o netrc.o progress.o rbuf.o recur.o res.o retr.o safe-ctype.o snprintf.o gen_sslfunc.o url.o utils.o version.o -lssl -lcrypto ld: Undefined symbols: _OPENSSL_add_all_algorithms_noconf make[1]: *** [wget] Error 1 make: *** [src] Error 2 I've tried to determine if my OpenSSL installation was built wrong, but as far as I can determine, it's OK. That doesn't mean that there's nothing wrong with OpenSSL on this platform, but so far, this link error has been the only problem I've encountered. The platform is a dual-processor G5 running Mac OS X 10.2.8 with the latest developer tools installed (gcc 3.3 with G5 optimizer settings available, although I haven't used any of the command line switches to turn those optimizations on). Help? Best Regards, Rob Poole [EMAIL PROTECTED]
RE: feature request: --second-guess-the-dns
You can do this now: wget http://216.46.192.85/ Using DNS is just a convenience after all, not a requirement. Mark Post -Original Message- From: Dan Jacobson [mailto:[EMAIL PROTECTED] Sent: Saturday, November 15, 2003 4:00 PM To: [EMAIL PROTECTED] Subject: feature request: --second-guess-the-dns I see there is --bind-address=ADDRESS When making client TCP/IP connections, bind() to ADDRESS on the local machine. ADDRESS may be specified as a hostname or IP address. This option can be useful if your machine is bound to multiple IPs. But I want a --second-guess-the-dns=ADDRESS so I can $ wget http://jidanni.org/ Resolving jidanni.org... done. Connecting to jidanni.org[216.46.203.182]:80... connected. HTTP request sent, awaiting response... 503 Service Unavailable $ wget --second-guess-the-dns=216.46.192.85 http://jidanni.org/ Connecting to jidanni.org[216.46.192.85]:80... connected... Even allow different port numbers there, even though we can add them after the url already: $ wget --second-guess-the-dns=216.46.192.85:66 http://jidanni.org:888/ or whatever. Also pick a better name than --second-guess-the-dns -- which is just a first guess for a name. Perhaps the user should do all this in the name server or something, but lets say he isn't root, and doesn't want to use netcat etc. either.
RE: how to get mirror just a portion of a website ?
Use the -np or --no-parent option. Mark Post -Original Message- From: Josh Brooks [mailto:[EMAIL PROTECTED] Sent: Sunday, November 16, 2003 11:48 PM To: [EMAIL PROTECTED] Subject: how to get mirror just a portion of a website ? Generally, I mirror an entire web site with: wget --tries=inf -nH --no-parent --random-wait -r -l inf --convert-links --html-extension www.example.com But, that is if I am mirroring an _entire_ web site - where the URL looks like; www.example.com BUT, how can I mirror a URL that looks like: http://www.example.com/~user/dir/ and get everything starting with ~user/dir/ and everything underneath it, but nothing above it - for instance, if there was a link back to ~user/otherdir/ I would not want to get that. So basically, I want to mirror ~user/dir/ and below, and follow nothing else - how can I do that ? thank.
RE: problem with LF/CR etc.
That is _really_ ugly, and perhaps immoral. Make it an option, if you must. Certainly don't make it the default behavior. Shudder Mark Post -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2003 4:59 PM To: Peter GILMAN Cc: [EMAIL PROTECTED] Subject: Re: problem with LF/CR etc. Peter GILMAN [EMAIL PROTECTED] writes: i have run into a problem while using wget: when viewing a web page with html like this: a href=images/IMG_01 .jpgimg src=images/tnIMG_01 .jpg/a Eek! Are people really doing that? This is news to me. browsers (i tested with mozilla and IE) can handle the line breaks in the urls (presumably stripping them out), but wget chokes on the linefeeds and carriage returns; it inserts them into the urls, and then (naturally) fails with a 404: [...] So, Wget should squash all newlines? It's not hard to implement, but it feels kind of ... unclean.
RE: SSL over proxy passthrough
I tested the Windows binary against the only SSL-enabled web server outside our firewall that I could think of at the moment, and it worked for me. Mark Post -Original Message- From: Herold Heiko [mailto:[EMAIL PROTECTED] Sent: Friday, November 28, 2003 3:18 AM To: [EMAIL PROTECTED] Cc: List Wget (E-mail) Subject: RE: SSL over proxy passthrough For who wants to test that from windows, MSVC binary at http://xoomer.virgilio.it/hherold/ Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Friday, November 28, 2003 3:26 AM To: [EMAIL PROTECTED] Subject: SSL over proxy passthrough This patch implements a first attempt of using the CONNECT method to establish passthrough of SSL communication over non-SSL proxies. This will require testing. 2003-11-28 Hrvoje Niksic [EMAIL PROTECTED] * http.c (gethttp): Use the CONNECT handle to establish SSL passthrough through non-SSL proxies. Index: src/http.c === RCS file: /pack/anoncvs/wget/src/http.c,v retrieving revision 1.125 diff -u -r1.125 http.c --- src/http.c2003/11/27 23:29:36 1.125 +++ src/http.c2003/11/28 02:22:00 @@ -804,7 +804,7 @@ authenticate_h = NULL; auth_tried_already = 0; - inhibit_keep_alive = !opt.http_keep_alive || proxy != NULL; + inhibit_keep_alive = !opt.http_keep_alive; again: /* We need to come back here when the initial attempt to retrieve @@ -825,21 +825,72 @@ hs-remote_time = NULL; hs-error = NULL; - /* If we're using a proxy, we will be connecting to the proxy - server. */ - conn = proxy ? proxy : u; + conn = u; + proxyauth = NULL; + if (proxy) +{ + char *proxy_user, *proxy_passwd; + /* For normal username and password, URL components override + command-line/wgetrc parameters. With proxy + authentication, it's the reverse, because proxy URLs are + normally the permanent ones, so command-line args + should take precedence. */ + if (opt.proxy_user opt.proxy_passwd) + { + proxy_user = opt.proxy_user; + proxy_passwd = opt.proxy_passwd; + } + else + { + proxy_user = proxy-user; + proxy_passwd = proxy-passwd; + } + /* This does not appear right. Can't the proxy request, + say, `Digest' authentication? */ + if (proxy_user proxy_passwd) + proxyauth = basic_authentication_encode (proxy_user, proxy_passwd, + Proxy-Authorization); + + /* If we're using a proxy, we will be connecting to the proxy + server. */ + conn = proxy; +} + host_lookup_failed = 0; + sock = -1; /* First: establish the connection. */ - if (inhibit_keep_alive - || !persistent_available_p (conn-host, conn-port, + + if (!inhibit_keep_alive) +{ + /* Look for a persistent connection to target host, unless a + proxy is used. The exception is when SSL is in use, in which + case the proxy is nothing but a passthrough to the target + host, registered as a connection to the latter. */ + struct url *relevant = conn; #ifdef HAVE_SSL - u-scheme == SCHEME_HTTPS + if (u-scheme == SCHEME_HTTPS) + relevant = u; +#endif + + if (persistent_available_p (relevant-host, relevant-port, +#ifdef HAVE_SSL + relevant-scheme == SCHEME_HTTPS, #else - 0 + 0, #endif - , host_lookup_failed)) + host_lookup_failed)) + { + sock = pconn.socket; + using_ssl = pconn.ssl; + logprintf (LOG_VERBOSE, _(Reusing existing connection to %s:%d.\n), + pconn.host, pconn.port); + DEBUGP ((Reusing fd %d.\n, sock)); + } +} + + if (sock 0) { /* In its current implementation, persistent_available_p will look up conn-host in some cases. If that lookup failed, we @@ -855,28 +906,75 @@ ? CONERROR : CONIMPOSSIBLE); #ifdef HAVE_SSL - if (conn-scheme == SCHEME_HTTPS) - { - if (!ssl_connect (sock)) -{ - logputs (LOG_VERBOSE, \n); - logprintf (LOG_NOTQUIET, - _(Unable to establish SSL connection.\n)); - fd_close (sock); - return CONSSLERR; -} - using_ssl = 1; - } + if (proxy u-scheme == SCHEME_HTTPS) + { + /* When requesting SSL URLs through proxies, use the + CONNECT method to request passthrough. */ + char *connect = + (char *) alloca (64 + +
RE: wget can't get the following site
Because the URL has special characters in it, surround it in double quotes: wget http://quicktake.morningstar.com/Stock/Income10.asp?Country=USASymbol=JNJ; stocktab=finance Mark Post -Original Message- From: David C. [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 2:01 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: wget can't get the following site Hi, all Please CC me when you reply. I'm not subscribed to this list. I'm new to wget. When I tried getting the following using wget, wget http://quicktake.morningstar.com/Stock/Income10.asp?Country=USASymbol=JNJs tocktab=finance I got the errors below: --22:58:29-- http://quicktake.morningstar.com:80/Stock/Income10.asp?Country=USA = [EMAIL PROTECTED]' Connecting to quicktake.morningstar.com:80... connected! HTTP request sent, awaiting response... 302 Object moved Location: http://quote.morningstar.com/switch.html?ticker= [following] --22:58:30-- http://quote.morningstar.com:80/switch.html?ticker= = [EMAIL PROTECTED]' Connecting to quote.morningstar.com:80... connected! HTTP request sent, awaiting response... 302 Object moved Location: TickerNotFound.html [following] TickerNotFound.html: Unknown/unsupported protocol. 'Symbol' is not recognized as an internal or external command, operable program or batch file. 'stocktab' is not recognized as an internal or external command, operable program or batch file. Is this a bug in wget? Or is there something I can do so that wget can get the site? Please help! Thanks in advance. - Do you Yahoo!? Yahoo! Hotjobs: Enter the Signing Bonus Sweepstakes
RE: wget -- ftp with proxy
Yes, it should be. Mark Post -Original Message- From: Cui, Byron [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 13, 2004 11:57 AM To: [EMAIL PROTECTED] Subject: wget -- ftp with proxy Hi, If use ftp through proxy, would the passive-ftp option still be valid? Thanks. Byron Cui e-Commerce Infrastructure Support and Information Security IBG Production Support Phone: 416-867-6822 Fax: 416-867-7157 FONT SIZE = 1** ** This e-mail and any attachments may contain confidential and privileged information. If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this e-mail and destroy any copies. Any dissemination or use of this information by a person other than the intended recipient is unauthorized and may be illegal. Unless otherwise stated, opinions expressed in this e-mail are those of the author and are not endorsed by the author's employer./FONT
RE: Syntax question ...
Well, that's what you're telling it to do with the -S option, so why are you surprised? man wget, then /-S Mark Post -Original Message- From: Simons, Rick [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 21, 2004 11:09 AM To: '[EMAIL PROTECTED]' Subject: RE: Syntax question ... I got wget compiled with ssl support now, and have a followup question ... I'm getting the local file created but populated with a server response, not the actual contents of the remote file. See example: wget -d -S https://server/testfile --http-user=user --http-passwd=pass DEBUG output created by Wget 1.9.1 on linux-gnu. --10:55:06-- https://server/testfile = `testfile' Resolving server... ip Caching server = ip Connecting to server[ip]:443... connected. Created socket 3. Releasing 0x81229f0 (new refcount 1). ---request begin--- GET /testfile HTTP/1.0 User-Agent: Wget/1.9.1 Host: server Accept: */* Connection: Keep-Alive Authorization: Basic cmlja3M6cmlja3MyNjI2 ---request end--- HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Wed, 21 Jan 2004 16:04:01 GMT 2 Date: Wed, 21 Jan 2004 16:04:01 GMTServer: Apache/1.3.26 (Unix) mod_ssl/2.8.10 OpenSSL/ 0.9.6g SecureTransport/4.1.2 3 Server: Apache/1.3.26 (Unix) mod_ssl/2.8.10 OpenSSL/0.9.6g SecureTransport/4.1.2Set-Coo kie: FDX=ocjoMt028Um+ri2vZQ0L6g==; path=/ 4 Set-Cookie: FDX=ocjoMt028Um+ri2vZQ0L6g==; path=/ Stored cookie filed2 443 / nonpermanent 0 undefined FDX ocjoMt028Um+ri2vZQ0L6g== Accept-Ranges: bytes 5 Accept-Ranges: bytesExpires: Thu, 01 Jan 1970 00:00:00 GMT 6 Expires: Thu, 01 Jan 1970 00:00:00 GMTFeatures: CHPWD;RTCK;STCK;ASC 7 Features: CHPWD;RTCK;STCK;ASCConnection: close 8 Connection: closeContent-Type: text/plain; charset=UTF-8 9 Content-Type: text/plain; charset=UTF-8 [ = ] 30--.--K/s Closing fd 3 10:55:07 (292.97 KB/s) - `testfile' saved [30] cat testfile Virtual user username logged in. ssl access log: ip - user [21/Jan/2004:10:04:02 -0600] GET /testfile HTTP/1.0 200 30 ssl error log: [Wed Jan 21 10:04:01 2004] [info] VIRTUAL HTTP LOGIN FROM ip [ip], user (class virt) Further thoughts or suggestions? -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 21, 2004 9:41 AM To: Simons, Rick Cc: '[EMAIL PROTECTED]' Subject: Re: Syntax question ... Simons, Rick [EMAIL PROTECTED] writes: Greetings all. I've posted in the past, but never really have gotten connectivity to a https server I support using the wget application. I've looked in the manual, on the website and searched the Internet but am not getting very far. wget -V GNU Wget 1.9 wget -d -S https://server/file https://server/file: Unsupported scheme. This error message indicates that your version of Wget is compiled without SSL support. I then decided (based on previous instruction from this group) to recompile wget with ssl. This is on a RH9 box, with openssl libs in /usr/include/openssl ./configure --with-ssl=/usr/include/openssl/ compiles Looking for SSL libraries in /usr/include/openssl/ checking for includes... not found ERROR: Failed to find OpenSSL libraries. Try just `./configure', it should find the SSL libraries in the default location. At least it does for me -- I use RH9.
RE: problem with # in path
It's more likely your system/shell that is doing it, if you're using Linux or UNIX. wget -r -l 0 ftp://19.24.24.24/some/datase/C\#Tool/ Mark Post -Original Message- From: Peter Mikeska [mailto:[EMAIL PROTECTED] Sent: Thursday, January 22, 2004 6:28 PM To: [EMAIL PROTECTED] Subject: problem with # in path Hi, im trying get all from wget -r -l 0 ftp://19.24.24.24/some/datase/C#Tool/ vut i cant get anything, because wget cut all from #, it thinks its comment. plz any help Thnx in advance Miki +---V---+ | Peter Mikeska |[EMAIL PROTECTED] | | A L C A T E L | | System Engineer | phone: +421 44 5206316 | +---+ | IT Services MadaCom | fax: +421 44 5206356 |
RE: GNU Wget 1.9.1
Title: Message It's a known bug. I'm waiting for a fix for it myself. Mark Post -Original Message-From: Lawrance, Mark [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 12, 2004 9:09 AMTo: [EMAIL PROTECTED]Subject: GNU Wget 1.9.1 GNU Wget 1.9.1 The non-interactive download utility Updated for Wget 1.9.1, May 2003 I am unable to get wget to work via a proxy for HTTPS sites. It does work via proxy for HTTP It does work with HTTPS NOT through proxy Any ideas? Should this work? Mark Lawrance Senior Wintel ArchitectArchitecture and EngineeringLondon Stock ExchangeDDE: +44 (0)20 7797 1277 Mobile: +44 (0)7971 032235[EMAIL PROTECTED] ---The London Stock Exchange plc will shortly be moving to: 10 Paternoster Square London EC4M 7LSThis will be our new registered office address from 17th May 2004. The move will be completed by mid June 2004.Our telephone numbers, email and website addresses will not be changing. Please speak to your company contact if you have any questions about the move.---Please read these warnings and restrictions:This e-mail transmission is strictly confidential and intended solely for the ordinary user of the e-mail address to which it was addressed. It may contain legally privileged and/or CONFIDENTIAL information. The unauthorised use, disclosure, distribution and/or copying of this e-mail or any information it contains is prohibited and could, in certain circumstances, constitute a criminal offence.If you have received this e-mail in error or are not an intended recipient please inform the London Stock Exchange immediately by return e-mail or telephone 020 7797 1000.We advise that in keeping with good computing practice the recipient of this e-mail should ensure that it is virus free. We do not accept responsibility for any virus that may be transferred by way of this e-mail. E-mail may be susceptible to data corruption, interception and unauthorised amendment, and we do not accept liability for any such corruption, interception or amendment or any consequences thereof. Calls to the London Stock Exchange may be recorded to enable the Exchange to carry out its regulatory responsibilities.London Stock Exchange plcOld Broad StreetLondon EC2N 1HPRegistered in England and Wales No 20757
RE: Preserving file ownership
I don't believe so. You might want to take a look at rsync instead. It does a very nice job of doing just what you need. Mark Post -Original Message- From: Kathryn Moretz [mailto:[EMAIL PROTECTED] Sent: Thursday, April 29, 2004 4:40 PM To: [EMAIL PROTECTED] Cc: Kathryn Moretz Subject: Preserving file ownership I am using wget to mirror multiple directories between 2 servers via FTP. This mirroring process will be running as root in the background continuously. The directories / files are owned by different users and groups. Is there a way to preserve this ownership when files are transferred from the remote host to the local host? As it currently stands, the mirrors are owned by root:system instead of by the original owner. Thank you in advance. Please cc me in any replies to this post, as I do not currently subscribe to this list.
RE: wget hangs or downloads end up incomplete in Windows 2000 X P.
Are you behind a firewall or proxy of some kind? If so, you might want to try using passive FTP mode. Mark Post -Original Message- From: Phillip Pi [mailto:[EMAIL PROTECTED] Sent: Thursday, May 20, 2004 3:08 PM To: [EMAIL PROTECTED] Subject: RE: wget hangs or downloads end up incomplete in Windows 2000 X P. FYI. I noticed if I ctrl-c to get out of the hanging part and try to resume, my FTP seems to be broken and hangs. I tried manually with ftp.exe command in command line and it froze with dir command: Microsoft Windows XP [Version 5.1.2600] (C) Copyright 1985-2001 Microsoft Corp. C:\Documents and Settings\phillip_piftp 192.168.14.18 Connected to 192.168.14.18. 220 USSM-CPD Microsoft FTP Service (Version 5.0). User (192.168.14.18:(none)): domain\username 331 Password required for domain\username. Password: 230 User domain\username logged in. ftp dir 200 PORT command successful. 150 Opening ASCII mode data connection for /bin/ls. [stuck forever until I ctrl-c to break out of it] I either have to reboot the computer OR wait maybe ten minutes to try again with the FTP connection. -- This is the ant. Treat it with respect. For it may very well be the next dominant lifeform of our planet. --Empire of the Ants movie /\___/\ / /\ /\ \ Phillip Pi (Ant) @ The Ant Farm: http://antfarm.ma.cx | |o o| | E-mail: [EMAIL PROTECTED] or [EMAIL PROTECTED] \ _ /Be sure you removed ANT from e-mail address if you get ( ) a returned e-mail. On Wed, 12 May 2004, Phillip Pi wrote: On Wed, 12 May 2004, Herold Heiko wrote: OK, I did more tests. I noticed -v is already enabled by default since the you probably have verbose=on in your wgetrc file. Good idea. Should I delete wgetrc? I doubt that will fix my problem since I tried on two different Windows machines. FYI. I don't have wgetrc file anywhere. I only have sample.wgetrc file on the machines I used so it sounds like verbose is enabled by default regardless of wgetrc file. 5250K .. .. .. .. .. The timestamp was from almost an hour ago (I was in a meeting) during the download test. Notice it never timed out to retry or abort! Please What happens if you restart wget again with mirror-like options on the same directory tree ? Does it hang again on the same file ? If yes, what if you Which wget parameter options are they? I never noticed it hangs on the same file for each hang in different tests. Please remember sometimes I have missing files when downloads are complete. It is either hang, finish but incomplete, or perfect. Those are the three results I have seen from many tests. try to download that file only ? If not, could you for any chance run a sniffer on that machine (ethereal is free) ? I do not know how to use this network tool. If you can give me instructions I can try! It would be useful to know if really everything is freezed, or if, for example, for some reason the data is just trickling down at 1byte/minute or something similar (stuck in retrasmission?). I have no idea. Does wget have a bytes/bits per second statistics?
RE: Escaping semicolons (actually Ampersands)
Then you haven't looked at enough web sites. Whenever tidydbg (from w3.org) tells me to do that in one of my URLs, I do that. I've got one page of links that has tons of them. They work. Can we stop arguing about this off-topic bit now? Mark Post -Original Message- From: Tony Lewis [mailto:[EMAIL PROTECTED] Sent: Monday, June 28, 2004 10:17 PM To: Phil Endecott; [EMAIL PROTECTED] Subject: Re: Escaping semicolons (actually Ampersands) Phil Endecott wrote: Tony The stuff between the quotes following HREF is not HTML; it is a Tony URL. Hence, it must follow URL rules not HTML rules. No, it's both a URL and HTML. It must follow both rules. Please see the page that I cited in my previous message: http://www.htmlhelp.com/tools/validator/problems.html#amp I've looked at hundreds of web pages and I've never seen anyone put amp; into HREF in place of an ampersand. Tony
RE: Metric units
Yeah, you're both right. While we're at it, why don't we just round off the value of pi to be 3.0. Those pesky trailing decimals are just an accident of history anyway. -Original Message- From: Carlos Villegas [mailto:[EMAIL PROTECTED] Sent: Thursday, December 23, 2004 8:22 PM To: Tony Lewis Cc: wget@sunsite.dk Subject: Re: Metric units On Thu, Dec 23, 2004 at 12:57:18PM -0800, Tony Lewis wrote: John J Foerch wrote: It seems that the system of using the metric prefixes for numbers 2^n is a simple accident of history. Any thoughts on this? I would say that the practice of using powers of 10 for K and M is a response to people who cannot think in binary. I would say that the original poster understands what he is saying, and you clearly don't... http://physics.nist.gov/cuu/Units/binary.html kilo, mega, giga, tera and many others are standard in SI and widely used in physics, chemistry, engineering by their real meaning (powers of 10). The hole powers of 2 thing is just because 1024 is close to 1000 and computers work in binary, so is logical to think in powers of 2 (so yes, a mere accident of 20th century history). Carlos
RE: Metric units
No, but that particular bit of idiocy was the inspiration for my comment. I just took it one decimal point further. -Original Message- From: Tony Lewis [mailto:[EMAIL PROTECTED] Sent: Friday, December 24, 2004 2:22 AM To: wget@sunsite.dk Subject: RE: Metric units Mark Post wrote: While we're at it, why don't we just round off the value of pi to be 3.0 Do you live in Indiana? Actually, Dr. Edwin Goodwin wanted to round off pi to any of several values including 3.2. http://www.agecon.purdue.edu/crd/Localgov/Second%20Level%20pages/Indiana_Pi_ Story.htm Tony
RE: selective recursive downloading
wget -m -np http://url.to.download/something/group-a/want-to-download/ \ http://url.to.download/something/group-b/want-to-download/ \ http://url.to.download/something/group-c/want-to-download/ Mark Post -Original Message- From: Gabor Istvan [mailto:[EMAIL PROTECTED] Sent: Friday, January 21, 2005 9:16 AM To: wget@sunsite.dk Subject: selective recursive downloading Dear All: I would like to know how could I use wget to selectively download certain subdirectories of a main directory. Here is what I want to do: Let's assume that we have a directory structure like this: http://url.to.download/something/group-a/want-to-download/ http://url.to.download/something/group-a/not-to-download/ http://url.to.download/something/group-b/want-to-download/ http://url.to.download/something/group-b/not-to-download/ http://url.to.download/something/group-c/want-to-download/ http://url.to.download/something/group-c/not-to-download/ I would like to download all of the files from the want-to-download subdirectories of groups a, b and c. I dont want to download anything from the not-to-download subdirectories of groups a, b and c. There are a lot of groups so it would be very painful to use -I or -X as these options - as I know - require a full path definition which would be different for each wanted and not-wanted directory because of the different groups. My question is how could I automate the downloads, what should I write in the command line? Thanks for your answer. Please send a copy to my email address ([EMAIL PROTECTED]) since I am not subscribe to the wget list. IG
RE: 403 Forbidden Errors with mac.com
Title: RE: 403 Forbidden Errors with mac.com Don't know what is happening on your end. I just executed wget http://idisk.mac.com/tombb/Public/tex-edit-plus-X.sit and it downloaded 2,484,062 bytes of something. What does using the -d option show you? Mark Post -Original Message- From: Emily Jackson [mailto:[EMAIL PROTECTED]] Sent: Tuesday, February 08, 2005 6:26 AM To: wget@sunsite.dk Subject: 403 Forbidden Errors with mac.com This produces a 403 Forbidden error: wget http://idisk.mac.com/tombb/Public/tex-edit-plus-X.sit as does this: wget --user-agent=Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/125.5.6 (KHTML, like Gecko) Safari/125.12 http://idisk.mac.com/tombb/Public/tex-edit-plus-X.sit (all on one line, of course; the user agent specified is for Apple's Safari browser) curl -O works fine, however. What else could I try to be able to use wget to download this file? [wget 1.9.1, Mac OS X 10.3.7] (Please cc any replies directly to me.) Thanks, Emily -- If it seem slow, wait for it; it will surely come, it will not delay. Emily Jackson http://home.hiwaay.net/~emilyj/missjackson.html
RE: bug-wget still useful
I don't know why you say that. I see bug reports and discussion of fixes flowing through here on a fairly regular basis. Mark Post -Original Message- From: Dan Jacobson [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 15, 2005 3:04 PM To: [EMAIL PROTECTED] Subject: bug-wget still useful Is it still useful to mail to [EMAIL PROTECTED] I don't think anybody's home. Shall the address be closed?
RE: links conversion; non-existent index.html
Probably because you're the only one that thinks it is a problem, instead of the way it needs to function? Nah, that couldn't be it. Mark Post -Original Message- From: Andrzej Kasperowicz [mailto:[EMAIL PROTECTED] Sent: Sunday, May 01, 2005 2:54 PM To: Jens Rösner; wget@sunsite.dk Subject: Re: links conversion; non-existent index.html -snip- You expect?? Yes, of course. Why are you so surprised? a.
RE: Switching to subversion for version control
You might want to give Ibiblio a try (www.ibiblio.org). They host my Slack/390 web/FTP site at no cost. They host a _bunch_ of sites at no cost. Mark Post -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Thursday, May 12, 2005 5:24 AM To: wget@sunsite.dk Subject: Switching to subversion for version control -snip- I'm also interested in information about free svn hosting. sunsite/dotsrc and savannah.gnu.org currently don't seem to be offering subversion hosting. There is www.berlios.de, but I have no experience with them.
RE: Switching to subversion for version control
I really don't know, but they seem very accommodating to people, especially Open Source projects such as wget. It's certainly worth an email to find out. Send your request to help at ibiblio.org. Mark Post -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Thursday, May 12, 2005 3:46 PM To: Post, Mark K Cc: wget@sunsite.dk Subject: Re: Switching to subversion for version control Post, Mark K [EMAIL PROTECTED] writes: You might want to give Ibiblio a try (www.ibiblio.org). They host my Slack/390 web/FTP site at no cost. They host a _bunch_ of sites at no cost. But do they host subversion? I can't find any mention of it with google.
RE: No more Libtool (long)
I read the entire message, but I probably didn't have to. My experience with libtool in packages that really are building libraries has been pretty painful. Since wget doesn't build any, getting rid of it is one less thing to kill my builds in the future. Congratulations. Mark Post -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Friday, June 24, 2005 8:11 PM To: wget@sunsite.dk Subject: No more Libtool (long) Thanks to the effort of Mauro Tortonesi and the prior work of Bruno Haible, Wget has been modified to no longer use Libtool for linking in external libraries. If you are interested in why that might be a cause for celebration, read on.
RE: No more Libtool (long)
This is the kind of obnoxious commentary I've learned to expect from glibc's maintainers. It's no more becoming from you (or anyone else). Buzz off. Mark Post -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Maciej W. Rozycki Sent: Monday, June 27, 2005 8:01 AM To: Hrvoje Niksic Cc: wget@sunsite.dk Subject: Re: No more Libtool (long) -snip- Everyone else please either file bug reports (or better yet fix bugs you trip over) or keep silent.
RE: No more Libtool (long)
You already blew that opportunity when you told us to shut up. Blame yourself. Mark Post -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Maciej W. Rozycki Sent: Monday, June 27, 2005 11:15 AM To: Post, Mark K Cc: wget@sunsite.dk Subject: RE: No more Libtool (long) -snip- Let's focus on technical issues rather than making it personal, OK? Maciej
RE: robots.txt takes precedence over -p
I hope that doesn't happen. While respecting robots.txt is not an absolute requirement, it is considered polite. I would not want the default behavior of wget to be considered impolite. Mark Post -Original Message- From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] Sent: Monday, August 08, 2005 7:43 PM To: Tony Lewis Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: robots.txt takes precedence over -p On Sunday 10 July 2005 09:52 am, Tony Lewis wrote: Thomas Boerner wrote: Is this behaviour: robots.txt takes precedence over -p a bug or a feature? It is a feature. If you want to ignore robots.txt, use this command line: wget -p -k www.heise.de/index.html -e robots=off hrvoje was thinking of changing the default behavior of wget to ignore the robots standard in the next releases. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute for Human Machine Cognition http://www.ihmc.us GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
RE: robots.txt takes precedence over -p
I would say the analogy is closer to a very rabid person operating a web browser. I've never been greatly inconvenienced by having to re-run a download while ignoring the robots.txt file. As I said, respecting robots.txt is not a requirement, but it is polite. I prefer my tools to be polite unless I tell them otherwise. Mark Post -Original Message- From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] Sent: Monday, August 08, 2005 8:35 PM To: Post, Mark K Cc: [EMAIL PROTECTED] Subject: Re: robots.txt takes precedence over -p On Monday 08 August 2005 07:30 pm, Post, Mark K wrote: I hope that doesn't happen. While respecting robots.txt is not an absolute requirement, it is considered polite. I would not want the default behavior of wget to be considered impolite. IMVHO, hrvoje has a good point when he says that wget behaves like a web browser and, as such, should not required to respect the robots standard.
RE: wget displays permission error
In the past, I have been confused as to whether the file which was generating the error was on the server, or on my local system. If there is a way to distinguish between the two, and be more explicit, that would be a little more helpful. I don't see any way wget could/should do anything except report the error. Mark Post -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Thursday, September 01, 2005 6:16 AM Cc: Kentaro Ozawa; [EMAIL PROTECTED] Subject: Re: wget displays permission error Jochen Roderburg [EMAIL PROTECTED] writes: Hmm, this did not actually try to write over 'index.html', did it ;-) Do the same with 'timestamping on' and you get (not surprisingly and with 'all' wget versions I have around) : index.html: Permission denied Cannot write to `index.html' (Permission denied). But what is Wget to do in such a case except report an error?
RE: retr.c:292: calc_rate: Assertion `bytes = 0' failed.
Odd. It didn't take me long to find this: http://ftp.us.debian.org/debian/pool/main/w/wget/wget_1.10.2-1_i386.deb Mark Post -Original Message- From: Simeon Miteff [mailto:[EMAIL PROTECTED] Sent: Thursday, November 24, 2005 2:10 AM To: [EMAIL PROTECTED] Subject: retr.c:292: calc_rate: Assertion `bytes = 0' failed. Hi I don't know if this is a known bug (I could not get any useful results out of the bugzilla), but if it isn't, the server shown in this example is public, so the problem should be re-producable. I realise that 1.10.2 is the latest version, but Debian doesn't seem to think so :-)
RE: retr.c:292: calc_rate: Assertion `bytes = 0' failed.
Not really. Debian will let you install whatever you want, provided the dependencies are satisfied. If you set up your apt parms properly, you can download and install packages from stable, testing, unstable, etc. If you don't want to do that for everything, you can set them back to just pick up new package versions from the stable channel. Mark Post -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Thursday, November 24, 2005 4:43 PM To: Post, Mark K Cc: Simeon Miteff; [EMAIL PROTECTED] Subject: Re: retr.c:292: calc_rate: Assertion `bytes = 0' failed. Post, Mark K [EMAIL PROTECTED] writes: Odd. It didn't take me long to find this: http://ftp.us.debian.org/debian/pool/main/w/wget/wget_1.10.2-1_i386.de b It's questionnable whether that's installable on stable Debian.
RE: Limit time to run
I think that a combination of --limit-rate and --wait parameters makes this type of enhancement unnecessary, given that his stated purpose was to not hammer a particular site. Mark Post -Original Message- From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 30, 2005 12:02 PM To: Frank McCown Cc: wget@sunsite.dk Subject: Re: Limit time to run Frank McCown wrote: It would be great if wget had a way of limiting the amount of time it took to run so it won't accidentally hammer on someone's web server for an indefinate amount of time. I'm often needing to let a crawler run for a while on an unknown site, and I have to manually kill wget after a few hours if it hasn't finished yet. It would be nice if I could do: wget --limit-time=120 ... to make it stop itself after 120 minutes. Please cc me on any replies. i don't think we need to add this feature to wget, as it can be achieved with a shell script that launches wget in background, sleeps for the given amount of time and then kills the wget process. however, if there is a general consensus about adding this feature to wget, i might consider changing my mind.
RE: wget - tracking urls/web crawling
Try using the -np (no parent) parameter. Mark Post -Original Message- From: bruce [mailto:[EMAIL PROTECTED] Sent: Thursday, June 22, 2006 4:15 PM To: 'Frank McCown'; wget@sunsite.dk Subject: RE: wget - tracking urls/web crawling hi frank... there must be something simple i'm missing... i'm looking to crawl the site http://timetable.doit.wisc.edu/cgi-bin/TTW3.search.cgi?20071 i issue the wget: wget -r -np http://timetable.doit.wisc.edu/cgi-bin/TTW3.search.cgi?20071 i thought that this would simply get everything under the http://...?20071. however, it appears that wget is getting 20062, etc.. which are the other semesters... what i'd really like to do is to simply get 'all depts' for each of the semesters... any thoughts/comments/etc... -bruce
Excluding directories
I'm trying to download parts of the SUSE Linux 10.1 tree. I'm going after things below http://suse.mirrors.tds.net/pub/suse/update/10.1/, but I want to exclude several directories in http://suse.mirrors.tds.net/pub/suse/update/10.1/rpm/ In that directory are the following subdirectories: i586/ i686/ noarch/ ppc/ ppc64/ src/ x86_64/ I only want the i586, i686, and noarch directories. I tried using the -X parameter, but it only seems to work if I specify -X /pub/suse/update/10.1/rpm/ppc,/pub/suse/update/10.1/rpm/ppc64,/pub/suse/ update/10.1/rpm/src,/pub/suse/update/10.1/rpm/x86_64 Is this the only way it's supposed to work? I was hoping to get away with something along the lines of -X rpm/ppc,rpm/src or -X ppc,src and so on. Thanks, Mark Post
RE: wget 403 forbidden error when no index.html.
The short answer is that you don't get to do it. If your browser can't do it, wget isn't going to be able to do it. Mark Post -Original Message- From: news [mailto:[EMAIL PROTECTED] On Behalf Of Aditya Joshi Sent: Friday, July 07, 2006 12:15 PM To: wget@sunsite.dk Subject: wget 403 forbidden error when no index.html. I am trying to download a specific directory contents of a site and i kep getting the 403 forbidden when i run wget. The direcotry does not have an index.html and ofcourse any refrences to that path result a 403 page displayed in my browser. Is this why wget is not working. If so how to download contents of such sites.
RE: Wget
You would want to use the -O option, and write a script to create a unique file name to be passed to wget. Mark Post From: John McGill [mailto:[EMAIL PROTECTED] Sent: Thursday, July 13, 2006 4:56 AMTo: wget@sunsite.dkSubject: Wget Hi, I hope you can help with a small problem I am having with the above win32 application. I wish to download a jpeg image from a camera, at the same time every day for the duration of a project. I need to be able to give the downloaded file either a unique file identifier or a time stamp so that I can compile a sequence at the end of the project. Is there a way of telling wget to download the image and increment the file number or add the date/time stamp? I am sure you are a very busy person and I hope you will have the time to answer my rather basic question. Regards John McGill