Package: wget Version: 1.11.4-2 Severity: minor Hi,
I found that I can't make wget --follow-tags to work when using the --no-clobber option. First of all, all that I want is to download all images from an *downloaded* html file, and then "convert-links" locally. Believing that --no-clobber is all what I need: ,----- | ... when -nc is specified, files with the suffixes .html or .htm | will be loaded from the local disk and parsed as if they had been | retrieved from the Web. `----- so I tried, curl http://www.cnn.com/ -o test.htm wget --no-directories --no-clobber --convert-links --follow-tags=img http://localhost/test.htm Thinking that wget will load the test.htm as if it had been retrieved from the web, parse it, then follow and download all the img tags. However it didn't work (see below). Why is that? (Note that if I substitute the --follow-tags option with --page-requisites, at least I'm getting something). PS. - If not using --no-clobber, but --force-html --input-file=test.htm, the --follow-tags=img, works fine, but --convert-links did not do anything, but I want the img links to be converted locally. - is there better way for wget to load local files instead of using the fake http://localhost/...? - is it possible not to use --span-hosts for this case? Thanks Tong $ wget -d --no-directories --no-clobber --convert-links --follow-tags=img --span-hosts http://localhost/test.htm Setting --directories (dirstruct) to 0 Setting --no-clobber (noclobber) to 1 Setting --convert-links (convertlinks) to 1 Setting --follow-tags (followtags) to img Setting --span-hosts (spanhosts) to 1 DEBUG output created by Wget 1.11.4 on linux-gnu. File `test.htm' already there; not retrieving. Scanning test.htm (from http://localhost/test.htm) Loaded test.htm (size 91418). test.htm: merge("http://localhost/test.htm", "http://localhost/header_cnn_com_logo.gif") -> http://localhost/header_cnn_com_logo.gif appending "http://localhost/header_cnn_com_logo.gif" to urlpos. test.htm: merge("http://localhost/test.htm", "http://localhost/header_google_logo.gif") -> http://localhost/header_google_logo.gif appending "http://localhost/header_google_logo.gif" to urlpos. test.htm: merge("http://localhost/test.htm", "http://i.cdn.turner.com/cnn/images/1.gif") -> http://i.cdn.turner.com/cnn/images/1.gif appending "http://i.cdn.turner.com/cnn/images/1.gif" to urlpos. [ . . .] appending "http://metrics.cnn.com/b/ss/cnn2global/1/H.1--NS/0?pageName=No%20Javascript" to urlpos. no-follow in test.htm: 0 will convert url http://localhost/header_cnn_com_logo.gif to complete will convert url http://localhost/header_google_logo.gif to complete will convert url http://i.cdn.turner.com/cnn/images/1.gif to complete [ . . .] will convert url http://i.cdn.turner.com/cnn/images/1.gif to complete will convert url http://i.cdn.turner.com/cnn/images/1.gif to complete will convert url http://metrics.cnn.com/b/ss/cnn2global/1/H.1--NS/0?pageName=No%20Javascript to complete Converting test.htm... nothing to do. Converted 1 files in 0.02 seconds. -- System Information: Debian Release: lenny/sid APT prefers testing APT policy: (300, 'testing'), (50, 'unstable') Architecture: i386 (i686) Kernel: Linux 2.6.26-grml (SMP w/1 CPU core; PREEMPT) Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) Shell: /bin/sh linked to /bin/bash Versions of packages wget depends on: ii libc6 2.7-13 GNU C Library: Shared libraries ii libssl0.9.8 0.9.8g-13 SSL shared libraries wget recommends no packages. wget suggests no packages. -- no debconf information -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org