Hello, thanks for your report. I am not sure that the URL normalisation should collapse multiple consecutive forward slashes, I don't see anything about it in RFC 1808. We can't assume that "foo//bar" is the same as "foo/bar", it could be handled differently by the server, for example it may be part of PATH_INFO.
AFAICS, Firefox and Chromium don't normalize consecutive forward slashes too. Cheers, Giuseppe Cillian Sharkey <cillian.shar...@heanet.ie> writes: > Hi, > > I've found wget does not always correctly normalise URLs by collapsing > multiple consecutive forward slashes into a single slash. > > This is a problem when recursively mirroring a site, as certain kinds of > links with multiple consecutive slashes will cause wget to go into an > infinite loop, limited only by the maximum depth level. > > Without complete normalisation, a link with extra slashes is seen as a > new URL that has not been visited, even if it has already. With each > traversal an extra slash is cumulatively appended to the URL, causing > the loop. > > Example: > > /index.html has href to "foo/loop.html" > /foo/loop.html has href to "..//index.html" > > Results in the following link traversal: > > /index.html > /a/loop.html > //index.html > //a/loop.html > ///index.html > ///a/loop.html > [..] > > I've tried a combination of URLs with and without consecutive slashes, > to test wget's behaviour. Results as follows: > > /index.html links to: > > HREF: wget requests: should be: > > /a//../b/10.html /a/b/10.html /b/10.html > /a/../b/11.html /b/11.html > > /a/b/..//../c/20.html /a/c/20.html /c/20.html > /a/b/../../c/21.html /c/21.html > > ..//30.html //30.html /30.html > ../31.html /31.html > > .//40.html //40.html /40.html > ./41.html /41.html > > //50.html Skipped, not downloaded! > /51.html /51.html > > > wget --version > > GNU Wget 1.12 built on linux-gnu. > > +digest +ipv6 +nls +ntlm +opie +md5/openssl +https -gnutls +openssl > -iri > > Wgetrc: > /etc/wgetrc (system) > Locale: /usr/share/locale > Compile: gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc" > -DLOCALEDIR="/usr/share/locale" -I. -I../lib -g -O2 > -D_FILE_OFFSET_BITS=64 -O2 -g -Wall > Link: gcc -g -O2 -D_FILE_OFFSET_BITS=64 -O2 -g -Wall /usr/lib/libssl.so > /usr/lib/libcrypto.so -ldl -lrt ftp-opie.o openssl.o http-ntlm.o > gen-md5.o ../lib/libgnu.a > > Regards,