On Fri, 2015-07-10 at 12:05 +0100, Barak A. Pearlmutter wrote: > $ cat debian/watch > version=3 > https://www.fossil-scm.org/download.html \ > .*/fossil-src-(\d*\.\d*)\.(?:zip|tgz|tbz|txz|(?:tar\.(?:gz|bz2|xz))) [...] > Note the URL it tries to fetch is > https://www.fossil-scm.org/download.html/download/fossil-src-1.33.tar.gz > which does not have the "download.html" stripped out. The URL looks > fine in a browser, and (as also shown in the above transcript) the > page source reads href="download/fossil-src-..." > > Mystified!
Inspection of the download page reveals: <base href="https://www.fossil-scm.org/download.html" /> uscan(1) says: If any of the hrefs in the homepage which match the (anchored) pattern are relative URLs, they will be taken as being relative to the base URL of the homepage (i.e., with everything after the trailing slash removed), or relative to the base URL specified in the homepage itself with a <base href="..."> tag. I think the behaviour is arguably slightly broken here in that https://www.w3.org/wiki/HTML/Elements/base implies that the last component shouldn't be included if it's a document rather than a directory but I couldn't spot that being explicitly specified in that URL at least. In any case, using the documented mangle facilities seems to work okay, giving: version=3 opts="downloadurlmangle=s#/download.html##" \ https://www.fossil-scm.org/download.html \ .*/fossil-src-(\d*\.\d*)\.(?:zip|tgz|tbz|txz|(?:tar\.(?:gz|bz2|xz))) Regards, Adam -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org