Hi,

I have updated the manpage patch series.

This includes build system update for po4a.

This one includes a fix to the pending manpage issue to documemt the new
filename scheme for http://site/?.  Closes: #573631

I also included tested version of pagemangle rule with test code ;-)
 * generic way to mangle the whole web page.
 * s3.amazonaws.com special case code is marked deprecated.
 * address needs for fullsourcemangle.  Closes: #395439
 * text in <a>...</a> is a special case.  Closes: #705989
 * s/data-realurl/href/g is a special case.  Closes: #773390

Quite frankly, I almost feel like asking you to remove the
s3.amazonaws.com specific code.  This kind of thing should not be hard
coded into generic code.  I only made them to warn user for now.
(I do not think we have use case yet for this pattern anyway.)

The last patch is for better DEBUG mode.

Please also do not forget to apply
 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=796986
 uscan: repacksuffix does not adjust version passed to uupdate
You can not add suffix earlier ssince mk-origtargz behavior is not
deterministic.

Once all these are applied, I will touch packaging of multitarballs

Osamu
From b2d8128f499d2be342618243104ac71c7c24d036 Mon Sep 17 00:00:00 2001
From: Osamu Aoki <os...@debian.org>
Date: Tue, 1 Sep 2015 13:19:31 +0000
Subject: [PATCH 1/8] Add POD to uscan.pl

 * Newly formatted manpage by POD.  Closes: #797787
---
 scripts/uscan.pl | 965 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 965 insertions(+)
 mode change 100755 => 100644 scripts/uscan.pl

diff --git a/scripts/uscan.pl b/scripts/uscan.pl
old mode 100755
new mode 100644
index 33f3ad4..20ef97f
--- a/scripts/uscan.pl
+++ b/scripts/uscan.pl
@@ -22,6 +22,971 @@
 # You should have received a copy of the GNU General Public License
 # along with this program. If not, see <https://www.gnu.org/licenses/>.
 
+=pod
+
+=head1 NAME
+
+uscan - scan/watch upstream sources for new releases of software
+
+=head1 SYNOPSIS
+
+B<uscan> [I<options>] [I<path>]
+
+=head1 DESCRIPTION
+
+For the basic usage, B<uscan> is executed without any arguments from the root
+of the Debianized source tree where you see the F<debian/> directory.
+Then typically the following happens:
+
+=over
+
+=item * B<uscan> downloads the upstream tarball with the highest version from
+the remote URL specified in F<debian/watch>
+
+=item * B<uscan> reads the first entry in F<debian/changelog> to determine the
+source package name I<spkg> and the last upstream version. 
+
+=item * B<uscan> saves the downloaded tarball to the parent B<../> directory:
+I<< ../<upkg>-<uversion>.tar.gz >>
+
+=item * B<uscan> invokes B<mk-origtargz> to create the source tarball: I<<
+../<spkg>_<sversion>.orig.tar.gz >>
+
+=item * B<uscan> invokes B<uupdate> to create the Debianized source tree: I<<
+../<spkg>-<sversion>/* >>
+
+=back
+
+Here, B<uscan> allows extensive flexibility.  The upstream tarball in he remote
+URL can be F<http://www.example.org/download/foo-v1.0.tar.gz>, the downloaded
+tarball can be F<../foo-1.0.tar.gz>, the source tarball can be
+F<../bar_2.1.0.orig.tar.gz>, and the Debianized source tree can be located at
+F<../bar-2.1.0>.
+
+Note: For simplicity, the compression method used in examples is B<gzip> with
+B<.gz> suffix.  Other methods such as B<xz>, B<bzip2>, and B<lzma> may also be
+used.
+
+=head1 FORMAT OF THE WATCH FILE
+
+The current version 3 format of F<debian/watch> can be summarized as follows:
+
+=over
+
+=item * Leading spaces and tabs are dropped.
+
+=item * Empty lines are dropped. 
+
+=item * A line started by B<#> (hash) is a comment line and dropped.
+
+=item * Single B<\> (back slash) at the end of a line is dropped and the the
+next line is concatenated after removing leading spaces and tabs.
+
+=item * The first non-comment line is:
+
+=over
+
+=item B<version=3>
+
+=back
+
+This is required.
+
+=item * The second non-comment line specifies the rule for the candidate
+upstream tarball URL and is typically in the following format:
+
+=over
+
+=item [B<opts="> I<...> B<">] \
+
+=item B<http://>I<URL> \
+
+=item I<matching-pattern> \
+
+=item [B<debian> [B<uupdate>]]
+
+=back
+
+Here, lines are folded with B<\> for readability.  You may write this in a
+single line.  A space before each B<\> is significant.  B<[> and B<]> are there
+to mark optional parts and should not be typed.
+
+The last 2 entries, B<debian> and B<uupdate>, are completely optional.  If these
+are missing, these are the default values for the Debian non-native source package.
+
+There are 2 other F<debian/watch> formats which concatnate B<http://>I<URL> and
+I<matching-pattern> with B</>.  One for HTTP and another for FTP.  See
+L<ADVANCED FEATURES>.
+
+=back
+
+=head1 TYPICAL EXECUTION
+
+Let's describe step-by-step how B<uscan> processes this typical F<debian/watch>
+.
+
+B<uscan> reads the per-site option specified in B<opts="> I<...> B<"> to
+customize its behavior.  They are:
+
+=over
+
+=item * The mangling rules which apply the I<rule> to the pertinent I<string>.
+They behave as if a Perl command "I<$target_string> B<~=> I<rule>" is executed.
+
+=over
+
+=item * B<dversionmangle=>I<rule> for the last upstream version string found in F<debian/changelog>
+
+=item * B<uversionmangle=>I<rule> for the candidate upstream version strings
+
+=item * B<versionmangle=>I<rules> as a syntactic shorthand for:
+
+B<uversionmangle=>I<rules>B<,dversionmangle=>I<rules>
+
+=item * B<filenamemangle=>I<rule> for the downloaded tarball filename string
+
+=item * B<downloadurlmangle=>I<rule> for the candidate upstream tarball URL string
+
+=item * B<pgpsigurlmangle=>I<rule> for the candidate upstream signature file URL string
+
+=back
+
+=item * B<pasv> option for the FTP connection to use PASV mode.  This is
+recommended for most cases using the FTP connection.
+
+=item * B<repacksuffix> option for the version suffix used by the repackaged
+source tarball
+
+=back
+
+See L<Details of per-site options> for the details.
+
+B<uscan> reads the first entry in F<debian/changelog> to determine the source
+package name and the last upstream version. 
+
+For example, if the first entry of F<debian/changelog> may be:
+
+=over
+
+=item I<sourcepackage> (B<3:2.03+dfsg1-4>) unstable; urgency=low
+
+=back
+
+then, the source package name is I<sourcepackage> and the last upstream version
+is determined to be B<2.03+dfsg1> without the epoch and the Debian revision.
+
+If the B<dversionmangle> rule exists, the last upstream version is updated by
+applying this rule to it.  For example, if the last upstream version is
+B<2.03+dfsg1> indicating the source tarball is repackaged, the suffix B<+dfsg1>
+is removed by the string substitution B<s/\+dsfg\d*$//> to make the
+(dversionmangled) last upstream version B<2.03> and it is compared to the
+candidate upstream tarballs such as B<2.03>, B<2.04>, ... .  Thus, set this
+rule as:
+
+=over
+
+=item  B<opts="dversionmangle=s/\+dsfg\d*$//">
+
+=back
+
+B<uscan> downloads a web page from B<http://>I<URL> specified in
+F<debian/watch>.  
+
+=over
+
+=item * If the directory name part of I<URL> has no parentheses, B<(> and B<)>,
+it is taken as verbatim.
+
+=item * If the directory name part of I<URL> has parentheses, B<(> and B<)>,
+then B<uscan> recursively searches all possible directories to find a page for
+the newest version.
+
+=back
+
+For example, this B<http://>I<URL> may be specified as:
+
+=over
+
+=item * B<http://www.example.org/DL(.+)/>
+
+=back
+
+Please note the trailing B</> in the above.
+
+The downloaded web page is scanned for links defined in the B<< <a href=" >>
+I<...> B<< "> >> tag to locate the candidate upstream tarball URLs.  These
+candidate upstream tarball URLs are matched by the Perl regex pattern
+I<matching-pattern> such as B<DL-(?:[\d\.]+?)/upstreamname-(.+)\.tar\.gz> to
+narrow down the candidates.  This pattern match needs to be anchored at the
+beginning and the end.  For example, candidate URLs may be:
+
+=over
+
+=item * B<DL-2.02/upstreamname-2.02.tar.gz>
+
+=item * B<DL-2.03/upstreamname-2.03.tar.gz>
+
+=item * B<DL-2.04/upstreamname-2.04.tar.gz>
+
+=back
+
+Here the matching string of B<(.+)> in I<matching-pattern> is considered as the
+candidate upstream version.  If there are multiple matching strings of
+capturing patterns in I<matching-pattern>, they are all concatenated with B<.>
+(period) to form the candidate upstream version.  Make sure to use the
+non-capturing regex such as B<(?:[\d\.]+?)> instead for the variable text
+matching part unrelated to the version.  
+
+For example, candidate upstream versions may be:
+
+=over
+
+=item * B<2.02>
+
+=item * B<2.03>
+
+=item * B<2.04>
+
+=back
+
+The downloaded tarball filename is set to the same as its filename in the
+remote URL.
+
+If the B<uversionmangle> rule exists, the candidate upstream versions are
+updated by applying this rule to them. (This rule may be useful if the
+upstream tarball needs to be repacked.)
+
+The upstream tarball URL corresponding to the newest (uversionmangled) candidate
+upstream version newer than the (dversionmangled) last upstream version is
+selected to be the candidate upstream tarball URL.
+
+Here, the order of the version is decided by B<dpkg --compare-versions>.
+
+If the B<filenamemangle> rule exists, the downloaded tarball filename is
+updated by applying this rule to it. (This rule may not be significant for
+modern use cases.  B<mk-origtargz> takes care proper naming of the source
+tarball based on the source package name in F<debian/changelog> without relying
+on the filename of the remote URL.  B<uupdate> is invoked by B<uscan> with
+B<--no-symlink> option and does not rename the tarball anymore.)
+
+If the candidate upstream tarball URL is a relative URL, it is converted to a
+absolute URL using the base URL of the web page.  If the B<< <base href=" >> I<
+... > B<< "> >> tag exists in the web page, the candidate upstream tarball URL
+is converted to the absolute URL using the specified base URL in the base tag,
+instead.
+
+If the B<downloadurlmangle> rule exists, the candidate upstream tarball URL is
+updated by applying this rule to it. (This is useful for some sites with the
+obfuscated download URL.)
+
+B<uscan> downloads the candidate upstream tarball to the parent B<../>
+directory.  For example, the downloaded file may be:
+
+=over
+
+=item * F<../upstreamname-2.04.tar.gz>
+
+=back
+
+Let's call this B<2.04> in the above example generically as I<version> in the
+following.
+
+If the B<pgpsignurlmangle> rule exists, the upstream signature file URL is
+generated by applying this rule to the (downloadurlmangled) candidate upstream
+tarball URL and the signnature file is tried to be downloaded.
+
+Otherwise, 4 common possible upstream signature file URLs are tried by
+appending 4 common suffixes B<.asc>, B<.gpg>, B<.pgp>, and B<.sig> to the
+(downloadurlmangled) candidate upstream tarball URL and at least one signature
+file is tried to be downloaded.
+
+If the downloaded signature file exists, the downloaded upstream tarball is
+checked for its authenticity using the downloaded signature file by using the
+keyring F<debian/upstream/signing-key.pgp> or the armored keyring
+F<debian/upstream/signing-key.asc>. If it is not valid, or not made by one of
+the listed keys, B<uscan> will report an error.
+
+B<uscan> invokes B<mk-origtargz> to create the source tarball properly named
+for the source package with B<.orig.> in its filename.
+
+=over
+
+=item case A: package the upstream tarball as is
+
+B<mk-origtargz> creates a symlink
+B<../>I<sourcename>B<_>I<version>B<.orig.tar.gz> linked to the downloaded local
+upstream tarball. Here, I<sourcename> is the source package name found in
+F<debian/changelog>. For example, if I<sourcename> is B<foo>, the generated
+symlink may be:
+
+=over
+
+=item * F<../foo_2.04.orig.tar.gz> -> F<upstreamname-2.04.tar.gz> (as is)
+
+=back
+
+Usually, there is no need to set up B<opts="dversionmangle=> I<...> B<"> for
+this case.
+
+=item case B: package the upstream tarball aftre removing non-DFSG files
+
+B<mk-origtargz> checks the filename glob of the B<Files-Excluded> stanza in the
+first section of F<debian/copyright>, removes matching files to create a
+repacked upstream tarball.  Normally, the repacked upstream tarball is renamed
+with I<suffix> to B<../>I<sourcename>B<_>I<version>I<suffix>B<.orig.tar.gz>
+using the B<repacksuffix> option.    Here I<version> is updated to be
+I<version>I<suffix>. 
+
+The removal of files is required if files are not DFSG-compliant.  For such
+case, B<+dfsg1> is used as I<suffix>.
+
+So the combined per-site options are set as
+B<opts="dversionmangle=s/\+dsfg\d*$// ,repacksuffix=+dfsg1">, instead.
+
+For example, the repacked upstream tarball may be:
+
+=over
+
+=item * F<../foo_2.04+dfsg1.orig.tar.gz> (repackaged)
+
+=back
+
+=back
+
+B<uscan> normally invokes "B<uupdate> B<--no-symlink --upstream-version>
+I<version> B<../>I<sourcename>B<_>I<version>B<.orig.tar.gz>".
+
+Please note that B<--no-symlink> option is used here since B<mk-origtargz> is
+invoked to make B<*.orig.tar.gz> file ready to be used for the source package.
+B<uscan> picks I<sourcename> from F<debian/changelog> so the tarball filename
+of the actual remote URL does not matter for packaging.
+
+It creates the new upstream source tree under the
+B<../>I<sourcename>B<->I<version> directory and Debianize it leveraging the
+last package contents.
+
+=head1 ADVANCED FEATURES
+
+B<uscan> has many other enhanced features which are skipped in the above
+section for the simplicity.  Let's check their highlights. 
+
+B<uscan> actually scans not just the current directory but all its
+subdirectories looking for F<debian/watch> to process them all. 
+See the below section L<Directory name checking>.
+
+B<uscan> can be executed with I<path> as its argument to change the starting
+directory of search from the current directory to I<path> .
+
+B<uscan> can be used to assess the health of the package by executing with
+B<--report> option to output a human readable report without downloading the
+upstream tarball.
+
+B<uscan> can be used to assess the health of the package by executing
+with B<--dehs> option to output an XML data suitable for the DEHS system.
+
+B<http://>I<URL> part in the second non-comment line can be B<https://>I<URL>
+to use the SSL/TLS protocol, instead
+
+The second non-comment line of F<debian/watch> can be in the alternative
+formats.
+
+=over
+
+=item * If I<matching-pattern> does not contain B</>, a short hand format may
+be used:
+
+=over
+
+=item  [B<opts="> I<...> B<">] \
+
+=item  B<http://>I<URL/matching-pattern> \
+
+=item  [B<debian> [B<uupdate>]]
+
+=back
+
+Here, B<http://>I<URL> is accessed for the web page.
+
+=item * If the FTP protocol is used instead of HTTP/HTTPS, the following format should be used:
+
+=over
+
+=item  [B<opts="> I<...> B<">] \
+
+=item  B<ftp://>I<URL/matching-pattern> \
+
+=item  [B<debian> [B<uupdate>]]
+
+=back
+
+It is good idea to set B<opts=pasv> here which makes B<uscan> try the passive FTP protocol first.
+
+=back
+
+The optional B<debian> string in the second non-comment line of F<debian/watch>
+means to obtain the last upstream version from F<debian/changelog>.  If this is
+replaced by a specific version number such as B<1.0.2>, B<1.0.2> is used as the
+last upstream version, instead.
+
+The optional B<uupdate> string in the second non-comment line of
+F<debian/watch> means to execute B<uupdate> with options after processing this
+line.  For example, you can customize this by replacing B<uupdate> by
+F<debian/myuupdate> with the following content.
+
+  #!/bin/sh -e
+  # called with '--upstream-version' <version> <file>
+  uupdate "$@" --no-symlink
+  package=`dpkg-parsechangelog | sed -n 's/^Source: //p'`
+  cd ../$package-$2
+  debuild
+
+Then B<uscan> invokes "I<debian/myuupdate> B<--upstream-version> I<version>
+B<../>I<sourcename>B<_>I<version>B<.orig.tar.gz>" instead to perform a fully
+automatic upstream update of Debian binary packages.
+
+Note that we don't call B<dupload> or B<dput> automatically, as the maintainer
+should perform sanity checks on the software before uploading it to Debian.
+
+See L<OPTIONS> and L<CONFIGURATION VARIABLES> for other variations.
+
+=head1 EXAMPLE
+
+Here are the typical F<debian/watch> files.
+
+The existance and non-existance of a space before tailing B<\> (back slash) are
+significant.
+
+=head2 HTTP site (basic)
+
+For the basic HTTP site:
+
+  http://example.com/~user/release/foo.html \
+  files/foo-([\d\.]*).tar.gz
+
+=head2 HTTP site (flexible)
+
+For the maximum flexibility of upstream tarball formats:
+
+  http://example.com/example-(\d[\d.]*)\.\
+  (?:zip|tgz|tbz2|txz|tar\.(?:gz|bz2|xz))
+
+=head2 HTTP site (recursive directory scanning)
+
+For recursive directory scanning:
+
+  http://tmrc.mit.edu/mirror/twisted/Twisted/(\d\.\d)/ \
+  Twisted-([\d\.]*)\.tar\.bz2
+
+or in one string style variant
+
+  http://tmrc.mit.edu/mirror/twisted/\
+  Twisted/(\d\.\d)/Twisted-([\d\.]*)\.tar\.bz2
+
+Here, the website should be able to handle requests to:
+
+  http://tmrc.mit.edu/mirror/twisted/Twisted/
+
+=head2 HTTP site (alternative)
+
+For one string style:
+
+  http://www.cpan.org/modules/by-module/Text/Text-CSV_XS-(.+)\.tar\.gz
+
+This is the same as
+
+  http://www.cpan.org/modules/by-module/Text Text-CSV_XS-(.+)\.tar\.gz
+
+=head2 HTTP site (sf.net)
+
+For SourceForge based projects, qa.debian.org runs a redirector which allows a
+simpler form of URL. The format below will automatically be rewritten to use
+the redirector.
+
+  http://sf.net/audacity/audacity-src-(.+)\.tar\.gz
+
+=head2 HTTP site (github.com)
+
+For GitHub projects, you can use the tags or releases page.  The archive URLs
+use only the version as the filename.  You can rename the downloaded upstream
+tarball into standard I<project>B<->I<version>B<.tar.gz> using
+B<filenamemangle>:
+
+  opts="filenamemangle=\
+  s/(?:.*?)?v?(\d[\d.]*)\.tar\.gz/<project>-$1.tar.gz/" \
+  https://github.com/<user>/<project>/tags \
+  (?:.*?/)?v?(\d[\d.]*)\.tar\.gz
+
+=head2 HTTP site (code.google.com)
+
+For Google Code projects, you should use the downloads page like this:
+
+  https://code.google.com/p/<project>/downloads/list?can=1 \
+  .*/<project>-(\d[\d.]*)\.tar\.gz
+
+=head2 HTTP site (funny version)
+
+For a site which has funny version numbers, the parenthesized groups will be
+joined with B<.> (period) to make a sanitized version number.
+
+  http://www.site.com/pub/foobar/foobar_v(\d+)_(\d+)\.tar\.gz
+
+=head2 HTTP site (DFSG)
+
+The upstream part of the Debian version number can be
+mangled:
+
+  opts="dversionmangle=s/\+dfsg\d*$//,repacksuffix=+dfsg1" \
+  http://some.site.org/some/path/foobar-(.+)\.tar\.gz
+
+=head2 HTTP site (filenamemangle)
+
+The filename is found by taking the last component of the URL and
+removing everything after any '?'.  If this would not make a usable
+filename, use B<filenamemangle>.  For example,
+F<< <A href="http://foo.bar.org/dl/?path=&dl=foo-0.1.1.tar.gz";> >>
+could be handled as:
+
+  opts=filenamemangle=s/.*=(.*)/$1/ \
+  http://foo.bar.org/dl/\?path=&dl=foo-(.+)\.tar\.gz
+
+
+F<< <A href="http://foo.bar.org/dl/?path=&dl_version=0.1.1";> >>
+could be handled as:
+
+  opts=filenamemangle=s/.*=(.*)/foo-$1\.tar\.gz/ \
+  http://foo.bar.org/dl/\?path=&dl_version=(.+)
+
+=head2 HTTP site (downloadurlmangle)
+
+The option B<downloadurlmangle> can be used to mangle the URL of the file
+to download.  This can only be used with B<http://> URLs.  This may be
+necessary if the link given on the web page needs to be transformed in
+some way into one which will work automatically, for example:
+
+  opts=downloadurlmangle=s/prdownload/download/ \
+  http://developer.berlios.de/project/showfiles.php?group_id=2051 \
+  http://prdownload.berlios.de/softdevice/vdr-softdevice-(.+).tgz
+
+=head2 FTP site (basic):
+
+  opts=pasv \
+  ftp://ftp.tex.ac.uk/tex-archive/web/c_cpp/cweb/cweb-(.+)\.tar\.gz
+
+=head2 FTP site (regex special characters):
+
+  opts=pasv \
+  ftp://ftp.worldforge.org/pub/worldforge/libs/\
+  Atlas-C++/transitional/Atlas-C\+\+-(.+)\.tar\.gz
+
+Please note that this URL is connected to be I< ... >B<libs/Atlas-C++/>I< ... >
+. For B<++>, the first one in the directory path is verbatim while the one in
+the filename is escaped by B<\>.
+
+=head2 FTP site (funny version)
+
+This is another way of handling site with funny version numbers,
+this time using mangling.  (Note that multiple groups will be
+concatenated before mangling is performed, and that mangling will
+only be performed on the basename version number, not any path
+version numbers.)
+
+  opts="uversionmangle=s/^/0.0./" \
+  ftp://ftp.ibiblio.org/pub/Linux/ALPHA/wine/\
+  development/Wine-(.+)\.tar\.gz
+
+
+=head1 OPTIONS
+
+For the basic usage, B<uscan> does not require to set these options.
+
+=over
+
+=item B<--report>, B<--no-download>
+
+Only report about available newer versions but do not download
+anything.
+
+=item B<--report-status>
+
+Report on the status of all packages, even those which are up-to-date,
+but do not download anything.
+
+=item B<--download>
+
+Report and download. (This is the default behavior.)
+
+=item B<--destdir>
+
+Path of directory to which to download. If the specified path is not absolute,
+it will be relative to one of the current directory or, if directory scanning
+is enabled, the package's
+source directory.
+
+=item B<--force-download>
+
+Download upstream even if up-to-date (will not overwrite local files, however)
+
+=item B<--pasv>
+
+Force PASV mode for FTP connections.
+
+=item B<--no-pasv>
+
+Do not use PASV mode for FTP connections.
+
+=item B<--timeout> I<N>
+
+Set timeout to I<N> seconds (default 20 seconds).
+
+=item B<--no-symlink>
+
+Do not call B<mk-origtargz>.
+
+=item B<--dehs>
+
+Use an XML format for output, as required by the DEHS system.
+
+=item B<--no-dehs>
+
+Use the traditional uscan output format. (This is the default behavior.)
+
+=item B<--package> I<package>
+
+Specify the name of the package to check for rather than examining
+F<debian/changelog>; this requires the B<--upstream-version> (unless a version
+is specified in the F<watch> file) and B<--watchfile> options as well.
+Furthermore, no directory scanning will be done and nothing will be downloaded.
+This option is probably most useful in conjunction with the DEHS system (and
+B<--dehs>).
+
+=item B<--upstream-version> I<upstream-version>
+
+Specify the current upstream version rather than examine F<debian/watch> or
+F<debian/changelog> to determine it. This is ignored if a directory scan is being
+performed and more than one F<debian/watch> file is found.
+
+=item B<--watchfile> I<watchfile>
+
+Specify the I<watchfile> rather than perform a directory scan to
+determine it. If this option is used without B<--package>, then
+B<uscan> must be called from within the Debian package source tree
+(so that F<debian/changelog> can be found simply by stepping up
+through the tree).
+
+=item B<--download-version> I<version>
+
+Specify the I<version> which the upstream release must match in order to be
+considered, rather than using the release with the highest version.
+
+=item B<--download-current-version>
+
+Download the currently packaged version
+
+=item B<--verbose>
+
+Give verbose output.
+
+=item B<--no-verbose>
+
+Don't give verbose output.  (This is the default behavior.)
+
+=item B<--no-exclusion>
+
+Do not automatically exclude files mentioned in F<debian/copyright> field B<Files-Excluded>
+
+=item B<--debug>
+
+Dump the downloaded web pages to stdout for debugging your F<watch> file.
+
+=item B<--check-dirname-level> I<N>
+
+See the below section L<Directory name checking> for an explanation of this option.
+
+=item B<--check-dirname-regex> I<regex>
+
+See the below section L<Directory name checking> for an explanation of this option.
+
+=item B<--user-agent>, B<--useragent>
+
+Override the default user agent header.
+
+=item B<--no-conf>, B<--noconf>
+
+Do not read any configuration files. This can only be used as the first option
+given on the command-line.
+
+=item B<--help>
+
+Give brief usage information.
+
+=item B<--version>
+
+Display version information.
+
+=back
+
+B<uscan> also accepts following options and passes them to B<mk-origtargz>:
+
+=over
+
+=item B<--symlink>
+
+Make B<orig.tar.gz> (with the appropriate extension) symlink to the downloaded
+files. (This is the default behavior.)
+
+=item B<---copy>
+
+Instead of symlinking as described above, copy the downloaded files.
+
+=item B<---rename>
+
+Instead of symlinking as described above, rename the downloaded files.
+
+=item B<---repack>
+
+After having downloaded an lzma tar, xz tar, bzip tar or zip archive, repack it
+to a gzip tar archive, if required. The unzip package must be installed in
+order to repack .zip archives, the xz-utils package must be installed to repack
+lzma or xz tar archives.
+
+=item B<--compression> [ B<gzip> | B<bzip2> | B<lzma> | B<xz> ]
+
+In the case where the upstream sources are repacked (either because B<--repack>
+option is given or F<debian/copyright> contains the field B<Files-Excluded>), it is
+possible to control the compression method via the parameter (defaults to
+B<gzip>).
+
+=item B<--copyright-file> I<copyright-file>
+
+Exclude files mentioned in B<Files-Excluded> in the given I<copyright-file>.
+This is useful when running B<uscan> not within a source package directory.
+
+=back
+
+=head1 CONFIGURATION VARIABLES
+
+For the basic usage, B<uscan> does not require to set these configuration
+variables.
+
+The two configuration files F</etc/devscripts.conf> and F<~/.devscripts> are
+sourced by a shell in that order to set configuration variables. These
+may be overridden by command line options. Environment variable settings are
+ignored for this purpose. If the first command line option given is
+B<--noconf>, then these files will not be read. The currently recognized
+variables are:
+
+=over
+
+=item B<USCAN_DOWNLOAD>
+
+If this is set to B<no>, then newer upstream files will not be downloaded; this
+is equivalent to the B<--report> or B<--no-download> options.
+
+=item B<USCAN_PASV>
+
+If this is set to yes or no, this will force FTP connections to use PASV mode
+or not to, respectively. If this is set to default, then B<Net::FTP(3)> makes
+the choice (primarily based on the B<FTP_PASSIVE> environment variable).
+
+=item B<USCAN_TIMEOUT>
+
+If set to a number I<N>, then set the timeout to I<N> seconds. This is
+equivalent to the B<--timeout> option.
+
+=item B<USCAN_SYMLINK>
+
+If this is set to no, then a I<pkg>_I<version>B<.orig.tar.{gz|bz2|lzma|xz}>
+symlink will not be made (equivalent to the B<--no-symlink> option). If it is
+set to B<yes> or B<symlink>, then the symlinks will be made. If it is set to
+rename, then the files are renamed (equivalent to the B<--rename> option).
+
+=item B<USCAN_DEHS_OUTPUT>
+
+If this is set to B<yes>, then DEHS-style output will be used. This is
+equivalent to the B<--dehs> option.
+
+=item B<USCAN_VERBOSE>
+
+If this is set to B<yes>, then verbose output will be given.  This is
+equivalent to the B<--verbose> option.
+
+=item B<USCAN_USER_AGENT>
+
+If set, the specified user agent string will be used in place of the default.
+This is equivalent to the B<--user-agent> option.
+
+=item B<USCAN_DESTDIR>
+
+If set, the downloaded files will be placed in this  directory.  This is
+equivalent to the B<--destdir> option.
+
+=item B<USCAN_REPACK>
+
+If this is set to yes, then after having downloaded a bzip tar, lzma tar, xz
+tar, or zip archive, uscan will repack it to a gzip tar. This is equivalent to
+the B<--repack> option.
+
+=item B<USCAN_EXCLUSION>
+
+If this is set to no, files mentioned in the field B<Files-Excluded> of
+F<debian/copyright> will be ignored and no exclusion of files will be tried.
+This is equivalent to the B<--no-exclusion> option.
+
+=back
+
+=head1 EXIT STATUS
+
+The exit status gives some indication of whether a newer version was found or
+not; one is advised to read the output to determine exactly what happened and
+whether there were any warnings to be noted.
+
+=over
+
+=item B<0>
+
+Either B<--help> or B<--version> was used, or for some F<watch> file which was
+examined, a newer upstream version was located.
+
+=item B<1>
+
+No newer upstream versions were located for any of the F<watch> files examined.
+
+=back
+
+=head1 DETAILS
+
+=head2 Details of per-site options
+
+As explained cursorily in the above section L<TYPICAL EXECUTION>,
+B<uscan> reads the per-site option specified in B<opts="> I< ... > B<"> to
+customize its behavior.  
+
+Multiple per-site options I<option1>, I<option2>, I<option3>, ... can be set
+as:
+
+B<opts=">I<option1>B<,> I<option2>B<,> I<option3>B<,> I< ... >B<">
+
+The double quotes are necessary if options contain any spaces.
+
+The mangling rules set by the per-site option behave as if a Perl command
+"I<$target_string> B<~=> I<rule>" is executed but there are some notable
+details.
+
+=over
+
+=item * multiple rules can be specified for a I<rule> by making a concatenated
+string of B<;> (semicolon) separated operations.
+
+=item * I<rule> may only use the B<s>, B<tr> and B<y> operations.
+
+=item * When the B<s> operation is used, only the B<g>, B<i> and B<x> flags are
+available and rule may not contain any expressions which have the potential to
+execute code (i.e. the B<(?{})> and B<(??{})> constructs are not supported).
+
+=item * If the B<s> operation is used, the replacement can contain back references
+to expressions within parenthesis in the matching regex, like
+B<s/-alpha(\d*)/.a$1/>. These back references must use the B<$1> syntax, as the
+B<\1> syntax is not supported.
+
+=item * each operation can not contain B<;> (semicolon) nor B<,> (comma).
+
+=back
+
+=head2 Directory name checking
+
+Similarly to several other scripts in the B<devscripts> package, B<uscan>
+explores the requested directory trees looking for F<debian/changelog> and
+F<debian/watch> files. As a safeguard against stray files causing potential
+problems, and in order to promote efficiency, it will examine the name of the
+parent directory once it finds the F<debian/changelog> file, and check that the
+directory name corresponds to the package name. It will only attempt to
+download newer versions of the package and then perform any requested action if
+the directory name matches the package name. Precisely how it does this is
+controlled by two configuration file variables
+B<DEVSCRIPTS_CHECK_DIRNAME_LEVEL> and B<DEVSCRIPTS_CHECK_DIRNAME_REGEX>, and
+their corresponding command-line options B<--check-dirname-level> and
+B<--check-dirname-regex>.
+
+B<DEVSCRIPTS_CHECK_DIRNAME_LEVEL> can take the following values:
+
+=over
+
+=item B<0>
+
+Never check the directory name.
+
+=item B<1>
+
+Only check the directory name if we have had to  change  directory in
+our search for F<debian/changelog>, that is, the directory containing
+F<debian/changelog> is not  the  directory  from  which B<uscan> was invoked.  This
+is the default behavior.
+
+=item B<2>
+
+Always check the directory name.
+
+=back
+
+The directory name is checked by testing whether the current directory name (as
+determined by pwd(1)) matches the regex given by the configuration file
+option B<DEVSCRIPTS_CHECK_DIRNAME_REGEX> or by the command line option
+B<--check-dirname-regex> I<regex>. Here regex is a Perl regex (see
+perlre(3perl)), which will be anchored at the beginning and the end. If regex
+contains a B</>, then it must match the full directory path. If not, then
+it must match the full directory name. If regex contains the string I<package>,
+this will be replaced by the source package name, as determined from the
+F<debian/changelog>. The default value for the regex is: I<package>B<(-.+)?>, thus matching
+directory names such as I<package> and I<package>-I<version>.
+
+=head1 HISTORY AND UPGRADING
+
+This section briefly describes the backwards-incompatible F<watch> file features
+which have been added in each F<watch> file version, and the first version of the
+B<devscripts> package which understood them.
+
+=over
+
+=item Pre-version 2
+
+The F<watch> file syntax was significantly different in those days. Don't use it.
+If you are upgrading from a pre-version 2 F<watch> file, you are advised to read
+this manpage and to start from scratch.
+
+=item Version 2
+
+B<devscripts> version 2.6.90: The first incarnation of the current style of
+F<watch> files.
+
+=item Version 3
+
+B<devscripts> version 2.8.12: Introduced the following: correct handling of
+regex special characters in the path part, directory/path pattern matching,
+version number in several parts, version number mangling. Later versions
+have also introduced URL mangling.
+
+If you are upgrading from version 2, the key incompatibility is if you have
+multiple groups in the pattern part; whereas only the first one would be used
+in version 2, they will all be used in version 3. To avoid this behavior,
+change the non-version-number groups to be B<(?:> I< ...> B<)> instead of a
+plain B<(> I< ... > B<)> group.
+
+=back
+
+=head1 SEE ALSO
+
+dpkg(1), mk-origtargz(1), perlre(1), uupdate(1), devscripts.conf(5)
+
+=head1 AUTHOR
+
+The original version of uscan was written by Christoph Lameter
+<clame...@debian.org>. Significant improvements, changes and bugfixes were
+made by Julian Gilbey <j...@debian.org>. HTTP support was added by Piotr
+Roszatycki <dex...@debian.org>. The program was rewritten in Perl by Julian
+Gilbey.
+
+=cut
+
 use 5.010;  # defined-or (//)
 use strict;
 use warnings;
-- 
2.1.4

From bf73847a35d79c496b0108830b272248af25bbab Mon Sep 17 00:00:00 2001
From: Osamu Aoki <os...@debian.org>
Date: Thu, 3 Sep 2015 17:00:14 +0000
Subject: [PATCH 2/8] document <pkg>-<version>.download for uscan.pl

 * Documemt the new filename scheme for http://site/?.
   Closes: #573631
---
 scripts/uscan.pl | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/scripts/uscan.pl b/scripts/uscan.pl
index 20ef97f..730b1be 100644
--- a/scripts/uscan.pl
+++ b/scripts/uscan.pl
@@ -540,16 +540,18 @@ mangled:
 
 =head2 HTTP site (filenamemangle)
 
-The filename is found by taking the last component of the URL and
-removing everything after any '?'.  If this would not make a usable
-filename, use B<filenamemangle>.  For example,
-F<< <A href="http://foo.bar.org/dl/?path=&dl=foo-0.1.1.tar.gz";> >>
-could be handled as:
+The filename is found by taking the last component of the URL and removing
+everything after any '?'.  If that leaves nothing for filename, B<uscan>
+generate filename using the source package name in B<debian/changelog>, the new
+version, and suffix B<.download> .
+
+If this does not fit to you, use B<filenamemangle>.  For example, F<< <A
+href="http://foo.bar.org/dl/?path=&dl=foo-0.1.1.tar.gz";> >> could be handled
+as:
 
   opts=filenamemangle=s/.*=(.*)/$1/ \
   http://foo.bar.org/dl/\?path=&dl=foo-(.+)\.tar\.gz
 
-
 F<< <A href="http://foo.bar.org/dl/?path=&dl_version=0.1.1";> >>
 could be handled as:
 
-- 
2.1.4

From 4d6502a094284d876479c4cfd133263dd5824b24 Mon Sep 17 00:00:00 2001
From: Osamu Aoki <os...@debian.org>
Date: Tue, 1 Sep 2015 14:22:56 +0000
Subject: [PATCH 3/8] Remove duplicate info in comment

POD is current info for version=3
---
 scripts/uscan.pl | 49 +------------------------------------------------
 1 file changed, 1 insertion(+), 48 deletions(-)

diff --git a/scripts/uscan.pl b/scripts/uscan.pl
index 730b1be..c5503e1 100644
--- a/scripts/uscan.pl
+++ b/scripts/uscan.pl
@@ -1648,54 +1648,7 @@ exit ($found ? 0 : 1);
 # greatest version number (as determined by the (...) group), using the
 # Debian version number comparison algorithm described below.
 #
-# watch_version=3:
-#
-# Correct handling of regex special characters in the path part:
-# ftp://ftp.worldforge.org/pub/worldforge/libs/Atlas-C++/transitional/Atlas-C\+\+-(.+)\.tar\.gz
-#
-# Directory pattern matching:
-# ftp://ftp.nessus.org/pub/nessus/nessus-([\d\.]+)/src/nessus-core-([\d\.]+)\.tar\.gz
-#
-# The pattern in each part may contain several (...) groups and
-# the version number is determined by joining all groups together
-# using "." as separator.  For example:
-#   ftp://site/dir/path/pattern-(\d+)_(\d+)_(\d+)\.tar\.gz
-#
-# This is another way of handling site with funny version numbers,
-# this time using mangling.  (Note that multiple groups will be
-# concatenated before mangling is performed, and that mangling will
-# only be performed on the basename version number, not any path version
-# numbers.)
-# opts=uversionmangle=s/^/0.0./ \
-#   ftp://ftp.ibiblio.org/pub/Linux/ALPHA/wine/development/Wine-(.+)\.tar\.gz
-#
-# Similarly, the upstream part of the Debian version number can be
-# mangled:
-# opts=dversionmangle=s/\.dfsg\.\d+$// \
-#   http://some.site.org/some/path/foobar-(.+)\.tar\.gz
-#
-# The versionmangle=... option is a shorthand for saying uversionmangle=...
-# and dversionmangle=... and applies to both upstream and Debian versions.
-#
-# The option filenamemangle can be used to mangle the name under which
-# the downloaded file will be saved:
-#   href="http://foo.bar.org/download/?path=&amp;download=foo-0.1.1.tar.gz";
-# could be handled as:
-# opts=filenamemangle=s/.*=(.*)/$1/ \
-#     http://foo.bar.org/download/\?path=&amp;download=foo-(.+)\.tar\.gz
-# and
-#   href="http://foo.bar.org/download/?path=&amp;download_version=0.1.1";
-# as:
-# opts=filenamemangle=s/.*=(.*)/foo-$1\.tar\.gz/ \
-#    http://foo.bar.org/download/\?path=&amp;download_version=(.+)
-#
-# The option downloadurlmangle can be used to mangle the URL of the file
-# to download.  This can only be used with http:// URLs.  This may be
-# necessary if the link given on the webpage needs to be transformed in
-# some way into one which will work automatically, for example:
-# opts=downloadurlmangle=s/prdownload/download/ \
-#   http://developer.berlios.de/project/showfiles.php?group_id=2051 \
-#   http://prdownload.berlios.de/softdevice/vdr-softdevice-(.+).tgz
+# watch_version=3: (See POD file on the top of this file)
 
 
 sub process_watchline ($$$$$$)
-- 
2.1.4

From 8f424e621af8c0cf9422b82412fa951726dee836 Mon Sep 17 00:00:00 2001
From: Osamu Aoki <os...@debian.org>
Date: Tue, 1 Sep 2015 13:20:18 +0000
Subject: [PATCH 4/8] Build uscan.1 from uscan.1

 * remove old uscan.1
 * update build script
 * update po4a spec
---
 po4a/devscripts-po4a.conf |   4 +-
 scripts/Makefile          |   2 +-
 scripts/uscan.1           | 596 ----------------------------------------------
 3 files changed, 3 insertions(+), 599 deletions(-)
 delete mode 100644 scripts/uscan.1

diff --git a/po4a/devscripts-po4a.conf b/po4a/devscripts-po4a.conf
index ff6c490..f732ec8 100644
--- a/po4a/devscripts-po4a.conf
+++ b/po4a/devscripts-po4a.conf
@@ -120,8 +120,8 @@
 	$lang:$lang/tagpending.$lang.pl add_$lang:?add_$lang/translator_pod.add
 [type:pod] ../scripts/transition-check.pl \
 	$lang:$lang/transition-check.$lang.pl add_$lang:?add_$lang/translator_pod.add
-[type:man] ../scripts/uscan.1 \
-	$lang:$lang/uscan.$lang.1 add_$lang:?add_$lang/translator_man.add
+[type:pod] ../scripts/uscan.pl \
+	$lang:$lang/uscan.$lang.pl add_$lang:?add_$lang/translator_pod.add
 [type:man] ../scripts/uupdate.1 \
 	$lang:$lang/uupdate.$lang.1 add_$lang:?add_$lang/translator_man.add
 [type:man] ../doc/what-patch.1 \
diff --git a/scripts/Makefile b/scripts/Makefile
index 797c78f..872f9d2 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -22,7 +22,7 @@ COMPL_FILES := $(wildcard *.bash_completion)
 COMPLETION = $(patsubst %.bash_completion,devscripts.%,$(COMPL_FILES))
 COMPL_DIR := $(shell pkg-config --variable=completionsdir bash-completion)
 
-GEN_MAN1S += devscripts.1 mk-origtargz.1
+GEN_MAN1S += devscripts.1 mk-origtargz.1 uscan.1
 
 all: $(SCRIPTS) $(GEN_MAN1S) $(CWRAPPERS) $(COMPLETION)
 
diff --git a/scripts/uscan.1 b/scripts/uscan.1
deleted file mode 100644
index bd9d2a7..0000000
--- a/scripts/uscan.1
+++ /dev/null
@@ -1,596 +0,0 @@
-.TH USCAN 1 "Debian Utilities" "DEBIAN" \" -*- nroff -*-
-.SH NAME
-uscan \- scan/watch upstream sources for new releases of software
-.SH SYNOPSIS
-\fBuscan\fR [\fIoptions\fR] [\fIpath-to-debian-source-packages\fR ...]
-.SH DESCRIPTION
-\fBuscan\fR scans the given directories (or the current directory if
-none are specified) and all of their subdirectories for packages
-containing a control file \fIdebian/watch\fR.  Parameters are then
-read from those control files and upstream ftp or http sites are
-inspected for newly available updates (as compared with the upstream
-version number retrieved from the \fIdebian/changelog\fR file in the
-same directory).  The newest updates are retrieved (as determined by
-their version numbers) and if specified in the \fIwatch\fR file, a program
-may then be executed on the newly downloaded source.
-.PP
-The traditional \fIdebian/watch\fR files can still be used, but the
-current format offers both simpler and more flexible services.  We do
-not describe the old format here; for their documentation, see the
-source code for \fRuscan\fR.
-
-.SH FORMAT of debian/watch files
-
-The following demonstrates the type of entries which can appear in a
-\fIdebian/watch\fR file.  Obviously, not all of these would appear in
-one such file; usually, one would have one line for the current
-package.
-
-.PP
-.nf
-# format version number, currently 3; this line is compulsory!
-version=3
-
-# Line continuations are performed with \fB\e\fR
-
-# This is the format for an FTP site:
-# Full-site-with-pattern  [Version  [Action]]
-ftp://ftp.tex.ac.uk/tex-archive/web/c_cpp/cweb/cweb-(.+)\e.tar\e.gz \e
-  debian  uupdate
-
-# This is the format for an FTP site with regex special characters in
-# the filename part
-ftp://ftp.worldforge.org/pub/worldforge/libs/Atlas-C++/transitional/Atlas-C\e+\e+-(.+)\e.tar\e.gz
-
-# This is the format for an FTP site with directory pattern matching
-ftp://ftp.nessus.org/pub/nessus/nessus-([\ed\e.]+)/src/nessus-core-([\ed\e.]+)\e.tar\e.gz
-
-# This can be used if you want to override the PASV setting
-# for a specific site
-# opts=pasv ftp://.../...
-
-# This is one format for an HTTP site, which is the same
-# as the FTP format.  \fBuscan\fR starts by downloading the homepage,
-# obtained by removing the last component of the URL; in this case,
-# \fIhttp://www.cpan.org/modules/by-module/Text/\fR
-http://www.cpan.org/modules/by-module/Text/Text-CSV_XS-(.+)\e.tar\e.gz
-
-# This is a variant HTTP format which allows direct specification of
-# the homepage:
-# Homepage  Pattern  [Version  [Action]]
-http://www.dataway.ch/~lukasl/amph/amph.html \e
-  files/amphetamine-([\ed\e.]*).tar.bz2
-
-# This one shows that recursive directory scanning works, in either of
-# two forms, as long as the website can handle requests of the form
-# \fIhttp://site/inter/mediate/dir/\fR
-http://tmrc.mit.edu/mirror/twisted/Twisted/(\ed\e.\ed)/ \e
-  Twisted-([\ed\e.]*)\e.tar\e.bz2
-http://tmrc.mit.edu/mirror/twisted/Twisted/(\ed\e.\ed)/Twisted-([\ed\e.]*)\e.tar\e.bz2
-
-# For maximum flexibility with upstream tarball formats, use this:
-http://example.com/example-(\ed[\ed\.]*)\e.(?:zip|tgz|tbz2|txz|tar\e.(?:gz|bz2|xz))
-
-# qa.debian.org runs a redirector which allows a simpler form of URL
-# for SourceForge based projects. The format below will automatically
-# be rewritten to use the redirector.
-http://sf.net/audacity/audacity-src-(.+)\e.tar\e.gz
-
-# For GitHub projects you can use the tags or releases page.  Since the archive
-# URLs use only the version as the name, it is recommended to use a
-# filenamemangle to adjust the name of the downloaded file:
-opts="filenamemangle=s/(?:.*?\/)?v?(\ed[\ed.]*)\e.tar\e.gz/<project>-$1.tar.gz/" \e
-  https://github.com/<user>/<project>/tags (?:.*?/)?v?(\ed[\ed.]*)\e.tar\e.gz
-
-# For Google Code projects you should use the downloads page like this:
-https://code.google.com/p/<project>/downloads/list?can=1 \e
-  .*/<project>-(\ed[\ed.]*)\e.tar\e.gz
-
-# This is the format for a site which has funny version numbers;
-# the parenthesised groups will be joined with dots to make a
-# sanitised version number
-http://www.site.com/pub/foobar/foobar_v(\ed+)_(\ed+)\e.tar\e.gz
-
-# This is another way of handling site with funny version numbers,
-# this time using mangling.  (Note that multiple groups will be
-# concatenated before mangling is performed, and that mangling will
-# only be performed on the basename version number, not any path
-# version numbers.)
-opts="uversionmangle=s/^/0.0./" \e
-  ftp://ftp.ibiblio.org/pub/Linux/ALPHA/wine/development/Wine-(.+)\e.tar\e.gz
-
-# Similarly, the upstream part of the Debian version number can be
-# mangled:
-opts=dversionmangle=s/\e+dfsg\ed*$// \e
-  http://some.site.org/some/path/foobar-(.+)\e.tar\e.gz
-
-# The filename is found by taking the last component of the URL and
-# removing everything after any '\fB?\fR'.  If this would not make a usable
-# filename, use filenamemangle.  For example,
-# <A href="http://foo.bar.org/download/?path=&download=foo-0.1.1.tar.gz";>
-# could be handled as:
-# opts=filenamemangle=s/.*=(.*)/$1/ \e
-#     http://foo.bar.org/download/\e?path=&download=foo-(.+)\e.tar\e.gz
-#
-# <A href="http://foo.bar.org/download/?path=&download_version=0.1.1";>
-# could be handled as:
-# opts=filenamemangle=s/.*=(.*)/foo-$1\e.tar\e.gz/ \e
-#    http://foo.bar.org/download/\e?path=&download_version=(.+)
-
-# The option downloadurlmangle can be used to mangle the URL of the file
-# to download.  This can only be used with http:// URLs.  This may be
-# necessary if the link given on the web page needs to be transformed in
-# some way into one which will work automatically, for example:
-# opts=downloadurlmangle=s/prdownload/download/ \e
-#   http://developer.berlios.de/project/showfiles.php?group_id=2051 \e
-#   http://prdownload.berlios.de/softdevice/vdr-softdevice-(.+).tgz
-
-.fi
-.PP
-Comment lines may be introduced with a `\fB#\fR' character.  Continuation
-lines may be indicated by terminating a line with a backslash
-character.
-.PP
-The first (non-comment) line of the file must begin `version=3'.  This
-allows for future extensions without having to change the name of the
-file.
-.PP
-There are two possibilities for the syntax of an HTTP \fIwatch\fR file line,
-and only one for an FTP line.  We begin with the common (and simpler)
-format.  We describe the optional opts=... first field below, and
-ignore it in what follows.
-.PP
-The first field gives the full pattern of URLs being searched for.  In
-the case of an FTP site, the directory listing for the requested
-directory will be requested and this will be scanned for files
-matching the basename (everything after the trailing `\fB/\fR').  In the
-case of an HTTP site, the URL obtained by stripping everything after
-the trailing slash will be downloaded and searched for hrefs (links of
-the form <a href=...>) to either the full URL pattern given, or to the
-absolute part (everything without the http://host.name/ part), or to
-the basename (just the part after the final `\fB/\fR').  Everything up to
-the final slash is taken as a verbatim URL, as long as there are no
-parentheses (`\fB(\fR' and '\fB)\fR') in this part of the URL: if it does, the
-directory name will be matched in the same way as the final component
-of the URL as described below.  (Note that regex metacharacters such
-as `\fB+\fR' are regarded literally unless they are in a path component
-containing parentheses; see the Atlas-C++ example above.  Also, the
-parentheses must match within each path component.)
-.PP
-The pattern (after the final slash) is a Perl regexp (see
-\fBperlre\fR(1) for details of these).  You need to make the pattern
-so tight that it matches only the upstream software you are interested
-in and nothing else.  Also, the pattern will be anchored at the
-beginning and at the end, so it must match the full filename.  (Note
-that for HTTP URLs, the href may include the absolute path or full
-site and path and still be accepted.)  The pattern must contain at
-least one Perl group as explained in the next paragraph.
-.PP
-Having got a list of `files' matching the pattern, their version
-numbers are extracted by treating the part matching the Perl regexp
-groups, demarcated by `\fB(...)\fR', joining them with `\fB.\fR' as a separator,
-and using the result as the version number of the file.  The version
-number will then be mangled if required by the uversionmangle option
-described below.  Finally, the file versions are then compared to find
-the one with the greatest version number, as determined by \fBdpkg
-\-\-compare-versions\fR.  Note that if you need Perl groups which are
-not to be used in the version number, either use `\fB(?:...)\fR' or use the
-uversionmangle option to clean up the mess!
-.PP
-The current (upstream) version can be specified as the second
-parameter in the \fIwatch\fR file line.  If this is \fIdebian\fR or absent,
-then the current Debian version (as determined by
-\fIdebian/changelog\fR) is used to determine the current upstream
-version.  The current upstream version may also be specified by the
-command-line option \fB\-\-upstream-version\fR, which specifies the
-upstream version number of the currently installed package (i.e., the
-Debian version number without epoch and Debian revision).  The
-upstream version number will then be mangled using the dversionmangle
-option if one is specified, as described below.  If the newest version
-available is newer than the current version, then it is downloaded
-into the parent directory, unless the \fB\-\-report\fR or
-\fB\-\-report-status\fR option has been used.  Once the file has been
-downloaded, then a symlink to the file is made from
-\fI<package>_<version>.orig.tar.{gz|bz2|lzma|xz}\fR as described by the help
-for the \fB\-\-symlink\fR option.
-.PP
-Finally, if a third parameter (an action) is given in the \fIwatch\fR file
-line, this is taken as the name of a command, and the command
-.nf
-    \fIcommand \fB\-\-upstream-version\fI version filename\fR
-.fi
-is executed, using either the original file or the symlink name.  A
-common such command would be \fBuupdate\fR(1).  (Note that the calling
-syntax was slightly different when using \fIwatch\fR file without a
-`\fBversion=\fR...' line; there the command executed was `\fIcommand filename
-version\fR'.)  If the command is \fBuupdate\fR, then the
-\fB\-\-no\-symlink\fR option is given to \fBuupdate\fR as a first
-option, since any requested symlinking will already be done by
-\fBuscan\fR.
-.PP
-The alternative version of the \fIwatch\fR file syntax for HTTP URLs is as
-follows.  The first field is a homepage which should be downloaded and
-then searched for hrefs matching the pattern given in the second
-field.  (Again, this pattern will be anchored at the beginning and the
-end, so it must match the whole href.  If you want to match just the
-basename of the href, you can use a pattern like
-".*/name-(.+)\e.tar\e.gz" if you know that there is a full URL, or
-better still: "(?:.*/)?name-(.+)\e.tar\e.gz" if there may or may not
-be.  Note the use of (?:...) to avoid making a backreference.)  If any
-of the hrefs in the homepage which match the (anchored) pattern are
-relative URLs, they will be taken as being relative to the base URL of
-the homepage (i.e., with everything after the trailing slash removed),
-or relative to the base URL specified in the homepage itself with a
-<base href="..."> tag.  The third and fourth fields are the version
-number and action fields as before.
-.SH "PER-SITE OPTIONS"
-A \fIwatch\fR file line may be prefixed with `\fBopts=\fIoptions\fR', where
-\fIoptions\fR is a comma-separated list of options.  The whole
-\fIoptions\fR string may be enclosed in double quotes, which is
-necessary if \fIoptions\fR contains any spaces.  The recognised
-options are as follows:
-.TP
-\fBactive\fR and \fBpassive\fR (or \fBpasv\fR)
-If used on an FTP line, these override the choice of whether to use
-PASV mode or not, and force the use of the specified mode for this
-site.
-.TP
-\fBuversionmangle=\fIrules\fR
-This is used to mangle the upstream version number as matched by the
-ftp://... or http:// rules as follows.  First, the \fIrules\fR string
-is split into multiple rules at every `\fB;\fR'.  Then the upstream version
-number is mangled by applying \fIrule\fR to the version, in a similar
-way to executing the Perl command:
-.nf
-    $version =~ \fIrule\fR;
-.fi
-for each rule.  Thus, suitable rules might be `\fBs/^/0./\fR' to prepend
-`\fB0.\fR' to the version number and `\fBs/_/./g\fR' to change underscores into
-periods.  Note that the \fIrule\fR string may not contain commas;
-this should not be a problem.
-
-\fIrule\fR may only use the '\fBs\fR', '\fBtr\fR' and '\fBy\fR' operations.  When the '\fBs\fR'
-operation is used, only the '\fBg\fR', '\fBi\fR' and '\fBx\fR' flags are available and
-\fIrule\fR may not contain any expressions which have the potential to
-execute code (i.e. the (?{}) and (??{}) constructs are not supported).
-
-If the '\fBs\fR' operation is used, the replacement can contain
-backreferences to expressions within parenthesis in the matching regexp,
-like `\fBs/-alpha(\ed*)/.a$1/\fR'. These backreferences must use the
-`\fB$1\fR' syntax, as the `\fB\e1\fR' syntax is not supported.
-.TP
-\fBdversionmangle=\fIrules\fR
-This is used to mangle the Debian version number of the currently
-installed package in the same way as the \fBuversionmangle\fR option.
-Thus, a suitable rule might be `\fBs/\e+dfsg\ed*$//\fR' to remove a
-`\fB+dfsg1\fR' suffix from the Debian version number, or to handle `\fB.pre6\fR'
-type version numbers.  Again, the \fIrules\fR string may not contain
-commas; this should not be a problem.
-.TP
-\fBversionmangle=\fIrules\fR
-This is a syntactic shorthand for
-\fBuversionmangle=\fIrules\fB,dversionmangle=\fIrules\fR, applying the
-same rules to both the upstream and Debian version numbers.
-.TP
-\fBfilenamemangle=\fIrules\fR
-This is used to mangle the filename with which the downloaded file
-will be saved, and is parsed in the same way as the
-\fBuversionmangle\fR option.  Examples of its use are given in the
-examples section above.
-.TP
-\fBdownloadurlmangle=\fIrules\fR
-This is used to mangle the URL to be used for the download.  The URL
-is first computed based on the homepage downloaded and the pattern
-matched, then the version number is determined from this URL.
-Finally, any rules given by this option are applied before the actual
-download attempt is made. An example of its use is given in the
-examples section above.
-.TP
-\fBpgpsigurlmangle=\fIrules\fR
-If present, the supplied rules will be applied to the downloaded URL
-(after any downloadurlmangle rules, if present) to craft a new URL
-that will be used to fetch the detached OpenPGP signature file for the
-upstream tarball.  Some common rules might be `\fBs/$/.asc/\fR' or
-`\fBs/$/.pgp/\fR' or `\fBs/$/.gpg/\fR'.  This signature must be made
-by a key found in the keyring \fBdebian/upstream/signing-key.pgp\fR or
-the armored keyring \fBdebian/upstream/signing-key.asc\fR.  If it is not
-valid, or not made by one of the listed keys, uscan will report an
-error.
-.TP
-\fBrepacksuffix=\fIsuffix\fR
-If the upstream sources are modified because \fIdebian/copyright\fR contains
-the \fBFiles-Excluded\fR field, \fIsuffix\fR will be appended to the upstream
-version of the repacked tar archive.  Common suffixes might be \fB+dfsg1\fR to
-indicate the removal of files that are not DFSG-compliant or \fB+ds1\fR for
-other reasons such as removal of prebuilt files or large embedded code copies.
-.SH "Directory name checking"
-Similarly to several other scripts in the \fBdevscripts\fR package,
-\fBuscan\fR explores the requested directory trees looking for
-\fIdebian/changelog\fR and \fIdebian/watch\fR files.  As a safeguard
-against stray files causing potential problems, and in order to
-promote efficiency, it will examine the name of the parent directory
-once it finds the \fIdebian/changelog\fR file, and check that the
-directory name corresponds to the package name.  It will only attempt
-to download newer versions of the package and then perform any
-requested action if the directory name matches the package name.
-Precisely how it does this is controlled by two configuration file
-variables \fBDEVSCRIPTS_CHECK_DIRNAME_LEVEL\fR and
-\fBDEVSCRIPTS_CHECK_DIRNAME_REGEX\fR, and their corresponding command-line
-options \fB\-\-check-dirname-level\fR and
-\fB\-\-check-dirname-regex\fR.
-.PP
-\fBDEVSCRIPTS_CHECK_DIRNAME_LEVEL\fR can take the following values:
-.TP
-.B 0
-Never check the directory name.
-.TP
-.B 1
-Only check the directory name if we have had to change directory in
-our search for \fIdebian/changelog\fR, that is, the directory
-containing \fIdebian/changelog\fR is not the directory from which
-\fBuscan\fR was invoked.  This is the default behaviour.
-.TP
-.B 2
-Always check the directory name.
-.PP
-The directory name is checked by testing whether the current directory
-name (as determined by \fBpwd\fR(1)) matches the regex given by the
-configuration file option \fBDEVSCRIPTS_CHECK_DIRNAME_REGEX\fR or by the
-command line option \fB\-\-check-dirname-regex\fR \fIregex\fR.  Here
-\fIregex\fR is a Perl regex (see \fBperlre\fR(3perl)), which will be
-anchored at the beginning and the end.  If \fIregex\fR contains a '/',
-then it must match the full directory path.  If not, then it must
-match the full directory name.  If \fIregex\fR contains the string
-\'PACKAGE', this will be replaced by the source package name, as
-determined from the \fIchangelog\fR.  The default value for the regex is:
-\'PACKAGE(-.+)?', thus matching directory names such as PACKAGE and
-PACKAGE-version.
-.SH EXAMPLE
-This script will perform a fully automatic upstream update.
-
-.nf
-#!/bin/sh \-e
-# called with '\-\-upstream-version' <version> <file>
-uupdate "$@"
-package=`dpkg\-parsechangelog | sed \-n 's/^Source: //p'`
-cd ../$package-$2
-debuild
-.fi
-
-Note that we don't call \fBdupload\fR or \fBdput\fR automatically, as
-the maintainer should perform sanity checks on the software before
-uploading it to Debian.
-.SH OPTIONS
-.TP
-.B \-\-report\fP, \fB\-\-no\-download
-Only report about available newer versions but do not download anything.
-.TP
-.B \-\-report\-status
-Report on the status of all packages, even those which are up-to-date,
-but do not download anything.
-.TP
-.B \-\-download
-Report and download.  (This is the default behaviour.)
-.TP
-.B \-\-destdir
-Path of directory to which to download.  If the specified path is not
-absolute, it will be relative to one of the current directory or, if directory
-scanning is enabled, the package's source directory.
-.TP
-.B \-\-force-download
-Download upstream even if up to date (will not overwrite local files, however)
-.TP
-.B \-\-pasv
-Force PASV mode for FTP connections.
-.TP
-.B \-\-no\-pasv
-Do not use PASV mode for FTP connections.
-.TP
-\fB\-\-timeout\fR \fIN\fR
-Set timeout to N seconds (default 20 seconds).
-.TP
-.B \-\-no\-symlink
-Do not call \fBmk\-origtargz\fR.
-.P
-The following options are passed to \fBmk\-origtargz\fR:
-.RS
-.TP
-.B \-\-symlink
-Make \fIorig.tar.gz\fR (with the appropriate extension) symlinks to the
-downloaded files.
-(This is the default behaviour.)
-.TP
-.B \-\-copy
-Instead of symlinking as described above, copy the downloaded files.
-.TP
-.B \-\-rename
-Instead of symlinking as described above, rename the downloaded files.
-.TP
-.B \-\-repack
-After having downloaded an lzma tar, xz tar, bzip tar or zip archive,
-repack it to a gzip tar archive, if required.
-The \fBunzip\fR package must be installed in order to repack .zip archives, the
-\fBxz-utils\fR package must be installed to repack lzma or xz tar archives.
-.TP
-\fB\-\-compression\fR [ \fBgzip\fR | \fBbzip2\fR | \fBlzma\fR | \fBxz\fR ]
-In the case where the upstream sources are repacked (either because
-\fB\-\-repack\fR option is given or \fIdebian/copyright\fR contains the
-field \fBFiles-Excluded\fR), it is possible to control the compression
-method via the parameter (defaults to \fBgzip\fR).
-.TP
-.B \-\-copyright\-file \fIcopyright-file\fR
-Exclude files mentioned in \fBFiles-Excluded\fR in the given copyright file.
-This is useful when running uscan not within a source package directory.
-.RE
-.TP
-.B \-\-dehs
-Use an XML format for output, as required by the DEHS system.
-.TP
-.B \-\-no-dehs
-Use the traditional uscan output format.  (This is the default behaviour.)
-.TP
-\fB\-\-package\fR \fIpackage\fR
-Specify the name of the package to check for rather than examining
-\fIdebian/changelog\fR; this requires the \fB\-\-upstream-version\fR
-(unless a version is specified in the \fIwatch\fR file)
-and \fB\-\-watchfile\fR options as well.  Furthermore, no directory
-scanning will be done and nothing will be downloaded.  This option is
-probably most useful in conjunction with the DEHS system (and
-\fB\-\-dehs\fR).
-.TP
-\fB\-\-upstream-version\fR \fIupstream-version\fR
-Specify the current upstream version rather than examine the \fIwatch\fR file
-or \fIchangelog\fR to determine it.  This is ignored if a directory scan is
-being performed and more than one \fIwatch\fR file is found.
-.TP
-\fB\-\-watchfile\fR \fIwatchfile\fR
-Specify the \fIwatchfile\fR rather than perform a directory scan to
-determine it.  If this option is used without \fB\-\-package\fR, then
-\fBuscan\fR must be called from within the Debian package source tree
-(so that \fIdebian/changelog\fR can be found simply by stepping up
-through the tree).
-.TP
-\fB\-\-download\-version\fR \fIversion\fR
-Specify the version which the upstream release must match in order to be
-considered, rather than using the release with the highest version.
-.TP
-\fB\-\-download\-current\-version\fR
-Download the currently packaged version
-.TP
-.B \-\-verbose
-Give verbose output.
-.TP
-.B \-\-no\-verbose
-Don't give verbose output.  (This is the default behaviour.)
-.TP
-.B \-\-no\-exclusion
-Do not automatically exclude files mentioned in
-\fIdebian/copyright\fR field \fBFiles-Excluded\fR
-.TP
-.B \-\-debug
-Dump the downloaded web pages to stdout for debugging your watch file.
-.TP
-\fB\-\-check-dirname-level\fR \fIN\fR
-See the above section \fBDirectory name checking\fR for an explanation of
-this option.
-.TP
-\fB\-\-check-dirname-regex\fR \fIregex\fR
-See the above section \fBDirectory name checking\fR for an explanation of
-this option.
-.TP
-\fB\-\-user-agent\fR, \fB\-\-useragent\fR
-Override the default user agent header.
-.TP
-\fB\-\-no-conf\fR, \fB\-\-noconf\fR
-Do not read any configuration files.  This can only be used as the
-first option given on the command-line.
-.TP
-.B \-\-help
-Give brief usage information.
-.TP
-.B \-\-version
-Display version information.
-.SH "CONFIGURATION VARIABLES"
-The two configuration files \fI/etc/devscripts.conf\fR and
-\fI~/.devscripts\fR are sourced by a shell in that order to set
-configuration variables.  These may be overridden by command line
-options.  Environment variable settings are ignored for this purpose.
-If the first command line option given is \fB\-\-noconf\fR, then these
-files will not be read.  The currently recognised variables are:
-.TP
-.B USCAN_DOWNLOAD
-If this is set to \fIno\fR, then newer upstream files will not be
-downloaded; this is equivalent to the \fB\-\-report\fR or
-\fB\-\-no\-download\fR options.
-.TP
-.B USCAN_PASV
-If this is set to \fIyes\fR or \fIno\fR, this will force FTP
-connections to use PASV mode or not to, respectively.  If this is set
-to \fIdefault\fR, then \fBNet::FTP\fR(3) makes the choice (primarily based on
-the \fBFTP_PASSIVE\fR environment variable).
-.TP
-.B USCAN_TIMEOUT
-If set to a number \fIN\fR, then set the timeout to \fIN\fR seconds.
-This is equivalent to the \fB\-\-timeout\fR option.
-.TP
-.B USCAN_SYMLINK
-If this is set to \fIno\fR, then a pkg_version.orig.tar.{gz|bz2|lzma|xz}
-symlink will not be made (equivalent to the \fB\-\-no\-symlink\fR
-option).  If it is set to \fIyes\fR or \fIsymlink\fR, then the
-symlinks will be made.  If it is set to \fIrename\fR, then the files
-are renamed (equivalent to the \fB\-\-rename\fR option).
-.TP
-.B USCAN_DEHS_OUTPUT
-If this is set to \fIyes\fR, then DEHS-style output will be used.
-This is equivalent to the \fB\-\-dehs\fR option.
-.TP
-.B USCAN_VERBOSE
-If this is set to \fIyes\fR, then verbose output will be given.  This
-is equivalent to the \fB\-\-verbose\fR option.
-.TP
-.B USCAN_USER_AGENT
-If set, the specified user agent string will be used in place of the
-default.  This is equivalent to the \fB\-\-user-agent\fR option.
-.TP
-.B USCAN_DESTDIR
-If set, the downloaded files will be placed in this directory.  This is
-equivalent to the \fB\-\-destdir\fR option.
-.TP
-.B USCAN_REPACK
-If this is set to \fIyes\fR, then after having downloaded a bzip tar,
-lzma tar, xz tar, or zip archive, \fBuscan\fR will repack it to a gzip tar.
-This is equivalent to the \fB\-\-repack\fR option.
-.TP
-.B USCAN_EXCLUSION
-If this is set to \fIno\fR, files mentioned in the field \fBFiles-Excluded\fR
-of \fIdebian/copyright\fR will be ignored and no exclusion of files will be
-tried.  This is equivalent to the \fB\-\-no-exclusion\fR option.
-.SH "EXIT STATUS"
-The exit status gives some indication of whether a newer version was
-found or not; one is advised to read the output to determine exactly
-what happened and whether there were any warnings to be noted.
-.TP
-0
-Either \fB\-\-help\fR or \fB\-\-version\fR was used, or for some
-\fIwatch\fR file which was examined, a newer upstream version was located.
-.TP
-1
-No newer upstream versions were located for any of the \fIwatch\fR files
-examined.
-.SH "HISTORY AND UPGRADING"
-This section briefly describes the backwards-incompatible \fIwatch\fR file
-features which have been added in each \fIwatch\fR file version, and the
-first version of the \fBdevscripts\fR package which understood them.
-.TP
-.I Pre-version 2
-The \fIwatch\fR file syntax was significantly different in those days.  Don't
-use it.  If you are upgrading from a pre-version 2 \fIwatch\fR file, you are
-advised to read this manpage and to start from scratch.
-.TP
-.I Version 2
-devscripts version 2.6.90: The first incarnation of the current style
-of \fIwatch\fR files.
-.TP
-.I Version 3
-devscripts version 2.8.12: Introduced the following: correct handling
-of regex special characters in the path part, directory/path pattern
-matching, version number in several parts, version number mangling.
-Later versions have also introduced URL mangling.
-
-If you are upgrading from version 2, the key incompatibility is if you
-have multiple groups in the pattern part; whereas only the first one
-would be used in version 2, they will all be used in version 3.  To
-avoid this behaviour, change the non-version-number groups to be
-(?:...) instead of a plain (...) group.
-.SH "SEE ALSO"
-.BR dpkg (1),
-.BR mk\-origtargz (1),
-.BR perlre (1),
-.BR uupdate (1),
-.BR devscripts.conf (5)
-.SH AUTHOR
-The original version of \fBuscan\fR was written by Christoph Lameter
-<clame...@debian.org>.  Significant improvements, changes and bugfixes
-were made by Julian Gilbey <j...@debian.org>.  HTTP support was added
-by Piotr Roszatycki <dex...@debian.org>.  The program was rewritten
-in Perl by Julian Gilbey.
-- 
2.1.4

From 3b8cb94c33530173630d8a77922fbb56ff9a8476 Mon Sep 17 00:00:00 2001
From: Osamu Aoki <os...@debian.org>
Date: Wed, 2 Sep 2015 21:17:08 +0900
Subject: [PATCH 5/8] test_uscan: URL patterns

---
 test/test_uscan | 152 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 152 insertions(+)

diff --git a/test/test_uscan b/test/test_uscan
index 8637b6e..24c5202 100755
--- a/test/test_uscan
+++ b/test/test_uscan
@@ -328,4 +328,156 @@ END
 
 }
 
+# test a watch file HTML rules
+
+testWatchFile() {
+# setup test environment
+    TMPDIR=$(mktemp -d)
+    cd $TMPDIR
+    echo " * WORK DIRECTORY: `pwd`"
+    PKG=foo
+# start HTTP server with its root at $TMPDIR/repo
+    mkdir -p repo
+    spawnHttpServer
+    PORT=$(cat $TMPDIR/repo/port)
+    echo " * WEBSITE:         http://localhost:$PORT";
+    UTARBALL=${PKG}-2.0.tar.gz
+    STARBALL=${PKG}_2.0.orig.tar.gz
+
+# repo has pid and port files (create a dummy upstream tarball)
+    tar -czf keep.tar.gz repo/
+
+#############################################################################
+# create minimum repository for $PKG to start uscan except for debian/watch
+    mkdir -p $PKG/debian/source
+
+# native package
+    cat <<END > $PKG/debian/changelog
+$PKG (1.0) unstable; urgency=low
+
+  * Initial release
+
+ -- Joe Developer <j...@debian.org>  Mon, 02 Nov 2013 22:21:31 -0100
+END
+
+    cat <<END > $PKG/debian/source/format
+3.0 (native)
+END
+    cat <<'END' > $PKG/debian/copyright
+Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
+END
+    echo "=== 1 string without index.html, directory scan (native)"
+    mkdir -p $TMPDIR/repo/123/foo/ooo/
+    cp keep.tar.gz repo/123/foo/ooo/$UTARBALL
+    cat <<END > $PKG/debian/watch
+version=3
+http://localhost:$PORT/(\d+)/(.+)/(.+)/$PKG-([\.\d]+).tar.gz
+END
+    (cd $TMPDIR/$PKG ; $COMMAND )
+    WATCH=`grep -e "http:" $PKG/debian/watch`
+    assertTrue "$UTARBALL missing: $WATCH" "[ -f $UTARBALL ]"
+    assertTrue "$STARBALL missing: $WATCH" "[ -f $STARBALL ]"
+    rm $UTARBALL $STARBALL
+
+#############################################################################
+    echo "=== 2 strings without index.html, directory scan (native)"
+    cat <<END > $PKG/debian/watch
+version=3
+http://localhost:$PORT/(\d+)/(.+)/(.+)/ \\
+$PKG-([\d\.]+).tar.gz
+END
+    (cd $TMPDIR/$PKG ; $COMMAND )
+    WATCH=`grep -e "http:" $PKG/debian/watch`
+    assertTrue "$UTARBALL missing: $WATCH" "[ -f $UTARBALL ]"
+    assertTrue "$STARBALL missing: $WATCH" "[ -f $STARBALL ]"
+    rm $UTARBALL $STARBALL
+
+#############################################################################
+    echo "=== 2 strings via a web page (native)"
+    cat <<END > $PKG/debian/watch
+version=3
+http://localhost:$PORT \\
+(?:.*)/$PKG-([\d\.]+).tar.gz
+END
+    cat <<END > repo/index.html
+<html>
+<head>
+  <meta charset="utf-8">
+</head>
+<body>
+<a href="/123/foo/ooo/$PKG-0.0.tar.gz">Very old</a> <br/ >
+<a href="/123/foo/ooo/$PKG-1.0.tar.gz">A bit OLD</a> <br />
+<a href="/123/foo/ooo/$PKG-2.0.tar.gz">Latest</a> <br />
+</body>
+<html>
+END
+    (cd $TMPDIR/$PKG ; $COMMAND )
+    WATCH=`grep -A2 -e "http:" $PKG/debian/watch`
+    assertTrue "$UTARBALL missing: $WATCH" "[ -f $UTARBALL ]"
+    assertTrue "$STARBALL missing: $WATCH" "[ -f $STARBALL ]"
+    rm -f $UTARBALL $STARBALL
+
+#############################################################################
+    echo "=== 2 strings with / via a web page (native)"
+    cat <<END > $PKG/debian/watch
+version=3
+http://localhost:$PORT/ \\
+(?:.*)/$PKG-([\d\.]+).tar.gz \\
+debian uupdate
+END
+# having uupdate for native package is stupid but it safely accept it and do nothing bad
+    cat <<END > repo/index.html
+<html>
+<head>
+  <meta charset="utf-8">
+</head>
+<body>
+<a href="/123/foo/ooo/$PKG-0.0.tar.gz">Very old</a> <br/ >
+<a href="/123/foo/ooo/$PKG-1.0.tar.gz">A bit OLD</a> <br />
+<a href="/123/foo/ooo/$PKG-2.0.tar.gz">Latest</a> <br />
+</body>
+<html>
+END
+    (cd $TMPDIR/$PKG ; $COMMAND )
+    WATCH=`grep -A2 -e "http:" $PKG/debian/watch`
+    assertTrue "$UTARBALL missing: $WATCH" "[ -f $UTARBALL ]"
+    assertTrue "$STARBALL missing: $WATCH" "[ -f $STARBALL ]"
+    rm -f $UTARBALL $STARBALL
+#############################################################################
+    echo "=== non-native package uupdate"
+    cat <<END > $PKG/debian/changelog
+$PKG (1.0-1) unstable; urgency=low
+
+  * Initial release
+
+ -- Joe Developer <j...@debian.org>  Mon, 02 Nov 2013 22:21:31 -0100
+END
+    cat <<END > $PKG/debian/rules
+%:
+	dh $@
+END
+    cat <<END > $PKG/debian/source/format
+3.0 (quilt)
+END
+    cat <<END > $PKG/debian/watch
+version=3
+http://localhost:$PORT \\
+(?:.*)/$PKG-(\d+)\.(\d+)\.tar\.gz \\
+debian uupdate
+END
+
+    DTARBALL=${PKG}_1.0-1.debian.tar.xz
+    ( cd $TMPDIR/$PKG ; tar -cJf $TMPDIR/$DTARBALL debian )
+    ( cd $TMPDIR/$PKG ; $COMMAND )
+    STREE="$PKG-2.0/debian/changelog"
+    WATCH=`grep -A3 -e "http:" $PKG/debian/watch`
+    assertTrue "$STREE missing: $WATCH" "[ -f $STREE ]"
+    rm -f $UTARBALL $STARBALL $DTARBALL
+    rm -rf $PKG-2.0 $PKG-2.0.orig
+
+    cd - >/dev/null
+
+    cleanup
+}
+
 . shunit2
-- 
2.1.4

From 9bdf05d0f2774a4284b9ddb484f1a680751caafa Mon Sep 17 00:00:00 2001
From: Osamu Aoki <os...@debian.org>
Date: Thu, 27 Aug 2015 22:35:38 +0900
Subject: [PATCH 6/8] pagemangle rule

 * generic way to mangle the whole web page.
 * s3.amazonaws.com special case code is marked deprecated.
 * address needs for fullsourcemangle.  Closes: #395439
 * text in <a>...</a> is a special case.  Closes: #705989
 * s/data-realurl/href/g is a special case.  Closes: #773390
---
 scripts/uscan.pl | 22 ++++++++++++++++++-
 test/test_uscan  | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/scripts/uscan.pl b/scripts/uscan.pl
index c5503e1..8b9bc74 100644
--- a/scripts/uscan.pl
+++ b/scripts/uscan.pl
@@ -1742,6 +1742,9 @@ sub process_watchline ($$$$$$)
 		    @{$options{'uversionmangle'}} = split /;/, $1;
 		    @{$options{'dversionmangle'}} = split /;/, $1;
 		}
+		elsif ($opt =~ /^pagemangle\s*=\s*(.+)/) {
+		    @{$options{'pagemangle'}} = split /;/, $1;
+		}
 		elsif ($opt =~ /^filenamemangle\s*=\s*(.+)/) {
 		    @{$options{'filenamemangle'}} = split /;/, $1;
 		}
@@ -1917,10 +1920,25 @@ sub process_watchline ($$$$$$)
 	print STDERR "$progname debug: received content:\n$content\[End of received content]\n"
 	    if $debug;
 
+	print STDERR "PRE-pagemangle \$content\n$content\n\n" if $debug;
+	# pagenmangle: should not abuse this slow operation
+	foreach my $pat (@{$options{'pagemangle'}}) {
+	    print STDERR "opts=\"pagemangle=$pat\"\n" if $debug;
+	    if (! safe_replace(\$content, $pat)) {
+		uscan_warn "$progname: In $watchfile, potentially"
+		  . " unsafe or malformed pagemangle"
+		  . " pattern:\n  '$pat'"
+		  . " found. Skipping watchline\n"
+		  . "  $line\n";
+		return 1;
+	    }
+	}
 	if ($content =~ m%^<[?]xml%i &&
-	    $content =~ m%xmlns="http://s3.amazonaws.com/doc/2006-03-01/"%) {
+	    $content =~ m%xmlns="http://s3.amazonaws.com/doc/2006-03-01/"% &&
+	    $content !~ m%<Key><a\s+href%) {
 	    # this is an S3 bucket listing.  Insert an 'a href' tag
 	    # into the content for each 'Key', so that it looks like html (LP: #798293)
+	    uscan_warn "*** Amazon special case code is deprecated***\nUse opts=pagemangle rule, instead\n";
 	    print STDERR "$progname debug: fixing s3 listing\n" if $debug;
 	    $content =~ s%<Key>([^<]*)</Key>%<Key><a href="$1">$1</a></Key>%g
 	}
@@ -1941,6 +1959,8 @@ sub process_watchline ($$$$$$)
 	    ($urlbase = $base) =~ s%/[^/]*$%/%;
 	}
 
+	print STDERR "POST-pagemangle \$content\n$content\n\n" if $debug;
+
 	print STDERR "$progname debug: matching pattern(s) @patterns\n" if $debug;
 	my @hrefs;
 	while ($content =~ m/<\s*a\s+[^>]*href\s*=\s*([\"\'])(.*?)\1/sgi) {
diff --git a/test/test_uscan b/test/test_uscan
index 24c5202..1c318a9 100755
--- a/test/test_uscan
+++ b/test/test_uscan
@@ -435,6 +435,59 @@ END
 <a href="/123/foo/ooo/$PKG-0.0.tar.gz">Very old</a> <br/ >
 <a href="/123/foo/ooo/$PKG-1.0.tar.gz">A bit OLD</a> <br />
 <a href="/123/foo/ooo/$PKG-2.0.tar.gz">Latest</a> <br />
+<<<<<<< HEAD
+=======
+</body>
+<html>
+END
+    (cd $TMPDIR/$PKG ; $COMMAND )
+    WATCH=`grep -A2 -e "http:" $PKG/debian/watch`
+    assertTrue "$UTARBALL missing: $WATCH" "[ -f $UTARBALL ]"
+    assertTrue "$STARBALL missing: $WATCH" "[ -f $STARBALL ]"
+    rm -f $UTARBALL $STARBALL
+#############################################################################
+    echo "=== 2 strings with / via a web page after pagemangle bogus (native)"
+    cat <<END > $PKG/debian/watch
+version=3
+opts="pagemangle=s/bogus/href/g" \\
+http://localhost:$PORT/ \\
+(?:.*)/$PKG-([\d\.]+).tar.gz
+END
+    cat <<END > repo/index.html
+<html>
+<head>
+  <meta charset="utf-8">
+</head>
+<body>
+<a bogus="/123/foo/ooo/$PKG-0.0.tar.gz">Very old</a> <br/ >
+<a bogus="/123/foo/ooo/$PKG-1.0.tar.gz">A bit OLD</a> <br />
+<a bogus="/123/foo/ooo/$PKG-2.0.tar.gz">Latest</a> <br />
+</body>
+<html>
+END
+    (cd $TMPDIR/$PKG ; $COMMAND )
+    WATCH=`grep -A2 -e "http:" $PKG/debian/watch`
+    assertTrue "$UTARBALL missing: $WATCH" "[ -f $UTARBALL ]"
+    assertTrue "$STARBALL missing: $WATCH" "[ -f $STARBALL ]"
+    rm -f $UTARBALL $STARBALL
+#############################################################################
+    echo "=== 2 strings with / via a web page after pagemangle <key> (native)"
+    cat <<END > $PKG/debian/watch
+version=3
+opts="pagemangle=s%<Key>([^<]*)</Key>%<Key><a href="\$1">\$1</a></Key>%g" \\
+http://localhost:$PORT/ \\
+(?:.*)/$PKG-([\d\.]+).tar.gz
+END
+    cat <<END > repo/index.html
+<html>
+<head>
+  <meta charset="utf-8">
+</head>
+<body>
+<Key>/123/foo/ooo/$PKG-0.0.tar.gz</Key> <br/ >
+<Key>/123/foo/ooo/$PKG-1.0.tar.gz</Key> <br />
+<Key>/123/foo/ooo/$PKG-2.0.tar.gz</Key> <br />
+>>>>>>> 43e73bc... test uscan for pagemangle
 </body>
 <html>
 END
@@ -465,7 +518,18 @@ http://localhost:$PORT \\
 (?:.*)/$PKG-(\d+)\.(\d+)\.tar\.gz \\
 debian uupdate
 END
-
+    cat <<END > repo/index.html
+<html>
+<head>
+  <meta charset="utf-8">
+</head>
+<body>
+<a href="/123/foo/ooo/$PKG-0.0.tar.gz">Very old</a> <br/ >
+<a href="/123/foo/ooo/$PKG-1.0.tar.gz">A bit OLD</a> <br />
+<a href="/123/foo/ooo/$PKG-2.0.tar.gz">Latest</a> <br />
+</body>
+<html>
+END
     DTARBALL=${PKG}_1.0-1.debian.tar.xz
     ( cd $TMPDIR/$PKG ; tar -cJf $TMPDIR/$DTARBALL debian )
     ( cd $TMPDIR/$PKG ; $COMMAND )
-- 
2.1.4

From 5e3bf0a908f712dada255f1a0773dfb2aaafc4af Mon Sep 17 00:00:00 2001
From: Osamu Aoki <os...@debian.org>
Date: Thu, 3 Sep 2015 12:09:18 +0000
Subject: [PATCH 7/8] POD updates for pagemangle

---
 scripts/uscan.pl | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/scripts/uscan.pl b/scripts/uscan.pl
index 8b9bc74..c73fd6c 100644
--- a/scripts/uscan.pl
+++ b/scripts/uscan.pl
@@ -137,6 +137,8 @@ They behave as if a Perl command "I<$target_string> B<~=> I<rule>" is executed.
 
 =item * B<dversionmangle=>I<rule> for the last upstream version string found in F<debian/changelog>
 
+=item * B<pagemangle=>I<rule> for the downloaded web page string
+
 =item * B<uversionmangle=>I<rule> for the candidate upstream version strings
 
 =item * B<versionmangle=>I<rules> as a syntactic shorthand for:
@@ -213,6 +215,11 @@ For example, this B<http://>I<URL> may be specified as:
 
 Please note the trailing B</> in the above.
 
+If the B<pagemangle> rule exists, the whole downloaded web page as a string is
+updated by applying this rule to it.  This is very powerful tool and needs to
+be used with care.  If other mangling rules can be used to address your
+objective, do not use this rule. 
+
 The downloaded web page is scanned for links defined in the B<< <a href=" >>
 I<...> B<< "> >> tag to locate the candidate upstream tarball URLs.  These
 candidate upstream tarball URLs are matched by the Perl regex pattern
@@ -569,6 +576,28 @@ some way into one which will work automatically, for example:
   http://developer.berlios.de/project/showfiles.php?group_id=2051 \
   http://prdownload.berlios.de/softdevice/vdr-softdevice-(.+).tgz
 
+=head2 HTTP site (pagemangle)
+
+The option B<pagemangle> can be used to mangle the downloaded web page before
+applying other rules.  The non-standard web page without proper B<< <a href="
+>> << ... >> B<< "> >> entries can be converted.  For example, if F<foo.html>
+uses B<< <a bogus=" >> I<< ... >> B<< "> >>, this can be converted to the
+standard page format with:
+
+  opts=pagemangle="s/<a\s+bogus=/<a href=/g" \
+  http://example.com/release/foo.html \
+  files/foo-([\d\.]*).tar.gz
+
+Please note the use of B<g> here to replace all occurrences.
+
+If F<foo.html> uses B<< <Key> >> I<< ... >> B<< </Key> >>, this can be converted to the
+standard page format with:
+
+  opts="pagemangle=s%<Key>([^<]*)</Key>%<Key><a href="$1">$1</a></Key>%g" \\
+  http://localhost:$PORT/ \
+  (?:.*)/$PKG-([\d\.]+).tar.gz
+END
+
 =head2 FTP site (basic):
 
   opts=pasv \
-- 
2.1.4

From 1dcfa2f96dae30b57c28082fab7d3cde086e1122 Mon Sep 17 00:00:00 2001
From: Osamu Aoki <os...@debian.org>
Date: Thu, 3 Sep 2015 16:43:46 +0000
Subject: [PATCH 8/8] improve debug output of uscan.pl

 * print mangled data under debug mode. Closes: #350454
---
 scripts/uscan.pl | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/scripts/uscan.pl b/scripts/uscan.pl
index c73fd6c..f1d4413 100644
--- a/scripts/uscan.pl
+++ b/scripts/uscan.pl
@@ -1888,6 +1888,7 @@ sub process_watchline ($$$$$$)
 	    return 1;
 	}
     }
+    print STDERR "$progname debug: dversionmangled last version: $mangled_lastversion\n" if $debug;
     if($opt_download_current_version) {
 	$download_version = $mangled_lastversion;
 	$force_download = 1;
@@ -1946,10 +1947,9 @@ sub process_watchline ($$$$$$)
 	}
 
 	my $content = $response->content;
-	print STDERR "$progname debug: received content:\n$content\[End of received content]\n"
+	print STDERR "$progname debug: received content:\n$content\n[End of received content]\n"
 	    if $debug;
 
-	print STDERR "PRE-pagemangle \$content\n$content\n\n" if $debug;
 	# pagenmangle: should not abuse this slow operation
 	foreach my $pat (@{$options{'pagemangle'}}) {
 	    print STDERR "opts=\"pagemangle=$pat\"\n" if $debug;
@@ -1988,7 +1988,8 @@ sub process_watchline ($$$$$$)
 	    ($urlbase = $base) =~ s%/[^/]*$%/%;
 	}
 
-	print STDERR "POST-pagemangle \$content\n$content\n\n" if $debug;
+	print STDERR "$progname debug: pagemangled content:\n$content\n[End of pagemangled content]\n"
+	    if $debug;
 
 	print STDERR "$progname debug: matching pattern(s) @patterns\n" if $debug;
 	my @hrefs;
@@ -2069,7 +2070,7 @@ sub process_watchline ($$$$$$)
 	}
 
 	my $content = $response->content;
-	print STDERR "$progname debug: received content:\n$content\[End of received content]\n"
+	print STDERR "$progname debug: received content:\n$content\n[End of received content]\n"
 	    if $debug;
 
 	# FTP directory listings either look like:
@@ -2164,6 +2165,8 @@ EOF
 	    return 1;
 	}
     }
+    print STDERR "$progname debug: new version $newversion\n" if $debug;
+    print STDERR "$progname debug: new filename $newfile\n" if $debug;
 
     my $newfile_base=basename($newfile);
     if (exists $options{'filenamemangle'}) {
@@ -2187,6 +2190,7 @@ EOF
 	    $newfile_base = "$pkg-$newversion.download";
 	}
     }
+    print STDERR "$progname debug: filenamemangled new filename $newfile_base\n" if $debug;
 
     # So what have we got to report now?
     my $upstream_url;
@@ -2269,6 +2273,7 @@ EOF
 	# FTP site
 	$upstream_url = "$base$newfile";
     }
+    print STDERR "$progname debug: downloadurlmangled upstream URL $upstream_url\n" if $debug;
 
     if (exists $options{'pgpsigurlmangle'}) {
 	$pgpsig_url = $upstream_url;
@@ -2283,6 +2288,7 @@ EOF
 	    }
 	}
     }
+    print STDERR "$progname debug: pgpsigurlmangled upstream URL $pgpsig_url\n" if $debug;
 
     $dehs_tags{'debian-uversion'} = $lastversion;
     $dehs_tags{'debian-mangled-uversion'} = $mangled_lastversion;
@@ -2589,7 +2595,7 @@ sub newest_dir ($$$$$) {
 	}
 
 	my $content = $response->content;
-	print STDERR "$progname debug: received content:\n$content\[End of received content\]\n"
+	print STDERR "$progname debug: received content:\n$content\n[End of received content\]\n"
 	    if $debug;
 	# We need this horrid stuff to handle href=foo type
 	# links.  OK, bad HTML, but we have to handle it nonetheless.
@@ -2648,7 +2654,7 @@ sub newest_dir ($$$$$) {
 	}
 
 	my $content = $response->content;
-	print STDERR "$progname debug: received content:\n$content\[End of received content]\n"
+	print STDERR "$progname debug: received content:\n$content\n[End of received content]\n"
 	    if $debug;
 
 	# FTP directory listings either look like:
-- 
2.1.4

Attachment: signature.asc
Description: Digital signature

Reply via email to