Package: debiandoc-sgml
Version: 1.2.27
Severity: important
Tags: patch
In the kernel-handbook source we have:
Check the <url
id="http://bugs.debian.org/cgi-bin/pkgreport.cgi?src=linux&src=linux-2.6"
name="current bug list">
And in the HTML output this becomes:
Check the <code><a
href="http://bugs.debian.org/cgi-bin/pkgreport.cgi?src=linux%5C%7C[amp%20]%5C%7Csrc=linux-2.6">current
bug list</a></code>
I'm attaching a fix. Please upload and ask for a freeze exception, as
this causes real breakage in debian-kernel-handbook.
Ben.
-- System Information:
Debian Release: wheezy/sid
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'proposed-updates'), (500,
'unstable'), (500, 'stable'), (1, 'experimental')
Architecture: i386 (x86_64)
Foreign Architectures: amd64
Kernel: Linux 3.2.0-3-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_GB.utf8, LC_CTYPE=en_GB.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages debiandoc-sgml depends on:
ii libhtml-parser-perl 3.69-2
ii libroman-perl 1.23-1
ii libtext-format-perl 0.56-1
ii perl 5.14.2-12
ii sgml-base 1.26+nmu3
ii sgml-data 2.0.8
ii sgmlspl 1.03ii-32
ii sp 1.3.4-1.2.1-47.1+b1
Versions of packages debiandoc-sgml recommends:
ii ghostscript 9.05~dfsg-6
ii texinfo 4.13a.dfsg.1-10
pn texlive <none>
pn texlive-latex-extra <none>
Versions of packages debiandoc-sgml suggests:
ii debiandoc-sgml-doc 1.1.22
pn latex-cjk-all <none>
pn texlive-lang-all <none>
-- no debconf information
>From 9b2a5f95132b499d85eb6c17b57cf8e3a7748ac6 Mon Sep 17 00:00:00 2001
From: Ben Hutchings <[email protected]>
Date: Thu, 16 Aug 2012 04:16:54 +0100
Subject: [PATCH] Fix mangling of '&' in URLs
SGML entities, e.g. '&' are converted on input to SDATA sequences
e.g. '\|[amp ]\|'. These need to be converted back to literal
characters or entities on output, depending on the format. Currently
we fail to do this because:
1. The driver normalizes URLs by squashing multiple spaces. Since the
spaces are significant in matching of the SDATA sequences, they are
not converted (by any back-end).
Change it to trim leading and trailing space only; URLs should not
normally contain any spaces anyway.
2. The HTML and XML back-ends further normalize URLs using the URL
class. This results in the SDATA sequences being URL-encoded, and so
they are not matched in the subsequent conversion to CDATA.
Swap the order of conversion so that URL-encoding is done last. This
is not theoretically correct: we should convert to literal text, then
URL-encode, then HTML/XML-encode. However we know that '&' and ';'
will not be URL-escaped and therefore the result should be the same.
---
debian/changelog | 7 +++++++
tools/lib/Format/Driver.pm | 2 +-
tools/lib/Format/HTML.pm | 10 +++++++---
tools/lib/Format/XML.pm | 10 +++++++---
4 files changed, 22 insertions(+), 7 deletions(-)
diff --git a/debian/changelog b/debian/changelog
index 2f58e28..db764f6 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,10 @@
+debiandoc-sgml (1.2.27+nmu1) UNRELEASED; urgency=low
+
+ * Non-maintainer upload.
+ * Fix handling of entities (e.g. &) in URLs.
+
+ -- Ben Hutchings <[email protected]> Thu, 16 Aug 2012 03:55:35 +0100
+
debiandoc-sgml (1.2.27) unstable; urgency=low
* Rebuild with debhelper sgml-base >=1.26+nmu2. Closes: #675474
diff --git a/tools/lib/Format/Driver.pm b/tools/lib/Format/Driver.pm
index 368711f..b707291 100644
--- a/tools/lib/Format/Driver.pm
+++ b/tools/lib/Format/Driver.pm
@@ -918,7 +918,7 @@ sub end_httppath
sub start_url
{
( $element, $event ) = @_;
- my $id = _normalize( _a( 'ID' ) );
+ my $id = _trim( _a( 'ID' ) );
my $name = _a( 'NAME' );
$name = "" if ( $name eq '\|\|' ) || ( $name eq '\|urlname\|' )
|| ( $name eq $id );
diff --git a/tools/lib/Format/HTML.pm b/tools/lib/Format/HTML.pm
index 590bd79..564b420 100644
--- a/tools/lib/Format/HTML.pm
+++ b/tools/lib/Format/HTML.pm
@@ -956,7 +956,7 @@ sub _output_httppath
}
sub _output_url
{
- my $url = URI->new( $_[0] );
+ my $url = URI->new( _to_cdata( $_[0] ) );
$_[1] = $_[0] if $_[1] eq "";
output( "<code><a href=\"$url\">" );
_cdata( $_[1] );
@@ -966,7 +966,7 @@ sub _output_url
## ----------------------------------------------------------------------
## data output subroutines
## ----------------------------------------------------------------------
-sub _cdata
+sub _to_cdata
{
( $_ ) = @_;
@@ -976,7 +976,11 @@ sub _cdata
# SDATA
s/\\\|(\[\w+\s*\])\\\|/$sdata{ $1 }/g;
- output( $_ );
+ return $_;
+}
+sub _cdata
+{
+ output( _to_cdata( $_[0] ) );
}
sub _sdata
{
diff --git a/tools/lib/Format/XML.pm b/tools/lib/Format/XML.pm
index 5e1b807..7d852ef 100644
--- a/tools/lib/Format/XML.pm
+++ b/tools/lib/Format/XML.pm
@@ -769,7 +769,7 @@ sub _output_httppath
}
sub _output_url
{
- my $url = URI->new( $_[0] );
+ my $url = URI->new( _to_cdata( $_[0] ) );
$_[1] = $_[0] if $_[1] eq "";
output( "<ulink url=\"$url\">" );
_cdata( $_[1] );
@@ -779,14 +779,18 @@ sub _output_url
## ----------------------------------------------------------------------
## data output subroutines
## ----------------------------------------------------------------------
-sub _cdata
+sub _to_cdata
{
( $_ ) = @_;
# SDATA
s/\\\|(\[\w+\s*\])\\\|/$sdata{ $1 }/g;
- output( $_ );
+ return $_;
+}
+sub _cdata
+{
+ output( _to_cdata( $_[0] ) );
}
sub _sdata
{