Hi,

Sometime last year I was surprised to see (not on a public list unfortunately)
that bookindex.html is 657kB, with > 200kB just being repetitions of
xmlns="http://www.w3.org/1999/xhtml"; xmlns:xlink="http://www.w3.org/1999/xlink";

Reminded of this, due to a proposal to automatically generate docs as part of
cfbot runs (which'd be fairly likely to update bookindex.html), I spent a few
painful hours last night trying to track this down.


The reason for the two xmlns= are different. The
xmlns="http://www.w3.org/1999/xhtml"; is afaict caused by confusion on our
part.

Some of our stylesheets use
xmlns="http://www.w3.org/TR/xhtml1/transitional";
others use
xmlns="http://www.w3.org/1999/xhtml";

It's noteworthy that the docbook xsl stylesheets end up with
<html xmlns="http://www.w3.org/1999/xhtml";>
so it's a bit pointless to reference http://www.w3.org/TR/xhtml1/transitional
afaict.

Adding xmlns="http://www.w3.org/1999/xhtml"; to stylesheet-html-common.xsl gets
rid of xmlns="http://www.w3.org/TR/xhtml1/transitional"; in bookindex specific
content.

Changing stylesheet.xsl from transitional to http://www.w3.org/1999/xhtml gets
rid of xmlns="http://www.w3.org/TR/xhtml1/transitional"; in navigation/footer.

Of course we should likely change all http://www.w3.org/TR/xhtml1/transitional
references, rather than just the one necessary to get rid of the xmlns= spam.


So far, so easy. It took me way longer to understand what's causing the
all the xmlns:xlink= appearances.

For a long time I was misdirected because if I remove the <xsl:template
name="generate-basic-index"> in stylesheet-html-common.xsl, the number of
xmlns:xlink drastically reduces to a handful. Which made me think that their
existance is somehow our fault. And I tried and tried to find the cause.

But it turns out that this originally is caused by a still existing buglet in
the docbook xsl stylesheets, specifically autoidx.xsl. It doesn't omit xlink
in exclude-result-prefixes, but uses ids etc from xlink.

The reason that we end up with so many more xmlns:xlink is just that without
our customization there ends up being a single
<div xmlns:xlink="http://www.w3.org/1999/xlink"; class="index">
and then everything below that doesn't need the xmlns:xlink anymore. But
because stylesheet-html-common.xsl emits the div, the xmlns:xlink is emitted
for each element that autoidx.xsl has "control" over.

Waiting for docbook to fix this seems a bit futile, I eventually found a
bugreport about this, from 2016: https://sourceforge.net/p/docbook/bugs/1384/

But we can easily reduce the "impact" of the issue, by just adding a single
xmlns:xlink to <div class="index">, which is sufficient to convince xsltproc
to not repeat it.


Before:
-rw-r--r-- 1 andres andres 683139 Feb 13 04:31 html-broken/bookindex.html
After:
-rw-r--r-- 1 andres andres 442923 Feb 13 12:03 html/bookindex.html

While most of the savings are in bookindex, the rest of the files are reduced
by another ~100kB.


WIP patch attached. For now I just adjusted the minimal set of
xmlns="http://www.w3.org/TR/xhtml1/transitional";, but I think we should update
all.

Greetings,

Andres Freund
diff --git i/doc/src/sgml/stylesheet-html-common.xsl w/doc/src/sgml/stylesheet-html-common.xsl
index d9961089c65..9f69af40a94 100644
--- i/doc/src/sgml/stylesheet-html-common.xsl
+++ w/doc/src/sgml/stylesheet-html-common.xsl
@@ -4,6 +4,7 @@
 %common.entities;
 ]>
 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
+                xmlns="http://www.w3.org/1999/xhtml";
                 version="1.0">
 
 <!--
@@ -126,7 +127,11 @@ set       toc,title
                                                  &uppercase;),
                                              substring(&primary;, 1, 1)))]"/>
 
-  <div class="index">
+  <!-- pgsql-docs: added xmlns:xlink, autoidx.xsl doesn't include xlink in
+       exclude-result-prefixes. Without our customization that just leads to a
+       single xmlns:xlink in this div, but because we emit it it otherwise
+       gets pushed down to the elements output by autoidx.xsl -->
+  <div class="index" xmlns:xlink="http://www.w3.org/1999/xlink";>
     <!-- pgsql-docs: begin added stuff -->
     <p class="indexdiv-quicklinks">
       <a href="#indexdiv-Symbols">
diff --git i/doc/src/sgml/stylesheet.xsl w/doc/src/sgml/stylesheet.xsl
index 0eac594f0cc..24a9481fd49 100644
--- i/doc/src/sgml/stylesheet.xsl
+++ w/doc/src/sgml/stylesheet.xsl
@@ -1,7 +1,7 @@
 <?xml version='1.0'?>
 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
                 version='1.0'
-                xmlns="http://www.w3.org/TR/xhtml1/transitional";
+                xmlns="http://www.w3.org/1999/xhtml";
                 exclude-result-prefixes="#default">
 
 <xsl:import href="http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl"/>

Reply via email to