On 15/05/12 22:40, Dieter Plaetinck wrote:
On Mon, 14 May 2012 23:32:35 +0200
Dario Giovannetti<[email protected]> wrote:
I would like to propose using pandoc (
http://johnmacfarlane.net/pandoc/
) instead of the make-doc.sh script for converting the installation
guide (Markdown syntax) to the document hosted in the wiki (MediaWiki
syntax).
I've tested it and the result looks pretty good, with only a few minor
manual refinements required (which I volunteer to perform, if needed):
instead of the current script, which practically produces an html
document, we would get a correctly-formed, much neater MediaWiki
document.
This would greatly simplify further improvements in the
wikification of
the the guide, like the adaptation to ArchWiki's style standars.
Thank you
Dario
If you have a patch that I can apply, I'll try it out.
if the code using pandoc is more elegant, and the output result is
comparable (and/or better), i'm up for it.
Dieter
As requested, here's the patch: it's quite radical also because the
previous script was creating a header that is not used any longer in
the wiki. As you can see I've rewritten almost everything in Python,
since I'm much more comfortable in that language with regular
expressions; besides, that way the code is much more readable and
flexible.
NOTE 1: you will require to install the "pandoc" package, currently
in the AUR: http://aur.archlinux.org/packages.php?ID=32490
NOTE 2: the patch has been committed on the "develop" branch.
From 593842cd1182ae0342efc9356477b16739641455 Mon Sep 17 00:00:00 2001
From: Dario Giovannetti <[email protected]>
Date: Wed, 16 May 2012 23:12:50 +0200
Subject: [PATCH 50/50] revise the automatic procedure for creating the
MediaWiki version of the installation guide
---
README | 3 +-
doc/official_installation_guide_en | 3 +-
make-doc.sh | 49 ++-------------
make_doc_fixes.py | 122
++++++++++++++++++++++++++++++++++++
4 files changed, 131 insertions(+), 46 deletions(-)
create mode 100755 make_doc_fixes.py
diff --git a/README b/README
index 6981de7..35456cd 100644
--- a/README
+++ b/README
@@ -19,7 +19,8 @@ Homepage: http://github.com/Dieterbe/aif
- libui-sh
- iproute2
Optionally:
- - markdown: to generate the html installation guide
+ - pandoc: to generate the MediaWiki installation guide
+ - python: to generate the MediaWiki installation guide
- cryptsetup: for encryption support
- lvm2: for LVM support
- dhcpd: for dhcp networking support
diff --git a/doc/official_installation_guide_en
b/doc/official_installation_guide_en
index 0faea07..5b2e580 100644
--- a/doc/official_installation_guide_en
+++ b/doc/official_installation_guide_en
@@ -2,7 +2,7 @@
General installation documentation for the Arch Linux distribution.
-This guide is only valid for release 2010.05 or newer.
+This guide is only valid for release 2011.08 or newer.
This guide is maintained in [aif
git](http://projects.archlinux.org/?p=aif.git)
Git pull requests, patches, comments are welcome on the arch
[releng mailing
list](http://www.archlinux.org/mailman/listinfo/arch-releng)
@@ -218,6 +218,7 @@ You can find more info on the wiki
[Community contributed
documentation](http://wiki.archlinux.org/index.php/Archiso-as-pxe-server)
(this section could be a bit more elaborate)
+
### Client
Configure your system to try network booting (pxe) first.
diff --git a/make-doc.sh b/make-doc.sh
index 4e6c5a2..39914f5 100755
--- a/make-doc.sh
+++ b/make-doc.sh
@@ -1,50 +1,11 @@
#!/bin/sh
-which markdown &>/dev/null || echo "Need markdown utility!" >&2
+which pandoc &>/dev/null || echo "Need pandoc utility!" >&2
+
+echo "generating mediawiki document..."
-echo "generating html..."
for i in doc/official_installation_guide_??
do
echo $i
- # convert markdown to html, convert html links to wiki ones.
- cat $i | markdown | sed 's|<a
href="\([^"]*\)"[^>]*>\([^<]*\)</a>|[\1 \2]|g' > $i.html
- # turn code markup into a syntax that mediawiki understands
- sed -i 's#<pre><code>#<pre>#g' $i.html
- sed -i 's#</code></pre>#</pre>#g' $i.html
-
+ # convert markdown to mediawiki and perform further adaptations
+ cat $i | pandoc -f markdown -t mediawiki | xargs -0
./make_doc_fixes.py $i > $i.mw
done
-
-echo "adding special wiki thingies..."
-
-i=doc/official_installation_guide_en
-echo $i
-
-
-summary_begin='<p><strong>Article summary<\/strong><\/p>'
-summary_end_plus_one='<p><strong>Related articles<\/strong><\/p>'
-related_begin='<p><strong>Related articles<\/strong><\/p>'
-related_end_plus_one='<h1>Introduction<\/h1>'
-
-summary=`sed -n "/$summary_begin/, /$summary_end_plus_one/p;"
$i.html | sed "/$summary_begin/d; /$summary_end_plus_one/d"`
-related=`sed -n "/$related_begin/, /$related_end_plus_one/p;"
$i.html | sed "/$related_begin/d; /$related_end_plus_one/d"`
-
-# prepare $related for wikiing.
-# note that like this we always keep the absulolute url's even if
they are on the same subdomain eg: {{Article summary
wiki|http://foo/bar bar}} (note).
-# wiki renders absolute url a bit uglier. always having absolute
url's is not needed if the page can be looked up on the same wiki,
but like this it was simplest to implement..
-related=`echo "$related"| sed -e 's#<p>\[\(.*\)\]
\(.*\)<\/p>#{{Article summary wiki|\1}} \2#'`
-
-# preare $summary for wiiking: replace email address by nice mailto
links
-summary=`echo "$summary" | sed 's/\([^"|,
]*@[-A-Za-z0-9_.]*\)/[mailto:\1 \1]/'`
-
-
-echo -e "[[Category:Getting and installing Arch
(English)]]\n[[Category:HOWTOs (English)]]
-[[Category:Accessibility (English)]]
-[[Category:Website Resources]]
-{{Article summary start}}\n{{Article summary text|
1=$summary}}\n{{Article summary heading|Available Languages}}\n
-{{i18n_entry|English|Official Arch Linux Install Guide}}\n
-{{Article summary heading|Related articles}}
-$related
-{{Article summary end}}" | cat - $i.html > $i.html.tmp && mv
$i.html.tmp $i.html
-
-# remove summary and related articles from actual content
-sed "/$summary_end_plus_one/p; /$summary_begin/,
/$summary_end_plus_one/d" $i.html > $i.html.tmp && mv $i.html.tmp
$i.html
-sed "/$related_end_plus_one/p; /$related_begin/,
/$related_end_plus_one/d" $i.html > $i.html.tmp && mv $i.html.tmp
$i.html
diff --git a/make_doc_fixes.py b/make_doc_fixes.py
new file mode 100755
index 0000000..a7200a0
--- /dev/null
+++ b/make_doc_fixes.py
@@ -0,0 +1,122 @@
+#!/usr/bin/env python3
+"""
+This script is not meant to be run as a standalone application,
instead it is
+called by make-doc.sh to perform further adaptations to the
MediaWiki version
+of the installation guide.
+"""
+
+import sys
+import re
+
+FILENAME = sys.argv[1]
+INPUT = sys.argv[2]
+
+# Used in fix_multiline_list_items
+LIST_REGEXP = "^([\:\*#]+.+<br(?: /)?>)( *\n)"
+LIST_REPLACE = "\g<1>"
+
+# Used in wikify_internal_links
+LINK_REGEXP = "\[{baseurl}([^\]\s]+?) ([^\]\n]+?)\]"
+LINK_REPLACE = "[[\g<1>|\g<2>]]"
+
+# If a translation of the guide is added, a proper entry should be
added to
+# this dictionary; the key names must be 2-character language tags
+LANGFIXES = {
+ "en": {
+ "baseurl": "https?://wiki\.archlinux\.org/index\.php/", #
regexp
+ "header": """\
+[[Category:Getting and installing Arch]]
+[[fr:Guide officiel de l'installation]]
+[[ro:Ghid de instalare oficial]]
+{{i18n|Official Installation Guide}}
+""", # string
+ "intro": """The Official Installation Guide is maintained in
[http://projects.archlinux.org/aif.git/ aif.git].
+
+The version included with the latest
[http://www.archlinux.org/download/ release] (2011.08.19) can be
found
[http://projects.archlinux.org/aif.git/plain/doc/official_installation_guide_en?id=13c8c0813328eb8f52b03b3c53a32f1f40558021
here].
+
+The latest version can be found
[http://projects.archlinux.org/aif.git/plain/doc/official_installation_guide_en
here].
+
+The (unofficial) [[Beginners' Guide]] provides a thorough
walkthrough of the the installation and configuration process.
+
+""",
+ "summary_heading": None, # must be None only for English
+ "summary": "'''Article summary'''", # string
+ "related": "'''Related articles'''", # string
+ "introduction": "= Introduction =", # string
+ },
+}
+
+
+def fix_multiline_list_items(text):
+ """
+ pandoc doesn't convert multiline list items correctly, so this
function
+ compensates for that.
+ """
+ test = ""
+ # It's necessary to run this multiple times because of how the
regular
+ # expression is designed
+ while text != test:
+ test = text
+ text = re.sub(LIST_REGEXP, LIST_REPLACE, text,
flags=re.MULTILINE)
+ return text
+
+
+def wikify_internal_links(text, patches):
+ """
+ Turns external links that point to the local subdomain into
proper internal
+ links.
+ """
+ regexp = LINK_REGEXP.format(**patches)
+ text = re.sub(regexp, LINK_REPLACE, text)
+ return text
+
+
+def insert_header(text, patches):
+ """
+ Inserts the standard article header.
+ """
+ text = patches["header"] + text
+ return text
+
+
+def assemble_summary(text, patches):
+ """
+ Converts the article summary and related links into a standard
summary
+ """
+ # NOTE: this function requires some fixes if more languages are
added
+ part_a = text.partition(patches["summary"])
+ part_b = part_a[2].partition(patches["related"])
+ part_c = part_b[2].partition(patches["introduction"])
+ related_links = part_c[0].strip().split("\n")
+ summary_heading = ("|" + patches["summary_heading"]
+ if (patches["summary_heading"])
+ else "")
+ summary_text = part_b[0].strip()
+ related = "\n".join(["{{{{Article summary text|1={}}}}}".format(r)
+ for r in related_links])
+ summary = """{{{{Article summary start{}}}}}
+{{{{Article summary text|1={}}}}}
+{{{{Article summary heading|Related articles}}}}
+{}
+{{{{Article summary end}}}}
+
+""".format(summary_heading , summary_text, related)
+ text = part_a[0] + summary + patches["intro"] + part_c[1] +
part_c[2]
+ return text
+
+
+def main(filename, text):
+ """
+ Main function
+ """
+ language = filename[-2:]
+ text = fix_multiline_list_items(text)
+ if language in LANGFIXES:
+ patches = LANGFIXES[language]
+ text = wikify_internal_links(text, patches)
+ text = insert_header(text, patches)
+ text = assemble_summary(text, patches)
+ return text
+
+if __name__ == "__main__":
+ print(main(FILENAME, INPUT))