Re: Bug#283949: allow generation of plain text
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thursday 09 December 2004 00:00, Nikolai Prokoschenko wrote: +xsl:output method=html encoding=UTF-8 indent=no/ + Thanks! Applied and committed. That also makes the 'sed' statement in buildone.sh look nicer. BTW, we also should consider generating strict XHTML.. ;) Hmm. I don't really know what that means and what the consequences and (dis)advantages would be. Care to give some explanation and arguments? Cheers, FJP -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBuDocgm/Kwh6ICoQRAlzQAKCnX/1tqEh6I9RDRaV/RKiJu9pDwACguCs1 2+ZQuaHY1sTs42SMhyRjjP4= =UO4s -END PGP SIGNATURE-
Re: Bug#283949: allow generation of plain text
On Thu, Dec 09, 2004 at 12:42:20PM +0100, Frans Pop wrote: BTW, we also should consider generating strict XHTML.. ;) Hmm. I don't really know what that means and what the consequences and (dis)advantages would be. Care to give some explanation and arguments? Well, it's XHTML ;) But the advantages are actually: 1) XHTML is _the_ current HTML standard 2) XHTML is XML-based, might be easier to parse. 3) XHTML has the priority put onto the semantics and thus the design is CSS-based 4) It's easy to convert to it using XSL-stylesheet (I've actually seen an example somewhere, that's why I proposed that) But as it's marginally different from HTML4, the decision could be postponed to the time, when our XSL-Templates get mature and when we are able to generate PDFs for all languages ;) -- Nikolai Prokoschenko [EMAIL PROTECTED] / Jabber: [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Bug#283949: allow generation of plain text
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Monday 06 December 2004 10:34, Nikolai Prokoschenko wrote: Works for Russian. The special characters are not shown, as they are not available in KOI8-R - I'd propose switching to UTF-8 as soon as possible. Could you have a look at how to set up things so the single HTML file will be generated in UTF-8? You seem to be better at that kind of thing than I am :-P I guess the final file for Russian could be generated in UTF now (without the switch of the HTML file). However, is that what Russian users will expect: will they, in general, know how to recognize, open and read a document that is UTF-8 encoded? If the systems of most users are set up for KOI8-R, I would suggest leaving the doc in that encoding. Cheers, Frans -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBt0HEgm/Kwh6ICoQRAvrtAJ0f/oI1p1P7ZtrWIe0PCrjm44OboACeKtXz jYNrDl7xegXSuHVUYc8m/NA= =wATd -END PGP SIGNATURE-
Re: Bug#283949: allow generation of plain text
On Wed, Dec 08, 2004 at 07:02:44PM +0100, Frans Pop wrote: Works for Russian. The special characters are not shown, as they are not available in KOI8-R - I'd propose switching to UTF-8 as soon as possible. Could you have a look at how to set up things so the single HTML file will be generated in UTF-8? You seem to be better at that kind of thing than I am :-P I'll look into it, this should be _sowewhere_. :) I guess the final file for Russian could be generated in UTF now (without the switch of the HTML file). However, is that what Russian users will expect: will they, in general, know how to recognize, open and read a document that is UTF-8 encoded? The russian users are a special case. They will be the last ones to switch to unicode, as they do not consider any advantages and only talk about KOI8-R works and if the 8th bit is cut off, I can still read it and such kind of 1980es speech. When RedHat 9.0 has been released, the first thing the users asked was how to switch the whole distribution back to KOI8-R. It's plain sick. My radical opinion is to enforce UTF-8, as most distributions will be UTF-8 based in a year (Debian will switch after sarge release, AFAIK), RedHat and SuSE are already there. However, the Russian manual translation won't hit the installation CDs anytime soon (there are some extra glitches with xml2po which have to be worked on), we can just use UTF-8. If the manual had to go to the CDs, KOI8-R could have been considered. If the systems of most users are set up for KOI8-R, I would suggest leaving the doc in that encoding. It's relevant only for the plain text manual, which will not be available on the CDs, the webpage is not important. So it can stay UTF-8 -- Nikolai Prokoschenko [EMAIL PROTECTED] / Jabber: [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Bug#283949: allow generation of plain text
On Wed, Dec 08, 2004 at 07:02:44PM +0100, Frans Pop wrote: Could you have a look at how to set up things so the single HTML file will be generated in UTF-8? You seem to be better at that kind of thing than I am :-P Done. It's a small patch on the stylesheet Index: style-html-single.xsl === --- style-html-single.xsl (revision 24326) +++ style-html-single.xsl (working copy) @@ -5,6 +5,8 @@ xsl:import href=file:///usr/share/sgml/docbook/stylesheet/xsl/nwalsh/html/docbook.xsl/ +xsl:output method=html encoding=UTF-8 indent=no/ + !-- Include our common parameters -- xsl:include href=style-common.xsl/ BTW, we also should consider generating strict XHTML.. ;) -- Nikolai Prokoschenko [EMAIL PROTECTED] / Jabber: [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Bug#283949: allow generation of plain text
On Sun, Dec 05, 2004 at 03:44:27PM +0100, Frans Pop wrote: - Replace some unprintable characters I noticed that some characters were replaced by a ? in the text file. The main problems were quotes and dashes. I managed to replace these using a sed script. Kenshi, Nikolai, Miroslav: Could you please check if I used the correct charset and if the result is acceptable for your languages? Works for Russian. The special characters are not shown, as they are not available in KOI8-R - I'd propose switching to UTF-8 as soon as possible. -- Nikolai Prokoschenko [EMAIL PROTECTED] / Jabber: [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#283949: allow generation of plain text
On Thursday 02 December 2004 15:16, Frank Lichtenheld wrote: While investigating how to incorporate the install manual into the websites I saw there is no plain text output yet. I've prepared a little patch to allow this (xml-single html-text). I have committed this patch to SVN (trunk only) with some additions. - Variable encoding to support Czech, Russian and Japanese I have set the encoding for these languages as follows - cs: ISO-8859-2 - ja: EUC-JP - ru: KOI8-R I think it would be better if the intermediate file would be UTF-8, but was unable to quickly find out how to set that. - Replace some unprintable characters I noticed that some characters were replaced by a ? in the text file. The main problems were quotes and dashes. I managed to replace these using a sed script. See the script now in SVN for details. Kenshi, Nikolai, Miroslav: Could you please check if I used the correct charset and if the result is acceptable for your languages? (Leaving the bug open for now as improvements are possible.) -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#283949: allow generation of plain text
Hi Frans, At Sun, 5 Dec 2004 15:44:27 +0100, Frans Pop wrote: - Variable encoding to support Czech, Russian and Japanese I have set the encoding for these languages as follows - cs: ISO-8859-2 - ja: EUC-JP - ru: KOI8-R See the script now in SVN for details. Kenshi, Nikolai, Miroslav: Could you please check if I used the correct charset and if the result is acceptable for your languages? I checked text file built by 'build/buildone.sh i386 ja txt'. Yeah, it seems good. Thanks, -- Kenshi Muto [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#283949: allow generation of plain text
Package: debian-installer-manual Severity: normal Tags: patch While investigating how to incorporate the install manual into the websites I saw there is no plain text output yet. I've prepared a little patch to allow this (xml-single html-text). The method of using w3m -dump still has some drawbacks. The most important one is that most of the URLs are invisible. None of the textmode browers I tested seems to have an option to change this behaviour, though (sadly enough, the text export of Firefox does this one good, but has its errors, too. Beside the fact we can't use it in batch mode ;) Still working on this. Gruesse, Frank Lichtenheld -- System Information: Debian Release: 3.1 APT prefers testing APT policy: (990, 'testing'), (500, 'unstable') Architecture: i386 (i686) Kernel: Linux 2.6.8-1-k7-smp Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 Index: buildone.sh === --- buildone.sh (revision 24138) +++ buildone.sh (working copy) @@ -20,6 +20,7 @@ stylesheet_dir=$build_path/stylesheets stylesheet_profile=$stylesheet_dir/style-profile.xsl stylesheet_html=$stylesheet_dir/style-html.xsl +stylesheet_html_single=$stylesheet_dir/style-html-single.xsl stylesheet_fo=$stylesheet_dir/style-fo.xsl stylesheet_dsssl=$stylesheet_dir/style-print.dsl @@ -101,6 +102,24 @@ checkresult $? } +create_text () { + +create_profiled + +echo Creating temporary .html file... + +/usr/bin/xsltproc \ +--xinclude \ + --output $tempdir/install.${language}.html \ +$stylesheet_html_single \ +$tempdir/install.${language}.profiled.xml +checkresult $? + +echo Creating .txt file... +w3m -dump $tempdir/install.${language}.html \ + $destination/install.${language}.txt +} + create_dvi () { # Skip this step if the .dvi file already exists @@ -179,6 +198,7 @@ html) create_html;; ps)create_ps;; pdf) create_pdf;; +txt) create_text;; *) echo Format $format unknown or not yet supported!;; esac Index: build.sh === --- build.sh(revision 24138) +++ build.sh(working copy) @@ -16,6 +16,10 @@ destination=/tmp/manual fi +if [ -z $format ]; then +formats=html pdf ps txt +fi + [ -e $destination ] || mkdir -p $destination if [ $official_build ]; then @@ -32,9 +36,15 @@ else destsuffix=${lang}.${arch} fi - ./buildone.sh $arch $lang html - mkdir $destination/$destsuffix - mv build.out/html/*.html $destination/$destsuffix + ./buildone.sh $arch $lang $formats + mkdir -p $destination/$destsuffix + for format in $formats; do + if [ $format = html ]; then + mv build.out/html/*.html $destination/$destsuffix + else + mv build.out/install.$lang.$format $destination/$destsuffix + fi + done ./clear.sh done done
Bug#283949: allow generation of plain text
On Thursday 02 December 2004 15:16, Frank Lichtenheld wrote: While investigating how to incorporate the install manual into the websites I saw there is no plain text output yet. I've prepared a little patch to allow this (xml-single html-text). Thanks, I'll try to incorporate them in (the new version of) the build scripts this weekend. Cheers, FJP -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]