Re: Bug#283949: allow generation of plain text

2004-12-09 Thread Frans Pop
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thursday 09 December 2004 00:00, Nikolai Prokoschenko wrote:
 +xsl:output method=html encoding=UTF-8 indent=no/
 +

Thanks! Applied and committed.
That also makes the 'sed' statement in buildone.sh look nicer.

 BTW, we also should consider generating strict XHTML.. ;)

Hmm. I don't really know what that means and what the consequences and 
(dis)advantages would be. Care to give some explanation and arguments?

Cheers,
FJP
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBuDocgm/Kwh6ICoQRAlzQAKCnX/1tqEh6I9RDRaV/RKiJu9pDwACguCs1
2+ZQuaHY1sTs42SMhyRjjP4=
=UO4s
-END PGP SIGNATURE-



Re: Bug#283949: allow generation of plain text

2004-12-09 Thread Nikolai Prokoschenko
On Thu, Dec 09, 2004 at 12:42:20PM +0100, Frans Pop wrote:

  BTW, we also should consider generating strict XHTML.. ;)
 Hmm. I don't really know what that means and what the consequences and 
 (dis)advantages would be. Care to give some explanation and arguments?

Well, it's XHTML ;)

But the advantages are actually:

1) XHTML is _the_ current HTML standard
2) XHTML is XML-based, might be easier to parse.
3) XHTML has the priority put onto the semantics and thus the design is
CSS-based
4) It's easy to convert to it using XSL-stylesheet (I've actually seen an
example somewhere, that's why I proposed that)

But as it's marginally different from HTML4, the decision could be
postponed to the time, when our XSL-Templates get mature and when we are
able to generate PDFs for all languages ;)

-- 
Nikolai Prokoschenko 
[EMAIL PROTECTED] / Jabber: [EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Bug#283949: allow generation of plain text

2004-12-08 Thread Frans Pop
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Monday 06 December 2004 10:34, Nikolai Prokoschenko wrote:
 Works for Russian. The special characters are not shown, as they are
 not available in KOI8-R - I'd propose switching to UTF-8 as soon as
 possible.

Could you have a look at how to set up things so the single HTML file will 
be generated in UTF-8? You seem to be better at that kind of thing than I 
am :-P

I guess the final file for Russian could be generated in UTF now (without 
the switch of the HTML file).
However, is that what Russian users will expect: will they, in general, 
know how to recognize, open and read a document that is UTF-8 encoded?

If the systems of most users are set up for KOI8-R, I would suggest 
leaving the doc in that encoding.

Cheers,
Frans
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBt0HEgm/Kwh6ICoQRAvrtAJ0f/oI1p1P7ZtrWIe0PCrjm44OboACeKtXz
jYNrDl7xegXSuHVUYc8m/NA=
=wATd
-END PGP SIGNATURE-



Re: Bug#283949: allow generation of plain text

2004-12-08 Thread Nikolai Prokoschenko
On Wed, Dec 08, 2004 at 07:02:44PM +0100, Frans Pop wrote:
  Works for Russian. The special characters are not shown, as they are
  not available in KOI8-R - I'd propose switching to UTF-8 as soon as
  possible.
 Could you have a look at how to set up things so the single HTML file will 
 be generated in UTF-8? You seem to be better at that kind of thing than I 
 am :-P

I'll look into it, this should be _sowewhere_. :)

 I guess the final file for Russian could be generated in UTF now
 (without the switch of the HTML file).  However, is that what Russian
 users will expect: will they, in general, know how to recognize, open
 and read a document that is UTF-8 encoded?

The russian users are a special case. They will be the last ones to switch
to unicode, as they do not consider any advantages and only talk about
KOI8-R works and if the 8th bit is cut off, I can still read it and such
kind of 1980es speech. When RedHat 9.0 has been released, the first thing
the users asked was how to switch the whole distribution back to KOI8-R.
It's plain sick.

My radical opinion is to enforce UTF-8, as most distributions will be
UTF-8 based in a year (Debian will switch after sarge release, AFAIK),
RedHat and SuSE are already there. However, the Russian manual translation
won't hit the installation CDs anytime soon (there are some extra glitches
with xml2po which have to be worked on), we can just use UTF-8. If the
manual had to go to the CDs, KOI8-R could have been considered.

 If the systems of most users are set up for KOI8-R, I would suggest
 leaving the doc in that encoding.

It's relevant only for the plain text manual, which will not be available
on the CDs, the webpage is not important. So it can stay UTF-8

-- 
Nikolai Prokoschenko 
[EMAIL PROTECTED] / Jabber: [EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Bug#283949: allow generation of plain text

2004-12-08 Thread Nikolai Prokoschenko
On Wed, Dec 08, 2004 at 07:02:44PM +0100, Frans Pop wrote:

 Could you have a look at how to set up things so the single HTML file will 
 be generated in UTF-8? You seem to be better at that kind of thing than I 
 am :-P

Done. It's a small patch on the stylesheet

Index: style-html-single.xsl
===
--- style-html-single.xsl   (revision 24326)
+++ style-html-single.xsl   (working copy)
@@ -5,6 +5,8 @@
xsl:import
href=file:///usr/share/sgml/docbook/stylesheet/xsl/nwalsh/html/docbook.xsl/
   
+xsl:output method=html encoding=UTF-8 indent=no/
+
 !-- Include our common parameters --
 xsl:include href=style-common.xsl/

BTW, we also should consider generating strict XHTML.. ;)

-- 
Nikolai Prokoschenko 
[EMAIL PROTECTED] / Jabber: [EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Bug#283949: allow generation of plain text

2004-12-06 Thread Nikolai Prokoschenko
On Sun, Dec 05, 2004 at 03:44:27PM +0100, Frans Pop wrote:

 - Replace some unprintable characters
   I noticed that some characters were replaced by a ? in the text file.
   The main problems were quotes and dashes. I managed to replace these
   using a sed script.
 
 Kenshi, Nikolai, Miroslav:
 Could you please check if I used the correct charset and if the result is 
 acceptable for your languages?

Works for Russian. The special characters are not shown, as they are not
available in KOI8-R - I'd propose switching to UTF-8 as soon as possible.

-- 
Nikolai Prokoschenko 
[EMAIL PROTECTED] / Jabber: [EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#283949: allow generation of plain text

2004-12-05 Thread Frans Pop
On Thursday 02 December 2004 15:16, Frank Lichtenheld wrote:
 While investigating how to incorporate the install manual into
 the websites I saw there is no plain text output yet. I've prepared
 a little patch to allow this (xml-single html-text).

I have committed this patch to SVN (trunk only) with some additions.

- Variable encoding to support Czech, Russian and Japanese
  I have set the encoding for these languages as follows
  - cs: ISO-8859-2
  - ja: EUC-JP
  - ru: KOI8-R

  I think it would be better if the intermediate file would be UTF-8, but
  was unable to quickly find out how to set that.

- Replace some unprintable characters
  I noticed that some characters were replaced by a ? in the text file.
  The main problems were quotes and dashes. I managed to replace these
  using a sed script.

See the script now in SVN for details.

Kenshi, Nikolai, Miroslav:
Could you please check if I used the correct charset and if the result is 
acceptable for your languages?

(Leaving the bug open for now as improvements are possible.)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#283949: allow generation of plain text

2004-12-05 Thread Kenshi Muto
Hi Frans,

At Sun, 5 Dec 2004 15:44:27 +0100,
Frans Pop wrote:
 - Variable encoding to support Czech, Russian and Japanese
   I have set the encoding for these languages as follows
   - cs: ISO-8859-2
   - ja: EUC-JP
   - ru: KOI8-R

 See the script now in SVN for details.
 
 Kenshi, Nikolai, Miroslav:
 Could you please check if I used the correct charset and if the result is 
 acceptable for your languages?

I checked text file built by 'build/buildone.sh i386 ja txt'.
Yeah, it seems good.

Thanks,
-- 
Kenshi Muto
[EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#283949: allow generation of plain text

2004-12-02 Thread Frank Lichtenheld
Package: debian-installer-manual
Severity: normal
Tags: patch

While investigating how to incorporate the install manual into
the websites I saw there is no plain text output yet. I've prepared
a little patch to allow this (xml-single html-text).

The method of using w3m -dump still has some drawbacks. The most
important one is that most of the URLs are invisible. None of the
textmode browers I tested seems to have an option to change this
behaviour, though (sadly enough, the text export of Firefox does this
one good, but has its errors, too. Beside the fact we can't use it
in batch mode ;) Still working on this.

Gruesse,
Frank Lichtenheld

-- System Information:
Debian Release: 3.1
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'unstable')
Architecture: i386 (i686)
Kernel: Linux 2.6.8-1-k7-smp
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8
Index: buildone.sh
===
--- buildone.sh (revision 24138)
+++ buildone.sh (working copy)
@@ -20,6 +20,7 @@
 stylesheet_dir=$build_path/stylesheets
 stylesheet_profile=$stylesheet_dir/style-profile.xsl
 stylesheet_html=$stylesheet_dir/style-html.xsl
+stylesheet_html_single=$stylesheet_dir/style-html-single.xsl
 stylesheet_fo=$stylesheet_dir/style-fo.xsl
 stylesheet_dsssl=$stylesheet_dir/style-print.dsl
 
@@ -101,6 +102,24 @@
 checkresult $?
 }
 
+create_text () {
+
+create_profiled
+
+echo Creating temporary .html file...
+
+/usr/bin/xsltproc \
+--xinclude \
+   --output $tempdir/install.${language}.html \
+$stylesheet_html_single \
+$tempdir/install.${language}.profiled.xml
+checkresult $?
+
+echo Creating .txt file...
+w3m -dump $tempdir/install.${language}.html \
+   $destination/install.${language}.txt
+}
+
 create_dvi () {
 
 # Skip this step if the .dvi file already exists
@@ -179,6 +198,7 @@
 html)  create_html;;
 ps)create_ps;;
 pdf)   create_pdf;;
+txt)   create_text;;
 *) echo Format $format unknown or not yet supported!;;
 
 esac
Index: build.sh
===
--- build.sh(revision 24138)
+++ build.sh(working copy)
@@ -16,6 +16,10 @@
destination=/tmp/manual
 fi
 
+if [ -z $format ]; then
+formats=html pdf ps txt
+fi
+
 [ -e $destination ] || mkdir -p $destination
 
 if [ $official_build ]; then
@@ -32,9 +36,15 @@
else
destsuffix=${lang}.${arch}
fi
-   ./buildone.sh $arch $lang html
-   mkdir $destination/$destsuffix
-   mv build.out/html/*.html $destination/$destsuffix
+   ./buildone.sh $arch $lang $formats
+   mkdir -p $destination/$destsuffix
+   for format in $formats; do
+   if [ $format = html ]; then
+   mv build.out/html/*.html $destination/$destsuffix
+   else
+   mv build.out/install.$lang.$format $destination/$destsuffix
+   fi
+   done
./clear.sh
 done
 done


Bug#283949: allow generation of plain text

2004-12-02 Thread Frans Pop
On Thursday 02 December 2004 15:16, Frank Lichtenheld wrote:
 While investigating how to incorporate the install manual into
 the websites I saw there is no plain text output yet. I've prepared
 a little patch to allow this (xml-single html-text).

Thanks, I'll try to incorporate them in (the new version of) the build 
scripts this weekend.

Cheers,
FJP


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]