Re: [docbook] How to get a proper UTF-8 HTML with umlaut

Bob Stayton Mon, 03 Jun 2013 09:43:47 -0700

Hi,

Futher I think the source is not read well.
The mutation form ü to Ã¼ is maybe cause by
reading the source as ascii not as UTF-8.

Indeed, the source is not being read correctly. It is being read not asascii but as ISO-8859-1 (Latin1). The lowercase ü character is encoded inUTF-8 as hex sequence C3 BC. The C3 and BC characters do not exist in ASCIIencoding, but they do in ISO-8859-1 as A-tilde and fraction 1/4. If thesource is interpreted as ISO-8859-1, then that byte sequence would beinterpreted as those two characters, not one. In the ISO named entities, C3is Ã and BC is ¼ which is what you are seeing in your outputfor ü. (You can see these entity declarations in the DocBook 4.5 DTDdistribution in the "ent" directory files.)

I'm not able to duplicate your output using xsltproc and any combination ofencodings or xsltproc options. I did not think xsltproc could even outputnamed entities like Ã but I could be wrong.

Something is going wrong with the parser reading your files. I wouldexamine your xsltproc setup, try xsltproc on another system that isindependent of the first, and try Saxon 6 as an alternative processor.


Bob Stayton
Sagehill Enterprises
b...@sagehill.net

--------------------------------------------------
From: <markus.sticker.e...@zf.com>
Sent: Monday, June 03, 2013 7:12 AM
To: <docbook@lists.oasis-open.org>

Subject: AW: AW: AW: AW: AW: AW: [docbook] How to get a proper UTF-8 HTMLwith umlaut

Hi Markus,

I have set this parameter before.
(See the former mails)

So there must be some switches for
setting the entity translation of.

As you can see all special characters are
translated to HTML entities.

Futher I think the source is not read well.
The mutation form ü to Ã¼ is maybe cause by
reading the source as ascii not as UTF-8.

BR
Markus





-----Ursprüngliche Nachricht-----
Von: Markus Hoenicka [mailto:markus.hoeni...@mhoenicka.de]
Gesendet: Montag, 3. Juni 2013 15:52
An: docbook@lists.oasis-open.org
Betreff: Re: AW: AW: AW: AW: AW: [docbook] How to get a proper UTF-8 HTMLwith umlaut
Am 2013-06-03 15:39, schrieb markus.sticker.e...@zf.com:
Hi Markus,

This result is the same as in docbook 5 ... your output is ISO-8859-1
:-( That's the default in docbook

BR
Markus
I'm sorry, I was too quick with this test.
I've now processed the document with the following command line, usingchunked output and UTF-8 as you requested:
xsltproc --output output/ --stringparam chunker.output.encoding UTF-8/usr/share/sgml/docbook/xsl-stylesheets/html/chunk.xsl refdbtest.xml
The result is the same for me, except that everything is UTF-8 now. Theumlauts are there in the html source and they're displayed ok in a webbrowser. See attached html output.
regards
Markus

--
Markus Hoenicka
http://www.mhoenicka.de
AQ score 38

---------------------------------------------------------------------
To unsubscribe, e-mail: docbook-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-h...@lists.oasis-open.org



---------------------------------------------------------------------
To unsubscribe, e-mail: docbook-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-h...@lists.oasis-open.org

Re: [docbook] How to get a proper UTF-8 HTML with umlaut

Reply via email to