Hi,
Futher I think the source is not read well.
The mutation form ü to ü is maybe cause by
reading the source as ascii not as UTF-8.
Indeed, the source is not being read correctly. It is being read not as
ascii but as ISO-8859-1 (Latin1). The lowercase ü character is encoded in
UTF-8 as hex sequence C3 BC. The C3 and BC characters do not exist in ASCII
encoding, but they do in ISO-8859-1 as A-tilde and fraction 1/4. If the
source is interpreted as ISO-8859-1, then that byte sequence would be
interpreted as those two characters, not one. In the ISO named entities, C3
is à and BC is ¼ which is what you are seeing in your output
for ü. (You can see these entity declarations in the DocBook 4.5 DTD
distribution in the "ent" directory files.)
I'm not able to duplicate your output using xsltproc and any combination of
encodings or xsltproc options. I did not think xsltproc could even output
named entities like à but I could be wrong.
Something is going wrong with the parser reading your files. I would
examine your xsltproc setup, try xsltproc on another system that is
independent of the first, and try Saxon 6 as an alternative processor.
Bob Stayton
Sagehill Enterprises
b...@sagehill.net
--------------------------------------------------
From: <markus.sticker.e...@zf.com>
Sent: Monday, June 03, 2013 7:12 AM
To: <docbook@lists.oasis-open.org>
Subject: AW: AW: AW: AW: AW: AW: [docbook] How to get a proper UTF-8 HTML
with umlaut
Hi Markus,
I have set this parameter before.
(See the former mails)
So there must be some switches for
setting the entity translation of.
As you can see all special characters are
translated to HTML entities.
Futher I think the source is not read well.
The mutation form ü to ü is maybe cause by
reading the source as ascii not as UTF-8.
BR
Markus
-----Ursprüngliche Nachricht-----
Von: Markus Hoenicka [mailto:markus.hoeni...@mhoenicka.de]
Gesendet: Montag, 3. Juni 2013 15:52
An: docbook@lists.oasis-open.org
Betreff: Re: AW: AW: AW: AW: AW: [docbook] How to get a proper UTF-8 HTML
with umlaut
Am 2013-06-03 15:39, schrieb markus.sticker.e...@zf.com:
Hi Markus,
This result is the same as in docbook 5 ... your output is ISO-8859-1
:-( That's the default in docbook
BR
Markus
I'm sorry, I was too quick with this test.
I've now processed the document with the following command line, using
chunked output and UTF-8 as you requested:
xsltproc --output output/ --stringparam chunker.output.encoding UTF-8
/usr/share/sgml/docbook/xsl-stylesheets/html/chunk.xsl refdbtest.xml
The result is the same for me, except that everything is UTF-8 now. The
umlauts are there in the html source and they're displayed ok in a web
browser. See attached html output.
regards
Markus
--
Markus Hoenicka
http://www.mhoenicka.de
AQ score 38
---------------------------------------------------------------------
To unsubscribe, e-mail: docbook-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-h...@lists.oasis-open.org
---------------------------------------------------------------------
To unsubscribe, e-mail: docbook-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-h...@lists.oasis-open.org