Re: [docbook-apps] Re: JAXP and docbook-xsl Stylesheets
Don Adams wrote: A HUGE thank you! Yes, after spending hours trying to figure this out, all I needed to do was change this: transformer.setOutputProperty(OutputKeys.ENCODING,UTF-8); to this: transformer.setOutputProperty(OutputKeys.ENCODING,ISO-8859-1); I think that the best is to omit this command completely. Then transformer will use encoding specified in stylesheet by xsl:output encoding=.../ and you can be sure that there will not be encoding mismatch. -- -- Jirka Kosek e-mail: [EMAIL PROTECTED] http://www.kosek.cz -- Profesionální školení a poradenství v oblasti technologií XML. Podívejte se na náš nově spuštěný web http://DocBook.cz Podrobný přehled školení http://xmlguru.cz/skoleni/ -- Nejbližší termíny školení: ** XSLT 23.-26.10.2006 ** XML schémata 13.-15.11.2006 ** ** DocBook 11.-13.12.2006 ** XSL-FO 11.-12.12.2006 ** -- http://xmlguru.czBlog mostly about XML for English readers -- smime.p7s Description: S/MIME Cryptographic Signature
JAXP and docbook-xsl Stylesheets
I am trying to use the JAXP Transformer in my Java code to translate valid docbook article XML into valid XHTML and FO XML. I am having a major problem with the use of the non-breaking space code #160; in the docbook-xsl stylesheets. The cause of the problem is perfectly described under 5. Be careful with nonbreaking spaces on this web page: http://www.oreillynet.com/pub/a/oreilly/java/news/javaxslt_0801.html To summarize, special characters in the docbook-xsl stylesheets are transformed into the actual special characters in the output of the transformation when the transformation method is selected as xml. So, for example, when a transformation is done into FO XML, a table title in the FO XML output contains Table 1. x; however, the spaces after the word Table and after 1. are not the ASCII space character, they are a single character code 160 (a non-breaking space) which is not valid in an XML file. From everything I've read, it seems like this is the correct behavior and other transformers such as xsltproc replace the single character codes with ASCII characters on their own as a post process. For XHTML, I set the output method to html instead of xml for the Transformer. This generated nbsp; in the output instead of the single character code 160. This is acceptable for me because browsers will display the output of the transformation even though it is not technically valid XHTML. For FO XML, I don't see a work-around for the problem other than hand-modifying the docbook-xsl stylesheets to either replace all the special characters with a valid ASCII character, or use the disable-output-escaping XSL attribute (which looks very difficult and support for this attribute is not a requirement of XSLT processors). Does anyone have any comments or suggestions? Thanks, Don - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: JAXP and docbook-xsl Stylesheets
On Wed, Sep 13, 2006 at 11:06:36AM -0500, Don Adams wrote: I am trying to use the JAXP Transformer in my Java code to translate valid docbook article XML into valid XHTML and FO XML. I am having a major problem with the use of the non-breaking space code #160; in the docbook-xsl stylesheets. The cause of the problem is perfectly described under 5. Be careful with nonbreaking spaces on this web page: http://www.oreillynet.com/pub/a/oreilly/java/news/javaxslt_0801.html To summarize, special characters in the docbook-xsl stylesheets are transformed into the actual special characters in the output of the transformation when the transformation method is selected as xml. So, for example, when a transformation is done into FO XML, a table title in the FO XML output contains Table 1. x; however, the spaces after the word Table and after 1. are not the ASCII space character, they are a single character code 160 (a non-breaking space) which is not valid in an XML file. This sounds like an encoding problem. When your xml file has latin1 encoding (iso-8859-1), non-breaking space is a single character of value A0. When your xml file has utf-8 encoding, non-breaking space consist of two bytes. If you mix both, that is, when the xml file declares utf-8 encoding but non-breaking space is written in the latin-1 manner as a single byte A0, your XML file is not valid. Regards, Simon -- Simon Pepping home page: http://www.leverkruid.eu - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [docbook-apps] JAXP and docbook-xsl Stylesheets
Don Adams wrote: into valid XHTML and FO XML. I am having a major problem with the use of the non-breaking space code #160; in the docbook-xsl stylesheets. The cause of the problem is perfectly described under 5. Be careful with nonbreaking spaces on this web page: http://www.oreillynet.com/pub/a/oreilly/java/news/javaxslt_0801.html This is not perfect description, but complete mess. If you have problems displaying files containing Unicode character with code 160 (U+00A0), then your browser is unable to infer encoding used correctly. This can be due misconfigured HTTP server, but there could be other things that went wrong. To summarize, special characters in the docbook-xsl stylesheets are transformed into the actual special characters in the output of the transformation when the transformation method is selected as xml. So, for example, when a transformation is done into FO XML, a table title in the FO XML output contains Table 1. x; however, the spaces after the word Table and after 1. are not the ASCII space character, they are a single character code 160 (a non-breaking space) which is not valid in an XML file. This character is perfectly valid in XML file. XML file can contain any Unicode character in element and attribute content. From everything I've read, it seems like this is the correct behavior Seems that you have read wrong resources. Does anyone have any comments or suggestions? You didn't show us your code which is dealing with JAXP. Could it be that you are using your own stream for writing output of transformation and you are setting incorrect encoding on this stream? Jirka -- -- Jirka Kosek e-mail: [EMAIL PROTECTED] http://www.kosek.cz -- Profesionální školení a poradenství v oblasti technologií XML. Podívejte se na náš nově spuštěný web http://DocBook.cz Podrobný přehled školení http://xmlguru.cz/skoleni/ -- Nejbližší termíny školení: ** XSLT 23.-26.10.2006 ** XML schémata 13.-15.11.2006 ** ** DocBook 11.-13.12.2006 ** XSL-FO 11.-12.12.2006 ** -- http://xmlguru.czBlog mostly about XML for English readers -- smime.p7s Description: S/MIME Cryptographic Signature
Re: JAXP and docbook-xsl Stylesheets
Simon, A HUGE thank you! Yes, after spending hours trying to figure this out, all I needed to do was change this: transformer.setOutputProperty(OutputKeys.ENCODING,UTF-8); to this: transformer.setOutputProperty(OutputKeys.ENCODING,ISO-8859-1); -- Don This sounds like an encoding problem. When your xml file has latin1 encoding (iso-8859-1), non-breaking space is a single character of value A0. When your xml file has utf-8 encoding, non-breaking space consist of two bytes. If you mix both, that is, when the xml file declares utf-8 encoding but non-breaking space is written in the latin-1 manner as a single byte A0, your XML file is not valid. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]