Re: [docbook-apps] Re: JAXP and docbook-xsl Stylesheets

2006-09-14 Thread Jirka Kosek

Don Adams wrote:


A HUGE thank you!  Yes, after spending hours
trying to figure this out, all I needed to
do was change this:

transformer.setOutputProperty(OutputKeys.ENCODING,UTF-8);

to this:

transformer.setOutputProperty(OutputKeys.ENCODING,ISO-8859-1);


I think that the best is to omit this command completely. Then 
transformer will use encoding specified in stylesheet by xsl:output 
encoding=.../ and you can be sure that there will not be encoding 
mismatch.


--
--
  Jirka Kosek e-mail: [EMAIL PROTECTED] http://www.kosek.cz
--
  Profesionální školení a poradenství v oblasti technologií XML.
 Podívejte se na náš nově spuštěný web http://DocBook.cz
   Podrobný přehled školení http://xmlguru.cz/skoleni/
--
   Nejbližší termíny školení:
** XSLT 23.-26.10.2006 ** XML schémata 13.-15.11.2006 **
 ** DocBook 11.-13.12.2006 ** XSL-FO 11.-12.12.2006 **
--
  http://xmlguru.czBlog mostly about XML for English readers
--



smime.p7s
Description: S/MIME Cryptographic Signature


JAXP and docbook-xsl Stylesheets

2006-09-13 Thread Don Adams

I am trying to use the JAXP Transformer in my Java code
to translate valid docbook article XML 
into valid XHTML and FO XML. I am having a major problem with
the use of the non-breaking space code #160; in
the docbook-xsl stylesheets. The cause of the problem 
is perfectly described under 5. Be careful with 
nonbreaking spaces on this web page:

http://www.oreillynet.com/pub/a/oreilly/java/news/javaxslt_0801.html

To summarize, special characters in the docbook-xsl
stylesheets are transformed into the actual special characters
in the output of the transformation when the transformation
method is selected as xml.  So, for example, when
a transformation is done into FO XML, a table title in the
FO XML output contains Table 1. x; however, the spaces
after the word Table and after 1. are not the ASCII space 
character, they are a single character code 160
(a non-breaking space) which is not valid in an XML file.
From everything I've read, it seems like this is the
correct behavior and other transformers such as xsltproc
replace the single character codes with ASCII characters
on their own as a post process.

For XHTML, I set the output method to html instead of
xml for the Transformer.  This generated nbsp; in
the output instead of the single character code 160. This is
acceptable for me because browsers will display the output
of the transformation even though it is not technically valid XHTML.

For FO XML, I don't see a work-around for the problem
other than hand-modifying the docbook-xsl stylesheets
to either replace all the special characters with a
valid ASCII character, or use the disable-output-escaping
XSL attribute (which looks very difficult and support for
this attribute is not a requirement of XSLT processors).

Does anyone have any comments or suggestions?

Thanks,
Don

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: JAXP and docbook-xsl Stylesheets

2006-09-13 Thread Simon Pepping
On Wed, Sep 13, 2006 at 11:06:36AM -0500, Don Adams wrote:
 
 I am trying to use the JAXP Transformer in my Java code
 to translate valid docbook article XML 
 into valid XHTML and FO XML. I am having a major problem with
 the use of the non-breaking space code #160; in
 the docbook-xsl stylesheets. The cause of the problem 
 is perfectly described under 5. Be careful with 
 nonbreaking spaces on this web page:
 
 http://www.oreillynet.com/pub/a/oreilly/java/news/javaxslt_0801.html
 
 To summarize, special characters in the docbook-xsl
 stylesheets are transformed into the actual special characters
 in the output of the transformation when the transformation
 method is selected as xml.  So, for example, when
 a transformation is done into FO XML, a table title in the
 FO XML output contains Table 1. x; however, the spaces
 after the word Table and after 1. are not the ASCII space 
 character, they are a single character code 160
 (a non-breaking space) which is not valid in an XML file.

This sounds like an encoding problem. When your xml file has latin1
encoding (iso-8859-1), non-breaking space is a single character of
value A0. When your xml file has utf-8 encoding, non-breaking space
consist of two bytes. If you mix both, that is, when the xml file
declares utf-8 encoding but non-breaking space is written in the
latin-1 manner as a single byte A0, your XML file is not valid.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.eu

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [docbook-apps] JAXP and docbook-xsl Stylesheets

2006-09-13 Thread Jirka Kosek

Don Adams wrote:


into valid XHTML and FO XML. I am having a major problem with
the use of the non-breaking space code #160; in
the docbook-xsl stylesheets. The cause of the problem 
is perfectly described under 5. Be careful with 
nonbreaking spaces on this web page:


http://www.oreillynet.com/pub/a/oreilly/java/news/javaxslt_0801.html


This is not perfect description, but complete mess. If you have problems 
displaying files containing Unicode character with code 160 (U+00A0), 
then your browser is unable to infer encoding used correctly. This can 
be due misconfigured HTTP server, but there could be other things that 
went wrong.



To summarize, special characters in the docbook-xsl
stylesheets are transformed into the actual special characters
in the output of the transformation when the transformation
method is selected as xml.  So, for example, when
a transformation is done into FO XML, a table title in the
FO XML output contains Table 1. x; however, the spaces
after the word Table and after 1. are not the ASCII space 
character, they are a single character code 160

(a non-breaking space) which is not valid in an XML file.


This character is perfectly valid in XML file. XML file can contain any 
Unicode character in element and attribute content.



From everything I've read, it seems like this is the
correct behavior 


Seems that you have read wrong resources.


Does anyone have any comments or suggestions?


You didn't show us your code which is dealing with JAXP. Could it be 
that you are using your own stream for writing output of transformation 
and you are setting incorrect encoding on this stream?


Jirka

--
--
  Jirka Kosek e-mail: [EMAIL PROTECTED] http://www.kosek.cz
--
  Profesionální školení a poradenství v oblasti technologií XML.
 Podívejte se na náš nově spuštěný web http://DocBook.cz
   Podrobný přehled školení http://xmlguru.cz/skoleni/
--
   Nejbližší termíny školení:
** XSLT 23.-26.10.2006 ** XML schémata 13.-15.11.2006 **
 ** DocBook 11.-13.12.2006 ** XSL-FO 11.-12.12.2006 **
--
  http://xmlguru.czBlog mostly about XML for English readers
--



smime.p7s
Description: S/MIME Cryptographic Signature


Re: JAXP and docbook-xsl Stylesheets

2006-09-13 Thread Don Adams

Simon,

A HUGE thank you!  Yes, after spending hours
trying to figure this out, all I needed to
do was change this:

transformer.setOutputProperty(OutputKeys.ENCODING,UTF-8);

to this:

transformer.setOutputProperty(OutputKeys.ENCODING,ISO-8859-1);

--
Don


 This sounds like an encoding problem. When your xml 
 file has latin1 encoding (iso-8859-1), non-breaking 
 space is a single character of value A0. When your xml 
 file has utf-8 encoding, non-breaking space consist of 
 two bytes. If you mix both, that is, when the xml file
 declares utf-8 encoding but non-breaking space is written 
 in the latin-1 manner as a single byte A0, your XML file 
 is not valid.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]