Hervé Girod wrote:

I tried to setup the OutputStream with UTF-8, but it doesn't change
anything. It is really a minor problem, because the output SVG file can be
opened and shows what it must, but it is slighly different from the
original, at least when Unicode / non ASCII characters are used...

Ahh, I understand the source document used escaped text and the output uses multi-byte sequences in UTF-8. In general there is no way to resolve this issue. The escaped text is converted to UTF-16 (Java Strings) before Batik ever gets to look at them. So while in your very restricted case your solution may work, because you know that these 'wide' characters are always written using hex escape in general a source document could be using either mechanism, and there is essentially no way for Batik to know which chars should be written which way.

   So my basic response is that if you are happy with the 'fix' then
feel free to continue to use it (you may want to rename the class
and/or move it into a package you control), but I don't think it is
appropriate for the Batik package to incorporate it.

I enclose a Java file showing the result (can be tested with the fontArabic
example in the samples directory). I used it with the 1.5 version of Batik.
The first file to set in the test is the input file (for example:
fontArabic), the second is the output directory for the output files (two
SVG files are generated: one by using a UTF-8 encoding, another with
US-ASCII).

I think that the two classes involved in this process are those that are
used in SVG generation : XmlWriter, and in SVG Transcoding (OutputManager).


I embed my fonts in the SVG file, but as some characters are not in the

ASCII range, I use Unicode definition as in the


fontArabic.svg sample. However, when I write the corresponding DOM

document to a File, the Unicode hexadecimal


definition "&#x..." is converted to a String.


 I think that when the content is written these characters should
be encoded using UTF-8/16 depending on the output stream.  The problem
is that the decision to encode with hex is a bit more complex than
just is the char > 0xFF (for UTF-8 the break point is actually 0x7F).


  You may need to setup the output stream differently to get the
desired behavior.  Take a look at OutputStreamWriter.


------------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to