Jasper Krauter created BATIK-1328:
-------------------------------------
Summary: No support for unicode characters in U+10000 - U+10FFFF
range
Key: BATIK-1328
URL: https://issues.apache.org/jira/browse/BATIK-1328
Project: Batik
Issue Type: Bug
Components: SVG DOM
Affects Versions: 1.13
Reporter: Jasper Krauter
The SVG Transcoder checks for valid XML characters but does not take into
account characters that, due to the Java String implementation, are represented
by two Java chars (UTF-16 Surrogate Pairs). Since neither of those individual
chars are a valid XML character on their own, the transcoder fails. But the
[XML1.0 specification|https://www.w3.org/TR/xml/#charsets] does allow for those
characters.
In {{{}org.apache.batik.dom.util.DOMUtilities#contentToString{}}}, instead of
{{{}String#charAt{}}}, rather {{String#codePointAt}} should be used to extract
individual characters. Using {{{}StringBuffer#appendCodePoint{}}}, the code
points can properly appended to the output string. The methods that check for
character validity already account for code points.
Code example to reproduce the issue:
{code:java}
String svgNS = SVGDOMImplementation.SVG_NAMESPACE_URI;
Document doc =
SVGDOMImplementation.getDOMImplementation().createDocument(svgNS, "svg", null);
Element text = doc.createElementNS(svgNS, "text");
text.setTextContent("Hello, world! 👋");
doc.getDocumentElement().appendChild(text);
var transcoder = new SVGTranscoder();
TranscoderOutput out = new TranscoderOutput(new OutputStreamWriter(System.out));
TranscoderInput in = new TranscoderInput(doc);
transcoder.transcode(in, out);{code}
throws
{code:java}
Exception in thread "main" java.lang.RuntimeException: IO:Invalid character
  at
[email protected]/org.apache.batik.transcoder.svg2svg.SVGTranscoder.transcode(SVGTranscoder.java:179){code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]