Hello,
I'm a developer working on a project to convert PDF files to SVG, using pdfbox
and batik. Pdfbox already contains methods to draw the PDF onto a Graphics2D
object, with the goal to export to JPG, GIF, etc. However, when I supply a
SVGGraphics2D object, I observe two problems with the text (probably due to the
same issue):
- the text is distorted - badly positioned, wrong size, etc.
- certain characters don't appear. For example, the euro symbol. This
particular character actually causes SVGGraphics2D::drawString to put an
invalid character (0x02) in the XML. The string passed to drawString contains
a single byte, 0x02; however, in the PDF this character is mapped to a type1
font and (I think) describes how to draw it.
Example code:
doc = PDDocument.load(url);DOMImplementation domImpl =
GenericDOMImplementation.getDOMImplementation();Document document =
domImpl.createDocument("http://www.w3.org/2000/svg", "svg", null);SVGGraphics2D
graphics = new SVGGraphics2D(document);....PageDrawer.drawPage(graphics, page,
pageDimension);File outFile = new File("out.svg");Writer out = new
OutputStreamWriter(new FileOutputStream(outFile),
"UTF-8");graphics.stream(out);out.close();
I realize that the problem seems like it may be with pdfbox; however, the
output is fine when exporting to, say, JPG (in which case the graphics object
is a SunGraphics2D). I looked at the source code but I'm afraid it's a bit
over my head; the biggest thing I can see is that the algorithms are completely
different. :) The output is correct when I manually set
generatorCtx.svgFont=true, but of course this makes the output file bigger
(10MB instead of 8MB).
Any help on this issue would be greatly appreciated. If needed, I can send a
PDF to duplicate the problem.
Thank you for your time,
Kelsey Rider