[
https://issues.apache.org/jira/browse/PDFBOX-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107826#comment-14107826
]
John Hewson commented on PDFBOX-2262:
-------------------------------------
My latest commits tackle the multi-byte CMap problem, which wasn't handle
correctly in PDFBox previously, and with my previous changes had resulted in a
situation where we had bad behaviour due to new code being correct but existing
code relying on it being buggy. As I'd already planned to this as part of
PDFBOX-2149, I took the time to finally refactor CMaps and Encodings, in
particular CMaps with variable-length character codes.
Hopefully you'll find the new code very easy to understand (there are 457 fewer
lines :)), where once we had:
{code}
int codeLength;
for (int i = 0; i < string.length; i += codeLength)
{
// Decode the value to a Unicode character
codeLength = 1;
String unicode = font.encode(string, i, codeLength);
int[] charCodes;
if (unicode == null && i + 1 < string.length)
{
// maybe a multibyte encoding
codeLength++;
unicode = font.encode(string, i, codeLength);
charCodes = new int[] { font.getCodeFromArray(string, i, codeLength) };
}
else
{
charCodes = new int[] { font.getCodeFromArray(string, i, codeLength) };
}
...
{code}
We now have:
{code}
InputStream in = new ByteArrayInputStream(string);
while (in.available() > 0)
{
int code = font.readCode(in);
String unicode = font.toUnicode(code);
...
{code}
Hopefully I didn't break too much in the process, the exceptions on the
following files should now be fixed:
PDFBOX-1283.pdf <== still has rendering issues
PDFBOX-1421.pdf <== still has rendering issues
PDFBOX-1422.pdf
FOP-2252.pdf
freesanstest.pdf
None of the other test files with rendering issues are affected, they're still
buggy, I'll take a look at them soon.
> Remove usage of AWT fonts
> -------------------------
>
> Key: PDFBOX-2262
> URL: https://issues.apache.org/jira/browse/PDFBOX-2262
> Project: PDFBox
> Issue Type: Improvement
> Components: PDModel, Rendering
> Affects Versions: 2.0.0
> Reporter: John Hewson
> Assignee: John Hewson
> Attachments: ELVIA-Reiserucktritt-Vollschutz.pdf-1.png,
> FreeSansTest.pdf, PDFBOX-1094-094730.pdf-1.png, PDFBOX-1770.pdf-1.png,
> bugzilla886049.pdf, bugzilla886049.pdf-1.png
>
>
> We're still using AWT fonts to render the "standard 14" built-in fonts, which
> causes rendering problems and encoding issues (see PDFBOX-2140). We're also
> using AWT for some fallback fonts.
> Removal of these AWT fonts isn't too difficult, we need to load the fonts
> using the existing PDFFontManager mechanism which has recently been added.
> All missing TrueType fonts loaded from disk have been using SystemFontManager
> for a number of weeks now.
> We should ship some sensible default fonts with PDFBox, such as the
> Liberation fonts (see PDFBOX-2169, PDFBOX-2263), in case PDFFontManager can't
> find anything suitable, rather than falling back to the default TTF font, but
> by default we'll probe the system for suitable fonts.
--
This message was sent by Atlassian JIRA
(v6.2#6252)