Tilman Hausherr created TIKA-3246: ------------------------------------- Summary: IllegalArgumentException when generation of appearances fails Key: TIKA-3246 URL: https://issues.apache.org/jira/browse/TIKA-3246 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.25 Reporter: Tilman Hausherr Attachments: REDHAT-1301016-0.pdf
{noformat} java.lang.IllegalArgumentException: No glyph for U+0041 (A) in font BZZZZZ+Aladin-Regular at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.encode(PDCIDFontType2.java:372) at org.apache.pdfbox.pdmodel.font.PDType0Font.encode(PDType0Font.java:422) at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:332) at org.apache.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:363) at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.calculateFontSize(AppearanceGeneratorHelper.java:859) at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.insertGeneratedAppearance(AppearanceGeneratorHelper.java:494) at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceContent(AppearanceGeneratorHelper.java:422) at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(AppearanceGeneratorHelper.java:232) at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:264) at org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.refreshAppearances(PDAcroForm.java:327) at org.apache.pdfbox.pdmodel.fixup.processor.AcroFormGenerateAppearancesProcessor.process(AcroFormGenerateAppearancesProcessor.java:54) at org.apache.pdfbox.pdmodel.fixup.AcroFormDefaultFixup.apply(AcroFormDefaultFixup.java:56) at org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAcroForm(PDDocumentCatalog.java:132) at org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAcroForm(PDDocumentCatalog.java:113) at org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:267) {noformat} This is related to a change in PDFBox in {{PDDocumentCatalog.getAcroForm()}}, we try to "fix" fields when they exist as annotations but not as fields. I wonder if this is needed at all. -- This message was sent by Atlassian Jira (v8.3.4#803005)