[ https://issues.apache.org/jira/browse/TIKA-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr updated TIKA-3246: ---------------------------------- Attachment: TIKA-3246.patch > IllegalArgumentException when generation of appearances fails > ------------------------------------------------------------- > > Key: TIKA-3246 > URL: https://issues.apache.org/jira/browse/TIKA-3246 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.25 > Reporter: Tilman Hausherr > Priority: Major > Attachments: TIKA-3246.patch > > > {noformat} > java.lang.IllegalArgumentException: No glyph for U+0041 (A) in font > BZZZZZ+Aladin-Regular > at > org.apache.pdfbox.pdmodel.font.PDCIDFontType2.encode(PDCIDFontType2.java:372) > at > org.apache.pdfbox.pdmodel.font.PDType0Font.encode(PDType0Font.java:422) > at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:332) > at org.apache.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:363) > at > org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.calculateFontSize(AppearanceGeneratorHelper.java:859) > at > org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.insertGeneratedAppearance(AppearanceGeneratorHelper.java:494) > at > org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceContent(AppearanceGeneratorHelper.java:422) > at > org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(AppearanceGeneratorHelper.java:232) > at > org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:264) > at > org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.refreshAppearances(PDAcroForm.java:327) > at > org.apache.pdfbox.pdmodel.fixup.processor.AcroFormGenerateAppearancesProcessor.process(AcroFormGenerateAppearancesProcessor.java:54) > at > org.apache.pdfbox.pdmodel.fixup.AcroFormDefaultFixup.apply(AcroFormDefaultFixup.java:56) > at > org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAcroForm(PDDocumentCatalog.java:132) > at > org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAcroForm(PDDocumentCatalog.java:113) > at > org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:267) > {noformat} > This is related to a change in PDFBox in {{PDDocumentCatalog.getAcroForm()}}, > we try to "fix" fields when they exist as annotations but not as fields. I > wonder if this is needed at all. > It happens with several files, among them the two AML files of PDFBOX-4086. -- This message was sent by Atlassian Jira (v8.3.4#803005)