[ https://issues.apache.org/jira/browse/PDFBOX-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated PDFBOX-2377: -------------------------------- Description: On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText {noformat} 764929.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} was: On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText {noformat} 764949.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} > Apparent regression in character mapping in a few files from govdocs1 > --------------------------------------------------------------------- > > Key: PDFBOX-2377 > URL: https://issues.apache.org/jira/browse/PDFBOX-2377 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.8.7 > Reporter: Tim Allison > Assignee: Andreas Lehmkühler > Priority: Minor > Labels: regression > Attachments: 290991-6.txt, 290991-7.txt, 290991-8.txt, 290991.pdf, > 312888.pdf, 357094-1.8.6.txt, 357094-1.8.8.txt, 357094.pdf, 764929.pdf, > PDFBOX2247-701542.pdf > > > On a small number of test files in a 50k sample of pdfs from govdocs1, it > appears that some characters are no longer being extracted correctly in 1.8.7 > when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText > {noformat} > 764929.pdf > 1.8.6: Lang, Astrophysical Data: Planets and Stars > 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, > {noformat} > and > {noformat} > 312888.pdf > 1.8.6: Self-Assessment \u0026 Capability Description > 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)