[jira] [Updated] (PDFBOX-2377) Apparent regression in character mapping in a few files from govdocs1
[ https://issues.apache.org/jira/browse/PDFBOX-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-2377: --- Affects Version/s: (was: 2.0.0) Apparent regression in character mapping in a few files from govdocs1 - Key: PDFBOX-2377 URL: https://issues.apache.org/jira/browse/PDFBOX-2377 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.8.7 Reporter: Tim Allison Assignee: Andreas Lehmkühler Priority: Minor Labels: regression Fix For: 1.8.8 Attachments: 290991-6.txt, 290991-7.txt, 290991-8.txt, 290991.pdf, 312888.pdf, 357094-1.8.6.txt, 357094-1.8.8.txt, 357094.pdf, 764929.pdf, PDFBOX2247-701542.pdf On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText {noformat} 764929.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2377) Apparent regression in character mapping in a few files from govdocs1
[ https://issues.apache.org/jira/browse/PDFBOX-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-2377: --- Fix Version/s: (was: 2.0.0) Apparent regression in character mapping in a few files from govdocs1 - Key: PDFBOX-2377 URL: https://issues.apache.org/jira/browse/PDFBOX-2377 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.8.7 Reporter: Tim Allison Assignee: Andreas Lehmkühler Priority: Minor Labels: regression Fix For: 1.8.8 Attachments: 290991-6.txt, 290991-7.txt, 290991-8.txt, 290991.pdf, 312888.pdf, 357094-1.8.6.txt, 357094-1.8.8.txt, 357094.pdf, 764929.pdf, PDFBOX2247-701542.pdf On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText {noformat} 764929.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2377) Apparent regression in character mapping in a few files from govdocs1
[ https://issues.apache.org/jira/browse/PDFBOX-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-2377: --- Fix Version/s: 2.0.0 Apparent regression in character mapping in a few files from govdocs1 - Key: PDFBOX-2377 URL: https://issues.apache.org/jira/browse/PDFBOX-2377 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.8.7, 2.0.0 Reporter: Tim Allison Assignee: Andreas Lehmkühler Priority: Minor Labels: regression Fix For: 1.8.8, 2.0.0 Attachments: 290991-6.txt, 290991-7.txt, 290991-8.txt, 290991.pdf, 312888.pdf, 357094-1.8.6.txt, 357094-1.8.8.txt, 357094.pdf, 764929.pdf, PDFBOX2247-701542.pdf On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText {noformat} 764929.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2377) Apparent regression in character mapping in a few files from govdocs1
[ https://issues.apache.org/jira/browse/PDFBOX-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-2377: --- Fix Version/s: 1.8.8 Apparent regression in character mapping in a few files from govdocs1 - Key: PDFBOX-2377 URL: https://issues.apache.org/jira/browse/PDFBOX-2377 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.8.7, 2.0.0 Reporter: Tim Allison Assignee: Andreas Lehmkühler Priority: Minor Labels: regression Fix For: 1.8.8, 2.0.0 Attachments: 290991-6.txt, 290991-7.txt, 290991-8.txt, 290991.pdf, 312888.pdf, 357094-1.8.6.txt, 357094-1.8.8.txt, 357094.pdf, 764929.pdf, PDFBOX2247-701542.pdf On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText {noformat} 764929.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2377) Apparent regression in character mapping in a few files from govdocs1
[ https://issues.apache.org/jira/browse/PDFBOX-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-2377: --- Affects Version/s: 2.0.0 Apparent regression in character mapping in a few files from govdocs1 - Key: PDFBOX-2377 URL: https://issues.apache.org/jira/browse/PDFBOX-2377 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.8.7, 2.0.0 Reporter: Tim Allison Assignee: Andreas Lehmkühler Priority: Minor Labels: regression Fix For: 1.8.8, 2.0.0 Attachments: 290991-6.txt, 290991-7.txt, 290991-8.txt, 290991.pdf, 312888.pdf, 357094-1.8.6.txt, 357094-1.8.8.txt, 357094.pdf, 764929.pdf, PDFBOX2247-701542.pdf On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText {noformat} 764929.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2377) Apparent regression in character mapping in a few files from govdocs1
[ https://issues.apache.org/jira/browse/PDFBOX-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated PDFBOX-2377: Description: On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText {noformat} 764929.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} was: On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText {noformat} 764949.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} Apparent regression in character mapping in a few files from govdocs1 - Key: PDFBOX-2377 URL: https://issues.apache.org/jira/browse/PDFBOX-2377 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.8.7 Reporter: Tim Allison Assignee: Andreas Lehmkühler Priority: Minor Labels: regression Attachments: 290991-6.txt, 290991-7.txt, 290991-8.txt, 290991.pdf, 312888.pdf, 357094-1.8.6.txt, 357094-1.8.8.txt, 357094.pdf, 764929.pdf, PDFBOX2247-701542.pdf On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText {noformat} 764929.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2377) Apparent regression in character mapping in a few files from govdocs1
[ https://issues.apache.org/jira/browse/PDFBOX-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-2377: --- Attachment: PDFBOX2247-701542.pdf The file from PDFBOX-2247 as the origin link is broken Apparent regression in character mapping in a few files from govdocs1 - Key: PDFBOX-2377 URL: https://issues.apache.org/jira/browse/PDFBOX-2377 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.8.7 Reporter: Tim Allison Assignee: Andreas Lehmkühler Priority: Minor Labels: regression Attachments: 290991-6.txt, 290991-7.txt, 290991-8.txt, 290991.pdf, 312888.pdf, 357094-1.8.6.txt, 357094-1.8.8.txt, 357094.pdf, 764929.pdf, PDFBOX2247-701542.pdf On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText {noformat} 764949.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2377) Apparent regression in character mapping in a few files from govdocs1
[ https://issues.apache.org/jira/browse/PDFBOX-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2377: Attachment: 357094-1.8.8.txt 357094-1.8.6.txt 357094.pdf Same problem for 357094.pdf Apparent regression in character mapping in a few files from govdocs1 - Key: PDFBOX-2377 URL: https://issues.apache.org/jira/browse/PDFBOX-2377 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.8.7 Reporter: Tim Allison Priority: Minor Labels: regression Attachments: 312888.pdf, 357094-1.8.6.txt, 357094-1.8.8.txt, 357094.pdf, 764929.pdf On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText {noformat} 764949.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2377) Apparent regression in character mapping in a few files from govdocs1
[ https://issues.apache.org/jira/browse/PDFBOX-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2377: Attachment: 290991-8.txt 290991-7.txt 290991-6.txt 290991.pdf 290991.pdf is almost good again - except for : where there is a .. Apparent regression in character mapping in a few files from govdocs1 - Key: PDFBOX-2377 URL: https://issues.apache.org/jira/browse/PDFBOX-2377 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.8.7 Reporter: Tim Allison Assignee: Andreas Lehmkühler Priority: Minor Labels: regression Attachments: 290991-6.txt, 290991-7.txt, 290991-8.txt, 290991.pdf, 312888.pdf, 357094-1.8.6.txt, 357094-1.8.8.txt, 357094.pdf, 764929.pdf On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText {noformat} 764949.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2377) Apparent regression in character mapping in a few files from govdocs1
[ https://issues.apache.org/jira/browse/PDFBOX-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated PDFBOX-2377: Description: On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText {noformat} 764949.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} was: On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly. I ran pdfbox's app.jar with ExtractText {noformat} 764949.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} Apparent regression in character mapping in a few files from govdocs1 - Key: PDFBOX-2377 URL: https://issues.apache.org/jira/browse/PDFBOX-2377 Project: PDFBox Issue Type: Bug Reporter: Tim Allison Attachments: 312888.pdf, 764929.pdf On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText {noformat} 764949.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2377) Apparent regression in character mapping in a few files from govdocs1
[ https://issues.apache.org/jira/browse/PDFBOX-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated PDFBOX-2377: Attachment: 312888.pdf 764929.pdf Apparent regression in character mapping in a few files from govdocs1 - Key: PDFBOX-2377 URL: https://issues.apache.org/jira/browse/PDFBOX-2377 Project: PDFBox Issue Type: Bug Reporter: Tim Allison Attachments: 312888.pdf, 764929.pdf On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly. I ran pdfbox's app.jar with ExtractText {noformat} 764949.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2377) Apparent regression in character mapping in a few files from govdocs1
[ https://issues.apache.org/jira/browse/PDFBOX-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated PDFBOX-2377: Affects Version/s: 1.8.7 Apparent regression in character mapping in a few files from govdocs1 - Key: PDFBOX-2377 URL: https://issues.apache.org/jira/browse/PDFBOX-2377 Project: PDFBox Issue Type: Bug Affects Versions: 1.8.7 Reporter: Tim Allison Priority: Minor Attachments: 312888.pdf, 764929.pdf On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText {noformat} 764949.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2377) Apparent regression in character mapping in a few files from govdocs1
[ https://issues.apache.org/jira/browse/PDFBOX-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated PDFBOX-2377: Priority: Minor (was: Major) Apparent regression in character mapping in a few files from govdocs1 - Key: PDFBOX-2377 URL: https://issues.apache.org/jira/browse/PDFBOX-2377 Project: PDFBox Issue Type: Bug Affects Versions: 1.8.7 Reporter: Tim Allison Priority: Minor Attachments: 312888.pdf, 764929.pdf On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText {noformat} 764949.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2377) Apparent regression in character mapping in a few files from govdocs1
[ https://issues.apache.org/jira/browse/PDFBOX-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated PDFBOX-2377: Component/s: Text extraction Apparent regression in character mapping in a few files from govdocs1 - Key: PDFBOX-2377 URL: https://issues.apache.org/jira/browse/PDFBOX-2377 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.8.7 Reporter: Tim Allison Priority: Minor Attachments: 312888.pdf, 764929.pdf On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText {noformat} 764949.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde, {noformat} and {noformat} 312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)