[jira] [Commented] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31

Jira Thu, 13 Jun 2024 22:45:04 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854934#comment-17854934
 ]


Andreas Lehmkühler commented on PDFBOX-5838:
--------------------------------------------

Unfortunately again one of those either/or cases :-(

I tend to keep the current implementation, as not only the golden master Adobe 
follows the same rules than us, but other tools are doing the same. I've 
checked pdftotext and evince. Both are using poppler and are producing the same 
output than Adobe and PDFBox

> Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31
> --------------------------------------------------------------
>
>                 Key: PDFBOX-5838
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5838
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.32, 3.0.3 PDFBox
>            Reporter: Tilman Hausherr
>            Priority: Major
>              Labels: regression
>         Attachments: OFLSV3YFD3TDOU4YZTL2QY745W53W3DW.pdf, 
> PDFBOX-5838-0024320-reduced.pdf
>
>
> discovered in 2.0.32 regression tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31

Reply via email to