[ 
https://issues.apache.org/jira/browse/PDFBOX-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016194#comment-16016194
 ] 

Tilman Hausherr commented on PDFBOX-3796:
-----------------------------------------

Did you try the sort option?

> Content of different table cells concatenated on text extraction in some cases
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-3796
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3796
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.7, 3.0.0
>            Reporter: Yauheni Salopiy
>              Labels: table, text_extraction
>         Attachments: fdl_relpub_foi_dailyre0313172017_2.0.6.txt, 
> fdl_relpub_foi_dailyre0313172017_3.0.txt, fdl_relpub_foi_dailyre0313172017.pdf
>
>
> Content of different table cells concatenated on text extraction in some 
> cases.
> Please, see in attachments one of the problematic pdf files and plain text 
> files extracted by PDFBox 2.0.6 and 3.0.0 (trunk)
> Snippet from the extracted text containing concatenated text content of 
> different cells:
>  INDIVIDUAL REC{color:#d04437}SJ{color}eanette 
> Bleckle{color:#d04437}y0{color}3/17/2017/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to