[ 
https://issues.apache.org/jira/browse/PDFBOX-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yauheni Salopiy updated PDFBOX-3796:
------------------------------------
    Description: 
Content of different table cells concatenated on text extraction in some cases.

Please, see in attachments one of the problematic pdf files and plain text 
files extracted by PDFBox 2.0.6 and 3.0.0 (trunk)
Snippet from the extracted text containing concatenated text content of 
different cells:

 INDIVIDUAL REC{color:#d04437}SJ{color}eanette 
Bleckle{color:#d04437}y0{color}3/17/2017/


  was:
Content of different table cells concatenated on text extraction in some cases.

Please, see in attachments one of the problematic pdf files and plain text 
files extracted by PDFBox 2.0.6 and 3.0.0


> Content of different table cells concatenated on text extraction in some cases
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-3796
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3796
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.7, 3.0.0
>            Reporter: Yauheni Salopiy
>              Labels: table, text_extraction
>         Attachments: fdl_relpub_foi_dailyre0313172017_2.0.6.txt, 
> fdl_relpub_foi_dailyre0313172017_3.0.txt, fdl_relpub_foi_dailyre0313172017.pdf
>
>
> Content of different table cells concatenated on text extraction in some 
> cases.
> Please, see in attachments one of the problematic pdf files and plain text 
> files extracted by PDFBox 2.0.6 and 3.0.0 (trunk)
> Snippet from the extracted text containing concatenated text content of 
> different cells:
>  INDIVIDUAL REC{color:#d04437}SJ{color}eanette 
> Bleckle{color:#d04437}y0{color}3/17/2017/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to