[ 
https://issues.apache.org/jira/browse/PDFBOX-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lee van Hooff updated PDFBOX-3435:
----------------------------------
    Attachment: text-extraction-issues.pdf

> Text extraction - words on same line detection failing in 2.x
> -------------------------------------------------------------
>
>                 Key: PDFBOX-3435
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3435
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.0, 2.0.1, 2.0.2
>            Reporter: Lee van Hooff
>         Attachments: text-extraction-issues.pdf
>
>
> The ability to extract a line of text  as it appears in the PDF is no longer 
> working in the 2.x version of pdfbox.
> java -jar pdfbox-app-1.8.4.jar ExtractText -console -sort 
> ~/Desktop/text-extraction-issues.pdf
> results in:
> {noformat}
> . . .
> Your Code        Our Code                            Description              
>                                 Qty    Price Ex   Total Ex  
> 11SP             100129630       IRWIN VICE-GRIP 11 C-CLAMP SWIVEL PAD        
>    4         00.00      000.00
> IR-0352          100094584       IRWIN 600MM TOOL BAG                         
>    1         00.00       00.00
> EM81.9           100088913       EMPIRE TORPEDO LEVEL ALUMINIUM               
>    1         00.00       00.00
> 20566-618R       100023443       LENOX RECIPRO BLADE 150X20X0.9MM 18TPI 5P    
>     3          0.00       00.00
> . . .
> {noformat}
> while
> java -jar pdfbox-app-2.0.2.jar ExtractText -console -sort 
> ~/Desktop/text-extraction-issues.pdf
> results in:
> {noformat}
> . . .
> Your Code        Our Code                            Description              
>                                 Qty    Price Ex   Total Ex  
> IRWIN VICE-GRIP 11 C-CLAMP SWIVEL PAD    
> 11SP             100129630              4         00.00      000.00
> IRWIN 600MM TOOL BAG                     
> IR-0352          100094584              1         00.00       00.00
> EMPIRE TORPEDO LEVEL ALUMINIUM           
> EM81.9           100088913              1         00.00       00.00
> LENOX RECIPRO BLADE 150X20X0.9MM 18TPI 5P
> 20566-618R       100023443              3          0.00       00.00
> . . .
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to