[jira] Resolved: (PDFBOX-571) Dubious handling of word spacing (Tw)

JIRA Sat, 28 Nov 2009 03:38:47 -0800

     [ 
https://issues.apache.org/jira/browse/PDFBOX-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andreas Lehmkühler resolved PDFBOX-571.
---------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0.0

Villus explanation seems reasonable to me. So, I've tested the patch and it 
works fine.

- the rendering of the attached sample pdf is more acurate (not perfect, but 
better)
- the extracted text of the attached sample pdf is more acurate too
- the other test cases are working like before

Thanks to Villu for his contribution

> Dubious handling of word spacing (Tw)
> -------------------------------------
>
>                 Key: PDFBOX-571
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-571
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction, Utilities
>    Affects Versions: 0.8.0-incubator
>            Reporter: Villu Ruusmann
>             Fix For: 1.0.0
>
>         Attachments: PDFStreamEngine.patch, pg_0005.pdf, pg_0005_selectall.png
>
>
> Wanted to provide a contrary case for the current handling of word spacing.
> The sample page (pg_0005.pdf) uses a Type1C font for text rendering. The 
> problem is that this Type1C font uses a custom encoding where the code values 
> are assigned sequentially starting from the code value of 1. Thus the code 
> value 32 is assigned to a digit "3", not to a space character " " as one 
> would expect.
> The PDF producer software has (mis-)used word spacing to break up longer 
> character sequences. For example, on table line 3, the character sequence 
> "0.831.05" is broken into two cells "0.83" and "1.05". Other uses of this 
> "optimization" can be seen when the sample page is opened in Acrobat Reader 
> (tested on version 7.0) and the "Select all" operation is performed. I've 
> attached the screenshot of Acrobat Reader (pg_0005_selectall.png) for your 
> convenience.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PDFBOX-571) Dubious handling of word spacing (Tw)

Reply via email to