[ 
https://issues.apache.org/jira/browse/PDFBOX-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-759.
---------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.4.0
         Assignee: Andreas Lehmkühler

I attached the extracted text. It looks good to me. Especially the mentione 
page 80 looks a lot better than the adobe reader copy and paste result.

> Special characters not extracted
> --------------------------------
>
>                 Key: PDFBOX-759
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-759
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.1.0, 1.2.0
>         Environment: all
>            Reporter: Sebastian Freuck
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.4.0
>
>         Attachments: Mathematik_Stochastik.pdf, 
> PDFBOX759-Mathematik_Stochastik.txt
>
>
> When trying to extract characters for mathematic formulas, there appear to be 
> lots of characters that don't seem to have any meaning.
> Take the example on page 80 the last formula with the binomial coefficient. 
> The first opening bracket, when extracted using the Foxit Reader or Adobe 
> Reader gets a character with the int value 18 and the closing bracket is the 
> int value 19. Now when I look at the TextPosition objects using PDFBox, there 
> is one character to the left of the 5 and that one has the glyph name 
> spacehackarabic/space and the int value 32. 
> The next problem is that there seems to be a character at the same position 
> as the 5, a 'controlLF'. What does it do at the same position as that number? 
> Mpw after the character 2 are 3 other characters, another 'controlLF' and two 
> 'spacehackarabic/space'. There is no indication whatsoever abouth the 
> bracket. What do those extra characters mean? And why doesn't it show the 
> character for the bracket that I am able to extract using the other PDF 
> readers?
> The PDF can be downloaded from 
> http://upload.wikimedia.org/wikibooks/de/f/f6/Mathematik_Stochastik.pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to