[ 
https://issues.apache.org/jira/browse/PDFBOX-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalie Bureanu updated PDFBOX-1553:
------------------------------------

    Description: 
Hello,

Preamble: We are glad to use PDFBox and I personally grateful to all developers 
who sustain this project. It is good work, guys!

We have one problem. For our application purposes we extract from pdf "char by 
char" with rispective coordinates for each char. (see attached Parser)
After this we group chars into the words. We noticed that for some pdf 
documents we have a strange "offset" for extracted rect coordinates. (see 
screens)

The offset is seems to be incremental (not sure) - at left top corner of 
document is near to real coordinates of character, but at right bottom corner 
is near to 0.5 cm..
If I make selection in Adobe Reader - it seems all ok.

I attached two pdf files with offset to this post.
If you want to see the offset "in action" you can use our service to do it at 
http://pdf2data.cloudforpeople.com/ (Please do not consider it as advertising)

Please can you test these files and tell me if it is a really bug?
How we can resolve it?

Thanks,
Vitalie


  was:
Hello,

Preamble: We are glad to use PDFBox and I personally grateful to all developers 
who sustain this project. It is good work, guys!

We have one problem. For our application purposes we extract from pdf "char by 
char" with rispective coordinates for each char. (see attached Parser)
After this we group chars into the words. We noticed that for some pdf 
documents we have a strange "offset" for extracted rect coordinates. (see 
screens)

The offset is seems to be incremental (not sure) - at left top corner of 
document is near to real coordinates of character, but at right bottom corner 
is near to 0.5 cm..
If I make selection in Adobe Reader - it seems all ok.

I attached two pdf files with offset to this post.
If you want to see the offset "in action" you can use our service to do it at 
http://pdf2data.cloudforpeople.com/ (Please do not consider it as advertising)

Please can you test these files and tell me if it is a really bug?


    
> Offset of extracted coordinates
> -------------------------------
>
>                 Key: PDFBOX-1553
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1553
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>         Environment: Linux Ubuntu 64 bit, Java
>            Reporter: Vitalie Bureanu
>            Priority: Minor
>              Labels: offset
>         Attachments: EnSt10_offset.pdf, EnSt11_offset.pdf, Extracted 
> coordinates of rects.jpg, Parser.java, Selection in Adobe Reader.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Hello,
> Preamble: We are glad to use PDFBox and I personally grateful to all 
> developers who sustain this project. It is good work, guys!
> We have one problem. For our application purposes we extract from pdf "char 
> by char" with rispective coordinates for each char. (see attached Parser)
> After this we group chars into the words. We noticed that for some pdf 
> documents we have a strange "offset" for extracted rect coordinates. (see 
> screens)
> The offset is seems to be incremental (not sure) - at left top corner of 
> document is near to real coordinates of character, but at right bottom corner 
> is near to 0.5 cm..
> If I make selection in Adobe Reader - it seems all ok.
> I attached two pdf files with offset to this post.
> If you want to see the offset "in action" you can use our service to do it at 
> http://pdf2data.cloudforpeople.com/ (Please do not consider it as advertising)
> Please can you test these files and tell me if it is a really bug?
> How we can resolve it?
> Thanks,
> Vitalie

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to