[ https://issues.apache.org/jira/browse/PDFBOX-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615638#comment-13615638 ]
Andreas Lehmkühler commented on PDFBOX-1553: -------------------------------------------- Maybe your issue is related to PDFBOX-1547 as your pdf has a cropbox too. Can you check this? > Offset of extracted coordinates > ------------------------------- > > Key: PDFBOX-1553 > URL: https://issues.apache.org/jira/browse/PDFBOX-1553 > Project: PDFBox > Issue Type: Bug > Affects Versions: 1.8.0 > Environment: Linux Ubuntu 64 bit, Java > Reporter: Vitalie Bureanu > Priority: Minor > Labels: offset > Attachments: EnSt10_offset.pdf, EnSt11_offset.pdf, Extracted > coordinates of rects.jpg, Parser.java, Selection in Adobe Reader.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Hello, > Preamble: We are glad to use PDFBox and I personally grateful to all > developers who sustain this project. It is good work, guys! > We have one problem. For our application purposes we extract from pdf "char > by char" with rispective coordinates for each char. (see attached Parser) > After this we group chars into the words. We noticed that for some pdf > documents we have a strange "offset" for extracted rect coordinates. (see > screens) > The offset is seems to be incremental (not sure) - at left top corner of > document is near to real coordinates of character, but at right bottom corner > is near to 0.5 cm.. > If I make selection in Adobe Reader - it seems all ok. > I attached two pdf files with offset to this post. > If you want to see the offset "in action" you can use our service to do it at > http://pdf2data.cloudforpeople.com/ (Please do not consider it as advertising) > Please can you test these files and tell me if it is a really bug? > How we can resolve it? > Thanks, > Vitalie -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira