[ https://issues.apache.org/jira/browse/PDFBOX-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983152#comment-14983152 ]
Daniel Persson commented on PDFBOX-3073: ---------------------------------------- Yes but if you use all the information from the PDF with the local coorinates and your function in the PDFTextStreamEngine.java then all data is in the wrong place when you actually have media and crop boxes that differs in size. I've ran about 500 examples and get the wrong placement of text every time. But if I change this to media box and then recalculate the data to the crop box after the data has been extracted I get the correct positions. > Change to use media box for page size instead of cropbox. > --------------------------------------------------------- > > Key: PDFBOX-3073 > URL: https://issues.apache.org/jira/browse/PDFBOX-3073 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 2.0.0 > Reporter: Daniel Persson > Priority: Minor > Labels: github-import > Fix For: 2.0.0 > > Attachments: mediabox_for_content.patch > > > For PDF documents where media box is larger or smaller than crop box the > content get squeezed or stretched. > For PDF content the media box should be used as the page size. > More information about this at > http://www.prepressure.com/pdf/basics/page-boxes -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org