[ https://issues.apache.org/jira/browse/PDFBOX-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adina Toma updated PDFBOX-1508: ------------------------------- Attachment: files.zip Attached compressed and uncompressed pdf file, as well the pdf's resulting from extracting the first page from both. I also modified the uncompressed first page by replacing "mediabox" with spaces and it results in cutting the text, just like the compressed first page. > Extracting page causes incorrect clipping > ----------------------------------------- > > Key: PDFBOX-1508 > URL: https://issues.apache.org/jira/browse/PDFBOX-1508 > Project: PDFBox > Issue Type: Bug > Components: Parsing, PDFReader > Affects Versions: 1.7.1 > Environment: Windows 7, Windows XP, Windows Server 2008 > Reporter: Adina Toma > Attachments: files.zip > > > I have a compressed pdf from which i extract pages (each page will become an > individual pdf file). The extracted pages are clipped incorrectly (text is > cut), as opposed to original pdf that is not clipped. I traced it down to a > missing mediabox attribute in the extracted pages, which exists in the > original file as an attribute on all pages. Using the same file, but > uncompressed, the extracted pages are not cut and the mediabox attribute is > present. > The main code (without initializations and checks) used to load and extract > pages is the following: > temp = new File("e:/temp.tmp"); > rand = new RandomAccessFile(temp,"rw"); > doc = PDDocument.loadNonSeq(file,rand); > PDPage page = (PDPage) doc.getPrintable(pageIndex); > PDDocument newDoc = new PDDocument(); > newDoc.importPage(page); > newDoc.close(); > doc.close(); > rand.close(); > temp.delete(); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira