[ https://issues.apache.org/jira/browse/PDFBOX-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12975334#action_12975334 ]
Adam Nichols commented on PDFBOX-905: ------------------------------------- The core issue seems to be that getRGBImage returned NULL and there is an unsupported/disabled operation. I tried to duplicate this and found an issue which I had to resolve before I could get the same problem and stacktrace as Jan. First, the only value in the Map returned by doc.getPageMap() was the Integer 1. This is because doc.getPageMap() calls PDDocument::generatePageMap() which calls processListOfPageReferences() which correctly detects that it has a page and not an array of pages, but then proceeds to ignore the actual PDPage object and use the reference (object ID, revision number) and parseCatalogObject() puts new Integer(1) into the PageMap for the value (the key being the object ID and revision number). I'm do not understand why it puts the new Integer in the PageMap. I am not certain, but I think this is incorrect. I'll attach a patch which show how I believe it should work. This is not the same issue that Jan reported, but I was not able to duplicate the issue until I applied my patch. Once this patch was in place, I was able to duplicate the issue without any problem. Here are the error messages I found in the logs when I tested: Dec 27, 2010 7:37:53 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process WARNING: getRGBImage returned NULL Dec 27, 2010 7:37:53 PM org.apache.pdfbox.util.PDFStreamEngine processOperator INFO: unsupported/disabled operation: i Dec 27, 2010 7:37:57 PM org.apache.pdfbox.util.PDFStreamEngine processOperator WARNING: java.lang.NullPointerException java.lang.NullPointerException at org.apache.fontbox.cff.AFMFormatter.printFontMetrics(AFMFormatter.java:76) at org.apache.fontbox.cff.AFMFormatter.printFont(AFMFormatter.java:57) at org.apache.fontbox.cff.AFMFormatter.format(AFMFormatter.java:50) at org.apache.pdfbox.pdmodel.font.PDType1CFont.prepareFontMetric(PDType1CFont.java:529) at org.apache.pdfbox.pdmodel.font.PDType1CFont.load(PDType1CFont.java:404) at org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:123) at org.apache.pdfbox.pdmodel.font.PDType1Font.getawtFont(PDType1Font.java:214) at org.apache.pdfbox.pdmodel.font.PDSimpleFont.drawString(PDSimpleFont.java:97) at org.apache.pdfbox.pdmodel.font.PDType0Font.drawString(PDType0Font.java:68) at org.apache.pdfbox.pdfviewer.PageDrawer.processTextPosition(PageDrawer.java:190) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:494) at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:551) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:274) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225) at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:107) at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:722) at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:693) at org.apache.pdfbox.util.PDFMergerUtilityTest.testPdfBox905(PDFMergerUtilityTest.java:570) The first error comes from Invoke.java line 77, but the actual problem comse from PDCcitt.java line 119 where it calls "ImageIO.read(tiff)" and THAT returns null. ImageIO is in the javax.imageio package, so this will probably be tedious to debug and if the bug is actually in ImageIO, this will not be something that PDFBox will be able to fix. If that's the case, each JVM provider will need to fix this (which also means that it may work using JVMs other than the one provided by Sun/Oracle). The more likely issue is that there's a problem with the tiff object which is being passed to this function, either due to a bug in PDFBox, or a problem in the tiff which was included in the PDF. I'm not going to be able to look into these in the foreseeable future as I haven't really worked with image processing code at all in Java, let alone in PDFBox, and I do not have the time to learn this right now. I hope these notes help someone familiar with image processing in tracking down the issue and resolving it. The next warning was about an unsupported operator: "i". The "i" operator is defined in Table A.1 of the PDF specification (ISO 32000-1:2008, page 644) which references table 57 (page 127). I don't have enough time to get into image processing to implement this, but this is likely part of the problem of Jan is facing with this PDF. If someone is interested in implementing this, you'll want to look at PageDrawer.properties to see how operators are implemented so you'll understand what needs to be done. I'm no expert on flatness tolerance, but it looks like it'll either result in some curves may be rendered as a series of straight lines, or a large amount of processing power will be used to make sure the lines are very fluid (see section 10.6.2, page 316, for details). This doesn't seem to be a huge concern. Finally, the NPE error originates from fontbox, and the problem is that getBounds(metrics) returns null in printFontMetrics(). This is because metrics is an empty List. This boils down to CFFFont.java skipping anything where the fontCharset.getName() returns null (CFFFont.java line 139). There are many entries in fontEncoding.getEntries() in this PDF, but none of them have a name. fontEncoding is not an instance of CFFParser.EmbeddedEncoding in this case, and fontCharset.getEntries() is empty. This is why there are no mappings, and thus no bounds. I do not know how to solve this issue as I don't understand if this is a problem with the PDF (violation of PDF spec) or not, and if so what should we do about it (if anything), or if this is a bug/limitation of PDFBox which could be fixed/improved, etc. Just as a test, I tried just putting a try/catch around printFont(font, output); in format(CFFFont font) which solves the NPE issue, but not the root problem. As such, I'm not surprised that doing this resulted in an OutOfMemoryError. I'll upload the uncompressed version of the PDF as well, since that still demonstrates the problems and it is easier to work with. Here's the actual code I used (in a JUnit test case) String inputpath = "C:\\Temp\\PDFBOX-905\\nullpointer_pdfToImage.unc2.pdf"; PDDocument doc = null; try { doc = PDDocument.load(inputpath); assertEquals(1, doc.getNumberOfPages()); Map pageMap = doc.getPageMap(); Set<Object> keys = pageMap.keySet(); for(Object key : keys) { Object pageObj = pageMap.get(key); if(pageObj instanceof PDPage) { PDPage page = (PDPage)pageObj; page.convertToImage(); // this should not throw a NPE } else { throw new Exception("pageObj = " + pageObj.toString()); } } } catch (Exception e) { e.printStackTrace(); fail("Threw exception!"); } finally { if(doc != null) try { doc.close(); } catch(Exception e) {} } I hope these comments help someone who is more familiar with fontbox and/or image processing to resolve any bugs and explain what's happening here. > NullPointerException when writing pdf to image > ---------------------------------------------- > > Key: PDFBOX-905 > URL: https://issues.apache.org/jira/browse/PDFBOX-905 > Project: PDFBox > Issue Type: Bug > Components: FontBox > Affects Versions: 1.3.1 > Reporter: Stevo Slavic > Attachments: nullpointer pdfToImage.pdf > > > java.lang.NullPointerException: null > at > org.apache.fontbox.cff.AFMFormatter.printFontMetrics(AFMFormatter.java:76) > ~[fontbox-1.3.1.jar:na] > at org.apache.fontbox.cff.AFMFormatter.printFont(AFMFormatter.java:57) > ~[fontbox-1.3.1.jar:na] > at org.apache.fontbox.cff.AFMFormatter.format(AFMFormatter.java:50) > ~[fontbox-1.3.1.jar:na] > at > org.apache.pdfbox.pdmodel.font.PDType1CFont.prepareFontMetric(PDType1CFont.java:529) > ~[pdfbox-1.3.1.jar:na] > at > org.apache.pdfbox.pdmodel.font.PDType1CFont.load(PDType1CFont.java:404) > ~[pdfbox-1.3.1.jar:na] > at > org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:123) > ~[pdfbox-1.3.1.jar:na] > at > org.apache.pdfbox.pdmodel.font.PDType1Font.getawtFont(PDType1Font.java:214) > ~[pdfbox-1.3.1.jar:na] > at > org.apache.pdfbox.pdmodel.font.PDSimpleFont.drawString(PDSimpleFont.java:97) > ~[pdfbox-1.3.1.jar:na] > at > org.apache.pdfbox.pdmodel.font.PDType0Font.drawString(PDType0Font.java:68) > ~[pdfbox-1.3.1.jar:na] > at > org.apache.pdfbox.pdfviewer.PageDrawer.processTextPosition(PageDrawer.java:190) > [pdfbox-1.3.1.jar:na] > at > org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:472) > [pdfbox-1.3.1.jar:na] > at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) > ~[pdfbox-1.3.1.jar:na] > at > org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:529) > [pdfbox-1.3.1.jar:na] > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:274) > [pdfbox-1.3.1.jar:na] > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251) > [pdfbox-1.3.1.jar:na] > at > org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225) > [pdfbox-1.3.1.jar:na] > at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:107) > [pdfbox-1.3.1.jar:na] > at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:722) > [pdfbox-1.3.1.jar:na] > at > org.apache.pdfbox.util.PDFImageWriter.writeImage(PDFImageWriter.java:135) > [pdfbox-1.3.1.jar:na] > Oddly, even though this exception gets thrown, image file gets written and > seems to be ok. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.