[ 
https://issues.apache.org/jira/browse/PDFBOX-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12975334#action_12975334
 ] 

Adam Nichols commented on PDFBOX-905:
-------------------------------------

The core issue seems to be that getRGBImage returned NULL and there is an 
unsupported/disabled operation.

I tried to duplicate this and found an issue which I had to resolve before I 
could get the same problem and stacktrace as Jan.  First, the only value in the 
Map returned by doc.getPageMap() was the Integer 1.  This is because 
doc.getPageMap() calls PDDocument::generatePageMap() which calls 
processListOfPageReferences() which correctly detects that it has a page and 
not an array of pages, but then proceeds to ignore the actual PDPage object and 
use the reference (object ID, revision number) and parseCatalogObject() puts 
new Integer(1) into the PageMap for the value (the key being the object ID and 
revision number).  I'm do not understand why it puts the new Integer in the 
PageMap.  I am not certain, but I think this is incorrect.  I'll attach a patch 
which show how I believe it should work.  This is not the same issue that Jan 
reported, but I was not able to duplicate the issue until I applied my patch.  
Once this patch was in place, I was able to duplicate the issue without any 
problem.

Here are the error messages I found in the logs when I tested:
Dec 27, 2010 7:37:53 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke 
process
WARNING: getRGBImage returned NULL
Dec 27, 2010 7:37:53 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Dec 27, 2010 7:37:57 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
WARNING: java.lang.NullPointerException
java.lang.NullPointerException
        at 
org.apache.fontbox.cff.AFMFormatter.printFontMetrics(AFMFormatter.java:76)
        at org.apache.fontbox.cff.AFMFormatter.printFont(AFMFormatter.java:57)
        at org.apache.fontbox.cff.AFMFormatter.format(AFMFormatter.java:50)
        at 
org.apache.pdfbox.pdmodel.font.PDType1CFont.prepareFontMetric(PDType1CFont.java:529)
        at 
org.apache.pdfbox.pdmodel.font.PDType1CFont.load(PDType1CFont.java:404)
        at 
org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:123)
        at 
org.apache.pdfbox.pdmodel.font.PDType1Font.getawtFont(PDType1Font.java:214)
        at 
org.apache.pdfbox.pdmodel.font.PDSimpleFont.drawString(PDSimpleFont.java:97)
        at 
org.apache.pdfbox.pdmodel.font.PDType0Font.drawString(PDType0Font.java:68)
        at 
org.apache.pdfbox.pdfviewer.PageDrawer.processTextPosition(PageDrawer.java:190)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:494)
        at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:551)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:274)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
        at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:107)
        at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:722)
        at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:693)
        at 
org.apache.pdfbox.util.PDFMergerUtilityTest.testPdfBox905(PDFMergerUtilityTest.java:570)

The first error comes from Invoke.java line 77, but the actual problem comse 
from PDCcitt.java line 119 where it calls "ImageIO.read(tiff)" and THAT returns 
null.  ImageIO is in the javax.imageio package, so this will probably be 
tedious to debug and if the bug is actually in ImageIO, this will not be 
something that PDFBox will be able to fix.  If that's the case, each JVM 
provider will need to fix this (which also means that it may work using JVMs 
other than the one provided by Sun/Oracle).  The more likely issue is that 
there's a problem with the tiff object which is being passed to this function, 
either due to a bug in PDFBox, or a problem in the tiff which was included in 
the PDF.  I'm not going to be able to look into these in the foreseeable future 
as I haven't really worked with image processing code at all in Java, let alone 
in PDFBox, and I do not have the time to learn this right now.  I hope these 
notes help someone familiar with image processing in tracking down the issue 
and resolving it.

The next warning was about an unsupported operator: "i".  The "i" operator is 
defined in Table A.1 of the PDF specification (ISO 32000-1:2008, page 644) 
which references table 57 (page 127).  I don't have enough time to get into 
image processing to implement this, but this is likely part of the problem of 
Jan is facing with this PDF.  If someone is interested in implementing this, 
you'll want to look at PageDrawer.properties to see how operators are 
implemented so you'll understand what needs to be done.  I'm no expert on 
flatness tolerance, but it looks like it'll either result in some curves may be 
rendered as a series of straight lines, or a large amount of processing power 
will be used to make sure the lines are very fluid (see section 10.6.2, page 
316, for details).  This doesn't seem to be a huge concern.

Finally, the NPE error originates from fontbox, and the problem is that 
getBounds(metrics) returns null in printFontMetrics().  This is because metrics 
is an empty List.  This boils down to CFFFont.java skipping anything where the 
fontCharset.getName() returns null (CFFFont.java line 139).  There are many 
entries in fontEncoding.getEntries() in this PDF, but none of them have a name. 
 fontEncoding is not an instance of CFFParser.EmbeddedEncoding in this case, 
and fontCharset.getEntries() is empty.  This is why there are no mappings, and 
thus no bounds.  I do not know how to solve this issue as I don't understand if 
this is a problem with the PDF (violation of PDF spec) or not, and if so what 
should we do about it (if anything), or if this is a bug/limitation of PDFBox 
which could be fixed/improved, etc.

Just as a test, I tried just putting a try/catch around printFont(font, 
output); in format(CFFFont font) which solves the NPE issue, but not the root 
problem.  As such, I'm not surprised that doing this resulted in an 
OutOfMemoryError.  I'll upload the uncompressed version of the PDF as well, 
since that still demonstrates the problems and it is easier to work with.

Here's the actual code I used (in a JUnit test case)
String inputpath = "C:\\Temp\\PDFBOX-905\\nullpointer_pdfToImage.unc2.pdf";
PDDocument doc = null;
try {
    doc = PDDocument.load(inputpath);
    assertEquals(1, doc.getNumberOfPages());
    Map pageMap = doc.getPageMap();
    Set<Object> keys = pageMap.keySet();
    for(Object key : keys) {
        Object pageObj = pageMap.get(key);
        if(pageObj instanceof PDPage) {
            PDPage page = (PDPage)pageObj;
            page.convertToImage(); // this should not throw a NPE
        } else {
            throw new Exception("pageObj = " + pageObj.toString());
        }
    }
} catch (Exception e) {
    e.printStackTrace();
    fail("Threw exception!");
} finally {
    if(doc != null)
        try { doc.close(); } catch(Exception e) {}
}


I hope these comments help someone who is more familiar with fontbox and/or 
image processing to resolve any bugs and explain what's happening here.

> NullPointerException when writing pdf to image
> ----------------------------------------------
>
>                 Key: PDFBOX-905
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-905
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 1.3.1
>            Reporter: Stevo Slavic
>         Attachments: nullpointer pdfToImage.pdf
>
>
> java.lang.NullPointerException: null
>       at 
> org.apache.fontbox.cff.AFMFormatter.printFontMetrics(AFMFormatter.java:76) 
> ~[fontbox-1.3.1.jar:na]
>       at org.apache.fontbox.cff.AFMFormatter.printFont(AFMFormatter.java:57) 
> ~[fontbox-1.3.1.jar:na]
>       at org.apache.fontbox.cff.AFMFormatter.format(AFMFormatter.java:50) 
> ~[fontbox-1.3.1.jar:na]
>       at 
> org.apache.pdfbox.pdmodel.font.PDType1CFont.prepareFontMetric(PDType1CFont.java:529)
>  ~[pdfbox-1.3.1.jar:na]
>       at 
> org.apache.pdfbox.pdmodel.font.PDType1CFont.load(PDType1CFont.java:404) 
> ~[pdfbox-1.3.1.jar:na]
>       at 
> org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:123) 
> ~[pdfbox-1.3.1.jar:na]
>       at 
> org.apache.pdfbox.pdmodel.font.PDType1Font.getawtFont(PDType1Font.java:214) 
> ~[pdfbox-1.3.1.jar:na]
>       at 
> org.apache.pdfbox.pdmodel.font.PDSimpleFont.drawString(PDSimpleFont.java:97) 
> ~[pdfbox-1.3.1.jar:na]
>       at 
> org.apache.pdfbox.pdmodel.font.PDType0Font.drawString(PDType0Font.java:68) 
> ~[pdfbox-1.3.1.jar:na]
>       at 
> org.apache.pdfbox.pdfviewer.PageDrawer.processTextPosition(PageDrawer.java:190)
>  [pdfbox-1.3.1.jar:na]
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:472)
>  [pdfbox-1.3.1.jar:na]
>       at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) 
> ~[pdfbox-1.3.1.jar:na]
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:529)
>  [pdfbox-1.3.1.jar:na]
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:274)
>  [pdfbox-1.3.1.jar:na]
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
>  [pdfbox-1.3.1.jar:na]
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
>  [pdfbox-1.3.1.jar:na]
>       at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:107) 
> [pdfbox-1.3.1.jar:na]
>       at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:722) 
> [pdfbox-1.3.1.jar:na]
>       at 
> org.apache.pdfbox.util.PDFImageWriter.writeImage(PDFImageWriter.java:135) 
> [pdfbox-1.3.1.jar:na]
> Oddly, even though this exception gets thrown, image file gets written and 
> seems to be ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to