[
https://issues.apache.org/jira/browse/PDFBOX-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014083#comment-14014083
]
Tilman Hausherr commented on PDFBOX-2101:
-----------------------------------------
Sorry, but there's a rendering problem with the 2nd page of PDFBOX-2103:
{code}
Start rendering page 2
30.05.2014 20:39:20.854 WARN [main] org.apache.pdfbox.util.PDFStreamEngine:557
- java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pdfbox.cos.COSArray.getObject(COSArray.java:188)
at
org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:63)
at
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:72)
at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:209)
at
org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:615)
at
org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:53)
at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:544)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:264)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:223)
at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:205)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:164)
at
org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:214)
at
org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:147)
at
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:96)
at pdfboxpageimageextraction.ExtractImages.doPdf(ExtractImages.java:414)
at pdfboxpageimageextraction.ExtractImages.main(ExtractImages.java:208)
30.05.2014 20:39:20.866 WARN [main] org.apache.pdfbox.util.PDFStreamEngine:356
- java.lang.NullPointerException
java.lang.NullPointerException
at
org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:352)
at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:43)
at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:544)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:264)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:223)
at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:205)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:164)
at
org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:214)
at
org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:147)
at
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:96)
at pdfboxpageimageextraction.ExtractImages.doPdf(ExtractImages.java:414)
at pdfboxpageimageextraction.ExtractImages.main(ExtractImages.java:208)
{code}
> Surprising memory consumption when extracting images
> ----------------------------------------------------
>
> Key: PDFBOX-2101
> URL: https://issues.apache.org/jira/browse/PDFBOX-2101
> Project: PDFBox
> Issue Type: Bug
> Components: Utilities
> Affects Versions: 1.8.5
> Environment: Windows 7
> java version "1.7.0_55"
> Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)
> Reporter: Tim Allison
> Assignee: Andreas Lehmkühler
> Priority: Minor
> Attachments: 239665.pdf, PDFBOX-2101-298-good.jpg,
> PDFBOX-2101-714-poor.jpg, java.hprof.zip
>
>
> ExtractImages seems to fail to release memory resources on some files in both
> PDFBox 1.8.5 and trunk.
> On this file 4MB file
> [http://digitalcorpora.org/corp/nps/files/govdocs1/239/239665.pdf], if
> extracting every image on every page (and there are many, many duplicate
> images), there is an OOM with -Xmx1g. If there is no Xmx and there is > 2.5g
> available, ExtractImages will work.
> With some experimentation, the triggers seem to be JPEG images that have
> masks. I'm not sure, though, whether the issue is with PDFBox or Java.
> Commandlines:
> 1.8.5:
> java -Xmx1g -cp pdfbox-app-1.8.5.jar org.apache.pdfbox.ExtractImages
> 239665.pdf
> 2.0_SNAPSHOT:
> java -Xmx1g -cp pdfbox-app-2.0.0-SNAPSHOT.jar
> org.apache.pdfbox.tools.ExtractImages -addkey 239665.pdf
> Results:
> 1.8.5: 906 files before OOM
> {noformat}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2271)
> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
> at
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja
> va:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
> at
> org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:
> 514)
> at
> org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDP
> ixelMap.java:217)
> at
> org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStr
> eam(PDPixelMap.java:363)
> at
> org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(
> PDXObjectImage.java:254)
> at
> org.apache.pdfbox.ExtractImages.processResources(ExtractImages.java:2
> 02)
> at
> org.apache.pdfbox.ExtractImages.extractImages(ExtractImages.java:160)
> at org.apache.pdfbox.ExtractImages.main(ExtractImages.java:65)
> {noformat}
> 2.0_SNAPSHOT: 428 files before OOM
> {noformat}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2271)
> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
> at
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja
> va:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
> at org.apache.pdfbox.io.IOUtils.copy(IOUtils.java:70)
> at org.apache.pdfbox.io.IOUtils.toByteArray(IOUtils.java:52)
> at
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.from8bit(
> SampledImageReader.java:171)
> at
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBIma
> ge(SampledImageReader.java:154)
> at
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDIm
> ageXObject.java:171)
> at
> org.apache.pdfbox.tools.ExtractImages.write2file(ExtractImages.java:2
> 31)
> at
> org.apache.pdfbox.tools.ExtractImages.processResources(ExtractImages.
> java:206)
> at
> org.apache.pdfbox.tools.ExtractImages.extractImages(ExtractImages.jav
> a:164)
> at org.apache.pdfbox.tools.ExtractImages.main(ExtractImages.java:69)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)