[ https://issues.apache.org/jira/browse/PDFBOX-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133166#comment-14133166 ]
Andreas Lehmkühler commented on PDFBOX-1586: -------------------------------------------- I've overhauled the scratch file usage for 2.0.0, so that the scratch file isn't exposed anymore, see PDFBOX-2301. [~jgreen] I've checked your testcase using the versions 1.8.3, 1.8.6 and the current 1.8 branch and everything works fine. Can you confirm that? > IndexOutOfBoundsException when saving a document (at random) > ------------------------------------------------------------ > > Key: PDFBOX-1586 > URL: https://issues.apache.org/jira/browse/PDFBOX-1586 > Project: PDFBox > Issue Type: Bug > Components: Writing > Affects Versions: 1.8.1 > Reporter: James Green > Assignee: Andreas Lehmkühler > Priority: Critical > Fix For: 2.0.0 > > Attachments: TestBuildNewDocumentFromMultipleSources.java > > > Getting the following stacktrace: > org.apache.pdfbox.exceptions.COSVisitorException: > java.lang.IndexOutOfBoundsException: Index: 28, Size: 0 > at > org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1245) > at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:201) > at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:206) > at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:524) > at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:434) > at > org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1056) > at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:496) > at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1392) > at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1157) > at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1138) > ... > Caused by: java.lang.IndexOutOfBoundsException: Index: 28, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:604) > at java.util.ArrayList.get(ArrayList.java:382) > at > org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84) > at > org.apache.pdfbox.io.RandomAccessFileInputStream.read(RandomAccessFileInputStream.java:96) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at > org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1232) > I'll add some context. We have a "data pipeline" in which a Windows Print > Monitor sends postscript into a servlet which then uses GhostScript 9.05 to > convert in-memory to PDF. This PDF is then loaded into PDFBox using > PDDocument.load(). > At this point we split the original PDF into multiple smaller ones each of > which is saved to a ByteArrayOutputStream. At the point of save() we are > having serious reliability issues. > Taking an original PDF from Ghostscript we have saved this into a unit test > to replicate the problem without success. If we attempt to re-execute the > pipeline to take the original PDF and split it, we get apparently random > percentages of saved documents. > For instance, on a 990 page document (text, no images), to be split into 990 > 1-page documents using Tomcat 7 with -Xmx=512m: > Pass 1: 50% were saved, 50% ended with stack traces > Pass 2: 100% were saved > Pass 3: 100% were saved > The same test with -Xmx=128m ended several times with just 1 document saved, > the rest were stack traces. > We have also seen this randomly hit a sample document consisting of four > pages to be split into two two-page documents so it does not appear to be > memory related. We also added code to catch the IndexOutOfBoundsException and > make up to ten attempts to repeat, but it seems the save() either works the > first time or not at all. > We're thinking there are environmental factors here but we're now focused on > getting this nailed. Any advice or assistance will be welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)