[ https://issues.apache.org/jira/browse/PDFBOX-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646467#comment-13646467 ]
James Green commented on PDFBOX-1586: ------------------------------------- RandomAccessBuffer.seek() is called on an instance after it's close() is called. Now to understand why. > IndexOutOfBoundsException when saving a document (at random) > ------------------------------------------------------------ > > Key: PDFBOX-1586 > URL: https://issues.apache.org/jira/browse/PDFBOX-1586 > Project: PDFBox > Issue Type: Bug > Affects Versions: 1.8.1 > Reporter: James Green > Priority: Critical > > Getting the following stacktrace: > org.apache.pdfbox.exceptions.COSVisitorException: > java.lang.IndexOutOfBoundsException: Index: 28, Size: 0 > at > org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1245) > at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:201) > at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:206) > at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:524) > at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:434) > at > org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1056) > at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:496) > at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1392) > at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1157) > at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1138) > ... > Caused by: java.lang.IndexOutOfBoundsException: Index: 28, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:604) > at java.util.ArrayList.get(ArrayList.java:382) > at > org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84) > at > org.apache.pdfbox.io.RandomAccessFileInputStream.read(RandomAccessFileInputStream.java:96) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at > org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1232) > I'll add some context. We have a "data pipeline" in which a Windows Print > Monitor sends postscript into a servlet which then uses GhostScript 9.05 to > convert in-memory to PDF. This PDF is then loaded into PDFBox using > PDDocument.load(). > At this point we split the original PDF into multiple smaller ones each of > which is saved to a ByteArrayOutputStream. At the point of save() we are > having serious reliability issues. > Taking an original PDF from Ghostscript we have saved this into a unit test > to replicate the problem without success. If we attempt to re-execute the > pipeline to take the original PDF and split it, we get apparently random > percentages of saved documents. > For instance, on a 990 page document (text, no images), to be split into 990 > 1-page documents using Tomcat 7 with -Xmx=512m: > Pass 1: 50% were saved, 50% ended with stack traces > Pass 2: 100% were saved > Pass 3: 100% were saved > The same test with -Xmx=128m ended several times with just 1 document saved, > the rest were stack traces. > We have also seen this randomly hit a sample document consisting of four > pages to be split into two two-page documents so it does not appear to be > memory related. We also added code to catch the IndexOutOfBoundsException and > make up to ten attempts to repeat, but it seems the save() either works the > first time or not at all. > We're thinking there are environmental factors here but we're now focused on > getting this nailed. Any advice or assistance will be welcomed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira