Xiaohan Zhang created PDFBOX-5737: ------------------------------------- Summary: java.lang.ArrayIndexOutOfBoundsException Bug Report Key: PDFBOX-5737 URL: https://issues.apache.org/jira/browse/PDFBOX-5737 Project: PDFBox Issue Type: Bug Affects Versions: 3.0.0 PDFBox Reporter: Xiaohan Zhang Attachments: crash-38ee70b5cb74519b642c150694f601239f492168
Recently we discovered a bug in latest pdfbox (3.0.0). Due to the lack of contextual knowledge in the pdfbox library, we cannot thoroughly fix some bugs hence we look forward to any proposed plan from the developers in fixing these bugs. # Test Program package com.test; import java.io.File; import java.io.IOException; import java.io.InputStream; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.text.PDFTextStripper; import org.apache.pdfbox.pdmodel.PDDocumentInformation; import org.apache.pdfbox.Loader; public class Entry { public static void main (String args[]) throws IOException { assert args.length == 1; try { File file = new File(args[0]); PDDocument document = Loader.loadPDF(file); PDDocumentInformation pdd = document.getDocumentInformation(); System.out.println("Author of the document is :"+ pdd.getAuthor()); System.out.println("Title of the document is :"+ pdd.getTitle()); System.out.println("Subject of the document is :"+ pdd.getSubject()); int noOfPages= document.getNumberOfPages(); for (int i = 0; i < noOfPages; i++) { PDPage page_doc = document.getPage(i); System.out.println("Page:"+ i + ". Content: " + page_doc.getContents()); } PDFTextStripper pdfStripper = new PDFTextStripper(); String text = pdfStripper.getText(document); System.out.println("Full Content:"+ text); document.close(); } catch (java.io.IOException ignore) { } System.out.println("end test, no crash"); } } # POC file See the attachments. # Crash Stack Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: arraycopy: length -1 is negative at java.base/java.lang.System.arraycopy(Native Method) at java.base/java.io.PushbackInputStream.unread(PushbackInputStream.java:232) at org.apache.pdfbox.filter.CCITTFaxFilter.decode(CCITTFaxFilter.java:75) at org.apache.pdfbox.filter.Filter.decode(Filter.java:96) at org.apache.pdfbox.filter.Filter.decode(Filter.java:238) at org.apache.pdfbox.cos.COSStream.createView(COSStream.java:196) at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:51) at org.apache.pdfbox.pdfparser.BruteForceParser.bfSearchForObjStreams(BruteForceParser.java:336) at org.apache.pdfbox.pdfparser.BruteForceParser.rebuildTrailer(BruteForceParser.java:838) at org.apache.pdfbox.pdfparser.COSParser.retrieveTrailer(COSParser.java:250) at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:127) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:156) at org.apache.pdfbox.Loader.loadPDF(Loader.java:466) at org.apache.pdfbox.Loader.loadPDF(Loader.java:348) at org.apache.pdfbox.Loader.loadPDF(Loader.java:303) at org.apache.pdfbox.Loader.loadPDF(Loader.java:246) at com.test.Entry.main(Entry.java:21) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org