[ https://issues.apache.org/jira/browse/PDFBOX-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
thomas menzel updated PDFBOX-186: --------------------------------- Attachment: PwC-Tech-Forecast-Spring-2009.pdf attched PwC-Tech-Forecast-Spring-2009.pdf i'm getting this exception too but i dont think the PDF is corrupt as it was generated with the PDF tools -- at least that is how i understand the properties this is with version 0.7.4 from SF. i also tried the same PDF with the 0.8.0 version and get another error there. this is posted @ PDFBOX-546 > NullPointerException in getAllKids with corrupted pdf > ----------------------------------------------------- > > Key: PDFBOX-186 > URL: https://issues.apache.org/jira/browse/PDFBOX-186 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Priority: Minor > Attachments: PwC-Tech-Forecast-Spring-2009.pdf > > > [imported from SourceForge] > http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1532246 > Originally submitted by ojaquemet on 2006-08-01 01:15. > java.lang.NullPointerException > at > org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) > at > org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) > at > org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226) > at > org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216) > at [...] > Tested with PDFBox-0.7.2-log4j.jar and > PDFBox-0.7.3-dev-20060731.jar > Because the corrupted PDF is too big (7MB) to be > attached here, you'll be able to find it there: > http://olivier.jaquemet.free.fr/PDF-corrupted.pdf > [comment on SourceForge] > Originally sent by nobody. > Logged In: NO > I get this message too. How do you parse big PDFs? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.