[
https://issues.apache.org/jira/browse/PDFBOX-413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler resolved PDFBOX-413.
---------------------------------------
Resolution: Invalid
Thanks for your help Adrian
> Text Extraction Does Not Extract Content Beyond First Page
> ----------------------------------------------------------
>
> Key: PDFBOX-413
> URL: https://issues.apache.org/jira/browse/PDFBOX-413
> Project: PDFBox
> Issue Type: Bug
> Environment: Ubuntu, OpenJDK 6
> Reporter: alvin
> Attachments: google.pdf
>
>
> Such as my attempt to extract plain text from PDF using PDFBOX:
> PDFTextStripper stripper = new PDFTextStripper();
> stripper.setStartPage( 1);
> stripper.setEndPage( 5 );
> LucenePDFDocument document = new LucenePDFDocument();
> Document luceneDocument = document.convertDocument(file);
> System.out.println("CONTENTS: "+luceneDocument.get("contents"));
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
> This is the result I get, and it never goes beyond page 1:
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Document<stored/uncompressed<path:/home/alvin/Desktop/google.pdf>
> stored/uncompressed<url:/home/alvin/Desktop/google.pdf>
> stored/uncompressed,indexed<modified:20090130112759> indexed<uid:
> Web Search Engine
> Sergey Brin and Lawrence Page
> Computer Science Department,
> Stanford University, Stanford, CA 94305, USA
> [email protected] and [email protected]
> Abstract
> In this paper, we present Google, a prototype of a large-scale search engine
> which makes heavy
> use of the structure present in hypertext. Google is designed to crawl and
> index the Web efficiently
> and produce much more satisfying search results than existing systems. The
> proto>>
> Is it Bug?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.