[
https://issues.apache.org/jira/browse/PDFBOX-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vladimir closed PDFBOX-950.
---------------------------
Resolution: Invalid
Yes, I tried to get text from pdf that contains only images.
Thanks.
> Null from PDF
> -------------
>
> Key: PDFBOX-950
> URL: https://issues.apache.org/jira/browse/PDFBOX-950
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Windows XP [5.1.2600]
> java version "1.6.0_23"
> Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
> Java HotSpot(TM) Client VM (build 19.0-b09, mixed mode, sharing)
> Reporter: Vladimir
>
> http://www.uss.com/corp/investors/sec_filings/3Q-2010-Earnings-Release.pdf
> In Foxit Reader opened correctly
> This code gets null:
> public static String getHtml(InputStream inputStream) {
> PDDocument pdDocument = null;
> String document = null;
> try {
> PDFParser parser = new PDFParser(inputStream);
> parser.parse();
> pdDocument = parser.getPDDocument();
> PDFText2HTML pdf2html = new PDFText2HTML(StringUtil.UTF_8());
> document = pdf2html.getText(pdDocument);
> } catch (IOException e) {
> e.printStackTrace();
> } finally {
> if (pdDocument != null) {
> try {
> pdDocument.getDocument().close();
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
> }
> return document;
> }
> <dependency>
> <groupId>org.apache.pdfbox</groupId>
> <artifactId>pdfbox</artifactId>
> <version>1.4.0</version>
> </dependency>
> <dependency>
> <groupId>org.bouncycastle</groupId>
> <artifactId>bcprov-jdk15</artifactId>
> <version>1.45</version>
> </dependency>
> <dependency>
> <groupId>org.bouncycastle</groupId>
> <artifactId>bcmail-jdk15</artifactId>
> <version>1.45</version>
> </dependency>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira