[jira] Updated: (PDFBOX-955) Can't extract b/w images from PDF

Tilman Hausherr (JIRA) Wed, 02 Feb 2011 06:21:58 -0800

     [ 
https://issues.apache.org/jira/browse/PDFBOX-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tilman Hausherr updated PDFBOX-955:
-----------------------------------

    Description: 
I wrote a test application using org.apache.pdfbox.ExtractImages to... extract 
images as PNG. (This is the start of something bigger, which involves making a 
statistic about the content of over a million pages within PDF files) However 
all images I get are all black or all white when I test on our own PDF files. I 
did get correct images from a file that had color images. To extract, I tried 
page.convertToImage() and then writing with ImageIO.write(), but I also tried 
using PDFImageWriter, neither had success for b/w images.

The sample PDF is not confidential; it does give a warning "getRGBImage 
returned NULL" but other PDFs that don't give the warning (but are 
confidential) also fail.

  was:
I wrote a test application using org.apache.pdfbox.ExtractImages to... extract 
images as PNG. (This is the start of something bigger, which involves making a 
statistic about the content of over a million pages within PDF files) However 
all images I get are all black or all white when I test on our own PDF files. I 
did get correct images from a file that had color images. To extract, I tried 
page.convertToImage() and then writing with ImageIO.write(), but I also tried 
using PDFImageWriter, neither had success for b/w images.

If I can attach a file in the next window I will do it. The sample PDF is not 
confidential; it does give a warning "getRGBImage returned NULL" but other PDFs 
that don't give the warning (but are confidential) also fail.

    Environment: Windows XP prof, Java 1.6.0_22, Netbeans 6.9.1  (was: Windows 
XP)

> Can't extract b/w images from PDF
> ---------------------------------
>
>                 Key: PDFBOX-955
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-955
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Windows XP prof, Java 1.6.0_22, Netbeans 6.9.1
>            Reporter: Tilman Hausherr
>            Priority: Blocker
>              Labels: extract
>         Attachments: ExtractImages.java, d0000040-01.png, d0000040.pdf
>
>
> I wrote a test application using org.apache.pdfbox.ExtractImages to... 
> extract images as PNG. (This is the start of something bigger, which involves 
> making a statistic about the content of over a million pages within PDF 
> files) However all images I get are all black or all white when I test on our 
> own PDF files. I did get correct images from a file that had color images. To 
> extract, I tried page.convertToImage() and then writing with ImageIO.write(), 
> but I also tried using PDFImageWriter, neither had success for b/w images.
> The sample PDF is not confidential; it does give a warning "getRGBImage 
> returned NULL" but other PDFs that don't give the warning (but are 
> confidential) also fail.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PDFBOX-955) Can't extract b/w images from PDF

Reply via email to