[ 
https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roland Quast updated PDFBOX-1018:
---------------------------------

          Component/s:     (was: PDModel)
               Labels: pdfbox  (was: )
             Priority: Critical  (was: Major)
          Description: 
This bug has been reported in various other tickets submitted before. I am 
attempting to conclusively prove that this is an issue, and it needs to be 
attended to since all past tickets regarding this bug have been marked invalid.

I have attached a video showing very basic code that will reproduce the issue. 
I have also attached the code that causes the issue, as well as a PDF file that 
works (a color one), and a black and white PDF file that doesn't.

The main issue is that when reading a black and white PDF file (see attached 
black and white pdf file), the following message is displayed, and the contents 
of the output image is completely white.

26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
WARNING: getRGBImage returned NULL

We use PDFBox in our program for reading PDF files, and at least 50 percent of 
our customer's PDF files (from different scanners) will not read because of 
this issue. This is a complete show stopper, and I'd be more than happy to help 
in any way I could to resolve it.




  was:
When converting a PDPage of this pdf into an image, the resulting file is 
always a white image with no contents.

The following message appeared in the log output (It doesn't seem to be  a 
duplicate of PDFBOX-794.) : 

 ERROR                  filter.FlateFilter - Stop reading corrupt stream

Here's the code used to convert the image :

@Test
public void testConvertImage() {
        try {
                PDDocument pdDocument = 
PDDocument.load("pdf_causing_white_pages.pdf");
                List<PDPage> documentPageList = 
pdDocument.getDocumentCatalog().getAllPages();
                TestCase.assertNotNull(documentPageList);
                int pageNumber = 1;
                for (PDPage tmpPage :documentPageList){
                        BufferedImage tempImage = tmpPage.convertToImage();
                        ImageIO.write(tempImage,"jpeg", new 
File("result_"+pageNumber+".jpeg"));
                        pageNumber ++;
                }                       
        } catch (FileNotFoundException e) {
                TestCase.fail(e.getMessage());
        } catch (IOException e) {
                TestCase.fail(e.getMessage());
        }
}



          Environment: JDK 1.6.0_22  (was: JDK 1.6.0_21)
    Affects Version/s: 1.2.0
                       1.2.1
                       1.5.0

> PDPage convertToImage bug creates white images from black and white pdf files.
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, 
> black_and_white.pdf, color.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am 
> attempting to conclusively prove that this is an issue, and it needs to be 
> attended to since all past tickets regarding this bug have been marked 
> invalid.
> I have attached a video showing very basic code that will reproduce the 
> issue. I have also attached the code that causes the issue, as well as a PDF 
> file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached 
> black and white pdf file), the following message is displayed, and the 
> contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke 
> process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent 
> of our customer's PDF files (from different scanners) will not read because 
> of this issue. This is a complete show stopper, and I'd be more than happy to 
> help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to