[ 
https://issues.apache.org/jira/browse/PDFBOX-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14337103#comment-14337103
 ] 

John Hewson commented on PDFBOX-2128:
-------------------------------------

Here's a workaround for the underlying bug in Sun's JPEG decoder. If the 
"Inconsistent metadata read from JPEG stream" exception is encountered, then 
rather than assuming that the image is YCCK, it uses reflection to fetch the 
color space from Sun's JPEGImageReader. This allows us to tell YCCK and CMYK 
apart even in the presence of other metadata errors (or rather, other Sun 
bugs). Note that this preserves our handling of YCCK images such as those in 
PDFBOX-1348.

I think that we should still look at replacing the JPEG reader in 2.1 and 
shipping something more robust, but this fix solves the current problem.

> CMYK images are not supported correctly
> ---------------------------------------
>
>                 Key: PDFBOX-2128
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2128
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.8.5, 1.8.6, 2.0.0
>         Environment: Windows 7 Professional
> Running jvm: Java HotSpot(TM) 64-Bit Server VM - 1.6.0_26-b03 - 20.1-b02 - 
> Sun Microsystems Inc
>            Reporter: Ludovic Davoine
>              Labels: PDJpeg, cmyk, images
>             Fix For: 2.1.0
>
>         Attachments: 573636.pdf, DCTFilter.patch, 
> PDFBOX-2128-CAPITAL.pdf-1.png, PDFBOX-2128-CAPITAL.pdf-2.png, 
> PDFBOX-2128-PORSCHE_CMYK.pdf-1.png, PDFBOX-2128-PORSCHE_CMYK.pdf-2.png, 
> porsche_cmyk.pdf-2.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I have a PDF with CMYK images inside and i need to extract the images in the 
> RGB format. But the PDJpeg class seems to not work correctly; the colors are 
> bad.  Example:
> - Original image in te PDF : http://ludoda.free.fr/IMAGE_IN_PDF.jpg
> - Extracted image: http://ludoda.free.fr/IMAGE_EXTRACTED.jpg
> You can download the PDF : http://ludoda.free.fr/PORSCHE_CMYK.PDF
> and try my simple Test Case (I'm using PDFbox 1.8.5): 
> {code}
> import java.awt.image.BufferedImage;
> import java.io.File;
> import java.io.IOException;
> import java.util.Iterator;
> import java.util.List;
> import java.util.Map;
> import javax.imageio.ImageIO;
> import org.apache.pdfbox.pdmodel.PDDocument;
> import org.apache.pdfbox.pdmodel.PDPage;
> import org.apache.pdfbox.pdmodel.PDResources;
> import org.apache.pdfbox.pdmodel.graphics.xobject.PDJpeg;
> import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObject;
> import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage;
> public class TestCase {
>       
>       public static void main(String[] args) 
>       {
>               try 
>               {
>                       System.out.println("START EXTRACTING IMAGES...");
>                       read_pdf();
>                       System.out.println("COMPLETE");
>               }
>               catch (IOException ex) 
>               {
>                   System.out.println("" + ex);
>               }
>       }
>       public static void read_pdf() throws IOException 
>       {
>                   PDDocument document = null; 
>                   document = PDDocument.load("C:\\temp\\PORSCHE_CMYK.pdf");
>                   @SuppressWarnings("unchecked")
>                   List<PDPage> pages = 
> document.getDocumentCatalog().getAllPages();
>                   Iterator<PDPage> iter = pages.iterator(); 
>                   int i =1;
>                   while (iter.hasNext())
>                   {
>                       PDPage page = (PDPage) iter.next();
>                       PDResources resources = page.getResources();
>                       Map<String, PDXObject> pageImages = 
> resources.getXObjects();
>                       if (pageImages != null)
>                       { 
>                           Iterator<String> imageIter = 
> pageImages.keySet().iterator();
>                           while (imageIter.hasNext())
>                           {
>                               String key = (String) imageIter.next();
>                               if(pageImages.get(key) instanceof 
> PDXObjectImage)
>                               {
>                                       PDJpeg image = (PDJpeg) 
> pageImages.get(key);
>                                       
>                                       // Test 1 : write2file
>                                       
> image.write2file("C:\\workspace\\JAVA_PDFTools\\temp\\image" + i);
>                                       
>                                       // Test 2: getRGBImage
>                                       BufferedImage 
> bimage=image.getRGBImage();
>                                       File outputfile = new 
> File("C:\\workspace\\JAVA_PDFTools\\temp\\image" + i+"_buffered.jpg");
>                                       ImageIO.write(bimage, "jpg", 
> outputfile);
>                                       i ++;
>                               }
>                           }
>                       }
>                   }
>               }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to