[jira] [Commented] (PDFBOX-2128) CMYK images are not supported correctly

John Hewson (JIRA) Wed, 25 Feb 2015 11:08:46 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14336996#comment-14336996
 ]


John Hewson commented on PDFBOX-2128:
-------------------------------------

As an open source project, PDFBox should not be adding features which allow 
proprietary plugins to be used. Delegating image decoding to external code 
opens a can of worms with respect to our internal implementation of image 
handing. Far better for PDFBox to ship with working image handling code. We 
currently use JAI for JBIG2 and JPX so we already have all the plugibility we 
need to have a fully functioning PDF reader.

Successfully being able to read JPEGs from a PDF is not a customisable user 
feature! It's not something which needs plugins - it's something which needs to 
work out-of-the box. We need to ship a working JPEG decoder with PDFBox. That 
means either using reflection to workaround Sun's bugs or adding a dependency 
to something like TwelveMonkeys.

> CMYK images are not supported correctly
> ---------------------------------------
>
>                 Key: PDFBOX-2128
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2128
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.8.5, 1.8.6, 2.0.0
>         Environment: Windows 7 Professional
> Running jvm: Java HotSpot(TM) 64-Bit Server VM - 1.6.0_26-b03 - 20.1-b02 - 
> Sun Microsystems Inc
>            Reporter: Ludovic Davoine
>              Labels: PDJpeg, cmyk, images
>             Fix For: 2.1.0
>
>         Attachments: 573636.pdf, DCTFilter.patch, 
> PDFBOX-2128-CAPITAL.pdf-1.png, PDFBOX-2128-CAPITAL.pdf-2.png, 
> PDFBOX-2128-PORSCHE_CMYK.pdf-1.png, PDFBOX-2128-PORSCHE_CMYK.pdf-2.png, 
> porsche_cmyk.pdf-2.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I have a PDF with CMYK images inside and i need to extract the images in the 
> RGB format. But the PDJpeg class seems to not work correctly; the colors are 
> bad.  Example:
> - Original image in te PDF : http://ludoda.free.fr/IMAGE_IN_PDF.jpg
> - Extracted image: http://ludoda.free.fr/IMAGE_EXTRACTED.jpg
> You can download the PDF : http://ludoda.free.fr/PORSCHE_CMYK.PDF
> and try my simple Test Case (I'm using PDFbox 1.8.5): 
> {code}
> import java.awt.image.BufferedImage;
> import java.io.File;
> import java.io.IOException;
> import java.util.Iterator;
> import java.util.List;
> import java.util.Map;
> import javax.imageio.ImageIO;
> import org.apache.pdfbox.pdmodel.PDDocument;
> import org.apache.pdfbox.pdmodel.PDPage;
> import org.apache.pdfbox.pdmodel.PDResources;
> import org.apache.pdfbox.pdmodel.graphics.xobject.PDJpeg;
> import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObject;
> import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage;
> public class TestCase {
>       
>       public static void main(String[] args) 
>       {
>               try 
>               {
>                       System.out.println("START EXTRACTING IMAGES...");
>                       read_pdf();
>                       System.out.println("COMPLETE");
>               }
>               catch (IOException ex) 
>               {
>                   System.out.println("" + ex);
>               }
>       }
>       public static void read_pdf() throws IOException 
>       {
>                   PDDocument document = null; 
>                   document = PDDocument.load("C:\\temp\\PORSCHE_CMYK.pdf");
>                   @SuppressWarnings("unchecked")
>                   List<PDPage> pages = 
> document.getDocumentCatalog().getAllPages();
>                   Iterator<PDPage> iter = pages.iterator(); 
>                   int i =1;
>                   while (iter.hasNext())
>                   {
>                       PDPage page = (PDPage) iter.next();
>                       PDResources resources = page.getResources();
>                       Map<String, PDXObject> pageImages = 
> resources.getXObjects();
>                       if (pageImages != null)
>                       { 
>                           Iterator<String> imageIter = 
> pageImages.keySet().iterator();
>                           while (imageIter.hasNext())
>                           {
>                               String key = (String) imageIter.next();
>                               if(pageImages.get(key) instanceof 
> PDXObjectImage)
>                               {
>                                       PDJpeg image = (PDJpeg) 
> pageImages.get(key);
>                                       
>                                       // Test 1 : write2file
>                                       
> image.write2file("C:\\workspace\\JAVA_PDFTools\\temp\\image" + i);
>                                       
>                                       // Test 2: getRGBImage
>                                       BufferedImage 
> bimage=image.getRGBImage();
>                                       File outputfile = new 
> File("C:\\workspace\\JAVA_PDFTools\\temp\\image" + i+"_buffered.jpg");
>                                       ImageIO.write(bimage, "jpg", 
> outputfile);
>                                       i ++;
>                               }
>                           }
>                       }
>                   }
>               }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-2128) CMYK images are not supported correctly

Reply via email to