[ 
https://issues.apache.org/jira/browse/PDFBOX-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt England updated PDFBOX-981:
--------------------------------

    Attachment: PDColorSpaceFactory.java.diff

Patch for PDColorSpaceFactory

> PDColorspaceFactory does not recognize colorspace DeviceGray (patch included 
> herein)
> ------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-981
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-981
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.5.0
>            Reporter: Matt England
>              Labels: pdfbox
>         Attachments: PDColorSpaceFactory.java.diff, example.pdf
>
>
> I was trying to use PDFTextStripper to extract text from a large corpus of 
> PDF files. In some of them, the method:
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpaceFactory.createColorSpace(
>  COSBase colorSpace, Map colorSpaces )
> fails to recognize the case when the colorSpace argument is of type COSArray 
> and the array's (first) element corresponds to COSName.DEVICEGRAY. Adding 
> that case successfully parses the files that failed with the stock 
> pdfbox-1.5.0. Below is a diff of my patched PDColorSpaceFactory that handles 
> the case where the colorspace name is DeviceGray. Incidentally, it occurs to 
> me that another (possibly better) approach is to call through to 
> createColorSpace(String) when no other case matches.
> % diff PDColorSpaceFactory.java.orig PDColorSpaceFactory.java
> 94a95,97
> > else if ( type.getName().equals( PDDeviceGray.NAME) ) {
> > retval = new PDDeviceGray();
> > }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to