[jira] [Commented] (PDFBOX-1438) Problems with Image Extraction from PDF

JIRA Tue, 13 Nov 2012 09:24:15 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496350#comment-13496350
 ]


Andreas Lehmkühler commented on PDFBOX-1438:
--------------------------------------------

Your code looks good to me, although it might be easier to use the 
ExtractImages class. [1]

The result is as expected. The pdf contains 2 images (one on each page) and 
both are extracted. The remaining part consists of many lines, curves and boxes 
which can't be extracted as image. A possible workaround maybe the conversion 
of each page to an image using PDFToImage [2]. But the result would include the 
2 small images as well.


[1] 
http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/ExtractImages.java
[2] 
http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/PDFToImage.java
                
> Problems with Image Extraction from PDF
> ---------------------------------------
>
>                 Key: PDFBOX-1438
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1438
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.7.1
>         Environment: Windows XP
>            Reporter: Christian Czech
>         Attachments: Korrespondenz_000.jpg, Korrespondenz_001.jpg, 
> Korrespondenz.PDF
>
>
> PDFBox don't extract images from pdf document correctly

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1438) Problems with Image Extraction from PDF

Reply via email to