[jira] [Commented] (PDFBOX-1438) Problems with Image Extraction from PDF

Christian Czech (JIRA) Mon, 12 Nov 2012 05:31:18 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495250#comment-13495250
 ]


Christian Czech commented on PDFBOX-1438:
-----------------------------------------

Hi Andreas,

here my code:

parser = new PDFStreamParser( cosStream );
Iterator<Object> iter = parser.getTokenIterator();
while( iter.hasNext() ) {
  Object next = iter.next();
  if( next instanceof COSObject ) {
    arguments.add( ((COSObject)next).getObject() );
  } else if( next instanceof PDFOperator ) {
    processOperator( (PDFOperator)next, arguments, nName );
  }
 ....

protected void processOperator(PDFOperator operator, List arguments, String 
nName) throws IOException {
  String operation = operator.getOperation();
  if (operation.equals("Do")) {
    COSName objectName = (COSName) arguments.get(0);
    Map xobjects = getResources().getXObjects();
    PDXObject xobject = (PDXObject) xobjects.get(objectName.getName());
    if (xobject instanceof PDXObjectImage) {
      PDXObjectImage image = (PDXObjectImage) xobject;
     ....
      String extension = image.getSuffix();
      ....      
      File outputFileTemp = new File(tempFileName);
      image.write2file(outputFileTemp);
    }
   ......

And result, please see more attachments 


                
> Problems with Image Extraction from PDF
> ---------------------------------------
>
>                 Key: PDFBOX-1438
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1438
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.7.1
>         Environment: Windows XP
>            Reporter: Christian Czech
>         Attachments: Korrespondenz.PDF
>
>
> PDFBox don't extract images from pdf document correctly

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1438) Problems with Image Extraction from PDF

Reply via email to