RE: extract images

Peter_Lenahan Tue, 20 Jan 2009 03:58:06 -0800

Abid,

This bug may be the same bug that was just patched.
The line of code it is blowing up on is the same as another bug report.
" RE: java.io.EOFException: Unexpected end of ZLIB input stream"


Please get the Patch that Andreas talks about and try that.

Good Luck,
Peter


Hi Peter,

I've checked all critical locations org.apache.pdfbox.filter.FlateFilter
and provided a patch.

Thanks you for your help.

BR
Andreas

[email protected] schrieb:
> I forgot to add the number of bytes available in the variable mayRead 
> to the where statement, in the earlier message. Version 2 is below.
> 
> 
>      int mayRead=compressedData.available(); // pjl
>      while ((mayRead > 0 && 
>             (amountRead = decompressor.read(buffer, 0, 
>                                Math.min(mayRead,BUFFER_SIZE))) != -1))
> 
> -----Original Message-----
> From: Lenahan, Peter
> Sent: Friday, January 16, 2009 10:26 AM
> To: [email protected]
> Subject: RE: java.io.EOFException: Unexpected end of ZLIB input stream 
> error message on UNIX box
> 
> I did a Google search on your issue. There are a couple of solutions.
>    InflaterInputStream read Unexpected end of ZLIB It came up with: 
> Results 1 - 10 of about 854
> 
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4040920
> 
> Work Around   
> The workaround is to never attempt to read more bytes than the entry 
> contains. Call ZipEntry.getSize() to get the actual size of the entry, 
> then use this value to keep track of the number of bytes remaining in 
> the entry while reading from it. To take the previous example:
> 
> This code change may solve the issue for PDFBox. 
> 
> at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
> Add the Math.min() to reduce the number of bytes you are trying to read.
> 
>                 int mayRead=compressedData.available();
>                 while ((amountRead = decompressor.read(buffer, 0,
> Math.min(mayRead,BUFFER_SIZE))) != -1)
> 
> 
> 
> I found another potential issue like this with a solution on the Sun 
> site.
> It was described using windows, but the same could happen on UNIX.
> It suggests that the issue could happen if you are running several 
> processes against the same directory. Please look this over to see if 
> this is the problem. Are you running multiple processes to accomplish 
> the job faster?
> 
> http://forums.sun.com/thread.jspa?threadID=5316308
> 
> paul.miner
> Posts:2,639
> Registered: 10/8/07
> Re: Unexpected end of ZLIB input stream error while compiling    
> Jul 22, 2008 6:54 AM (reply 1 of 2)  (In reply to original post )   
> 
> koko191 wrote:
> Main batch :
> start /B %SWIFT_LOCAL_HOME%\scripts\rmicAll.bat
> start /B %SWIFT_LOCAL_HOME%\scripts\create_jar.bat
> 
> The "start" command does not wait for the command to finish, so both 
> those batch files would be running in parallel. If they both work on 
> the same jar, this could be a problem.
> 
> If you want to run the batch files in sequence, use "call".
> 
> -----Original Message-----
> From: Balasubramaniam, Balaji
> [mailto:[email protected]]
> Sent: Tuesday, January 13, 2009 7:05 PM
> To: [email protected]
> Subject: java.io.EOFException: Unexpected end of ZLIB input stream 
> error message on UNIX box
> 
> Hello,
> 
>  
> 
> I'm trying to use PdfBox to identify a PDF file is corrupted or not. 
> We are trying to automate a process in which it is going to loop 
> through a given folder and see how many of the PDF files are 
> corrupted. This program works fine in windows XP environment (OS 
> Version: x86 Windows XP 5.1, Java version
> : Java HotSpot(tm) Client VM 1.5.0-15-b04). When we ran this 
> application in UNIX box (OS Version: PA_RISC2.0 HP-UX B.11.23, Java 
> Version: Java
> HotSpot(tm) Client VM 1.5.0.11 jinteg:11.07.07-09:52 PA2.0(aCC_AP)) it 
> throws the following error.
> 
>  
> 
> NOTE: This error is not happening for all the time. It throws the 
> error only for some of the PDF files. Those PDF files are not 
> corrupted and I could open those PDF files manually and it opens fine.
> 
>  
> 
> java.io.EOFException: Unexpected end of ZLIB input stream
> 
>         at
> java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:216)
> 
>         at
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:134)
> 
>         at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
> 
>         at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
> 
>         at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
> 
>         at
> org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
> 
>         at
> org.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStream
> Ar
> ray.j
> ava:200)
> 
>         at
> org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:101)
> 
>         at
> ProcessDefinitions.RunAuditProcess.RunAuditProcessGenerateAuditLogMess
> ag
> e.inv
> oke(RunAuditProcessGenerateAuditLogMessage.java:212)
> 
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
> av
> a:39)
> 
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
> or
> Impl.
> java:25)
> 
>         at java.lang.reflect.Method.invoke(Method.java:585)
> 
>         at
> com.tibco.plugin.java.JavaActivity.eval(JavaActivity.java:383)
> 
>         at com.tibco.pe.plugin.Activity.eval(Activity.java:209)
> 
>         at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:540)
> 
>         at com.tibco.pe.core.Job.a(Job.java:712)
> 
>         at com.tibco.pe.core.Job.k(Job.java:501)
> 
>         at
> com.tibco.pe.core.JobDispatcher$JobCourier.a(JobDispatcher.java:249)
> 
>         at
> com.tibco.pe.core.JobDispatcher$JobCourier.run(JobDispatcher.java:200)
> 
>  
> 
> Sample code snippet I use to do the task.
> 
>  
> 
> PDDocument document = PDDocument.load(<input stream>);
> 
> List pages = document.getDocumentCatalog().getAllPages();
> 
> If(pages != null && pages.size() > 0) {
> 
>   PDPage page = (PDPage)pages.get(i);
> 
>   PDStream contents = page.getContents();
> 
>   PDFStreamParser parser = null;
> 
>   try {
> 
>                 parser = new PDFStreamParser(contents.getStream());
> 
>   } catch(Exception e) {
> 
>      System.err.println("This PDF cannot be read. Most possibly it 
> could be corrupted. " + pdfFileName);
> 
>   }
> 
> }
> 
>  
> 
> Could somebody shed some light on this one?
> 
>  
> 
> Thank you.
> 
> 


--
Auf der Verpackung stand "benötigt Windows 9x/2000/XP oder BESSER", also habe 
ich Linux installiert.



-----Original Message-----
From: Abid Hussain [mailto:[email protected]] 
Sent: Tuesday, January 20, 2009 6:17 AM
To: [email protected]
Subject: extract images

Hello everybody,

I'm trying to extract images from a pdf file which won't work...:-(

I tried the ExtractImages.exe which results in:
 >ExtractImages.exe "C:\path\to\pdf_file"
Exception in thread "main" java.lang.NullPointerException
         at org.pdfbox.ExtractImages.extractImages(ExtractImages.java:138)
         at org.pdfbox.ExtractImages.main(ExtractImages.java:72)

Then I tried to extract the images using code I copied from the ExtractImages 
class:
Here's a snippet:
PDXObjectImage image = (PDXObjectImage) images.get(key);
String name = getUniqueFileName(key, image.getSuffix());
image.write2file(name);

The execution of the last line results in:
java.util.zip.ZipException: unknown compression method
        at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:140)
        at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:110)
        at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
        at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
        at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
        at 
org.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:226)
        at org.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:481)
        at 
org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:138)
        at 
org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStream(PDPixelMap.java:166)
        at 
org.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(PDXObjectImage.java:118)
        at 
de.thecode.pdf.pdfbox.ExtractImages.extractImages(ExtractImages.java:52)
        at de.thecode.pdf.pdfbox.ExtractImages.main(ExtractImages.java:30)

Anybody knows how to get the image extraction work correctly...?

Best regards,

Abid

-- 

Abid Hussain

RE: extract images

Reply via email to