Re: extract images

Abid Hussain Wed, 21 Jan 2009 09:44:12 -0800

Thanks for help. Where can I find the provided patch? I looked in the jira butdidn't find anything. Maybe I have overlooked something?


Regards,


Abid

[email protected] schrieb:

Abid,

This bug may be the same bug that was just patched.
The line of code it is blowing up on is the same as another bug report.
" RE: java.io.EOFException: Unexpected end of ZLIB input stream"

Please get the Patch that Andreas talks about and try that.

Good Luck,
Peter


Hi Peter,

I've checked all critical locations org.apache.pdfbox.filter.FlateFilter
and provided a patch.

Thanks you for your help.

BR
Andreas

[email protected] schrieb:
I forgot to add the number of bytes available in the variable mayReadto the where statement, in the earlier message. Version 2 is below.
     int mayRead=compressedData.available(); // pjl
while ((mayRead > 0 &&(amountRead = decompressor.read(buffer, 0,Math.min(mayRead,BUFFER_SIZE))) != -1))
-----Original Message-----
From: Lenahan, Peter
Sent: Friday, January 16, 2009 10:26 AM
To: [email protected]
Subject: RE: java.io.EOFException: Unexpected end of ZLIB input streamerror message on UNIX box
I did a Google search on your issue. There are a couple of solutions.
InflaterInputStream read Unexpected end of ZLIB It came up with:Results 1 - 10 of about 854
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4040920

Work Around     
The workaround is to never attempt to read more bytes than the entrycontains. Call ZipEntry.getSize() to get the actual size of the entry,then use this value to keep track of the number of bytes remaining inthe entry while reading from it. To take the previous example:
This code change may solve the issue for PDFBox.
at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
Add the Math.min() to reduce the number of bytes you are trying to read.

                int mayRead=compressedData.available();
                while ((amountRead = decompressor.read(buffer, 0,
Math.min(mayRead,BUFFER_SIZE))) != -1)
I found another potential issue like this with a solution on the Sunsite.
It was described using windows, but the same could happen on UNIX.
It suggests that the issue could happen if you are running severalprocesses against the same directory. Please look this over to see ifthis is the problem. Are you running multiple processes to accomplishthe job faster?
http://forums.sun.com/thread.jspa?threadID=5316308

paul.miner
Posts:2,639
Registered: 10/8/07
Re: Unexpected end of ZLIB input stream error while compilingJul 22, 2008 6:54 AM (reply 1 of 2) (In reply to original post )
koko191 wrote:
Main batch :
start /B %SWIFT_LOCAL_HOME%\scripts\rmicAll.bat
start /B %SWIFT_LOCAL_HOME%\scripts\create_jar.bat
The "start" command does not wait for the command to finish, so boththose batch files would be running in parallel. If they both work onthe same jar, this could be a problem.
If you want to run the batch files in sequence, use "call".

-----Original Message-----
From: Balasubramaniam, Balaji
[mailto:[email protected]]
Sent: Tuesday, January 13, 2009 7:05 PM
To: [email protected]
Subject: java.io.EOFException: Unexpected end of ZLIB input streamerror message on UNIX box
Hello,
I'm trying to use PdfBox to identify a PDF file is corrupted or not.We are trying to automate a process in which it is going to loopthrough a given folder and see how many of the PDF files arecorrupted. This program works fine in windows XP environment (OSVersion: x86 Windows XP 5.1, Java version: Java HotSpot(tm) Client VM 1.5.0-15-b04). When we ran thisapplication in UNIX box (OS Version: PA_RISC2.0 HP-UX B.11.23, JavaVersion: JavaHotSpot(tm) Client VM 1.5.0.11 jinteg:11.07.07-09:52 PA2.0(aCC_AP)) itthrows the following error.
NOTE: This error is not happening for all the time. It throws theerror only for some of the PDF files. Those PDF files are notcorrupted and I could open those PDF files manually and it opens fine.
java.io.EOFException: Unexpected end of ZLIB input stream

        at
java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:216)

        at
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:134)

        at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)

        at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)

        at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)

        at
org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)

        at
org.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStream
Ar
ray.j
ava:200)

        at
org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:101)

        at
ProcessDefinitions.RunAuditProcess.RunAuditProcessGenerateAuditLogMess
ag
e.inv
oke(RunAuditProcessGenerateAuditLogMessage.java:212)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
av
a:39)

        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
or
Impl.
java:25)

        at java.lang.reflect.Method.invoke(Method.java:585)

        at
com.tibco.plugin.java.JavaActivity.eval(JavaActivity.java:383)

        at com.tibco.pe.plugin.Activity.eval(Activity.java:209)

        at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:540)

        at com.tibco.pe.core.Job.a(Job.java:712)

        at com.tibco.pe.core.Job.k(Job.java:501)

        at
com.tibco.pe.core.JobDispatcher$JobCourier.a(JobDispatcher.java:249)

        at
com.tibco.pe.core.JobDispatcher$JobCourier.run(JobDispatcher.java:200)
Sample code snippet I use to do the task.
PDDocument document = PDDocument.load(<input stream>);

List pages = document.getDocumentCatalog().getAllPages();

If(pages != null && pages.size() > 0) {

  PDPage page = (PDPage)pages.get(i);

  PDStream contents = page.getContents();

  PDFStreamParser parser = null;

  try {

                parser = new PDFStreamParser(contents.getStream());

  } catch(Exception e) {
System.err.println("This PDF cannot be read. Most possibly itcould be corrupted. " + pdfFileName);
  }

}
Could somebody shed some light on this one?
Thank you.
--
Auf der Verpackung stand "benötigt Windows 9x/2000/XP oder BESSER", also habe 
ich Linux installiert.



-----Original Message-----
From: Abid Hussain [mailto:[email protected]]Sent: Tuesday, January 20, 2009 6:17 AM
To: [email protected]
Subject: extract images

Hello everybody,

I'm trying to extract images from a pdf file which won't work...:-(

I tried the ExtractImages.exe which results in:
 >ExtractImages.exe "C:\path\to\pdf_file"
Exception in thread "main" java.lang.NullPointerException
         at org.pdfbox.ExtractImages.extractImages(ExtractImages.java:138)
         at org.pdfbox.ExtractImages.main(ExtractImages.java:72)

Then I tried to extract the images using code I copied from the ExtractImages 
class:
Here's a snippet:
PDXObjectImage image = (PDXObjectImage) images.get(key);
String name = getUniqueFileName(key, image.getSuffix());
image.write2file(name);

The execution of the last line results in:
java.util.zip.ZipException: unknown compression method
        at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:140)
        at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:110)
        at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
        at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
        at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
        at 
org.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:226)
        at org.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:481)
        at 
org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:138)
atorg.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStream(PDPixelMap.java:166)atorg.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(PDXObjectImage.java:118)
        at 
de.thecode.pdf.pdfbox.ExtractImages.extractImages(ExtractImages.java:52)
        at de.thecode.pdf.pdfbox.ExtractImages.main(ExtractImages.java:30)

Anybody knows how to get the image extraction work correctly...?

Best regards,

Abid


--

Abid Hussain

Re: extract images

Reply via email to