RE: extract images

Balasubramaniam, Balaji Wed, 21 Jan 2009 09:47:29 -0800

The patch is in SVN repository. You have to update your workspace from the
SVN location.


http://svn.apache.org/repos/asf/incubator/pdfbox/trunk/

and then build the project using ANT.

-----Original Message-----
From: Abid Hussain [mailto:[email protected]] 
Sent: Wednesday, January 21, 2009 9:44 AM
To: [email protected]
Subject: Re: extract images

Thanks for help. Where can I find the provided patch? I looked in the jira
but 
didn't find anything. Maybe I have overlooked something?

Regards,

Abid

[email protected] schrieb:
> Abid,
> 
> This bug may be the same bug that was just patched.
> The line of code it is blowing up on is the same as another bug report.
> " RE: java.io.EOFException: Unexpected end of ZLIB input stream"
> 
> Please get the Patch that Andreas talks about and try that.
> 
> Good Luck,
> Peter
> 
> 
> Hi Peter,
> 
> I've checked all critical locations org.apache.pdfbox.filter.FlateFilter
> and provided a patch.
> 
> Thanks you for your help.
> 
> BR
> Andreas
> 
> [email protected] schrieb:
>> I forgot to add the number of bytes available in the variable mayRead 
>> to the where statement, in the earlier message. Version 2 is below.
>>
>>
>>      int mayRead=compressedData.available(); // pjl
>>      while ((mayRead > 0 && 
>>             (amountRead = decompressor.read(buffer, 0, 
>>                                Math.min(mayRead,BUFFER_SIZE))) != -1))
>>
>> -----Original Message-----
>> From: Lenahan, Peter
>> Sent: Friday, January 16, 2009 10:26 AM
>> To: [email protected]
>> Subject: RE: java.io.EOFException: Unexpected end of ZLIB input stream 
>> error message on UNIX box
>>
>> I did a Google search on your issue. There are a couple of solutions.
>>    InflaterInputStream read Unexpected end of ZLIB It came up with: 
>> Results 1 - 10 of about 854
>>
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4040920
>>
>> Work Around  
>> The workaround is to never attempt to read more bytes than the entry 
>> contains. Call ZipEntry.getSize() to get the actual size of the entry, 
>> then use this value to keep track of the number of bytes remaining in 
>> the entry while reading from it. To take the previous example:
>>
>> This code change may solve the issue for PDFBox. 
>>
>> at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
>> Add the Math.min() to reduce the number of bytes you are trying to read.
>>
>>                 int mayRead=compressedData.available();
>>                 while ((amountRead = decompressor.read(buffer, 0,
>> Math.min(mayRead,BUFFER_SIZE))) != -1)
>>
>>
>>
>> I found another potential issue like this with a solution on the Sun 
>> site.
>> It was described using windows, but the same could happen on UNIX.
>> It suggests that the issue could happen if you are running several 
>> processes against the same directory. Please look this over to see if 
>> this is the problem. Are you running multiple processes to accomplish 
>> the job faster?
>>
>> http://forums.sun.com/thread.jspa?threadID=5316308
>>
>> paul.miner
>> Posts:2,639
>> Registered: 10/8/07
>> Re: Unexpected end of ZLIB input stream error while compiling    
>> Jul 22, 2008 6:54 AM (reply 1 of 2)  (In reply to original post )   
>>
>> koko191 wrote:
>> Main batch :
>> start /B %SWIFT_LOCAL_HOME%\scripts\rmicAll.bat
>> start /B %SWIFT_LOCAL_HOME%\scripts\create_jar.bat
>>
>> The "start" command does not wait for the command to finish, so both 
>> those batch files would be running in parallel. If they both work on 
>> the same jar, this could be a problem.
>>
>> If you want to run the batch files in sequence, use "call".
>>
>> -----Original Message-----
>> From: Balasubramaniam, Balaji
>> [mailto:[email protected]]
>> Sent: Tuesday, January 13, 2009 7:05 PM
>> To: [email protected]
>> Subject: java.io.EOFException: Unexpected end of ZLIB input stream 
>> error message on UNIX box
>>
>> Hello,
>>
>>  
>>
>> I'm trying to use PdfBox to identify a PDF file is corrupted or not. 
>> We are trying to automate a process in which it is going to loop 
>> through a given folder and see how many of the PDF files are 
>> corrupted. This program works fine in windows XP environment (OS 
>> Version: x86 Windows XP 5.1, Java version
>> : Java HotSpot(tm) Client VM 1.5.0-15-b04). When we ran this 
>> application in UNIX box (OS Version: PA_RISC2.0 HP-UX B.11.23, Java 
>> Version: Java
>> HotSpot(tm) Client VM 1.5.0.11 jinteg:11.07.07-09:52 PA2.0(aCC_AP)) it 
>> throws the following error.
>>
>>  
>>
>> NOTE: This error is not happening for all the time. It throws the 
>> error only for some of the PDF files. Those PDF files are not 
>> corrupted and I could open those PDF files manually and it opens fine.
>>
>>  
>>
>> java.io.EOFException: Unexpected end of ZLIB input stream
>>
>>         at
>> java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:216)
>>
>>         at
>> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:134)
>>
>>         at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
>>
>>         at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
>>
>>         at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
>>
>>         at
>> org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
>>
>>         at
>> org.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStream
>> Ar
>> ray.j
>> ava:200)
>>
>>         at
>> org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:101)
>>
>>         at
>> ProcessDefinitions.RunAuditProcess.RunAuditProcessGenerateAuditLogMess
>> ag
>> e.inv
>> oke(RunAuditProcessGenerateAuditLogMessage.java:212)
>>
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
>> av
>> a:39)
>>
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
>> or
>> Impl.
>> java:25)
>>
>>         at java.lang.reflect.Method.invoke(Method.java:585)
>>
>>         at
>> com.tibco.plugin.java.JavaActivity.eval(JavaActivity.java:383)
>>
>>         at com.tibco.pe.plugin.Activity.eval(Activity.java:209)
>>
>>         at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:540)
>>
>>         at com.tibco.pe.core.Job.a(Job.java:712)
>>
>>         at com.tibco.pe.core.Job.k(Job.java:501)
>>
>>         at
>> com.tibco.pe.core.JobDispatcher$JobCourier.a(JobDispatcher.java:249)
>>
>>         at
>> com.tibco.pe.core.JobDispatcher$JobCourier.run(JobDispatcher.java:200)
>>
>>  
>>
>> Sample code snippet I use to do the task.
>>
>>  
>>
>> PDDocument document = PDDocument.load(<input stream>);
>>
>> List pages = document.getDocumentCatalog().getAllPages();
>>
>> If(pages != null && pages.size() > 0) {
>>
>>   PDPage page = (PDPage)pages.get(i);
>>
>>   PDStream contents = page.getContents();
>>
>>   PDFStreamParser parser = null;
>>
>>   try {
>>
>>                 parser = new PDFStreamParser(contents.getStream());
>>
>>   } catch(Exception e) {
>>
>>      System.err.println("This PDF cannot be read. Most possibly it 
>> could be corrupted. " + pdfFileName);
>>
>>   }
>>
>> }
>>
>>  
>>
>> Could somebody shed some light on this one?
>>
>>  
>>
>> Thank you.
>>
>>
> 
> 
> --
> Auf der Verpackung stand "benötigt Windows 9x/2000/XP oder BESSER", also
habe ich Linux installiert.
> 
> 
> 
> -----Original Message-----
> From: Abid Hussain [mailto:[email protected]] 
> Sent: Tuesday, January 20, 2009 6:17 AM
> To: [email protected]
> Subject: extract images
> 
> Hello everybody,
> 
> I'm trying to extract images from a pdf file which won't work...:-(
> 
> I tried the ExtractImages.exe which results in:
>  >ExtractImages.exe "C:\path\to\pdf_file"
> Exception in thread "main" java.lang.NullPointerException
>          at org.pdfbox.ExtractImages.extractImages(ExtractImages.java:138)
>          at org.pdfbox.ExtractImages.main(ExtractImages.java:72)
> 
> Then I tried to extract the images using code I copied from the
ExtractImages class:
> Here's a snippet:
> PDXObjectImage image = (PDXObjectImage) images.get(key);
> String name = getUniqueFileName(key, image.getSuffix());
> image.write2file(name);
> 
> The execution of the last line results in:
> java.util.zip.ZipException: unknown compression method
>       at
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:140)
>       at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:110)
>       at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
>       at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
>       at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
>       at
org.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:226)
>       at org.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:481)
>       at
org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:13
8)
>       at 
>
org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStream(PDPixelMap.
java:166)
>       at 
>
org.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(PDXObjectImage.
java:118)
>       at
de.thecode.pdf.pdfbox.ExtractImages.extractImages(ExtractImages.java:52)
>       at de.thecode.pdf.pdfbox.ExtractImages.main(ExtractImages.java:30)
> 
> Anybody knows how to get the image extraction work correctly...?
> 
> Best regards,
> 
> Abid
> 

-- 

Abid Hussain

RE: extract images

Reply via email to