ClassCastException in FlateFilter
---------------------------------

                 Key: PDFBOX-354
                 URL: https://issues.apache.org/jira/browse/PDFBOX-354
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
            Reporter: Jukka Zitting


[Issue from SourceForge]
http://sourceforge.net/tracker/index.php?func=detail&aid=1965266&group_id=78314&atid=552832

Hi, I'm trying to extract text from a pdf which can be found at

http://www.vattenfall.com/www/vf_com/vf_com/Gemeinsame_Inhalte/DOCUMENT/360
168vatt/5965811xou/643131powe/892253fors/P0288421.pdf

and am getting a ClassCastException from within PDFBox. The full stacktrace
is:

java.lang.ClassCastException: org.pdfbox.cos.COSArray
at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:70)
at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
at org.pdfbox.cos.COSStream.doDecode(COSStream.java:243)
at
org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
at
org.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStreamArray
.java:200)
at
org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:101)
at
org.pdfbox.pdmodel.common.COSStreamArray.getStreamTokens(COSStreamArray.jav
a:141)
at
org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:202)
at
org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174)
at
org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336)
at
org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259)
at
org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
at
011763f3ecc678d2org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.jav
a:149)
at [my code].

The code leading up to the call to extract text is as follows:

org.pdfbox.pdfparser.PDFParser parser = new
org.pdfbox.pdfparser.PDFParser(myInputStream);

parser.parse();

pdDocument = parser.getPDDocument();

setContent(new PDFTextStripper().getText(pdDocument));


I hope the formatting is ok! Have you encountered this error before and can
you suggest any causes or solutions?

Thanks,
Ben Kirby
[EMAIL PROTECTED]

[Comment on SourceForge]
Date: 2008-05-22 20:02
Sender: danielwilson
Logged In: YES 
user_id=1737686
Originator: NO

Ben,
I just ran the PDF through our text extraction test -- without error.

I'm not sure what to say about the error you've encountered.

I do find that since version 0.73 came out Ben has updated the FlateFilter
source code.  Have you tried the latest code or just the 0.73 build?

[Comment on SourceForge]
Date: 2008-05-29 10:11
Sender: nobody
Logged In: NO 

Hi Daniel, sorry for the delay in responding. I've only tried the 0.7.3
build - I'll grab the latest code build now, and let you kow how I get
on...

Thanks,
Ben

[Comment on SourceForge]
Date: 2008-05-29 10:30
Sender: nobody
Logged In: NO 

Hi again, sorry, but your 13/05 and 14/05 nightly build jar and zips seem
to be corrupt. Maven can't use the maven jars, and WinRAR throws an error
while opening all of them, zips and jars.

Am I doing something wrong, or are they corrupt?

Thanks,
Ben

[Comment on SourceForge]
Date: 2008-07-31 11:52
Sender: bmk06
Logged In: YES 
user_id=1683216
Originator: YES

Hi again. I've just checked back for a response, and as there isn't any,
have gone to try the nightly builds again. However there don't seem to be
any! I'm trying http://www.pdfbox.org/dist - have they moved? I really want
to get this fixed in our build, so please could let me know what's going
on.

Thanks,
Ben
                                                        
[Comment on SourceForge]
Date: 2008-07-31 12:05
Sender: danielwilson
Logged In: YES 
user_id=1737686
Originator: NO

In the near future, http://incubator.apache.org/projects/pdfbox.html will
be the place to look for PDFBox stuff.

I don't see nightly builds ... I'll see if I can find out about those.

[Comment on SourceForge]
Date: 2008-07-31 13:12
Sender: bmk06
Logged In: YES 
user_id=1683216
Originator: YES

Thanks Daniel - I came across the new site myself, but, you're right, I
couldn't see any nightly builds. If you could let me know what the plan is,
that'd be great, otherwise I'll check back next week.

Thanks for your help,
Ben


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to