ClassCastException in FlateFilter
---------------------------------
Key: PDFBOX-354
URL: https://issues.apache.org/jira/browse/PDFBOX-354
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Reporter: Jukka Zitting
[Issue from SourceForge]
http://sourceforge.net/tracker/index.php?func=detail&aid=1965266&group_id=78314&atid=552832
Hi, I'm trying to extract text from a pdf which can be found at
http://www.vattenfall.com/www/vf_com/vf_com/Gemeinsame_Inhalte/DOCUMENT/360
168vatt/5965811xou/643131powe/892253fors/P0288421.pdf
and am getting a ClassCastException from within PDFBox. The full stacktrace
is:
java.lang.ClassCastException: org.pdfbox.cos.COSArray
at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:70)
at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
at org.pdfbox.cos.COSStream.doDecode(COSStream.java:243)
at
org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
at
org.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStreamArray
.java:200)
at
org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:101)
at
org.pdfbox.pdmodel.common.COSStreamArray.getStreamTokens(COSStreamArray.jav
a:141)
at
org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:202)
at
org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174)
at
org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336)
at
org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259)
at
org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
at
011763f3ecc678d2org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.jav
a:149)
at [my code].
The code leading up to the call to extract text is as follows:
org.pdfbox.pdfparser.PDFParser parser = new
org.pdfbox.pdfparser.PDFParser(myInputStream);
parser.parse();
pdDocument = parser.getPDDocument();
setContent(new PDFTextStripper().getText(pdDocument));
I hope the formatting is ok! Have you encountered this error before and can
you suggest any causes or solutions?
Thanks,
Ben Kirby
[EMAIL PROTECTED]
[Comment on SourceForge]
Date: 2008-05-22 20:02
Sender: danielwilson
Logged In: YES
user_id=1737686
Originator: NO
Ben,
I just ran the PDF through our text extraction test -- without error.
I'm not sure what to say about the error you've encountered.
I do find that since version 0.73 came out Ben has updated the FlateFilter
source code. Have you tried the latest code or just the 0.73 build?
[Comment on SourceForge]
Date: 2008-05-29 10:11
Sender: nobody
Logged In: NO
Hi Daniel, sorry for the delay in responding. I've only tried the 0.7.3
build - I'll grab the latest code build now, and let you kow how I get
on...
Thanks,
Ben
[Comment on SourceForge]
Date: 2008-05-29 10:30
Sender: nobody
Logged In: NO
Hi again, sorry, but your 13/05 and 14/05 nightly build jar and zips seem
to be corrupt. Maven can't use the maven jars, and WinRAR throws an error
while opening all of them, zips and jars.
Am I doing something wrong, or are they corrupt?
Thanks,
Ben
[Comment on SourceForge]
Date: 2008-07-31 11:52
Sender: bmk06
Logged In: YES
user_id=1683216
Originator: YES
Hi again. I've just checked back for a response, and as there isn't any,
have gone to try the nightly builds again. However there don't seem to be
any! I'm trying http://www.pdfbox.org/dist - have they moved? I really want
to get this fixed in our build, so please could let me know what's going
on.
Thanks,
Ben
[Comment on SourceForge]
Date: 2008-07-31 12:05
Sender: danielwilson
Logged In: YES
user_id=1737686
Originator: NO
In the near future, http://incubator.apache.org/projects/pdfbox.html will
be the place to look for PDFBox stuff.
I don't see nightly builds ... I'll see if I can find out about those.
[Comment on SourceForge]
Date: 2008-07-31 13:12
Sender: bmk06
Logged In: YES
user_id=1683216
Originator: YES
Thanks Daniel - I came across the new site myself, but, you're right, I
couldn't see any nightly builds. If you could let me know what the plan is,
that'd be great, otherwise I'll check back next week.
Thanks for your help,
Ben
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.