Hi everyone; I have a program written that has two PDF function
requirements:
1. It must be able to return all of the text from the file
2. It must be able to find red text within the file
I have two different types of PDF files. One we can call a Job Output File,
which may or may not have red text in it. The other is a Job Location File
which contains a table with all of the locations of the Job Output Files.
Originally I wrote the program with a custom text stripper which simply
adds a state boolean to track whether it found red in a given file. I then
created an overloaded processTextPosition method that looks like the
following:
[I found this method through researching but if there is a better method,
by all means share]
@Override
protected void processTextPosition(TextPosition textPos)
{
try
{
PDGraphicsState graphicsState = getGraphicsState();
// IF the current text contains RED
if (graphicsState.getNonStrokingColor().getJavaColor().getRed()
== 255)
{
this.hasRed = true;
}
}
catch (IOException ioe)
{
ioe.printStackTrace();
}
}
If I run the program on a Job Output File it works flawlessly. If I run it
on a Job Location File (which will never have red in it), I get the
following warning:
org.apache.pdfbox.util.operator.pagedrawer.FillEvenOddRule process
WARNING: java.lang.ClassCastException: MyPDFStripper cannot be cast to
org.apache.pdfbox.pdfviewer.PageDrawer
java.lang.ClassCastException: MyPDFStripper cannot be cast to
org.apache.pdfbox.pdfviewer.PageDrawer
at
org.apache.pdfbox.util.operator.pagedrawer.FillEvenOddRule.process(FillEvenOddRule.java:56)
at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:557)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
at MyPDFStripper.containsRed(IncrementalPDFStripper.java:68)
The program will generate NO warnings if I comment out the method call for
containsRed when passing it a Job Location File. Knowing this, I could get
around this warning rather easily by handling this case differently (which
it would be, but this is what testing is for; right?). But my question to
all of you is, why am I getting this? Is it because this Job Location File
has locations in a table that is throwing off the TextStripper? This is the
only difference between the files (neither contains images) that I can tell.
Thank you guys for your time!
Sincerely,
Aaron