Hi everyone; I have a program written that has two PDF function
requirements:


   1. It must be able to return all of the text from the file
   2. It must be able to find red text within the file


I have two different types of PDF files. One we can call a Job Output File,
which may or may not have red text in it. The other is a Job Location File
which contains a table with all of the locations of the Job Output Files.
Originally I wrote the program with a custom text stripper which simply
adds a state boolean to track whether it found red in a given file. I then
created an overloaded processTextPosition method that looks like the
following:

[I found this method through researching but if there is a better method,
by all means share]

@Override
    protected void processTextPosition(TextPosition textPos)
    {
        try
        {
            PDGraphicsState graphicsState = getGraphicsState();

            // IF the current text contains RED
            if (graphicsState.getNonStrokingColor().getJavaColor().getRed()
== 255)
            {
                this.hasRed = true;
            }

        }
        catch (IOException ioe)
        {
            ioe.printStackTrace();
        }

    }

If I run the program on a Job Output File it works flawlessly. If I run it
on a Job Location File (which will never have red in it), I get the
following warning:

org.apache.pdfbox.util.operator.pagedrawer.FillEvenOddRule process
WARNING: java.lang.ClassCastException: MyPDFStripper cannot be cast to
org.apache.pdfbox.pdfviewer.PageDrawer
java.lang.ClassCastException: MyPDFStripper cannot be cast to
org.apache.pdfbox.pdfviewer.PageDrawer
at
org.apache.pdfbox.util.operator.pagedrawer.FillEvenOddRule.process(FillEvenOddRule.java:56)
at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:557)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
at MyPDFStripper.containsRed(IncrementalPDFStripper.java:68)


The program will generate NO warnings if I comment out the method call for
containsRed when passing it a Job Location File. Knowing this, I could get
around this warning rather easily by handling this case differently (which
it would be, but this is what testing is for; right?). But my question to
all of you is, why am I getting this? Is it because this Job Location File
has locations in a table that is throwing off the TextStripper? This is the
only difference between the files (neither contains images) that I can tell.


Thank you guys for your time!
Sincerely,
Aaron

Reply via email to