Re: Custom PDFTextStripper Warning (sometimes)

John Hewson Mon, 07 Jul 2014 11:14:40 -0700

Hi Aaron

You’re using the operator classes from the 
“org.apache.pdfbox.util.operator.pagedrawer” package with your custom 
TextStripper, however these class are only for use with a PageDrawer. If you 
look at the top entry in the stack trace 
"org.apache.pdfbox.util.operator.pagedrawer.FillEvenOddRule.process(FillEvenOddRule.java:56)”
 then you’ll see that the code at this line is:


PageDrawer drawer = (PageDrawer)context;

But your context class is TextStripper (or at least a subclass of it) not a 
PageDrawer. The solution is not to initialise your TextStripper with the 
.properties file which maps PageDrawer operators, take a look at some of the 
subclasses of TextStripper which are already in PDFBox to see how this is done.

-- John

On 7 Jul 2014, at 10:50, -A <[email protected]> wrote:

> Hi everyone; I have a program written that has two PDF function
> requirements:
> 
> 
>   1. It must be able to return all of the text from the file
>   2. It must be able to find red text within the file
> 
> 
> I have two different types of PDF files. One we can call a Job Output File,
> which may or may not have red text in it. The other is a Job Location File
> which contains a table with all of the locations of the Job Output Files.
> Originally I wrote the program with a custom text stripper which simply
> adds a state boolean to track whether it found red in a given file. I then
> created an overloaded processTextPosition method that looks like the
> following:
> 
> [I found this method through researching but if there is a better method,
> by all means share]
> 
> @Override
>    protected void processTextPosition(TextPosition textPos)
>    {
>        try
>        {
>            PDGraphicsState graphicsState = getGraphicsState();
> 
>            // IF the current text contains RED
>            if (graphicsState.getNonStrokingColor().getJavaColor().getRed()
> == 255)
>            {
>                this.hasRed = true;
>            }
> 
>        }
>        catch (IOException ioe)
>        {
>            ioe.printStackTrace();
>        }
> 
>    }
> 
> If I run the program on a Job Output File it works flawlessly. If I run it
> on a Job Location File (which will never have red in it), I get the
> following warning:
> 
> org.apache.pdfbox.util.operator.pagedrawer.FillEvenOddRule process
> WARNING: java.lang.ClassCastException: MyPDFStripper cannot be cast to
> org.apache.pdfbox.pdfviewer.PageDrawer
> java.lang.ClassCastException: MyPDFStripper cannot be cast to
> org.apache.pdfbox.pdfviewer.PageDrawer
> at
> org.apache.pdfbox.util.operator.pagedrawer.FillEvenOddRule.process(FillEvenOddRule.java:56)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:557)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
> at MyPDFStripper.containsRed(IncrementalPDFStripper.java:68)
> 
> 
> The program will generate NO warnings if I comment out the method call for
> containsRed when passing it a Job Location File. Knowing this, I could get
> around this warning rather easily by handling this case differently (which
> it would be, but this is what testing is for; right?). But my question to
> all of you is, why am I getting this? Is it because this Job Location File
> has locations in a table that is throwing off the TextStripper? This is the
> only difference between the files (neither contains images) that I can tell.
> 
> 
> Thank you guys for your time!
> Sincerely,
> Aaron

Re: Custom PDFTextStripper Warning (sometimes)

Reply via email to