Hi Aaron You’re using the operator classes from the “org.apache.pdfbox.util.operator.pagedrawer” package with your custom TextStripper, however these class are only for use with a PageDrawer. If you look at the top entry in the stack trace "org.apache.pdfbox.util.operator.pagedrawer.FillEvenOddRule.process(FillEvenOddRule.java:56)” then you’ll see that the code at this line is:
PageDrawer drawer = (PageDrawer)context; But your context class is TextStripper (or at least a subclass of it) not a PageDrawer. The solution is not to initialise your TextStripper with the .properties file which maps PageDrawer operators, take a look at some of the subclasses of TextStripper which are already in PDFBox to see how this is done. -- John On 7 Jul 2014, at 10:50, -A <[email protected]> wrote: > Hi everyone; I have a program written that has two PDF function > requirements: > > > 1. It must be able to return all of the text from the file > 2. It must be able to find red text within the file > > > I have two different types of PDF files. One we can call a Job Output File, > which may or may not have red text in it. The other is a Job Location File > which contains a table with all of the locations of the Job Output Files. > Originally I wrote the program with a custom text stripper which simply > adds a state boolean to track whether it found red in a given file. I then > created an overloaded processTextPosition method that looks like the > following: > > [I found this method through researching but if there is a better method, > by all means share] > > @Override > protected void processTextPosition(TextPosition textPos) > { > try > { > PDGraphicsState graphicsState = getGraphicsState(); > > // IF the current text contains RED > if (graphicsState.getNonStrokingColor().getJavaColor().getRed() > == 255) > { > this.hasRed = true; > } > > } > catch (IOException ioe) > { > ioe.printStackTrace(); > } > > } > > If I run the program on a Job Output File it works flawlessly. If I run it > on a Job Location File (which will never have red in it), I get the > following warning: > > org.apache.pdfbox.util.operator.pagedrawer.FillEvenOddRule process > WARNING: java.lang.ClassCastException: MyPDFStripper cannot be cast to > org.apache.pdfbox.pdfviewer.PageDrawer > java.lang.ClassCastException: MyPDFStripper cannot be cast to > org.apache.pdfbox.pdfviewer.PageDrawer > at > org.apache.pdfbox.util.operator.pagedrawer.FillEvenOddRule.process(FillEvenOddRule.java:56) > at > org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:557) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235) > at > org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215) > at MyPDFStripper.containsRed(IncrementalPDFStripper.java:68) > > > The program will generate NO warnings if I comment out the method call for > containsRed when passing it a Job Location File. Knowing this, I could get > around this warning rather easily by handling this case differently (which > it would be, but this is what testing is for; right?). But my question to > all of you is, why am I getting this? Is it because this Job Location File > has locations in a table that is throwing off the TextStripper? This is the > only difference between the files (neither contains images) that I can tell. > > > Thank you guys for your time! > Sincerely, > Aaron
