thx for the hint. Maruan Sahyoun
> Am 30.07.2014 um 12:33 schrieb Andreas Lehmkühler <andr...@lehmi.de>: > > > >> Maruan Sahyoun <sahy...@fileaffairs.de> hat am 30. Juli 2014 um 08:12 >> geschrieben: >> >> >> +1 for removing the .properties file if the new mechanism is easier to >> understand and handle. The discussion doesn’t provide that proof or some >> information about that. >> >> How would a replacement look like? >> >> OTOH if it’s a documentation issue we could also add some more information to >> the javadocs to explain the dependencies. >> >> We could add a register/unregister method to allow to add/remove custom >> operator handling or provide a service discovery mechanism. This way we still >> have the old flexibility. > There is already the method registerOperatorProcessor in PDFStreamEngine to > register operators. In most cases it's called when processing the property > file. > In the case of preflight (see PreflightStreamEngine) those register calls are > done directly within the constructor. There isn't any unregister method. > > BR > Andreas Lehmkühler > >> >> BR >> Maruan >> >>> Am 29.07.2014 um 21:48 schrieb John Hewson <j...@jahewson.com>: >>> >>> Right but we need to address the confusion and complexity that has been >>> caused by .properties files which made PDFBOX-2246 so tricky to figure out. >>> >>> Lets remove this wart! >>> >>> -- John >>> >>>> On 29 Jul 2014, at 10:44, Tilman Hausherr <thaush...@t-online.de> wrote: >>>> >>>> Hi, >>>> >>>> At this time, the problem I see and wanted to solve (PDFBOX-2246) exists >>>> regardless whether we use a properties file or initialize directly in the >>>> code. >>>> >>>> Tilman >>>> >>>> >>>> Am 29.07.2014 19:41, schrieb John Hewson: >>>>> On 29 Jul 2014, at 03:44, Andreas Lehmkühler <andr...@lehmi.de> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> it's not a black and white issue (comments inline) >>>>>> >>>>>>> John Hewson <j...@jahewson.com> hat am 29. Juli 2014 um 07:44 >>>>>>> geschrieben: >>>>>>> >>>>>>> >>>>>>> Yes, really I should have said subclasses of PDFStreamEngine - that's >>>>>>> where >>>>>>> the .properties file originates. I'd propose replacing the properties >>>>>>> mechanism with a simple method containing the mapping which can be >>>>>>> overridden >>>>>>> in subclasses. Ultimately, users expect to be able to subclass the >>>>>>> behaviour >>>>>>> of a class by just subclassing the class. >>>>>> PDFStreamEngine doesn't configure any operator set itself. The subclasses >>>>>> are >>>>>> supposed to configure their own set of operators depending on the >>>>>> particular >>>>>> usecase. E.g. to extend the text extraction one has to subclass >>>>>> PDFTextStripper >>>>>> and so on. >>>>> It’s PDFStreamEngine which implements the .property mechanism though, via >>>>> the >>>>> PDFStreamEngine(Properties properties) constructor. >>>>> >>>>>> E.g. to extend the text extraction one has to subclass PDFTextStripper >>>>>> and so on. >>>>> That’s true, but it’s only half the story, don’t forget that the >>>>> .properties files need >>>>> to be copied and pasted elsewhere and modified along with overriding which >>>>> .property >>>>> file is passed in the constructor if you want to truly override the class’ >>>>> behaviour. >>>>> >>>>>>> We've seen a number of incidents of confusion on the mailing list due to >>>>>>> the >>>>>>> current design. >>>>>> IMHO, most of the confusion is based on the lack of knowledge of the pdf >>>>>> spec. >>>>>> One can't understand how pdfbox works under the hood by simply looking at >>>>>> the >>>>>> code. One has to understand the pdf spec as well, at least the base >>>>>> concepts. >>>>> I’m specifically talking about confusion surrounding how to override >>>>> operators, and >>>>> .properties files, this has come up before. This entire thread has been >>>>> caused by >>>>> PDFBox’s design and *not* the PDF spec. >>>>> >>>>>>> I'd say that to the modern Java developer having non-code runtime >>>>>>> binding has >>>>>>> become an anti-pattern, resulting in brittle code which can't easily be >>>>>>> navigated in an IDE and which resists automated analysis and exhibits >>>>>>> runtime >>>>>>> failures despite compiling ok. This is one of those cases where the >>>>>>> collective >>>>>>> wisdom has just evolved over the years. >>>>>> It depends on the given usecase. All solutions have advantages and >>>>>> disadvantages. E.g. if someone wants to configure the PDFTextStripper >>>>>> without >>>>>> recompiling the code, it is quite handy to keep the configuration in a >>>>>> text >>>>>> file. >>>>> Has anybody *ever* wanted to change the operators which PDFTextStripper is >>>>> processing without recompiling the code? These are internal implementation >>>>> details that shouldn’t be exposed in the first place - it’s not a >>>>> “configuration” at >>>>> all, especially as 99% of possible changes would just break >>>>> PDFTextStripper. >>>>> >>>>>> In this case I'm neither pro or con a text based config, but I tend to >>>>>> agree >>>>>> with John to have the different configurations in some method within the >>>>>> subclasses of PDFStreamEngine. >>>>> As above, this isn’t “configuration” at all, it lacks even a basic use >>>>> case. I don’t >>>>> see any pros which aren’t fabricated for the sake of argument, but the >>>>> cons are >>>>> causing us significant problems right here, right now. >>>>> >>>>>> BR >>>>>> Andreas Lehmkühler >>>>>> >>>>>>> -- John >>>>>>> >>>>>>>> On 28 Jul 2014, at 13:42, Tilman Hausherr <thaush...@t-online.de> >>>>>>>> wrote: >>>>>>>> >>>>>>>> I disagree - one doesn't *have* to pass a property file to >>>>>>>> PDFTextStripper >>>>>>>> and PageDrawer. The properties file for PDFTextStripper is optional. >>>>>>>> The >>>>>>>> property parameter was already there before it became an apache >>>>>>>> project. >>>>>>>> >>>>>>>> >>>>>>>> Tilman >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Am 28.07.2014 22:08, schrieb John Hewson: >>>>>>>>> We need to get rid of these .properties files, they’re causing endless >>>>>>>>> confusion, not to mention that they hide runtime dependencies in text >>>>>>>>> files. >>>>>>>>> >>>>>>>>> We should make it so that overriding a TextStripper, PageDrawer, etc. >>>>>>>>> doesn’t require external .properties files, currently Preflight works >>>>>>>>> in >>>>>>>>> this manner and it’s much clearer. >>>>>>>>> >>>>>>>>> I guess this is a legacy of the “old” ways of Java XML everything. >>>>>>>>> >>>>>>>>> -- John >>>>>>>>> >>>>>>>>>> On 27 Jul 2014, at 10:09, -A <aa...@hrtmn.net> wrote: >>>>>>>>>> >>>>>>>>>> Thank you, that works as promised and removes the warning. I'm still >>>>>>>>>> hoping >>>>>>>>>> to find a resource that better explains the pieces of PDFBox and how >>>>>>>>>> they >>>>>>>>>> work together. Unfortunately most posts on the internet are solely >>>>>>>>>> how and >>>>>>>>>> not why. >>>>>>>>>> >>>>>>>>>> Appreciate it! >>>>>>>>>> >>>>>>>>>> -Aaron >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sun, Jul 27, 2014 at 8:00 AM, Tilman Hausherr >>>>>>>>>> <thaush...@t-online.de> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> That didn't happen to me, but maybe it did happen to you with >>>>>>>>>>> another >>>>>>>>>>> file. >>>>>>>>>>> >>>>>>>>>>> Another solution would be to pass your own properties file, and it >>>>>>>>>>> should >>>>>>>>>>> have this content: >>>>>>>>>>> >>>>>>>>>>> ======================= >>>>>>>>>>> # Licensed to the Apache Software Foundation (ASF) under one or more >>>>>>>>>>> # contributor license agreements. See the NOTICE file distributed >>>>>>>>>>> with >>>>>>>>>>> # this work for additional information regarding copyright >>>>>>>>>>> ownership. >>>>>>>>>>> # The ASF licenses this file to You under the Apache License, >>>>>>>>>>> Version 2.0 >>>>>>>>>>> # (the "License"); you may not use this file except in compliance >>>>>>>>>>> with >>>>>>>>>>> # the License. You may obtain a copy of the License at >>>>>>>>>>> # >>>>>>>>>>> # http://www.apache.org/licenses/LICENSE-2.0 >>>>>>>>>>> # >>>>>>>>>>> # Unless required by applicable law or agreed to in writing, >>>>>>>>>>> software >>>>>>>>>>> # distributed under the License is distributed on an "AS IS" BASIS, >>>>>>>>>>> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or >>>>>>>>>>> implied. >>>>>>>>>>> # See the License for the specific language governing permissions >>>>>>>>>>> and >>>>>>>>>>> # limitations under the License. >>>>>>>>>>> >>>>>>>>>>> # This table is maps PDF stream operators to concrete >>>>>>>>>>> OperatorProcessor >>>>>>>>>>> # subclasses that are used by the PDFStreamEngine class to interpret >>>>>>>>>>> the >>>>>>>>>>> # PDF document. The classes configured here allow the >>>>>>>>>>> PDFTextStripper >>>>>>>>>>> # subclass of PDFStreamEngine to extract text content of the >>>>>>>>>>> document. >>>>>>>>>>> >>>>>>>>>>> BT = org.apache.pdfbox.util.operator.BeginText >>>>>>>>>>> cm = org.apache.pdfbox.util.operator.Concatenate >>>>>>>>>>> Do = org.apache.pdfbox.util.operator.Invoke >>>>>>>>>>> ET = org.apache.pdfbox.util.operator.EndText >>>>>>>>>>> gs = org.apache.pdfbox.util.operator.SetGraphicsStateParameters >>>>>>>>>>> q = org.apache.pdfbox.util.operator.GSave >>>>>>>>>>> Q = org.apache.pdfbox.util.operator.GRestore >>>>>>>>>>> T* = org.apache.pdfbox.util.operator.NextLine >>>>>>>>>>> Tc = org.apache.pdfbox.util.operator.SetCharSpacing >>>>>>>>>>> Td = org.apache.pdfbox.util.operator.MoveText >>>>>>>>>>> TD = org.apache.pdfbox.util.operator.MoveTextSetLeading >>>>>>>>>>> Tf = org.apache.pdfbox.util.operator.SetTextFont >>>>>>>>>>> Tj = org.apache.pdfbox.util.operator.ShowText >>>>>>>>>>> TJ = org.apache.pdfbox.util.operator.ShowTextGlyph >>>>>>>>>>> TL = org.apache.pdfbox.util.operator.SetTextLeading >>>>>>>>>>> Tm = org.apache.pdfbox.util.operator.SetMatrix >>>>>>>>>>> Tr = org.apache.pdfbox.util.operator.SetTextRenderingMode >>>>>>>>>>> Ts = org.apache.pdfbox.util.operator.SetTextRise >>>>>>>>>>> Tw = org.apache.pdfbox.util.operator.SetWordSpacing >>>>>>>>>>> Tz = org.apache.pdfbox.util.operator.SetHorizontalTextScaling >>>>>>>>>>> w = org.apache.pdfbox.util.operator.SetLineWidth >>>>>>>>>>> \' = org.apache.pdfbox.util.operator.MoveAndShow >>>>>>>>>>> \" = org.apache.pdfbox.util.operator.SetMoveAndShow >>>>>>>>>>> >>>>>>>>>>> CS=org.apache.pdfbox.util.operator.SetStrokingColorSpace >>>>>>>>>>> cs=org.apache.pdfbox.util.operator.SetNonStrokingColorSpace >>>>>>>>>>> rg=org.apache.pdfbox.util.operator.SetNonStrokingRGBColor >>>>>>>>>>> G=org.apache.pdfbox.util.operator.SetStrokingGrayColor >>>>>>>>>>> g=org.apache.pdfbox.util.operator.SetNonStrokingGrayColor >>>>>>>>>>> K=org.apache.pdfbox.util.operator.SetStrokingCMYKColor >>>>>>>>>>> k=org.apache.pdfbox.util.operator.SetNonStrokingCMYKColor >>>>>>>>>>> RG=org.apache.pdfbox.util.operator.SetStrokingRGBColor >>>>>>>>>>> rg=org.apache.pdfbox.util.operator.SetNonStrokingRGBColor >>>>>>>>>>> SC=org.apache.pdfbox.util.operator.SetStrokingColor >>>>>>>>>>> sc=org.apache.pdfbox.util.operator.SetNonStrokingColor >>>>>>>>>>> SCN=org.apache.pdfbox.util.operator.SetStrokingColor >>>>>>>>>>> scn=org.apache.pdfbox.util.operator.SetNonStrokingColor >>>>>>>>>>> >>>>>>>>>>> # The following operators are not relevant to text extraction, >>>>>>>>>>> # so we can silently ignore them. >>>>>>>>>>> >>>>>>>>>>> b >>>>>>>>>>> B >>>>>>>>>>> b* >>>>>>>>>>> B* >>>>>>>>>>> BDC >>>>>>>>>>> BI >>>>>>>>>>> BMC >>>>>>>>>>> BX >>>>>>>>>>> c >>>>>>>>>>> d >>>>>>>>>>> d0 >>>>>>>>>>> d1 >>>>>>>>>>> DP >>>>>>>>>>> El >>>>>>>>>>> EMC >>>>>>>>>>> EX >>>>>>>>>>> f >>>>>>>>>>> F >>>>>>>>>>> f* >>>>>>>>>>> h >>>>>>>>>>> i >>>>>>>>>>> ID >>>>>>>>>>> j >>>>>>>>>>> J >>>>>>>>>>> l >>>>>>>>>>> m >>>>>>>>>>> M >>>>>>>>>>> MP >>>>>>>>>>> n >>>>>>>>>>> re >>>>>>>>>>> ri >>>>>>>>>>> s >>>>>>>>>>> S >>>>>>>>>>> sh >>>>>>>>>>> v >>>>>>>>>>> W >>>>>>>>>>> W* >>>>>>>>>>> y >>>>>>>>>>> >>>>>>>>>>> ======================= >>>>>>>>>>> >>>>>>>>>>> Tilman >>>>>>>>>>> >>>>>>>>>>> Am 27.07.2014 15:54, schrieb -A: >>>>>>>>>>> >>>>>>>>>>> Tilman; >>>>>>>>>>>> That is somewhat embarrassing. At one point I brought this to the >>>>>>>>>>>> mailing >>>>>>>>>>>> list (because of the following warning) and was told to remove that >>>>>>>>>>>> line >>>>>>>>>>>> because the TextStripper wasn't actually a PageDrawer. The >>>>>>>>>>>> functionality >>>>>>>>>>>> still worked after that, however. >>>>>>>>>>>> >>>>>>>>>>>> Is there a way to do this without the warning, perhaps something >>>>>>>>>>>> within >>>>>>>>>>>> PageDrawer? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thank you, >>>>>>>>>>>> -Aaron >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> WARNING: java.lang.ClassCastException: IncrementalPDFStripper >>>>>>>>>>>> cannot be >>>>>>>>>>>> cast to org.apache.pdfbox.pdfviewer.PageDrawer >>>>>>>>>>>> java.lang.ClassCastException: IncrementalPDFStripper cannot be cast >>>>>>>>>>>> to >>>>>>>>>>>> org.apache.pdfbox.pdfviewer.PageDrawer >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.pdfbox.util.operator.pagedrawer.AppendRectangleToPath.process( >>>>>>>>>>>> AppendRectangleToPath.java:46) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.pdfbox.util.PDFStreamEngine.processOperator( >>>>>>>>>>>> PDFStreamEngine.java:557) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.pdfbox.util.PDFStreamEngine.processSubStream( >>>>>>>>>>>> PDFStreamEngine.java:268) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.pdfbox.util.PDFStreamEngine.processSubStream( >>>>>>>>>>>> PDFStreamEngine.java:235) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.pdfbox.util.PDFStreamEngine.processStream( >>>>>>>>>>>> PDFStreamEngine.java:215) >>>>>>>>>>>> at >>>>>>>>>>>> IncrementalPDFStripper.containsRed(IncrementalPDFStripper.java:90) >>>>>>>>>>>> at IncrementalPDFStripper.main(IncrementalPDFStripper.java:56) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Jul 27, 2014 at 5:47 AM, Tilman Hausherr >>>>>>>>>>>> <thaush...@t-online.de> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> It is even easier than I thought - replace super() with this: >>>>>>>>>>>>> super(ResourceLoader.loadProperties("org/apache/ >>>>>>>>>>>>> pdfbox/resources/PageDrawer.properties", true)); >>>>>>>>>>>>> >>>>>>>>>>>>> Tilman >>>>>>>>>>>>> >>>>>>>>>>>>> Am 27.07.2014 13:03, schrieb Tilman Hausherr: >>>>>>>>>>>>> >>>>>>>>>>>>> After having written the text below, I tested by including the >>>>>>>>>>>>> "rg" >>>>>>>>>>>>> >>>>>>>>>>>>>> operator in the properties list and now it worked. I also tested >>>>>>>>>>>>>> deleting >>>>>>>>>>>>>> your println and instead adding this if the text is red: >>>>>>>>>>>>>> >>>>>>>>>>>>>> System.out.print (textPos.getCharacter()); >>>>>>>>>>>>>> >>>>>>>>>>>>>> and so I got this output: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 21_Key .1295 R~Wall Prof LinP 0.003 0.004 0.000 >>>>>>>>>>>>>> true >>>>>>>>>>>>>> >>>>>>>>>>>>>> which is exactly what is red in the PDF. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Another way (probably better) to do it would probably be to not >>>>>>>>>>>>>> derive >>>>>>>>>>>>>> PDFTextStripper but |PDFStreamEngine and construct it with|| >>>>>>>>>>>>>> >>>>>>>>>>>>>> ResourceLoader.loadProperties("org/apache/pdfbox/resources/ >>>>>>>>>>>>>> PageDrawer.properties")| >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> see also http://stackoverflow.com/a/9157714/535646 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Tilman >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Am 27.07.2014 12:14, schrieb Tilman Hausherr: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> Do you still have the code that worked? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm not the text extraction specialist here, but what I did was >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> look >>>>>>>>>>>>>>> in the uncompressed source of the PDF. The stream has code like >>>>>>>>>>>>>>> this: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 0 0 0 rg >>>>>>>>>>>>>>> 0 0.5019 0 rg >>>>>>>>>>>>>>> 1 0 0 rg >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The first line sets to black, the second to green, the third to >>>>>>>>>>>>>>> red. >>>>>>>>>>>>>>> And >>>>>>>>>>>>>>> from what I saw, it can't work at all, because the "rg" operator >>>>>>>>>>>>>>> isn't >>>>>>>>>>>>>>> processed when extracting text, because >>>>>>>>>>>>>>> PDFTextStripper.properties >>>>>>>>>>>>>>> doesn't >>>>>>>>>>>>>>> contain the "rg" operator. (The operator is in another list, >>>>>>>>>>>>>>> which is >>>>>>>>>>>>>>> used >>>>>>>>>>>>>>> when rendering) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So that is what puzzles me. I think it can't work at all. But >>>>>>>>>>>>>>> you >>>>>>>>>>>>>>> said >>>>>>>>>>>>>>> it did work at a time. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Tilman >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Am 27.07.2014 07:43, schrieb Tilman Hausherr: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> Please upload the PDF somewhere and post the URL, PDF files are >>>>>>>>>>>>>>>> removed >>>>>>>>>>>>>>>> from the mailing list. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Tilman >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Am 27.07.2014 02:35, schrieb -A: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hello again. I've been trying to figure out this issue that has >>>>>>>>>>>>>>>> come >>>>>>>>>>>>>>>>> up for me and in my research I found someone posting on >>>>>>>>>>>>>>>>> StackOverflow ( >>>>>>>>>>>>>>>>> http://stackoverflow.com/questions/10844271/how-to-get- >>>>>>>>>>>>>>>>> font-color-using-pdfbox) a similar issue where they could not >>>>>>>>>>>>>>>>> read >>>>>>>>>>>>>>>>> any colors from a PDF. The user posted the code and someone >>>>>>>>>>>>>>>>> else >>>>>>>>>>>>>>>>> took it, >>>>>>>>>>>>>>>>> ran it, and reported that it worked. The users approach was >>>>>>>>>>>>>>>>> different than >>>>>>>>>>>>>>>>> mine, but alas. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm not sure at this point what is going on. I have stepped >>>>>>>>>>>>>>>>> through >>>>>>>>>>>>>>>>> each individual character and checked the PDGraphicsState >>>>>>>>>>>>>>>>> object, >>>>>>>>>>>>>>>>> and even >>>>>>>>>>>>>>>>> when I am looking at an open file with visibly red text >>>>>>>>>>>>>>>>> (attached) >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> debugger only reports DeviceGray. If I print out the >>>>>>>>>>>>>>>>> ColorSpace >>>>>>>>>>>>>>>>> name >>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>> the PDGraphicsState this is what is printed - for every >>>>>>>>>>>>>>>>> character. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I would appreciate if someone could perhaps run the attached >>>>>>>>>>>>>>>>> text >>>>>>>>>>>>>>>>> stripper with the attached PDF file and report back if it >>>>>>>>>>>>>>>>> actually >>>>>>>>>>>>>>>>> prints >>>>>>>>>>>>>>>>> trueinstead of false, as it does for me. Since I saw this >>>>>>>>>>>>>>>>> occurrence >>>>>>>>>>>>>>>>> elsewhere I'd like to rule that out - in case an IDE setting >>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>> sort >>>>>>>>>>>>>>>>> may be causing this? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It should be noted that I began using PDFBox with 1.8.5 and >>>>>>>>>>>>>>>>> had >>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>> code working fine. Still with 1.8.5 yesterday it was failing. >>>>>>>>>>>>>>>>> Upgrading to >>>>>>>>>>>>>>>>> 1.8.6 yielded the same results. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If this is an actual issue I do not mind attempting to solve >>>>>>>>>>>>>>>>> it if >>>>>>>>>>>>>>>>> someone may have a general idea where to point me as to >>>>>>>>>>>>>>>>> prevent >>>>>>>>>>>>>>>>> needless >>>>>>>>>>>>>>>>> meddling with graphics state objects. Or, if this should be >>>>>>>>>>>>>>>>> reported >>>>>>>>>>>>>>>>> I can >>>>>>>>>>>>>>>>> do that as well. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -Aaron >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> *Previous Message:* >>>>>>>>>>>>>>>>> * >>>>>>>>>>>>>>>>> * >>>>>>>>>>>>>>>>> * >>>>>>>>>>>>>>>>> * >>>>>>>>>>>>>>>>> I've attached an updated stripper file with the only addition >>>>>>>>>>>>>>>>> being >>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>> main function to test the class specifically. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> When ran with the PDF I have also attached it indeed does not >>>>>>>>>>>>>>>>> recognize the red text. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> At this point it seems that this issue is solely dependent on >>>>>>>>>>>>>>>>> PDFBox. >>>>>>>>>>>>>>>>> I'll stay tuned for some insight hopefully. If any other >>>>>>>>>>>>>>>>> information >>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> needed, let me know! >>