Hello; I am just going to jump in and ask about the following warning when used with the default PDFTextStripper class:
WARNING: Count in xref table is 0 at offset 96825 Attached is the causing document. I thought it may have to do with the Properties file that Tillman Hausherr pointed out to me, but didn't. This isn't a big issue as the program still functions, but if I could get rid of the warning so I don't have to look at it - more the merrier! Also getting to the PDF spec. If there is anything I could assist with if the properties file becomes an active issue (even just testing), let me know. Thanks, -Aaron On Wed, Jul 30, 2014 at 11:10 AM, John Hewson <j...@jahewson.com> wrote: > On 29 Jul 2014, at 23:12, Maruan Sahyoun <sahy...@fileaffairs.de> wrote: > > > +1 for removing the .properties file if the new mechanism is easier to > understand and handle. The discussion doesn’t provide that proof or some > information about that. > > > > How would a replacement look like? > > Basically like registerOperatorProcessor(), as used in > PreflightStreamEngine. > > > > > OTOH if it’s a documentation issue we could also add some more > information to the javadocs to explain the dependencies. > > > > We could add a register/unregister method to allow to add/remove custom > operator handling or provide a service discovery mechanism. This way we > still have the old flexibility. > > > > As Andreas notes, there’s a registerOperatorProcessor method which does > this, so the mechanism is already in place. The problem is not that we > don’t have the mechanism, it’s that we’re using .properties files at all. > The list of operator’s can’t be controlled from both code and from > .properties lists, one source has to be authoritative - otherwise we’d end > up with a situation where we have an operator disabled in a .properties > file and then re-enabled in code. Currently we have a situation where that > could happen. > > Therefore, removing the .properties is the only workable solution. It’s > important to note that it’s very, very unlikely that anybody is using the > .properties files in a use-case where they are not also making some code > changes, so the supposed benefit of “not having to recompile” never > existed. Adding an operator would always require compile-time changes to > PDFBox so that the PDFStreamEngine subclasses actually does something with > the new operator. > > -- John > > > BR > > Maruan > > > > Am 29.07.2014 um 21:48 schrieb John Hewson <j...@jahewson.com>: > > > >> Right but we need to address the confusion and complexity that has been > caused by .properties files which made PDFBOX-2246 so tricky to figure out. > >> > >> Lets remove this wart! > >> > >> -- John > >> > >> On 29 Jul 2014, at 10:44, Tilman Hausherr <thaush...@t-online.de> > wrote: > >> > >>> Hi, > >>> > >>> At this time, the problem I see and wanted to solve (PDFBOX-2246) > exists regardless whether we use a properties file or initialize directly > in the code. > >>> > >>> Tilman > >>> > >>> > >>> Am 29.07.2014 19:41, schrieb John Hewson: > >>>> On 29 Jul 2014, at 03:44, Andreas Lehmkühler <andr...@lehmi.de> > wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> it's not a black and white issue (comments inline) > >>>>> > >>>>>> John Hewson <j...@jahewson.com> hat am 29. Juli 2014 um 07:44 > geschrieben: > >>>>>> > >>>>>> > >>>>>> Yes, really I should have said subclasses of PDFStreamEngine - > that's where > >>>>>> the .properties file originates. I'd propose replacing the > properties > >>>>>> mechanism with a simple method containing the mapping which can be > overridden > >>>>>> in subclasses. Ultimately, users expect to be able to subclass the > behaviour > >>>>>> of a class by just subclassing the class. > >>>>> PDFStreamEngine doesn't configure any operator set itself. The > subclasses are > >>>>> supposed to configure their own set of operators depending on the > particular > >>>>> usecase. E.g. to extend the text extraction one has to subclass > PDFTextStripper > >>>>> and so on. > >>>> It’s PDFStreamEngine which implements the .property mechanism though, > via the > >>>> PDFStreamEngine(Properties properties) constructor. > >>>> > >>>>> E.g. to extend the text extraction one has to subclass > PDFTextStripper and so on. > >>>> That’s true, but it’s only half the story, don’t forget that the > .properties files need > >>>> to be copied and pasted elsewhere and modified along with overriding > which .property > >>>> file is passed in the constructor if you want to truly override the > class’ behaviour. > >>>> > >>>>>> We've seen a number of incidents of confusion on the mailing list > due to the > >>>>>> current design. > >>>>> IMHO, most of the confusion is based on the lack of knowledge of the > pdf spec. > >>>>> One can't understand how pdfbox works under the hood by simply > looking at the > >>>>> code. One has to understand the pdf spec as well, at least the base > concepts. > >>>> I’m specifically talking about confusion surrounding how to override > operators, and > >>>> .properties files, this has come up before. This entire thread has > been caused by > >>>> PDFBox’s design and *not* the PDF spec. > >>>> > >>>>>> I'd say that to the modern Java developer having non-code runtime > binding has > >>>>>> become an anti-pattern, resulting in brittle code which can't > easily be > >>>>>> navigated in an IDE and which resists automated analysis and > exhibits runtime > >>>>>> failures despite compiling ok. This is one of those cases where the > collective > >>>>>> wisdom has just evolved over the years. > >>>>> It depends on the given usecase. All solutions have advantages and > >>>>> disadvantages. E.g. if someone wants to configure the > PDFTextStripper without > >>>>> recompiling the code, it is quite handy to keep the configuration in > a text > >>>>> file. > >>>> Has anybody *ever* wanted to change the operators which > PDFTextStripper is > >>>> processing without recompiling the code? These are internal > implementation > >>>> details that shouldn’t be exposed in the first place - it’s not a > “configuration” at > >>>> all, especially as 99% of possible changes would just break > PDFTextStripper. > >>>> > >>>>> In this case I'm neither pro or con a text based config, but I tend > to agree > >>>>> with John to have the different configurations in some method within > the > >>>>> subclasses of PDFStreamEngine. > >>>> As above, this isn’t “configuration” at all, it lacks even a basic > use case. I don’t > >>>> see any pros which aren’t fabricated for the sake of argument, but > the cons are > >>>> causing us significant problems right here, right now. > >>>> > >>>>> BR > >>>>> Andreas Lehmkühler > >>>>> > >>>>>> -- John > >>>>>> > >>>>>>> On 28 Jul 2014, at 13:42, Tilman Hausherr <thaush...@t-online.de> > wrote: > >>>>>>> > >>>>>>> I disagree - one doesn't *have* to pass a property file to > PDFTextStripper > >>>>>>> and PageDrawer. The properties file for PDFTextStripper is > optional. The > >>>>>>> property parameter was already there before it became an apache > project. > >>>>>>> > >>>>>>> > >>>>>>> Tilman > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Am 28.07.2014 22:08, schrieb John Hewson: > >>>>>>>> We need to get rid of these .properties files, they’re causing > endless > >>>>>>>> confusion, not to mention that they hide runtime dependencies in > text > >>>>>>>> files. > >>>>>>>> > >>>>>>>> We should make it so that overriding a TextStripper, PageDrawer, > etc. > >>>>>>>> doesn’t require external .properties files, currently Preflight > works in > >>>>>>>> this manner and it’s much clearer. > >>>>>>>> > >>>>>>>> I guess this is a legacy of the “old” ways of Java XML everything. > >>>>>>>> > >>>>>>>> -- John > >>>>>>>> > >>>>>>>>> On 27 Jul 2014, at 10:09, -A <aa...@hrtmn.net> wrote: > >>>>>>>>> > >>>>>>>>> Thank you, that works as promised and removes the warning. I'm > still > >>>>>>>>> hoping > >>>>>>>>> to find a resource that better explains the pieces of PDFBox and > how they > >>>>>>>>> work together. Unfortunately most posts on the internet are > solely how and > >>>>>>>>> not why. > >>>>>>>>> > >>>>>>>>> Appreciate it! > >>>>>>>>> > >>>>>>>>> -Aaron > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Sun, Jul 27, 2014 at 8:00 AM, Tilman Hausherr < > thaush...@t-online.de> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> That didn't happen to me, but maybe it did happen to you with > another > >>>>>>>>>> file. > >>>>>>>>>> > >>>>>>>>>> Another solution would be to pass your own properties file, and > it should > >>>>>>>>>> have this content: > >>>>>>>>>> > >>>>>>>>>> ======================= > >>>>>>>>>> # Licensed to the Apache Software Foundation (ASF) under one or > more > >>>>>>>>>> # contributor license agreements. See the NOTICE file > distributed with > >>>>>>>>>> # this work for additional information regarding copyright > ownership. > >>>>>>>>>> # The ASF licenses this file to You under the Apache License, > Version 2.0 > >>>>>>>>>> # (the "License"); you may not use this file except in > compliance with > >>>>>>>>>> # the License. You may obtain a copy of the License at > >>>>>>>>>> # > >>>>>>>>>> # http://www.apache.org/licenses/LICENSE-2.0 > >>>>>>>>>> # > >>>>>>>>>> # Unless required by applicable law or agreed to in writing, > software > >>>>>>>>>> # distributed under the License is distributed on an "AS IS" > BASIS, > >>>>>>>>>> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express > or > >>>>>>>>>> implied. > >>>>>>>>>> # See the License for the specific language governing > permissions and > >>>>>>>>>> # limitations under the License. > >>>>>>>>>> > >>>>>>>>>> # This table is maps PDF stream operators to concrete > OperatorProcessor > >>>>>>>>>> # subclasses that are used by the PDFStreamEngine class to > interpret the > >>>>>>>>>> # PDF document. The classes configured here allow the > PDFTextStripper > >>>>>>>>>> # subclass of PDFStreamEngine to extract text content of the > document. > >>>>>>>>>> > >>>>>>>>>> BT = org.apache.pdfbox.util.operator.BeginText > >>>>>>>>>> cm = org.apache.pdfbox.util.operator.Concatenate > >>>>>>>>>> Do = org.apache.pdfbox.util.operator.Invoke > >>>>>>>>>> ET = org.apache.pdfbox.util.operator.EndText > >>>>>>>>>> gs = org.apache.pdfbox.util.operator.SetGraphicsStateParameters > >>>>>>>>>> q = org.apache.pdfbox.util.operator.GSave > >>>>>>>>>> Q = org.apache.pdfbox.util.operator.GRestore > >>>>>>>>>> T* = org.apache.pdfbox.util.operator.NextLine > >>>>>>>>>> Tc = org.apache.pdfbox.util.operator.SetCharSpacing > >>>>>>>>>> Td = org.apache.pdfbox.util.operator.MoveText > >>>>>>>>>> TD = org.apache.pdfbox.util.operator.MoveTextSetLeading > >>>>>>>>>> Tf = org.apache.pdfbox.util.operator.SetTextFont > >>>>>>>>>> Tj = org.apache.pdfbox.util.operator.ShowText > >>>>>>>>>> TJ = org.apache.pdfbox.util.operator.ShowTextGlyph > >>>>>>>>>> TL = org.apache.pdfbox.util.operator.SetTextLeading > >>>>>>>>>> Tm = org.apache.pdfbox.util.operator.SetMatrix > >>>>>>>>>> Tr = org.apache.pdfbox.util.operator.SetTextRenderingMode > >>>>>>>>>> Ts = org.apache.pdfbox.util.operator.SetTextRise > >>>>>>>>>> Tw = org.apache.pdfbox.util.operator.SetWordSpacing > >>>>>>>>>> Tz = org.apache.pdfbox.util.operator.SetHorizontalTextScaling > >>>>>>>>>> w = org.apache.pdfbox.util.operator.SetLineWidth > >>>>>>>>>> \' = org.apache.pdfbox.util.operator.MoveAndShow > >>>>>>>>>> \" = org.apache.pdfbox.util.operator.SetMoveAndShow > >>>>>>>>>> > >>>>>>>>>> CS=org.apache.pdfbox.util.operator.SetStrokingColorSpace > >>>>>>>>>> cs=org.apache.pdfbox.util.operator.SetNonStrokingColorSpace > >>>>>>>>>> rg=org.apache.pdfbox.util.operator.SetNonStrokingRGBColor > >>>>>>>>>> G=org.apache.pdfbox.util.operator.SetStrokingGrayColor > >>>>>>>>>> g=org.apache.pdfbox.util.operator.SetNonStrokingGrayColor > >>>>>>>>>> K=org.apache.pdfbox.util.operator.SetStrokingCMYKColor > >>>>>>>>>> k=org.apache.pdfbox.util.operator.SetNonStrokingCMYKColor > >>>>>>>>>> RG=org.apache.pdfbox.util.operator.SetStrokingRGBColor > >>>>>>>>>> rg=org.apache.pdfbox.util.operator.SetNonStrokingRGBColor > >>>>>>>>>> SC=org.apache.pdfbox.util.operator.SetStrokingColor > >>>>>>>>>> sc=org.apache.pdfbox.util.operator.SetNonStrokingColor > >>>>>>>>>> SCN=org.apache.pdfbox.util.operator.SetStrokingColor > >>>>>>>>>> scn=org.apache.pdfbox.util.operator.SetNonStrokingColor > >>>>>>>>>> > >>>>>>>>>> # The following operators are not relevant to text extraction, > >>>>>>>>>> # so we can silently ignore them. > >>>>>>>>>> > >>>>>>>>>> b > >>>>>>>>>> B > >>>>>>>>>> b* > >>>>>>>>>> B* > >>>>>>>>>> BDC > >>>>>>>>>> BI > >>>>>>>>>> BMC > >>>>>>>>>> BX > >>>>>>>>>> c > >>>>>>>>>> d > >>>>>>>>>> d0 > >>>>>>>>>> d1 > >>>>>>>>>> DP > >>>>>>>>>> El > >>>>>>>>>> EMC > >>>>>>>>>> EX > >>>>>>>>>> f > >>>>>>>>>> F > >>>>>>>>>> f* > >>>>>>>>>> h > >>>>>>>>>> i > >>>>>>>>>> ID > >>>>>>>>>> j > >>>>>>>>>> J > >>>>>>>>>> l > >>>>>>>>>> m > >>>>>>>>>> M > >>>>>>>>>> MP > >>>>>>>>>> n > >>>>>>>>>> re > >>>>>>>>>> ri > >>>>>>>>>> s > >>>>>>>>>> S > >>>>>>>>>> sh > >>>>>>>>>> v > >>>>>>>>>> W > >>>>>>>>>> W* > >>>>>>>>>> y > >>>>>>>>>> > >>>>>>>>>> ======================= > >>>>>>>>>> > >>>>>>>>>> Tilman > >>>>>>>>>> > >>>>>>>>>> Am 27.07.2014 15:54, schrieb -A: > >>>>>>>>>> > >>>>>>>>>> Tilman; > >>>>>>>>>>> That is somewhat embarrassing. At one point I brought this to > the > >>>>>>>>>>> mailing > >>>>>>>>>>> list (because of the following warning) and was told to remove > that line > >>>>>>>>>>> because the TextStripper wasn't actually a PageDrawer. The > functionality > >>>>>>>>>>> still worked after that, however. > >>>>>>>>>>> > >>>>>>>>>>> Is there a way to do this without the warning, perhaps > something within > >>>>>>>>>>> PageDrawer? > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Thank you, > >>>>>>>>>>> -Aaron > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> WARNING: java.lang.ClassCastException: IncrementalPDFStripper > cannot be > >>>>>>>>>>> cast to org.apache.pdfbox.pdfviewer.PageDrawer > >>>>>>>>>>> java.lang.ClassCastException: IncrementalPDFStripper cannot be > cast to > >>>>>>>>>>> org.apache.pdfbox.pdfviewer.PageDrawer > >>>>>>>>>>> at > >>>>>>>>>>> > org.apache.pdfbox.util.operator.pagedrawer.AppendRectangleToPath.process( > >>>>>>>>>>> AppendRectangleToPath.java:46) > >>>>>>>>>>> at > >>>>>>>>>>> org.apache.pdfbox.util.PDFStreamEngine.processOperator( > >>>>>>>>>>> PDFStreamEngine.java:557) > >>>>>>>>>>> at > >>>>>>>>>>> org.apache.pdfbox.util.PDFStreamEngine.processSubStream( > >>>>>>>>>>> PDFStreamEngine.java:268) > >>>>>>>>>>> at > >>>>>>>>>>> org.apache.pdfbox.util.PDFStreamEngine.processSubStream( > >>>>>>>>>>> PDFStreamEngine.java:235) > >>>>>>>>>>> at > >>>>>>>>>>> org.apache.pdfbox.util.PDFStreamEngine.processStream( > >>>>>>>>>>> PDFStreamEngine.java:215) > >>>>>>>>>>> at > IncrementalPDFStripper.containsRed(IncrementalPDFStripper.java:90) > >>>>>>>>>>> at IncrementalPDFStripper.main(IncrementalPDFStripper.java:56) > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Sun, Jul 27, 2014 at 5:47 AM, Tilman Hausherr < > thaush...@t-online.de> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> It is even easier than I thought - replace super() with this: > >>>>>>>>>>>> super(ResourceLoader.loadProperties("org/apache/ > >>>>>>>>>>>> pdfbox/resources/PageDrawer.properties", true)); > >>>>>>>>>>>> > >>>>>>>>>>>> Tilman > >>>>>>>>>>>> > >>>>>>>>>>>> Am 27.07.2014 13:03, schrieb Tilman Hausherr: > >>>>>>>>>>>> > >>>>>>>>>>>> After having written the text below, I tested by including > the "rg" > >>>>>>>>>>>> > >>>>>>>>>>>>> operator in the properties list and now it worked. I also > tested > >>>>>>>>>>>>> deleting > >>>>>>>>>>>>> your println and instead adding this if the text is red: > >>>>>>>>>>>>> > >>>>>>>>>>>>> System.out.print (textPos.getCharacter()); > >>>>>>>>>>>>> > >>>>>>>>>>>>> and so I got this output: > >>>>>>>>>>>>> > >>>>>>>>>>>>> 21_Key .1295 R~Wall Prof LinP 0.003 0.004 > 0.000 true > >>>>>>>>>>>>> > >>>>>>>>>>>>> which is exactly what is red in the PDF. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Another way (probably better) to do it would probably be to > not derive > >>>>>>>>>>>>> PDFTextStripper but |PDFStreamEngine and construct it with|| > >>>>>>>>>>>>> > >>>>>>>>>>>>> ResourceLoader.loadProperties("org/apache/pdfbox/resources/ > >>>>>>>>>>>>> PageDrawer.properties")| > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> see also http://stackoverflow.com/a/9157714/535646 > >>>>>>>>>>>>> > >>>>>>>>>>>>> Tilman > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Am 27.07.2014 12:14, schrieb Tilman Hausherr: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>> Do you still have the code that worked? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I'm not the text extraction specialist here, but what I did > was to > >>>>>>>>>>>>>> look > >>>>>>>>>>>>>> in the uncompressed source of the PDF. The stream has code > like this: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 0 0 0 rg > >>>>>>>>>>>>>> 0 0.5019 0 rg > >>>>>>>>>>>>>> 1 0 0 rg > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The first line sets to black, the second to green, the > third to red. > >>>>>>>>>>>>>> And > >>>>>>>>>>>>>> from what I saw, it can't work at all, because the "rg" > operator > >>>>>>>>>>>>>> isn't > >>>>>>>>>>>>>> processed when extracting text, because > PDFTextStripper.properties > >>>>>>>>>>>>>> doesn't > >>>>>>>>>>>>>> contain the "rg" operator. (The operator is in another > list, which is > >>>>>>>>>>>>>> used > >>>>>>>>>>>>>> when rendering) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> So that is what puzzles me. I think it can't work at all. > But you > >>>>>>>>>>>>>> said > >>>>>>>>>>>>>> it did work at a time. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Tilman > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Am 27.07.2014 07:43, schrieb Tilman Hausherr: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>> Please upload the PDF somewhere and post the URL, PDF > files are > >>>>>>>>>>>>>>> removed > >>>>>>>>>>>>>>> from the mailing list. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Tilman > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Am 27.07.2014 02:35, schrieb -A: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hello again. I've been trying to figure out this issue > that has come > >>>>>>>>>>>>>>>> up for me and in my research I found someone posting on > >>>>>>>>>>>>>>>> StackOverflow ( > >>>>>>>>>>>>>>>> http://stackoverflow.com/questions/10844271/how-to-get- > >>>>>>>>>>>>>>>> font-color-using-pdfbox) a similar issue where they could > not read > >>>>>>>>>>>>>>>> any colors from a PDF. The user posted the code and > someone else > >>>>>>>>>>>>>>>> took it, > >>>>>>>>>>>>>>>> ran it, and reported that it worked. The users approach > was > >>>>>>>>>>>>>>>> different than > >>>>>>>>>>>>>>>> mine, but alas. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I'm not sure at this point what is going on. I have > stepped through > >>>>>>>>>>>>>>>> each individual character and checked the PDGraphicsState > object, > >>>>>>>>>>>>>>>> and even > >>>>>>>>>>>>>>>> when I am looking at an open file with visibly red text > (attached) > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>> debugger only reports DeviceGray. If I print out the > ColorSpace > >>>>>>>>>>>>>>>> name > >>>>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>> the PDGraphicsState this is what is printed - for every > character. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I would appreciate if someone could perhaps run the > attached text > >>>>>>>>>>>>>>>> stripper with the attached PDF file and report back if it > actually > >>>>>>>>>>>>>>>> prints > >>>>>>>>>>>>>>>> trueinstead of false, as it does for me. Since I saw this > >>>>>>>>>>>>>>>> occurrence > >>>>>>>>>>>>>>>> elsewhere I'd like to rule that out - in case an IDE > setting of > >>>>>>>>>>>>>>>> some > >>>>>>>>>>>>>>>> sort > >>>>>>>>>>>>>>>> may be causing this? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> It should be noted that I began using PDFBox with 1.8.5 > and had > >>>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>> code working fine. Still with 1.8.5 yesterday it was > failing. > >>>>>>>>>>>>>>>> Upgrading to > >>>>>>>>>>>>>>>> 1.8.6 yielded the same results. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> If this is an actual issue I do not mind attempting to > solve it if > >>>>>>>>>>>>>>>> someone may have a general idea where to point me as to > prevent > >>>>>>>>>>>>>>>> needless > >>>>>>>>>>>>>>>> meddling with graphics state objects. Or, if this should > be > >>>>>>>>>>>>>>>> reported > >>>>>>>>>>>>>>>> I can > >>>>>>>>>>>>>>>> do that as well. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Thanks! > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> -Aaron > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> *Previous Message:* > >>>>>>>>>>>>>>>> * > >>>>>>>>>>>>>>>> * > >>>>>>>>>>>>>>>> * > >>>>>>>>>>>>>>>> * > >>>>>>>>>>>>>>>> I've attached an updated stripper file with the only > addition being > >>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>> main function to test the class specifically. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> When ran with the PDF I have also attached it indeed does > not > >>>>>>>>>>>>>>>> recognize the red text. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> At this point it seems that this issue is solely > dependent on > >>>>>>>>>>>>>>>> PDFBox. > >>>>>>>>>>>>>>>> I'll stay tuned for some insight hopefully. If any other > >>>>>>>>>>>>>>>> information > >>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>> needed, let me know! > >>> > >> > > > >