Tim, I have extracted the pptx PowerPoint file containing the Prague footer. I'm want to write a unit test for POI to find the Prague string so I can figure why Prague was not included in the Tika regression test using POI 3.15 beta 3 but was found by POI 3.15 beta 1.
Could you point me to the Tika code that generated the potential regressions zip file in TIKA-2013, or the POI class/function that is used to extract the text from a document? Also, is the pptx file shareable and ASL 2.0 licensed so that it can be included as part of POI's unit test suite? On Fri, Aug 12, 2016 at 6:52 PM, Javen O'Neal <javenon...@gmail.com> wrote: > On Aug 12, 2016 11:39, "Allison, Timothy B." <talli...@mitre.org> wrote: >>...the two potential content regressions may be caused by something at the >> Tika level. If anyone has time to take a look, that'd be great. > > I can take a look this weekend. > > Did you use the same Tika code with different POI versions for these tests > (so that we can attribute the change in behavior to a POI commit, regardless > of whether the bug is in Tika or POI)? --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org