Hi Tim I gave it another try and it looks like only the thumbnail file name is reported, `ToTextContentHandler` is used by default
I can try again with 2.4.1 RC later Thanks, Sergey On Sat, Apr 30, 2022 at 2:08 PM Sergey Beryozkin <sberyoz...@gmail.com> wrote: > Hi Tim > > Thanks for a quick fix, missed your answer yesterday, will check soon > and let you know. > > Cheers Sergey > > > On Fri 29 Apr 2022, 16:49 Tim Allison, <talli...@apache.org> wrote: > >> Hi Sergey, >> That the thumbnail file name showed up in the stream is a bug I >> introduced in 2.3.x. I missed it in the fix in 2.4.0 (TIKA-3711), but >> I just fixed it now (TIKA-3745). >> Are you not seeing "Hello Quarkus" at all, or is it just not the >> only text -- contains vs equals? I am seeing "Hello Quarkus" in at >> least the 2.4.0-rc1. >> >> On Fri, Apr 29, 2022 at 10:54 AM Sergey Beryozkin <sberyoz...@gmail.com> >> wrote: >> > >> > Hi Tim, All >> > >> > I have a simple test reading a string content from an ODT doc failing, >> PDF, >> > Excel are good, but something is going on with the ODT parsing. >> > >> > quarkus.odt in >> > >> https://github.com/quarkiverse/quarkus-tika/blob/main/integration-tests/src/main/resources/ >> > is expected to return a "Hello Quarkus" string >> > >> > but now the test fails with >> > >> > Expected: is "Hello Quarkus" >> > Actual: Thumbnails/thumbnail.png. >> > >> > AutoDetectParser is used to parse, using a standard sequence >> > >> > >> https://github.com/quarkiverse/quarkus-tika/blob/main/runtime/src/main/java/io/quarkus/tika/TikaParser.java#L85 >> > >> > May be it is an auto-detection issue, the media type which is used is >> here: >> > >> > >> https://github.com/quarkiverse/quarkus-tika/blob/main/integration-tests/src/test/java/io/quarkus/it/tika/TikaParserTest.java#L25 >> > >> > Any hints will be appreciated >> > >> > Thanks, Sergey >> >