Hi Tim

I gave it another try and it looks like only the thumbnail file name is
reported, `ToTextContentHandler` is used by default

I can try again with 2.4.1 RC later

Thanks, Sergey


On Sat, Apr 30, 2022 at 2:08 PM Sergey Beryozkin <sberyoz...@gmail.com>
wrote:

> Hi Tim
>
> Thanks for a quick fix, missed your answer yesterday,  will check soon
> and let you know.
>
> Cheers Sergey
>
>
> On Fri 29 Apr 2022, 16:49 Tim Allison, <talli...@apache.org> wrote:
>
>> Hi Sergey,
>>   That the thumbnail file name showed up in the stream is a bug I
>> introduced in 2.3.x.  I missed it in the fix in 2.4.0 (TIKA-3711), but
>> I just fixed it now (TIKA-3745).
>>   Are you not seeing "Hello Quarkus" at all, or is it just not the
>> only text -- contains vs equals?  I am seeing "Hello Quarkus" in at
>> least the 2.4.0-rc1.
>>
>> On Fri, Apr 29, 2022 at 10:54 AM Sergey Beryozkin <sberyoz...@gmail.com>
>> wrote:
>> >
>> > Hi Tim, All
>> >
>> > I have a simple test reading a string content from an ODT doc failing,
>> PDF,
>> > Excel are good, but something is going on with the ODT parsing.
>> >
>> > quarkus.odt in
>> >
>> https://github.com/quarkiverse/quarkus-tika/blob/main/integration-tests/src/main/resources/
>> > is expected to return a "Hello Quarkus" string
>> >
>> > but now the test fails with
>> >
>> > Expected: is "Hello Quarkus"
>> >   Actual: Thumbnails/thumbnail.png.
>> >
>> > AutoDetectParser is used to parse, using a standard sequence
>> >
>> >
>> https://github.com/quarkiverse/quarkus-tika/blob/main/runtime/src/main/java/io/quarkus/tika/TikaParser.java#L85
>> >
>> > May be it is an auto-detection issue, the media type which is used is
>> here:
>> >
>> >
>> https://github.com/quarkiverse/quarkus-tika/blob/main/integration-tests/src/test/java/io/quarkus/it/tika/TikaParserTest.java#L25
>> >
>> > Any hints will be appreciated
>> >
>> > Thanks, Sergey
>>
>

Reply via email to