[
https://issues.apache.org/jira/browse/TIKA-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950943#comment-17950943
]
Tim Allison commented on TIKA-4411:
-----------------------------------
Two issues above are now fixed. We're getting 3 new ooxml exceptions and a few
ooxml->zip changes in detection, but I _think_ these are improvements.
The new tika-eval fixes appear to work. The reports now include the internal
path for attachments, and there are no complaints when opening xlsx in
LibreOffice.
I'm now noticing a regression in some xhtml files (such as
{{bug_trackers/MOZILLA/1534195-1623599/MOZILLA-1554250-6.xhtml}}. For some of
these files, there are fewer "common tokens", or, if there are more, they are
xhtml tags. The signals for this regression were in the earlier run...I just
didn't notice them. :(
I'll look into these today.
> Run the 3.2.0 release process
> -----------------------------
>
> Key: TIKA-4411
> URL: https://issues.apache.org/jira/browse/TIKA-4411
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
> Fix For: 3.2.0
>
> Attachments: reports-3.2.0-pre-rc1.tgz, reports-3.2.0.tgz
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)