[
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340249#comment-17340249
]
Tim Allison edited comment on TIKA-3164 at 5/6/21, 2:56 PM:
------------------------------------------------------------
Reports are here:
https://corpora.tika.apache.org/base/reports/poi-5.0.1-snapshot-reports.tgz
These compare the latest 4.x vs. 5.0.1-snapshot. There's a new NPE in WMF
parsing, and it looks like we're missing a bunch of attachments.
I also need to look into why there's less content coming out of
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet ...
Parse times seem to be slower for ooxml than in 4.x, but that could be an
artifact of the mood of the vm at the time of running...
Attachments and content of spreadsheetml could be Tika issues, not POI. I need
to take a look.
was (Author: [email protected]):
Reports are here:
https://corpora.tika.apache.org/base/reports/poi-5.0.1-snapshot-reports.tgz
These compare the latest 4.x vs. 5.0.1-snapshot. There's a new NPE in WMF
parsing, and it looks like we're missing a bunch of attachments.
I also need to look into why there's less content coming out of
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet ... this
could be a Tika item, not POI...
Parse times seem to be slower for ooxml than in 4.x, but that could be an
artifact of the mood of the vm at the time of running...
> Upgrade to POI 5.0.0 when available
> -----------------------------------
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)