[
https://issues.apache.org/jira/browse/TIKA-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18077797#comment-18077797
]
Tim Allison edited comment on TIKA-4683 at 5/2/26 11:48 AM:
------------------------------------------------------------
New reports: unpack a known issue. Some churn in octet-stream getting detected
as text. Some diffs in encoding detection...will take a deeper look on
Monday... nothing immediately leaps out.
gzip file names, missing embedded files in msoffice.
More zero byte file exceptions and some churn in ole vs msoffice in embedded
doc detection... further look on Monday.
I'll take a look again early Monday (EST), but I think we're good enough for
4.0.0-ALPHA?
was (Author: [email protected]):
New reports: unpack a known issue. Some churn in octet-stream getting detected
as text. Some diffs in encoding detection...will take a deeper look on
Monday... nothing immediately leaps out.
gzip file names.
More zero byte file exceptions and some churn in ole vs msoffice in embedded
doc detection... further look on Monday.
I'll take a look again early Monday (EST), but I think we're good enough for
4.0.0-ALPHA?
> Prep for 4.0.0-ALPHA release
> ----------------------------
>
> Key: TIKA-4683
> URL: https://issues.apache.org/jira/browse/TIKA-4683
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
> Attachments: reports-20260429.tar.gz, reports-20260502.tar.gz,
> reports-4.0.0-20260411.tgz, reports.tar.gz
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)