[
https://issues.apache.org/jira/browse/TIKA-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721512#comment-17721512
]
Tim Allison commented on TIKA-4037:
-----------------------------------
We added detection for this file format.
However, the file that was shared with me privately triggers commons-compress
to identify this as a magic-less tar file.
As a complete fallback, if no magic is found in the file, commons compress
tries to read the first record as if from a tar file and then checks the
checksum. In the file that was shared with me the first "entry" has a length
of 0 so the checksum is correctly 0. If we're able to share the triggering
file, we may want to ask commons-compress if they'd be willing to make their
detection a bit stricter and to ignore entries with length 0 when they confirm
the checksum.
Within Tika, the problem is that the other detectors are run before the magic
detector, and if the other detectors don't come up with {{octet-stream}} or a
base type of what the magic detector finds, the magic detector is ignored.
We implicitly trust the other detectors and ignore the magic detection if an
earlier detector has found something. Not sure there's an easy improvement on
the Tika side.
> Add detection for os2 bitmap array files
> ----------------------------------------
>
> Key: TIKA-4037
> URL: https://issues.apache.org/jira/browse/TIKA-4037
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Trivial
> Fix For: 2.8.1
>
>
> http://fileformats.archiveteam.org/wiki/OS/2_Bitmap_Array
--
This message was sent by Atlassian Jira
(v8.20.10#820010)