[
https://issues.apache.org/jira/browse/TIKA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018817#comment-18018817
]
ASF GitHub Bot commented on TIKA-4476:
--------------------------------------
THausherr commented on PR #2315:
URL: https://github.com/apache/tika/pull/2315#issuecomment-3266016287
I also asked copilot with GPT4 and it didn't bring anything useful. However
GPT5 did bring what I mentioned, and another one:
===
Possible correctness edge case: isExclusivelyAudio() uses allMatch over
directories filtered to Mp4MediaDirectory. If that filtered set is empty,
allMatch returns true and you’ll incorrectly force audio/mp4. This can happen
if metadata-extractor didn’t produce any Mp4MediaDirectory entries. A safer
check would be:
Return true iff there is at least one Mp4SoundDirectory and no
Mp4VideoDirectory.
Example:
hasSound = anyMatch(d instanceof Mp4SoundDirectory)
hasVideo = anyMatch(d instanceof Mp4VideoDirectory)
return hasSound && !hasVideo
===
So the question is, could an MP4 file with "empty directories" exist at all?
> Audio only MP4 files should be typed audio/mp4
> ----------------------------------------------
>
> Key: TIKA-4476
> URL: https://issues.apache.org/jira/browse/TIKA-4476
> Project: Tika
> Issue Type: Bug
> Reporter: Tom Brisland
> Priority: Minor
>
> At present an MP4 container with exclusively audio is still typed as
> video/mp4.
> Ideally we'd want to categorise these files as audio/mp4 instead.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)