[ https://issues.apache.org/jira/browse/TIKA-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731249#comment-17731249 ]
Hudson commented on TIKA-4060: ------------------------------ SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-4060 Test AAC files, based on testWAV.wav, one without ID3, one with dummy ID3 values (nick: [https://github.com/apache/tika/commit/500900d67ede02e87440caa9f67501d3fe59b770]) * (add) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/resources/test-documents/testAACid3.aac * (add) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/resources/test-documents/testAAC.aac > Add magic to audio/aac in tika-mimetypes.xml > -------------------------------------------- > > Key: TIKA-4060 > URL: https://issues.apache.org/jira/browse/TIKA-4060 > Project: Tika > Issue Type: Sub-task > Reporter: Gregory Lepore > Priority: Minor > Fix For: 2.8.1 > > Attachments: > 067aece423d8694a891a61a45ac0e870914bc1314ef510ac40b36ca3397843ef, > cb1bec08898db7a733b42ac44bdd76b6177cd3a07a2435a83fd99b7453d564d1 > > > Currently tika-mimetypes only recognizes audio/aac files by the file > extension. PRONOM recently added support for identifying aac files, but the > signature is tricky. There are two signatures, below in PRONOM format curly > braces mean to look ahead between the two values for the subsequent patterns. > > The first pattern is pretty basic, the second pattern is the first pattern > after a 2048 ID3 header. > > ||Name|Audio Data Transport Stream sig.1| > ||Description|An FF pattern from BOF with variation of byte stream| > ||Byte sequences| > ||Position type|Absolute from BOF| > ||Offset|0| > ||Maximum Offset|0| > ||Byte order| | > ||Value|FF(F0\|F1\|F8\|F9)(40\|41\|44\|45\|48\|49\|4C\|4D\|50\|51\|54\|55\|58\|59\|5C\|5D\|60\|61\|64\|65\|68\|69\|6C\|6D\|70\|71\|80\|81\|84\|85\|88\|89\|8C\|8D\|90\|91\|94\|95\|98\|99\|9C\|9D\|A0\|A1\|A4\|A5\|A8\|A9\|AC\|AD\|B0\|B1)(00\|01\|20\|40\|41\|60\|80\|81\|60\|A0\|C0\|C1\|E0)| > | > ||Name|Audio Data Transport Stream sig.2| > ||Description|ID3 tag variation with variable byte stream| > ||Byte sequences| > ||Position type|Absolute from BOF| > ||Offset|0| > ||Maximum Offset|0| > ||Byte order| | > ||Value|494433\{0-2045}FF(F0\|F1\|F8\|F9)(40\|41\|44\|45\|48\|49\|4C\|4D\|50\|51\|54\|55\|58\|59\|5C\|5D\|60\|61\|64\|65\|68\|69\|6C\|6D\|70\|71\|80\|81\|84\|85\|88\|89\|8C\|8D\|90\|91\|94\|95\|98\|99\|9C\|9D\|A0\|A1\|A4\|A5\|A8\|A9\|AC\|AD\|B0\|B1)(00\|01\|20\|40\|41\|60\|80\|81\|60\|A0\|C0\|C1\|E0)| > | -- This message was sent by Atlassian Jira (v8.20.10#820010)