[ 
https://issues.apache.org/jira/browse/TIKA-4054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17726220#comment-17726220
 ] 

Tim Allison commented on TIKA-4054:
-----------------------------------

These look great, [~g...@rhobard.com]. Would you be able to add proposed 
mime-types for each?  Thank you!

> Add various file identifications to reduce application/octet-stream
> -------------------------------------------------------------------
>
>                 Key: TIKA-4054
>                 URL: https://issues.apache.org/jira/browse/TIKA-4054
>             Project: Tika
>          Issue Type: Sub-task
>            Reporter: Gregory Lepore
>            Priority: Major
>
> Catch all task for various format identification data which are currently 
> being identified as application/octet-stream. Most data is from PRONOM.
>  
> SPSS Data File
> ||External signatures|File extension: sav|
> ||Internal signatures|
> ||Name|SPSS Data File|
> ||Description|BOF: $FL2@(#)|
> ||Byte sequences|
> ||Position type|Absolute from BOF|
> ||Offset|0|
> ||Maximum Offset|0|
> ||Byte order| |
> ||Value|24464C3240282329|
> |
> |
>  
> Amiga Disk File
>  
> ||External signatures|File extension: adf|
> ||Internal signatures|
> ||Name|Amiga Disk File|
> ||Description|BOF: ‘DOS’ followed by ‘00\|01\|02\|03\|04\|05\|06\|07’ 
> depending on the format of the disk. More information on the internal 
> signature can be found here: http://lclevy.free.fr/adflib/adf_info.html#p41|
> ||Byte sequences|
> ||Position type|Absolute from BOF|
> ||Offset|0|
> ||Maximum Offset|0|
> ||Byte order| |
> ||Value|444F53(00\|01\|02\|03\|04\|05\|06\|07)|
> |
> |
>  
> JEOL NMR Spectroscopy
> ||External signatures|File extension: jdf|
> ||Internal signatures| |
> ||Name|JDF NMR Spectroscopy big endian|
> ||Description|Big Endian: BOF: 4A454F4C2E4E4D52 (JEOL.NMR)|
> ||Byte sequences|
>  
> |
> ||Position type|Absolute from BOF|
> ||Offset|0|
> ||Maximum Offset|0|
> ||Byte order| |
> ||Value|4A454F4C2E4E4D52|
> | | |
> ||Name|JDF little endian|
> ||Description|Little Endian: 524D4E2E4C4F454A (RMN.LOEJ)|
> ||Byte sequences| |
> ||Position type|Absolute from BOF|
> ||Offset|0|
> ||Maximum Offset|0|
> ||Byte order| |
> ||Value|524D4E2E4C4F454A|
>  
> ASPRS Lidar Data Exchange Format
> ||External signatures|File extension: las
> File extension: laz|
> ||Internal signatures|
> ||Name|ASPRS Lidar Data Exchange Format 1.2|
> ||Description|ASCII header: LASF, followed after 20 bytes by version number 
> 1.2|
> ||Byte sequences|
> ||Position type|Absolute from BOF|
> ||Offset|0|
> ||Byte order| |
> ||Value|4C415346\{20}0102\{78}[00:99]|
> |
> |
>  
> ASPRS Lidar Data Exchange Format v1.1
>  
> ||External signatures|File extension: las
> File extension: laz|
> ||Internal signatures|
> ||Name|ASPRS Lidar Data Exchange Format 1.1|
> ||Description|ASCII header: LASF, followed after 20 bytes by version number 
> 1.1|
> ||Byte sequences|
> ||Position type|Absolute from BOF|
> ||Offset|0|
> ||Byte order| |
> ||Value|4C415346\{20}0101\{78}[00:99]|
> |
> |
>  
> 3D Studio
> ||External signatures|File extension: 3ds|
> ||Internal signatures|
> ||Name|3D Studio (V1)|
> ||Description|Primary chunk ID, chunk length, version subchunk ID, chunk 
> length, version, 3D-editor chunk ID.|
> ||Byte sequences|
> ||Position type|Absolute from BOF|
> ||Offset|0|
> ||Byte order|Little-endian|
> ||Value|4D4D\{4}02000A000000(03\|04)\{3}3D3D|
> |
> ||Name|3D Studio (V2)|
> ||Description|Primary chunk ID, chunk length, 3D-editor chunk ID|
> ||Byte sequences|
> ||Position type|Absolute from BOF|
> ||Offset|0|
> ||Maximum Offset|0|
> ||Byte order| |
> ||Value|4D4D\{4}3D3D|
> |
> |
>  
> TAP (ZX Spectrum)
> ||External signatures|File extension: tap|
> ||Internal signatures|
> ||Name|TAPZX|
> ||Description|…\{20}ÿ|
> ||Byte sequences|
> ||Position type|Absolute from BOF|
> ||Offset|0|
> ||Maximum Offset|0|
> ||Byte order| |
> ||Value|130000\{20}FF|
> |
> |
>  
> Sibelius
> ||External signatures|File extension: sib|
> ||Internal signatures|
> ||Name|Sibelius|
> ||Description|Absolute from beginning of file, magic bytes: .SIBELIUS|
> ||Byte sequences|
> ||Position type|Absolute from BOF|
> ||Offset|0|
> ||Maximum Offset|0|
> ||Byte order| |
> ||Value|0F534942454C495553|
> |
> |
>  
> Portable Sound Format
> ||External signatures|File extension: psf
> File extension: psf1
> File extension: psflib
> File extension: minipsf
> File extension: minipsf1
> File extension: gsf
> File extension: gsflib
> File extension: minigsf|
> ||Internal signatures|
> ||Name|Portable Sound Format|
> ||Description|BOF: PSFx, where x represents one of the following values for 
> which PSF has been adapted 4th byte: 0x01: Playstation (PSF1) 0x02: 
> Playstation 2 (PSF2) 0x11: Sega Saturn (SSF) 0x12: Sega Dreamcast (DSF) 0x13: 
> Sega Genesis 0x21: Nintendo 64 (USF) 0x22: GameBoy Advance (GSF) 0x23: Super 
> NES (SNSF) 0x41: Capcom QSound (QSF) Format description: 
> http://web.archive.org/web/20140125155137/http://wiki.neillcorlett.com/PSFFormat|
> ||Byte sequences|
> ||Position type|Absolute from BOF|
> ||Offset|0|
> ||Maximum Offset|0|
> ||Byte order| |
> ||Value|505346(01\|02\|11\|12\|13\|21\|22\|23\|41)|
> |
> |



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to