[ 
https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194854#comment-13194854
 ] 

Nick Burch commented on TIKA-851:
---------------------------------

>From 
>http://developer.apple.com/library/mac/#documentation/QuickTime/QTFF/QTFFChap1/qtff1.html#//apple_ref/doc/uid/TP40000939-CH203-BBCGDDDF
"Generally speaking, atoms can be present in any order. Do not conclude that a 
particular atom is not present until you have parsed all the atoms in the file.

An exception is the file type atom, which typically identifies the file as a 
QuickTime movie. If present, this atom precedes any movie atom, movie data, 
preview, or free space atoms. If you encounter one of these other atom types 
prior to finding a file type atom, you may assume the file type atom is not 
present. (This atom is introduced in the QuickTime File Format Specification 
for 2004, and is not present in QuickTime movie files created prior to 2004)."

So, if there is a ftyp atom, it should be first, and if the first atom isn't a 
ftyp then there isn't one. The AtomParsely link is handy, that should help with 
producing a metadata extracting parser
                
> M4V and M4A detection invalid
> -----------------------------
>
>                 Key: TIKA-851
>                 URL: https://issues.apache.org/jira/browse/TIKA-851
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Alexander Chow
>             Fix For: 1.1
>
>         Attachments: TIKA-851.patch
>
>
> When the mime type of an M4V file is detected using its name only, it returns 
> video/x-m4v.  When it is detected using the InputStream (hence utilising the 
> MagicDetector), it incorrectly returns video/quicktime.
> Using the sample M4V file from Apple's [knowledge 
> base|http://support.apple.com/kb/HT1425]:
> {code:title=TikaTest.java}
> public class TikaTest {
>       public static void main(String[] args) throws Exception {
>               String userHome = System.getProperty("user.home");
>               File file = new File(userHome + "/Desktop/sample_iPod.m4v");
>               InputStream is = TikaInputStream.get(file);
>               Detector detector = new DefaultDetector(
>                       MimeTypes.getDefaultMimeTypes());
>               Metadata metadata = new Metadata();
>               metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
>               System.out.println("File + filename: " + detector.detect(is, 
> metadata));
>               System.out.println("File only:       " + detector.detect(is, 
> new Metadata()));
>               System.out.println("Filename only:   " + detector.detect(null, 
> metadata));
>       }
> }
> {code}
> Renders the output:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   video/x-m4v
> {code}
> Moreover, if the same test is run against an M4A file, the results are even 
> more incorrect:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   application/octet-stream
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to