[ 
https://issues.apache.org/jira/browse/TIKA-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoni Mylka updated TIKA-813:
------------------------------

    Attachment: testWEBARCHIVE.webarchive
                tika-813.patch

A second version of the patch which includes a unit test based on the file 
kindly provided by Andrzej. It turns out that the bplist magic had to be given 
higher priority to trump the (X)HTML magics, which occur later on in the file 
(it's a saved webpage after all).


                
> Webarchive detection.
> ---------------------
>
>                 Key: TIKA-813
>                 URL: https://issues.apache.org/jira/browse/TIKA-813
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.1
>            Reporter: Antoni Mylka
>         Attachments: Apache_Tika.webarchive, testWEBARCHIVE.webarchive, 
> tika-813.patch
>
>
> I'd like to be be able to detect .webarchive files. They are a special case 
> of the Apple Binary Property list format. They are generated by the Safari 
> browser and contain all the files that comprise a web page within a single 
> container file.
> Can anyone supply an example file? All the ones I have are confidential and I 
> don't have a mac myself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to