[ 
https://issues.apache.org/jira/browse/TIKA-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372020#comment-14372020
 ] 

Tyler Palsulich commented on TIKA-1293:
---------------------------------------

Looks good to me. Any objections to adding this magic for HTML Netscape 
bookmark files?

> Netscape bookmark files are not being detected as HTML
> ------------------------------------------------------
>
>                 Key: TIKA-1293
>                 URL: https://issues.apache.org/jira/browse/TIKA-1293
>             Project: Tika
>          Issue Type: Bug
>          Components: detector, mime
>            Reporter: Phil Lester
>         Attachments: bookmarks.txt
>
>
> We are able to circumvent the HTML file type detection using the standard 
> Netscape bookmark file doctype (<!DOCTYPE NETSCAPE-Bookmark-file-1>) and 
> renaming the file extension to .txt. Standard HTML elements can then be 
> included in the file. Some browsers (such as Firefox) will detect the .txt 
> file as HTML and display it accordingly when downloading.
> We were able to resolve this by adding a custom mime-type for text/html that 
> included a match pattern for the Netscape doctype:
> <match value="&lt;!DOCTYPE NETSCAPE-Bookmark-file-1" type="string" 
> offset="0:64"/>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to